如何从UTF-8文件中删除BOM表?

更新时间：2023-11-27 11:13:16

BOM是Unicode代码点U + FEFF； UTF-8编码由三个十六进制值0xEF，0xBB，0xBF组成.

A BOM is Unicode codepoint U+FEFF; the UTF-8 encoding consists of the three hex values 0xEF, 0xBB, 0xBF.

使用bash，您可以使用$''特殊引用格式创建UTF-8 BOM，该格式实现Unicode转义:$'\uFEFF'.因此，使用bash，从文本文件开头删除UTF-8 BOM的可靠方法是:

With bash, you can create a UTF-8 BOM with the $'' special quoting form, which implements Unicode escapes: $'\uFEFF'. So with bash, a reliable way of removing a UTF-8 BOM from the beginning of a text file would be:

sed -i $'1s/^\uFEFF//' file.txt

如果文件不是以UTF-8 BOM开始，则文件将保持不变，否则将删除BOM.

This will leave the file unchanged if it does not start with a UTF-8 BOM, and otherwise remove the BOM.

如果您使用其他外壳程序，您可能会发现"$(printf '\ufeff')"生成BOM字符(与zsh以及任何不带内置printf的外壳程序一起使用，前提是/usr/bin/printf是Gnu版本)，但是如果您想要兼容Posix的版本，则可以使用:

If you are using some other shell, you might find that "$(printf '\ufeff')" produces the BOM character (that works with zsh as well as any shell without a printf builtin, provided that /usr/bin/printf is the Gnu version ), but if you want a Posix-compatible version you could use:

sed "$(printf '1s/^\357\273\277//)" file.txt

(-i就地编辑标记也是Gnu扩展；此版本将可能已修改的文件写入stdout.)

(The -i in-place edit flag is also a Gnu extension; this version writes the possibly-modified file to stdout.)

上一篇 : ：如何使用纯JavaScript将GBK转换为UTF8？下一篇 : 将 ASCII 转换为 UTF-8 编码

如何从UTF-8文件中删除BOM表?

相关阅读

推荐文章