更新时间:2023-11-27 11:13:16
BOM是Unicode代码点U + FEFF; UTF-8编码由三个十六进制值0xEF,0xBB,0xBF组成.
A BOM is Unicode codepoint U+FEFF; the UTF-8 encoding consists of the three hex values 0xEF, 0xBB, 0xBF.
使用bash,您可以使用$''
特殊引用格式创建UTF-8 BOM,该格式实现Unicode转义:$'\uFEFF'
.因此,使用bash,从文本文件开头删除UTF-8 BOM的可靠方法是:
With bash, you can create a UTF-8 BOM with the $''
special quoting form, which implements Unicode escapes: $'\uFEFF'
. So with bash, a reliable way of removing a UTF-8 BOM from the beginning of a text file would be:
sed -i $'1s/^\uFEFF//' file.txt
如果文件不是以UTF-8 BOM开始,则文件将保持不变,否则将删除BOM.
This will leave the file unchanged if it does not start with a UTF-8 BOM, and otherwise remove the BOM.
如果您使用其他外壳程序,您可能会发现"$(printf '\ufeff')"
生成BOM字符(与zsh
以及任何不带内置printf
的外壳程序一起使用,前提是/usr/bin/printf
是Gnu版本),但是如果您想要兼容Posix的版本,则可以使用:
If you are using some other shell, you might find that "$(printf '\ufeff')"
produces the BOM character (that works with zsh
as well as any shell without a printf
builtin, provided that /usr/bin/printf
is the Gnu version ), but if you want a Posix-compatible version you could use:
sed "$(printf '1s/^\357\273\277//)" file.txt
(-i
就地编辑标记也是Gnu扩展;此版本将可能已修改的文件写入stdout.)
(The -i
in-place edit flag is also a Gnu extension; this version writes the possibly-modified file to stdout.)