Windows、Mac与Unix的换行符与编码转换
参考《鸟哥的linux私房菜》。
1. 换行符⚓
Windows系统的文件的换行符是CR
(^M)与LF
($),在linux中表现为:
[sink@dev vitest]$ cat -A .gitignore
autorun.inf^M$
uchkvk.pif^M$
可以使用dos2unix
来与linux格式相互转换。
1.1 安装⚓
1.1.1 从镜像获取⚓
挂载centos镜像,安装:
$ rpm -ivh dos2unix-6.0.3-4.el7.x86_64.rpm
1.1.2 从yum获取⚓
$ yum search dos2unix
dos2unix.x86_64 : Text file format converters
1.2 使用⚓
dos2unix/unix2dos/mac2unix/unix2mac [options] [file ...] [-n infile outfile ...]
-k, --keepdate keep output file date
-n, --newfile write to new file
infile original file in new file mode
outfile output file in new file mode
-o, --oldfile write to old file
file ... files to convert in old file mode
[sink@dev vitest]$ cat -A .gitignore
autorun.inf^M$
uchkvk.pif^M$
# linux
[sink@dev vitest]$ dos2unix -n .gitignore .gitignore-linux
dos2unix: converting file .gitignore to file .gitignore-linux in Unix format ...
[sink@dev vitest]$ cat -A .gitignore-linux
autorun.inf$
uchkvk.pif$
# Windows
[sink@dev vitest]$ unix2dos -n .gitignore-linux .gitignore-Windows
unix2dos: converting file .gitignore-linux to file .gitignore-Windows in DOS format ...
[sink@dev vitest]$ cat -A .gitignore-Windows
autorun.inf^M$
uchkvk.pif^M$
# macOS
[sink@dev vitest]$ unix2mac -n .gitignore-linux .gitignore-mac
unix2mac: converting file .gitignore-linux to file .gitignore-mac in Mac format ...
[sink@dev vitest]$ cat -A .gitignore-mac
autorun.inf^Muchkvk.pif^M
[sink@dev vitest]$ file .gitignore-*
.gitignore-linux: ASCII text
.gitignore-mac: ASCII text, with CR line terminators
.gitignore-Windows: ASCII text, with CRLF line terminators
1.3 其它字符转换方法⚓
tr [OPTION]... SET1 [SET2]
从标准输入 转换、去重、删除字符,写入标准输出
-c, -C, --complement 使用完整的 SET1
-d, --delete 删除 SET1 中的字符
-s, --squeeze-repeats 对相连的单个字符去重
SETs:
\\ backslash
\b backspace
\n new line
\r return
\t horizontal tab
CHAR1-CHAR2 all characters from CHAR1 to CHAR2 in ascending order
[CHAR*] in SET2, copies of CHAR until length of SET1
[CHAR*REPEAT] REPEAT copies of CHAR, REPEAT octal(八进制) if starting with 0
[:alnum:] all letters and digits
[:alpha:] all letters
[:blank:] all horizontal whitespace
[:digit:] all digits
[:lower:] all lower case letters
[:space:] all horizontal or vertical whitespace
[:upper:] all upper case letters
[:xdigit:] all hexadecimal digits
[=CHAR=] all characters which are equivalent to CHAR
[root@dev vitest]# echo abcdABCD | tr [:lower:] [:upper:]
ABCDABCD
[root@dev vitest]# echo abcdABCD | tr [a-z] [A-Z]
ABCDABCD
[root@dev vitest]# echo abcdABCD | tr [a-z] [1-9]
1234ABCD
[root@dev vitest]# cat -A .gitignore-Windows | tr -d '^M'
autorun.inf$
uchkvk.pif$
[root@dev vitest]# cat .gitignore-Windows | tr -d '\r' | cat -A
autorun.inf$
uchkvk.pif$
[root@dev vitest]# echo aaabbbb | tr -s a
abbbb
[root@dev vitest]# echo aaabbbb | tr -s a b
b
2. 编码⚓
在Windows系统中转换编码可以使用记事本的另存为
、Nodepade++
等工具。
iconv [OPTION...] [FILE...]
Convert encoding of given files from one encoding to another.
-f, --from-code=NAME encoding of original text
-t, --to-code=NAME encoding for output
-l, --list list all known coded character sets
-o, --output=FILE 输出到新的文件。默认更改原文件
# 先在Windows中编写文件,放到linux中
# 打开新终端设置GBK编码,此时可正常查看文件
[sink@dev vitest]$ cat charactor.txt
abcd
大运河
[sink@dev vitest]$ iconv -f gbk -t utf-8 -o charactor-utf8.txt charactor.txt
# 再打开新的终端设置为UTF-8编码,查看文件
[sink@dev vitest]$ cat charactor-utf8.txt
abcd
大运河
# 注意:file命令并不能准确的识别到Windows的文件编码
[sink@dev vitest]$ file -i charactor*
charactor.txt: text/plain; charset=iso-8859-1
charactor-utf8.txt: text/plain; charset=utf-8