本文共 5512 字,大约阅读时间需要 18 分钟。
虽然sed已经很牛逼了,但是再牛逼也有自身的限制。gawk就是用来搞定sed不能搞定的问题。
gawk可以做以下几件事情:
gawk options program file
还是直接看例子吧。
首先假设我们有这样一个文本数据data:
No.1 Google GmailNo.2 Microsoft WindowsNo.3 SAP ERPNo.4 Intel CoreNo.5 Cisco Rout
输入命令并得出输出结果:
$ gawk '{print $1}' dataNo.1No.2No.3No.4No.5
看到了吧,啥情况,这是输出文本所有以空格为分隔符的第一列。输出第二列当然就是这样写了。
$ gawk '{print $2}' dataGoogleMicrosoftSAPIntelCisco
如果命令是‘$0’,会出现啥?
$ gawk '{print $0}' dataNo.1 Google GmailNo.2 Microsoft WindowsNo.3 SAP ERPNo.4 Intel CoreNo.5 Cisco Router
很显然,如果输入‘$0’,那整个文本就输出了。
好,有了这个功能,那我们分析一些系统文件是不是就省事多了,比如passwd文件里面的内容打开以后,如果直接看,那貌似比较困难,现在有了gwak,来先输出一下第一列吧。
$ gawk -F: '{print $1}' /etc/passwd rootdaemonbinsyssyncgamesmanlpmailnewsuucpproxywww-databackuplistircgnatsnobodysystemd-timesyncsystemd-networksystemd-resolvesystemd-bus-proxysyslog......
看到这个结果就爽,不过要注意:这里用了一个参数F,这个参数的意思就是指定行中分隔数据字段的字段分隔符。F后面紧跟了个‘:’,意思就是以‘:’为分隔符。
gawk当然也可以用于管道数据处理:
$ echo "I have a pen" | gawk '{$4="apple"; print $0}'I have a apple
这个命令的意思就是将第四个词改为“apple”,然后全部输出。注意:$4=”apple”这个地方必须是双引号,如果是单引号就只会输出“I have a”。
与sed一样,gawk也可以一行一行地输入脚本命令:
$ gawk '{> $4="apple"> print $0}' apple apple
因为我们之前啥也每输入,也就是说没有原始字符串,所以每次回车都替换第四个单词为apple,然后通过ctrl+D就可以结束运行了。
新建脚本script
{ print $1 "'s home direcotry is " $6}
运行并得出结果:
$ gawk -F: -f script /etc/passwdroot's home direcotry is /rootdaemon's home direcotry is /usr/sbinbin's home direcotry is /binsys's home direcotry is /devsync's home direcotry is /bingames's home direcotry is /usr/gamesman's home direcotry is /var/cache/manlp's home direcotry is /var/spool/lpdmail's home direcotry is /var/mailnews's home direcotry is /var/spool/newsuucp's home direcotry is /var/spool/uucpproxy's home direcotry is /binwww-data's home direcotry is /var/wwwbackup's home direcotry is /var/backupslist's home direcotry is /var/listirc's home direcotry is /var/run/ircdgnats's home direcotry is /var/lib/gnatsnobody's home direcotry is /nonexistentsystemd-timesync's home direcotry is /run/systemdsystemd-network's home direcotry is /run/systemd/netifsystemd-resolve's home direcotry is /run/systemd/resolvesystemd-bus-proxy's home direcotry is /run/systemdsyslog's home direcotry is /home/syslog_apt's home direcotry is /nonexistentmessagebus's home direcotry is /var/run/dbusuuidd's home direcotry is /run/uuiddlightdm's home direcotry is /var/lib/lightdmwhoopsie's home direcotry is /nonexistentavahi-autoipd's home direcotry is /var/lib/avahi-autoipdavahi's home direcotry is /var/run/avahi-daemondnsmasq's home direcotry is /var/lib/misccolord's home direcotry is /var/lib/colordspeech-dispatcher's home direcotry is /var/run/speech-dispatcherhplip's home direcotry is /var/run/hplipkernoops's home direcotry is /pulse's home direcotry is /var/run/pulsertkit's home direcotry is /procsaned's home direcotry is /var/lib/sanedusbmux's home direcotry is /var/lib/usbmuxcomac's home direcotry is /home/comacsshd's home direcotry is /var/run/sshd
当然这个脚本也可以这么写:
{ text="'s home directory is " print $1 text $6}
注意:每个命令分别放到新的一行,有没有分号无所谓。
这两个关键字理解也简单,BEGIN就是在执行脚本前运行这个,END就是在执行脚本后运行这个。
还是看例子:
$ gawk 'BEGIN { print "This is the key word BEGIN" } { print $0}' dataThis is the key word BEGINNo.1 Google GmailNo.2 Microsoft WindowsNo.3 SAP ERPNo.4 Intel CoreNo.5 Cisco Router
在打印data中的文本前先输出了BEGIN中的内容。下面再看看END是啥效果:
$ gawk 'BEGIN { print "This is the key word BEGIN" } { print $0} END { print "This is the key word END"}' dataThis is the key word BEGINNo.1 Google GmailNo.2 Microsoft WindowsNo.3 SAP ERPNo.4 Intel CoreNo.5 Cisco RouterThis is the key word END
很简单是吧,最后呢,再体验一下gawk的威力,你就回觉得这家伙真心牛逼。
新建脚本文件script
BEGIN {print "The latest list of users and shells"print "Userid shell "print "---------------------------------------------"FS=":"}{print $1 "\t\t\t" $7}END {print "---------------------------------------------"print "This is the end of the list"}
输出结果:
$ gawk -f script /etc/passwdThe latest list of users and shellsUserid shell ---------------------------------------------root /bin/bashdaemon /usr/sbin/nologinbin /usr/sbin/nologinsys /usr/sbin/nologinsync /bin/syncgames /usr/sbin/nologinman /usr/sbin/nologinlp /usr/sbin/nologinmail /usr/sbin/nologinnews /usr/sbin/nologinuucp /usr/sbin/nologinproxy /usr/sbin/nologinwww-data /usr/sbin/nologinbackup /usr/sbin/nologinlist /usr/sbin/nologinirc /usr/sbin/nologingnats /usr/sbin/nologinnobody /usr/sbin/nologinsystemd-timesync /bin/falsesystemd-network /bin/falsesystemd-resolve /bin/falsesystemd-bus-proxy /bin/falsesyslog /bin/false_apt /bin/falsemessagebus /bin/falseuuidd /bin/falselightdm /bin/falsewhoopsie /bin/falseavahi-autoipd /bin/falseavahi /bin/falsednsmasq /bin/falsecolord /bin/falsespeech-dispatcher /bin/falsehplip /bin/falsekernoops /bin/falsepulse /bin/falsertkit /bin/falsesaned /bin/falseusbmux /bin/falsecomac /bin/bashsshd /usr/sbin/nologin---------------------------------------------This is the end of the list
生成这么清晰明了的报表,就写到这里吧,稍稍休息一下。
[1] Linux Command Line and Shell Scripting Bible 2nd Edition. Richard Blum, Christine Bresnahan. WILEY Press.
转载地址:http://wzhii.baihongyu.com/