1 / 52

The awk Utility

CS465 - Unix. The awk Utility. Background. awk was developed by Aho, Weinberger, and Kernighan (of K & R) Was further extended at Bell Labs Handles simple data-reformatting jobs easily with just a few lines of code. Versions awk - original version nawk - new awk - improved awk

foy
Download Presentation

The awk Utility

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS465 - Unix The awk Utility

  2. Background • awk was developed by • Aho, Weinberger, and Kernighan (of K & R) • Was further extended at Bell Labs • Handles simple data-reformatting jobs easily with just a few lines of code. • Versions • awk - original version • nawk - new awk - improved awk • gawk - gnu awk - improved nawk

  3. How awk works • awk commands include patterns and actions • Scans the input line by line, searching for lines that match a certain pattern (or regular expression) • Performs a selected action on the matching lines • awk can be used: • at the command line for simple operations • in programs or scripts for larger applications

  4. Running awk • From the Command Line: $ awk '/pattern/{action}' file • OR From an awk script file: $ cat awkscript # This is a comment /pattern/ {action} $ awk –f awkscript file

  5. awk’s Format using Input from a File $ awk /pattern/ filename • awk will act like grep $ awk '{action}' filename • awk will apply the action to every line in the file $ awk '/pattern/ {action}' filename • awk will apply the action to every line in the file that matches the pattern

  6. Example 1

  7. Example 1

  8. Example 1

  9. record 1 -> George Jones Admin record 2 -> Anthony Smith Accounting Records and Fields • Each record is split into fields, delimited by a special character (whitespace by default) • Can change delimeter with –F or FS • awk divides the input into records and fields • Each line is a record (by default) field-1 field-2 field-3 | | | v v v

  10. awk field variables • awk creates variables $1, $2, $3… that correspond to the resulting fields (just like a shell script). • $1 is the first field, $2 is the second… • $0 is a special field which is the entire line • NF is always set to the number of fields in the current line (no dollar sign to access)

  11. Example #1 $ cat students Bill White 7777771 1980/01/01 Science Jill Blue 1111117 1978/03/20 Arts Ben Teal 7171717 1985/02/26 CompSci Sue Beige 1717171 1963/09/12 Science $ $ awk '/Science/{print $1, $2}' students Bill White Sue Beige $ • Commas indicates that we want the output to be delimited by spaces (otherwise they are concatonated): • $ awk '/Science/{print $1 $2}' students • BillWhite • SueBeige

  12. Example #2 $ cat phonelist Joe Smith 774-0888 Mary Jones 772-2345 Hank Knight 494-8888 $ $ awk '{print "Name: ", $1, $2, \ " Telephone:", $3}' phonelist Name: Joe Smith Telephone: 774-0888 Name: Mary Jones Telephone: 772-2345 Name: Hank Knight Telephone: 494-8888 $ • No pattern given, so matches ALL lines • Text strings to print are placed in double quotes

  13. Example #3 Given a username, display the person’s real name: $ grep small /etc/passwd small000:x:1164:102:Faculty - Pam Smallwood:/export/home/small000:/bin/ksh $ $ awk -F: '/small000/{print $5}' /etc/passwd Faculty - Pam Smallwood $

  14. awk using Input from Commands • You can run awk in a pipeline, using input from another command: $ command | awk '/pattern/ {action}' • Takes the output from the command and pipes it into awk which will then perform the action on all lines that match the pattern

  15. Piped awk Input Example $ w 1:04pm up 25 day(s), 5:37, 6 users, load average: 0.00, 0.00, 0.01 User tty login@ idle JCPU PCPU what pugli766 pts/8 Tue10pm 3days -ksh lin318 pts/17 10:58am 1:45 vi choosesort small000 pts/18 12:43pm w mcdev712 pts/10 11:52am 14 1 vi adddata gibbo201 pts/12 12:15pm 18 -ksh nelso828 pts/16 7:17pm 17:43 -ksh $ $ w | awk '/ksh/{print $1}' pugli766 gibbo201 nelso828 $

  16. Relational Operators • awk can use relational operators ( <, >, <=, >=, ==, !=, ! ) to compare a field to a value • If the outcome of the comparison is true then the the action is performed • Examples: • To print every record in the log.txt file in which the second field is larger than 10 • $ awk '$2 > 10' log.txt • To print every record in the log.txt file which does NOT contain ‘Win32’ • $ awk '!/Win32/' log.txt

  17. Relational Operator Example $ who pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net) lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com) small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net) mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net) gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com) nelso828 pts/16 Jun 5 19:17 (65.100.138.177) $ $ who | awk '$4 < 6 {print $1, $3, $4, $5}' pugli766 Jun 3 22:24 nelso828 Jun 5 19:17 $

  18. Piping awk output $ who pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net) lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com) small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net) mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net) gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com) nelso828 pts/16 Jun 5 19:17 (65.100.138.177) $ $ who | awk '$4 == 6 {print $1}' | sort gibbo201 lin318 mcdev712 small000 $

  19. awk Programming • awk programming is done by building a list • The list is a list of rules • Each rule is applied sequentially to each line (record) • Example: /pattern1/ { action1 } /pattern2/ { action2 } /pattern3/ { action3 }

  20. awk - pattern matching • Before processing, lines can be matched with a pattern. /pattern/ { action } execute if line matches pattern The pattern is a regular expression. • Examples: /^$/ { print "This line is blank" } /num/ { print "Line includes num" } /[0-9]+$/ { print "Integer at end:", $0 } /[A-z]+/ { print "String:", $0 } /^[A-Z]/ { print "Starts w/uppercase letter" }

  21. awk program from a file • The awk commands (program) can be placed into a file • The –f (lowercase f) indicates that the commands come from a file whose name follows the –f $ awk –f awkfile datafile The contents of the file called awkfile will be used as the commands for awk

  22. Example 1 $ cat students Bill White 333333 1980/01/01 Science Jill Blue 333444 1978/03/20 Arts Bill Teal 555555 1985/02/26 CompSci Sue Beige 555777 1963/09/12 Science $ cat awkprog /5?5/ {print $1, $2} /3*4/ {print $5} $ $ awk –f awkprog students Arts Bill Teal Sue Beige $ **NOTE: All patterns applied to each line before moving to next line

  23. Example 2 $ cat students Bill White 333333 1980/01/01 Science Jill Blue 333444 1978/03/20 Arts Bill Teal 555555 1985/02/26 CompSci Sue Beige 555777 1963/09/12 Science $ cat awkprog /Science/ {print "Science stu:", $1, $2} /CompSci/ {print "Computing stu:", $1, $2} $ $ awk –f awkprog students Science stu: Bill White Computing stu: Bill Teal Science stu: Sue Beige $

  24. More about Patterns • Patterns can be: • Empty: will match everything • Regular expressions: /reg-expression/ • Boolean Expressions: $2=="foo" && $7=="bar" • Ranges: /jones/,/smith/

  25. Example - Boolean Expressions $ cat students Bill White 333333 1980/01/01 Science Jill Blue 333444 1978/03/20 Arts Bill Teal 555555 1985/02/26 CompSci Sue Beige 555777 1963/09/12 Science $ cat awkprog $3 <= 444444 {print "Not counted"} $3 > 444444 {print $2 ",", $1} $ $ awk –f awkprog students Not counted Not counted Teal, Bill Beige, Sue $

  26. Example - Ranges $ cat students Bill White 333333 1980/01/01 Science Jill Blue 333444 1978/03/20 Arts Bill Teal 555555 1985/02/26 CompSci Sue Beige 555777 1963/09/12 Science $ $ awk '/333333/,/555555/' students Bill White 333333 1980/01/01 Science Jill Blue 333444 1978/03/20 Arts Bill Teal 555555 1985/02/26 CompSci $

  27. More Built-In awk Variables • Two types: Informative and Configuration • Informative: NR = Current Record Number (start at 1) • Counts ALL records, not just those that match NF = Number of Fields in the Current Record FILENAME = Current Input Data File • Undefined in the BEGIN block

  28. Example using NF $ cat names Pam Sue Laurie Bob Joe Bill Dave Joan Jill $ $ awk '{print NF}' names 3 4 2 0 $

  29. Example using a boolen, NF, and NR $ cat names Pam Sue Laurie Bob Joe Bill Dave Joan Jill $ $ awk 'NF > 2 {print NR ":", NF, "fields"}' names 1: 3 fields 2: 4 fields $

  30. Built-in awk functions log(expr) natural logarithm index(s1,s2) position of string s2 in string s1 length(s) string length substr(s,m,n) n-char substring of s starting at m tolower(s) converts string to lowercase printf() print formatted - like C printf

  31. Example 2

  32. print & printf • Use print in an awk statement to output specific field(s) • printf is more versatile • works like printf in the C language • May contain a format specifier and a modifier

  33. Format Specification • A format specification consists of a percent symbol, a modifier, width and precision values, and a conversion character • To display the third field as a floating point number with two decimal places: awk '{printf("%.2f\n", $3)}' file • You can include additional text in the printf statement '{printf ("3rd value: %.2f\n", $3)}'

  34. Type Specifiers: %c Single character %d integer (decimal) %f Floating point %s String Between the % and the specifier you can place the width and precision %6.2f means a floating point number in a field of width 6 in which there are two decimal places Modifiers control details of appearance: - minus sign is the left justification modifier right justification) + plus sign forces the appearance of a sign (+,-) for numeric output 0 zero pads a right justified number with zeros Specifiers, Width, Precision, & Modifiers

  35. awk Variables • Variables • No need for declaration • Implicitly set to 0 AND the Empty String • Variable type is a combination of a floating-point and string • Variable is converted as needed, based on its use title = "Number of students" no = 100 weight = 13.4

  36. Example 2

  37. awk program execution Executes only once before reading input data BEGIN { ….} { ….} specification { ….. } END { ….. } Executes for each input line Executes for each input linethat matches specified /pattern/ or Boolean expression Executes at the end after all lines being processed

  38. Example #1: Count # lines in file • $ cat awkprog • BEGIN {total = 0} • {total = total + 1} • END {print total " lines"} • $ cat testfile • Hello There • Goodbye! • $ - Set total to 0 before processing any lines - For every row in the file, execute {total = total + 1} - Print total after all lines processed. • $ awk –f awkprog testfile • 2 lines • $

  39. Ex #2: Count lines containing a pattern {totalpattern++} only executes if the line in filename has pattern appearing in the line. $ cat Simpsons Marge 34 Homer 32 Lisa 10 Bart 11 Maggie 01 $ cat countthem BEGIN {totalMa = 0; totalar = 0} /Ma/ { totalMa++ } /ar/ { totalar++ } END { print totalMa " Ma's" print totalar " ar's"} $ $ awk -f countthem Simpsons 2 Ma's 2 ar's $

  40. Example #3: Add line numbers $ cat numawk BEGIN { print "Line numbers by awk" } { print NR ":", $0 } END { print "Done processing " FILENAME } $ cat testfile Hello There Goodbye! $ • $ awk –f numawk testfile • Line numbers by awk • 1: Hello There • 2: Goodbye! • Done processing testfile • $

  41. More Built-In awk Variables • Two types: Informative and Configuration • Configuration FS = Input field separator OFS = Output field separator (default for both is space " ") RS = Input record seperator ORS = Output record seperator (default for both is newline "\n")

  42. Example #1: Reverse 2 columns $ cat switch BEGIN {FS="\t"} {print $2 "\t" $1} $ awk -f switch Simpsons 34 Marge 32 Homer 10 Lisa 11 Bart 01 Maggie $ NOTE: Columns separated by tabs • Alternatively you could do the following: • $ awk -F\t '{print $2 "\t" $1}' Simpsons

  43. Example #2: Sum a column $ cat awksum2 BEGIN { FS="\t" sum = 0 } {sum = sum + $2} END { print "Done" print "Total sum is " sum } $ • $ awk -f awksum2 Simpsons • Done • Total sum is 88 • $

  44. Example #3: Comma delimited file $ cat names Bill Jones,3333,M Pam Smith,5555,F Sue Smith,4444,F $ • $ awk -F, '{print $2}' names • 3333 • 5555 • 4444 • $

  45. Longer awk program $ cat awkprog BEGIN { print "Processing..." } # print number of fields in first line NR == 1 { print $0, NF, "fields"} /^Unix/ { print "Line starts with Unix: ", $0 } /Unix$/ { print "Line ends with Unix: " $0 } # finishing it up END {print NR " lines checked"} $

  46. awk program execution $ cat datfile First Line Unix is great! What else is better? This is Unix Yes it is Unix Goodbye! $ $ awk -f awkprog datfile Processing... First Line 2 fields Line starts with Unix: Unix is great! Line ends with Unix: This is Unix Line ends with Unix: Yes it is Unix 6 lines checked $

  47. awk programming language syntax if ( found == true ) # if (expr)print “Found”; # {action1}else # elseprint “Not found”; # {action2} while ( i <= 100) # while (cond) { i = i + 1; # { actions... print i } # }

  48. awk programming language syntax do # do{ i = i + 1; #{ actions ... print i } # }while ( i < 100); # while (cond); for (i=1; i < 10; i++ ) # for (set; test; incr) { # { sqr = i * i; # actions print i " squared is " sqr} # }

  49. awk – longer example • Write an awk program that prints out content of a directory in the following format: BYTES FILE 24576 copyfile 736 copyfile.c 740 copyfile.c~ 24576 dirlist 989 dirlist.c 977 dirlist.c% 24576 envadv 185 envadv.c <dir> tmp 740 x.c Total: 73684 bytes in 9 regular files

  50. awk example - code $ cat awkprog BEGIN {print " BYTES \t FILE"; sum=0; filenum=0 } # test for lines starting with - /^-/ { sum += $5 ++filenum printf ("%10d \t%s\n", $5, $9) } # test for directories - line starts with d /^d/ { print " <dir> \t", $9 } # conclusion END { print "\n Total: " sum" bytes in" print " " filenum " regular files" } $

More Related