490 likes | 580 Views
Linux Utilities. COMP075 OS2. Command Line Utilities. UNIX use predated modern GUI shells Command line utilities provided users with flexibility and power without use of GUI Still very useful today Can be scripted Can be used through lightweight remote access
E N D
Linux Utilities COMP075 OS2
Command Line Utilities • UNIX use predated modern GUI shells • Command line utilities provided users with flexibility and power without use of GUI • Still very useful today • Can be scripted • Can be used through lightweight remote access • Can be used on servers in run level 3 • Faster (to write and to run)
sed • Stream Editor • Edits data read from standard input, writes to standard output • Uses regex extensively • One way of using: • sed reads file (or file piped to sed) • Applies edits • Output directed to new file • Directing output to the input file clobbers it • Rename files when done (maybe delete original)
History • Based on line editors used in teletype days • Users could print one line at a time, then apply edits to that line • Edited line written back to file • Edit commands could be scripted • sed uses those same one-line-at-a-time editing commands, but reads the file for you and applies edits to every line
sed in Pipelines • sed is often used in pipelines • Records arrive on standard input • sed applies edits and writes to standard output • Standard output may be piped to another utility to further process the records
sed command sed options script files • Options control general operation -r Use extended regular expressions -e script • Provides a sed script • Script • If no -e option • Files • Concatenated to provide input, • If missing or “-” sed reads standard input
Options • --version • --help • -n • Don't output automatically, only when told • -e script • Provides a sed script to run • Can be multiple • -f script file • Script is read from the file
Basic Operation • sed reads a record from input • Record goes in pattern space without newline • There is also a hold space used by some commands • Scripts are executed • Unless prevented, edited record is written to output with newline restored • Pattern space is cleared (normally)
sed Scripts • A sed script can be a single sed command, or a text file or list of multiple commands • Commands are separated by newlines • To specify multiple commands on the sed command line you can: • Use multiple -e options • Separate the commands with “;” • Use a multiple line command
Multi Line Command sed ' > first command >another >last one ' • After unclosed ' on first line BASH uses '>' prompt to ask for continuations • Last line had the closing ' so BASH runs the command after the last line is entered
Quoting sed Commands • Lots of sed commands contain special characters that need to be escaped • Remember the regex s/\\/\// that was confusing because of the escape characters? • To avoid use of escape character get in the habit of enclosing commands in single quotes • Special characters don't need to be escaped inside single quotes
Basic Command Syntax • Old line editors preceded commands with a line address • sed retains this feature, although most of the time commands are applied to all lines • Basic command is: [address]command • Address may specify a range but only for some commands • Commands may be grouped inside braces to apply address to entire group
Grouped Commands address{ command command command } • Closing brace must be on its own line • Lines separated by newlines • Can separate with “;” but hard to read, better to use newlines
Addresses • number – apply to line number • /regexp/I – apply to lines matching regexp • Optional “I” does case insensitive match • address,address • Specifies a range • address,+number • Number indicates number of lines • address! • Means not this line
s/// • This is the most useful sed command • Basically the same as the regex search and replace • Syntax is s/search/replace/flags • First character after s is delimiter • Replace can be empty
s Flags • g – means global, replace all matches in the line • p – print if a match was made • w filename – write to file if match was made • i – case insensitive • m – Multi line • If newlines are present in the pattern space, ^ and $ will match apparent start and end of records at location of the embedded newline, as well as at the start and end of the entire pattern space
Multi-Line Edits • sed normally processes one line at a time • To match patterns spanning lines, lines must be combined • N command reads next line and appends it to the pattern space with embedded newline • D command outputs up to the embedded newline • Multi-line edits are complex, but powerful if you figure them out
Other Commands • [address]d – deletes the lines matching the address, reads another • [address]a – append after matching line • [address]i – insert before matching line • [address]c – replace the matching line(s) • a and i and c must be followed by lines of text like this: 200a\ line to append after line 200\ another line • c can specify an address range
grep • A very useful utility derived from the ed line editor • g/re/p is an ed command that means Global Regex Print • Any line that matches the regex is printed • grep basically does that • You supply a regex and grep filters the input and prints those lines that match
grep Command Line grep switches pattern input-file grep switches -e pattern | -f pattern-file input-file • If file is omitted or - , reads standard input • -e allows multiple patterns • -f file contains patterns
grep switches • --help • -i ignore case • -v select non-matching lines • -w match whole words • -c just count the matches • -l just print matching filenames • -L for non-matching • -q Quiet, no output produced • Exit status 0 means match occurred • Used in scripts
More Switches • -H print the filename along with the matching line -h don't print filenames, even with multiple files • -n print line number • -A -B -C num • Print num lines After Before or both as well as matching line
awk • AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems. AWK was very popular in the late 1970s and 1980s, but from the 1990s has not retained its level of awareness in light of newer languages like Perl, upon which AWK had a strong influence. • Wikipedea
History • Developed at Bell labs byAlfred Aho, Peter Weinberger, and Brian Kernighan • Aho developed the pattern matching algorithms used in grep • Weinberger worked in image processing, and his face became famous • Kernighan helped develop Unix
AWK Operation • Much like sed • Processes records one at a time and outputs the result • But: • AWK script syntax is based on C programming • sed processes records, but awk recognizes fields within the records • awk allows finer control over the format of the output and is considered to be a report writing language
AWK Command Line • Similar to sed awk options script filename • -f filename supplies script in a file • awk scripts are generally more complex than sed scripts and are commonly run from a file • Script on command line usually needs single quotes to protect it from the shell • If filename is omitted or -, awk reads from standard input
AWK Command Line Options • Because AWK is capable of being used as a general purpose programming language, there a are a lot of options that are rarely used • Commonly used are: -f file • AWK script contained in file -F separator • Specifies field separator -v var=value • Provides values for AWK variables
AWK Scripts • Like sed scripts, commands are separated by newlines (or ;) • Each line consists of a pattern and a procedure • Like the address and command in sed • Pattern can be a regex like /search/ • Procedure is a series of commands inside braces and separated by newlines or ; • Default procedure prints entire input line {print}
Executable AWK Scripts • A file with an appropriate shabang line can become an executable program file #! /usr/bin awk -f awk script • First line provides command to process the file • If this file were called “myawk” you could run it like this myawk data-file-name
Writing awk Scripts • Simple command line scripts usually written using ; as separator so script can go on one line • In a script file newlines will separate commands, and proper indentation should be used inside blocks { print $2 ports += 1 }
Blocks in awk Scripts • Mostly blocks start with “pattern {“ • Followed by commands • End with “}” BEGIN { LastSource = “ “ ORS = “ “ Sources = 0 Ports = 0 }
AWK Variables • Users can create and set their own variables • There are also a number of built in variables that matter FS is the field separator • Defaults to space or tab • Change with -F option • Or FS= RS is the record separator • Defaults to newline • Change with RS=
AWK Variables • OFS is the output field separator • Defaults to “ “ • ORS is the output record separator • Defaults to “\n” • To print next output on the same line as current: ORS = “ “ print “something” ORS = “\n” print “something on the same line” print “something on the next line”
Patterns • Script lines start with optional pattern • Default matches all input lines • Pattern is: • /regex/ • BEGIN • Procedure runs before any input is read • END • Procedure runs after all input • pattern, pattern • Indicates a range, like in sed
Compound Patterns • Multiple patterns can be combined using boolean operators • pattern && pattern • pattern || pattern • Parentheses can be used to disambiguate complex boolean expressions
Field Names • AWK breaks record into fields based on value of FS • $1 is first field, $2 is second • $0 refers to entire input record
User Defined Variables • Users can define their own variables as required by assigning a value to them • Rules for variable names are like perl rules • Evaluated as string or number depending on context X = “string” { print X } • There is also support for arrays
Relational Expressions • Patterns can be written that use relational operators with variables and constants • Constants can be quoted strings or numbers or escape sequences • Like \n • Variables can be built in, user defined or field names
Operators • = += -= *= /= %= ^= **= • Assignment • && || ! • And, Or, Not • ~ !~ • Match, Don't match • < <= > >= != == • + - * / % ^ ** ++ --
BEGIN, END Patterns • BEGIN pattern precedes set up procedure BEGIN { total = 0 LastSource = “ “ } • END pattern introduces clean-up procedure END { print “Total = “ , total }
Relational Patterns • Patterns are most commonly a regular expression /Inext-DROP-DEFLT/ • Can be any expression $1 != LastSource { print $1 LastSource = $1 }
Commands • print exp-list • Writes record to output where each exp separated by OFS and followed by ORS • if (exp) statement [ else statement ] • Many more including loops, lots of functions, subroutines etc • Usually all we want to do with AWK is print, but “if” is useful to provide more control over output
An Example • Find the descriptions of user accounts cat /etc/passwd | grep -v '.*:.*:.*:.*::' | awk -F: '{print $1 " " $5}' | sort • grep finds line in etc/passwd where fifth field is not null • awk prints first and fifth fields using : as separator • Pipes to sort to list by user ID
sort sort options files • Files are concatenated and sorted • Output to standard output • If no files, or files = -, input from standard input • -t sep • Specifies field separator • -k sort keys • Format field.char,field.char … • Defaults to first field
Another Example • Exploring login shell names in password file • What login shells are mentioned? cat /etc/passwd | awk -F: '{print $7 }' | sort | uniq • How many are there? cat /etc/passwd | awk -F: '{print $7 }' | sort | uniq | wc -l • How many accounts for each? cat /etc/passwd | awk -F: '{print $7 }' | sort | uniq -c
uniq uniq switches • Looks for duplicate lines in sorted input • Can read/write to/from files but normally used in pipline • Default is to filter out duplicates • Can also only print duplicates • Can print number of ocurances
wc • Word Count wc switches files • If files are missing or – reads standard input • -c count bytes • -l count lines • -w count words • -L print length of longest line
References • http://www.gnu.org/software/sed/manual/index.html#dir • grep man page • http://www.gnu.org/software/gawk/manual/gawk.html • sort man page • uniq man page • wc man page