If you have a file of records, and you want to find out which record(s) meets the criteria like field1=xyz, field2=abc… How would you approach it? Simple! Load the file to database, write a sql with where clause and have the database taken care of it for you. Is it the simplest way? May not! Awk can achieve the same job without using db.
How Awk works?
awk program [files]
awk -f pfile [files]
Awk runs through a text file by reading and processing one record at time. Its commands are written with the intention that they act repetitively on each record as it is read in to awk. A record that has been read by awk is broken into separate fields ($1, $2, …etc), and actions can be performed on the separate fields as well as on the whole record.
When you type awk as a command, you must also provide two additional pieces of information or arguments. The first is the program or script to be executed, and the second is some method of identifying the file on which to perform the actions. Awk can be used as a pipe, and the file does not need to be explicitly named on the command line like: ls -l | awk ‘{print}’
By default, awk breaks down the field from the record via space and assigns each field value to variable $1, $2… So, you can do something like: ls -l | awk ‘{print $1 $3}’ to filter out some of the information before outputting it. So far so good! If you really run the command above, you may notice all the fileds output are concatenated into a single string. To separate them out, you could put ” ” to the statement to separate out $1 and $3. Or you tell awk what it should use as delimiter.
Executing more than one set of commands
So far, we are telling to do one thing per record. What if we want it to do more than one set of commands on a record? Use (;) to separate the commands: ls -l | awk ‘{ttl+=$5; print $9 }’. As you can see, awk takes variable (ie. ttl) as well. If you want to add pre-processing and post-processing commands, you can do this:
ls -l|awk ‘
BEGIN{print “Custom Directory Listing”}
{ttl+=$5;
print $9 ” ” $5 ” “$3}
END{print “Total ” ttl ” bytes”}’
There are some built-in variable that you may find useful:
- FILENAME
- FS - input field separator (you can set it FS = : via awk -F: ‘{…}’)
- OFS - output field separator
- NF - number of fields in the current record
- NR - number of current record (line #)
- RS - record separator
- $0 - entire input record
- $n - nth field in the current record. And field are separated by FS.
If tests and conditions
You can test the field via the following operators ( ==, >, <, >=, <=, !=) and conditions can combined via (&&, ||, !) . Loop that you can use:
- while (condition) command
- do command while (condition)
- for (set; test; increment) command (break & continue work as expected)
Pattern matching
Now you know the basic of awk. To discuss the power of awk, it is hard not mentioning its pattern matching feature. For example, if I want to search on a list of employee records that has ‘CA’, I can write my awk command like followings: awk ‘/AL/ {print $3,$2}’ emp_names
Pattern can be:
- /AL|IN/ (AL or IN)
- $2 ~ /A|B|C/ (with letter A, B or C in the 2nd field)
- $1 !~ /pattern/
- $1 != prev {print; prev = $1} (print all input lines in which the first field is different from the previous first field)
- /[0-9]+/
- /a-zA-Z]+/
- /^abc/ (match any string with abc at the beginning)
- /p$/ (match any string with p at end of the string)
- . (single character)
- * (any character)
- + (at least one)
- ? (1 or 0)
- {n}, {n,m}, {n,} (specify the occurrence range)






































(4.75 out of 5)
No Comment Received
Sorry the comment area are closed for non registered users