Mastering AWK in Linux
Posted on March 3, 2024 • 6 minutes • 1129 words
Awk
is a powerful programming language
and command-line
utility in Unix and Unix-like operating systems, including Linux.
It is primarily used for pattern scanning and processing.
Awk operates on text files
and can process data line by line
.
Here’s a comprehensive overview of its usage and commands:
Basic Structure:
The basic structure of an awk command is as follows:
awk 'pattern { action }' filename
Pattern
: Specifies the condition for the action to be performed.Action
: Specifies the command or set of commands to be executed when the pattern matches.Filename
: Specifies the input file to be processed. If not provided, awk reads from the standard input.
Inbuilt Variables
$0
: Represents the entire input line.$1
,$2
, … : Represent fields in the input line, separated by a delimiter (default is space/tab).NF
: Number of fields in the current line.NR
: Current record (line) number.FS
: Field separator (default is whitespace).RS
: Record separator (default is newline).
Multiple Commands:
You can separate multiple commands with semicolons
or use a script file
s:
awk '{print $1; if ($2 > 50) print "High"}' filename
# or
awk -f script.awk filename
Keywords:
1. BEGIN:
Specifies actions to be executed before processing the input.
Example:
BEGIN {print "Processing begins"}
2. END:
Specifies actions to be executed after processing all input.
Example:
END {print "Processing complete"}
3. if, else:
Conditional statements for decision making.
Example:
{if ($1 > 50) print "High"; else print "Low"}
4. while:
Loops through a set of statements as long as a specified condition is true.
Example:
{i=1; while (i<=NF) {print $i; i++}}
5. for:
Provides a compact way to iterate over a range of values.
Example:
{for (i=1; i<=NF; i++) print $i}
Built-in Functions:
1. print:
Outputs text or variables.
Example:
{print $1, $3}
2. printf:
Formats and prints text.
Example:
{printf "Name: %-10s, Age: %2d\n", $1, $2}
3. length:
Returns the length of a string.
Example:
{if (length($1) > 5) print $1}
4. split:
Splits a string into an array based on a specified delimiter.
Example:
{split($1, name, "-"); print name[1]}
5. gsub, sub:
Global or single substitution of a string.
Example:
{gsub(/a/, "X", $1); print $1}
6. tolower, toupper:
Converts text to lowercase or uppercase.
Example:
{print tolower($1), toupper($2)}
7. getline:
Reads the next input record.
Example:
{getline nextLine; print $0, nextLine}
8. close:
Closes a file or a pipe opened by getline.
Example:
{getline nextLine; close("file.txt"); print $0, nextLine}
9. delete:
Deletes an element from an array.
Example:
{delete array[$1]}
Example Commands:
Let’s consider an example file named sample.txt
with the following content:
Name,Age,Score
John,25,80
Alice,30,92
Bob,22,75
Eve,35,88
1. Print the entire file:
awk '{print}' sample.txt
Output:
Name,Age,Score
John,25,80
Alice,30,92
Bob,22,75
Eve,35,88
2. Print specific fields (Name and Score):
awk -F',' '{print $1, $3}' sample.txt
Output:
Name Score
John 80
Alice 92
Bob 75
Eve 88
3. Print lines with Age greater than 25:
awk -F',' '$2 > 25 {print}' sample.txt
Output:
Alice,30,92
Eve,35,88
4. Calculate and print the average Score:
awk -F',' '{sum+=$3} END {print "Average Score:", sum/NR}' sample.txt
Output:
Average Score: 83.75
5. Print lines with a specific pattern (containing “Alice”):
awk -F',' '/Alice/ {print}' sample.txt
Output:
Alice,30,92
6. Print lines longer than 15 characters:
awk 'length($0) > 15' sample.txt
Output:
Name,Age,Score
Arrays
Declaration and Initialization:
In awk, arrays are a fundamental data structure that allows you to store and manipulate data in a structured way.
Here are some key points about managing arrays in Awk:
- In Awk, arrays are
associative
, meaning you can use arbitrary indices (keys
) for your array elements. You don’t need to declare the size of the array explicitly. - Arrays are automatically created when you use them. There is no separate declaration needed.
Accessing Array Elements:
Indexing:
Array elements are accessed using their indices (keys
).
The first index is 1
by default. If you don’t specify an index, it’s treated as 1.
Example
names[1] = "Alice"
names[2] = "Bob"
print names[1] # Outputs: Alice
Iterating Through Arrays:
Using for Loop:
You can iterate through all the indices of an array using a for loop.
Example:
for (index in array) {
print index, array[index]
}
Array Built-in Functions:
delete:
Deletes an element from an array.
Example:
delete names[1]
Array Example
Let’s consider a simple example where we use an array to count the occurrences of each word in a text file:
# File: word_count.awk
{
for (i = 1; i <= NF; i++) {
word = $i
wordCount[word]++
}
}
END {
for (word in wordCount) {
print word, ":", wordCount[word]
}
}
AWK Complete Example Program:
Let’s consider a scenario where we have a file containing information about students' grades, and we want to calculate the average grade for each student.
The file, let’s call it grades.txt
, may look like this:
Alice 90 59 88
Bob 85 88 63
Charlie 95 92 88
David 78 80 55
Eve 88 90 85
Now, let’s create an Awk program to calculate the average grade for each student
:
This program does the following:
1. Initialization (BEGIN block):
1. Prints the header before processing any input.
# Begin only execute once
BEGIN {
print "-------Average Marks------"
print "Name\tAvgMark"
}
2. Processing (main block):
1. Extracts the name and grade from each line.
2. Accumulates grades and counts for each student.
# execution secion for every lines
{
name = $1
markCount[name] = 0
for (i=2; i <= NF; i++)
{
totalMarks[name] += $i
markCount[name]++
}
}
3. After Processing (END block):
1. Iterates through the students.
2. Calculates the average grade for each student.
3. Prints the name and average grade in a formatted way.
# Execute after Execution section
END{
for(student in totalMarks)
{
average = totalMarks[student] / markCount[student]
printf "%s\t%.2f\n", student,average
}
}
Full Program
calculate_average.awk
file
# Begin only execute once
BEGIN
{
print "-------Average Marks------"
print "Name\tAvgMark"
}
# execution secion for every lines
{
name = $1
markCount[name] = 0
for (i=2; i <= NF; i++)
{
totalMarks[name] += $i
markCount[name]++
}
# print totalMarks[name]
# print markCount[name]
}
# Execute after Execution section
END
{
for(student in totalMarks)
{
average = totalMarks[student] / markCount[student]
printf "%s\t%.2f\n", student,average
}
}
You can then run this Awk program on the grades.txt
file using the following command:
awk -f calculate_average.awk grades.txt
Expected Output:
-------Average Marks------
Name AvgMark
Alice 79.00
Charlie 91.67
Eve 87.67
Bob 78.67
David 71.00
This example demonstrates the use of Awk to perform calculations and generate meaningful output based on the input data.
You can adapt and modify Awk programs to suit various data processing tasks.