March 3, 2024

Mastering AWK in Linux

Posted on March 3, 2024  •  6 minutes  • 1129 words

Awk is a powerful programming language and command-line utility in Unix and Unix-like operating systems, including Linux.

It is primarily used for pattern scanning and processing.

Awk operates on text files and can process data line by line.

Here’s a comprehensive overview of its usage and commands:

Basic Structure:

The basic structure of an awk command is as follows:

awk 'pattern { action }' filename
  1. Pattern: Specifies the condition for the action to be performed.
  2. Action: Specifies the command or set of commands to be executed when the pattern matches.
  3. Filename: Specifies the input file to be processed. If not provided, awk reads from the standard input.

Inbuilt Variables

  1. $0 : Represents the entire input line.
  2. $1, $2, … : Represent fields in the input line, separated by a delimiter (default is space/tab).
  3. NF : Number of fields in the current line.
  4. NR : Current record (line) number.
  5. FS : Field separator (default is whitespace).
  6. RS : Record separator (default is newline).

Multiple Commands:

You can separate multiple commands with semicolons or use a script files:

awk '{print $1; if ($2 > 50) print "High"}' filename
# or
awk -f script.awk filename

Keywords:

1. BEGIN:

Specifies actions to be executed before processing the input.

Example:

BEGIN {print "Processing begins"}

2. END:

Specifies actions to be executed after processing all input.

Example:

END {print "Processing complete"}

3. if, else:

Conditional statements for decision making.

Example:

{if ($1 > 50) print "High"; else print "Low"}

4. while:

Loops through a set of statements as long as a specified condition is true.

Example:

{i=1; while (i<=NF) {print $i; i++}}

5. for:

Provides a compact way to iterate over a range of values.

Example:

{for (i=1; i<=NF; i++) print $i}

Built-in Functions:

1. print:

Outputs text or variables.

Example:

{print $1, $3}

2. printf:

Formats and prints text.

Example:

{printf "Name: %-10s, Age: %2d\n", $1, $2}

3. length:

Returns the length of a string.

Example:

{if (length($1) > 5) print $1}

4. split:

Splits a string into an array based on a specified delimiter.

Example:

{split($1, name, "-"); print name[1]}

5. gsub, sub:

Global or single substitution of a string.

Example:

{gsub(/a/, "X", $1); print $1}

6. tolower, toupper:

Converts text to lowercase or uppercase.

Example:

{print tolower($1), toupper($2)}

7. getline:

Reads the next input record.

Example:

{getline nextLine; print $0, nextLine}

8. close:

Closes a file or a pipe opened by getline.

Example:

{getline nextLine; close("file.txt"); print $0, nextLine}

9. delete:

Deletes an element from an array.

Example:

{delete array[$1]}

Example Commands:

Let’s consider an example file named sample.txt with the following content:

Name,Age,Score
John,25,80
Alice,30,92
Bob,22,75
Eve,35,88

1. Print the entire file:

awk '{print}' sample.txt

Output:

Name,Age,Score
John,25,80
Alice,30,92
Bob,22,75
Eve,35,88

2. Print specific fields (Name and Score):

awk -F',' '{print $1, $3}' sample.txt

Output:

Name Score
John 80
Alice 92
Bob 75
Eve 88

3. Print lines with Age greater than 25:

awk -F',' '$2 > 25 {print}' sample.txt

Output:

Alice,30,92
Eve,35,88

4. Calculate and print the average Score:

awk -F',' '{sum+=$3} END {print "Average Score:", sum/NR}' sample.txt

Output:

Average Score: 83.75

5. Print lines with a specific pattern (containing “Alice”):

awk -F',' '/Alice/ {print}' sample.txt

Output:

Alice,30,92

6. Print lines longer than 15 characters:

awk 'length($0) > 15' sample.txt

Output:

Name,Age,Score

Arrays

Declaration and Initialization:

In awk, arrays are a fundamental data structure that allows you to store and manipulate data in a structured way.

Here are some key points about managing arrays in Awk:

  1. In Awk, arrays are associative, meaning you can use arbitrary indices (keys) for your array elements. You don’t need to declare the size of the array explicitly.
  2. Arrays are automatically created when you use them. There is no separate declaration needed.

Accessing Array Elements:

Indexing:

Array elements are accessed using their indices (keys).

The first index is 1 by default. If you don’t specify an index, it’s treated as 1.

Example

names[1] = "Alice"
names[2] = "Bob"
print names[1]   # Outputs: Alice

Iterating Through Arrays:

Using for Loop:

You can iterate through all the indices of an array using a for loop.

Example:

for (index in array) {
    print index, array[index]
}

Array Built-in Functions:

delete:

Deletes an element from an array.

Example:

delete names[1]

Array Example

Let’s consider a simple example where we use an array to count the occurrences of each word in a text file:

# File: word_count.awk

{
    for (i = 1; i <= NF; i++) {
        word = $i
        wordCount[word]++
    }
}

END {
    for (word in wordCount) {
        print word, ":", wordCount[word]
    }
}


AWK Complete Example Program:

Let’s consider a scenario where we have a file containing information about students' grades, and we want to calculate the average grade for each student.

The file, let’s call it grades.txt, may look like this:

Alice 90 59 88
Bob 85 88 63
Charlie 95 92 88
David 78 80 55
Eve 88 90 85

Now, let’s create an Awk program to calculate the average grade for each student:

This program does the following:

1. Initialization (BEGIN block):

1. Prints the header before processing any input.
# Begin only execute once
BEGIN {
    print "-------Average Marks------"
    print "Name\tAvgMark"
}

2. Processing (main block):

1. Extracts the name and grade from each line.
2. Accumulates grades and counts for each student.
# execution secion  for every lines
{
    name = $1
    markCount[name] = 0
    for (i=2; i <= NF; i++)
    {
        totalMarks[name] += $i
        markCount[name]++
    }
}

3. After Processing (END block):

1. Iterates through the students.
2. Calculates the average grade for each student.
3. Prints the name and average grade in a formatted way.
# Execute after Execution section 
END{
    for(student in totalMarks)
    {
        average =  totalMarks[student] / markCount[student]
        printf "%s\t%.2f\n", student,average
    }
}

Full Program

calculate_average.awk file

# Begin only execute once
BEGIN 
{
    print "-------Average Marks------"
    print "Name\tAvgMark"
}

# execution secion  for every lines
{
    name = $1
    markCount[name] = 0
    for (i=2; i <= NF; i++)
    {
        totalMarks[name] += $i
        markCount[name]++
    }
    # print totalMarks[name]
    # print markCount[name]
}

# Execute after Execution section 
END
{
    for(student in totalMarks)
    {
        average =  totalMarks[student] / markCount[student]
        printf "%s\t%.2f\n", student,average
    }
}

You can then run this Awk program on the grades.txt file using the following command:

awk -f calculate_average.awk grades.txt

Expected Output:

-------Average Marks------
Name	AvgMark
Alice	79.00
Charlie	91.67
Eve	87.67
Bob	78.67
David	71.00

This example demonstrates the use of Awk to perform calculations and generate meaningful output based on the input data.

You can adapt and modify Awk programs to suit various data processing tasks.

Follow me

I work on everything coding and share developer memes