Data Processing at the Edge with Linux awk – InApps Technology is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn Data Processing at the Edge with Linux awk – InApps Technology in today’s post !

Read more about Data Processing at the Edge with Linux awk – InApps Technology at Wikipedia

You can find content about Data Processing at the Edge with Linux awk – InApps Technology from the Wikipedia website

Last June, data scientist and visualization expert Nick Strayer learned a valuable lesson in large scale data processing: Sometimes even the latest “Big Data”-oriented software doesn’t as well as what we already have in the Unix toolbox. Looking to parse 25TB of genetic data, he tried using tools such as Parquet and Spark, but in the end, he found the best solution was a combination of the R statistical programming language and the humble awk.

Sometimes we need to take large data sets and hack them into something easily analyzed by a human. Other times, that same data might need to be converted from one format to another, such as when you move from using one application to a new one.

awk is an awesome Linux command-line program for performing those types of tasks. It typically takes plain text as input and produces specifically formatted output. Of course, you could do that with common programming languages like Python or C. If you like to develop lots of custom code, that’s one way to do it. Linux has awk, a built-in utility that’s programmable, so why not use it?

Read More:   Microservices Made Easier with Cloud Foundry’s Lattice and Diego – InApps 2022

Today we’ll begin exploring awk with a few basic examples. Future articles will cover more advanced topics.

Start Simple

Say we want to print a big listing of files on our system and just show the file names. As always, Linux offers multiple solutions for this task. Use the standard “ls” command with the “-1” option (that’s the number one).

rob% ls -1

ls -1 (one) listing results

Simple, right?

The same could be done with awk, although we’ll need to do a little extra work. Again, get a list of the files, this time using the “ls” command with the “-l” option (the letter l) and redirect the output to a file.

rob% ls -l > rob2.txt

ls -l listing

I used the “head” command with the “-n” option for this screenshot, to display the first dozen lines of the rob2.txt file. The first line of the listing shows the number of 1k blocks used by the files in the directory. For our purposes, we can just delete it to clean up the file. Keep in mind that most real-world data conversions or transformations usually need a small bit of manual intervention, to get everything automated. It’s just the nature of slicing and dicing data. I removed the line using the vi editor and re-saved the file for later use.

Notice the printout is much more complicated.

Start by running the file through awk, using its standard print syntax. This outputs each line in the rob2.txt file, much like the normal Linux “cat” command.

rob% awk '{print}' rob2.txt

Simple awk print result

We only want the file names, so should use a field with the print statement. The file names are in field nine and the field separating character is a space, which is the default. Fields are indicated with the “$” sign.

Read More:   Using Python to Track and Make a Video Highlight of a Soccer Match…or a World Cup Highlight Reel? – InApps 2022

rob% awk '{print $9}' rob2.txt

awk using only field nine result

That’s better. It looks just like the “ls -1” command output.

A little more complex example is to print the file names, followed by their creation dates.

rob% awk '{print $9,$6,$7,$8}' rob2.txt

awk using 4 fields result

Notice that we used other fields and they can be placed anywhere you like. We could easily have put the date in front of the file name, if we wanted. Field six is the month. Field seven is the day. And, field eight is the year.

Sometimes files use commas or other characters for their field separator. Spreadsheets generate a comma separator when you export a .csv file (Comma Separated Values) from MS Excel or LibreOffice Calc. Use the “-F” option to specify the desired field separator in awk. Here’s an example of the command line you’d use for a comma.

rob% awk -F',' '{print $9,$6,$7,$8}' filename

You can also insert text into the printout. Adding a “Date =” label might be useful.

rob% awk '{print $9,"Date =",$6,$7,$8}' rob2.txt

awk using four fields and adding a date label

You Can Search, Too

awk has built-in search capabilities. Suppose we want to print out only the lines that contain “2015”. We could use the following.

rob% awk '/2015/ {print $9,"Date =",$6,$7,$8}' rob2.txt

awk searching for 2015 results

I verified the output with a quick grep for “2015” in the file.

rob% grep 2015 rob2.txt

grep for 2015 in rob2.txt results

Another way to search is by comparing a field to a value. We can compare field eight (the year) to “2015”.

rob% awk '{if ($8==2015) print $9,"Date =",$6,$7,$8}' rob2.txt

awk compare of field 8 to 2015 results

Maybe you’d want to search for years greater than “2015.” Use a comparison there too.

rob% awk '{if ($8>2015) print $9,"Date =",$6,$7,$8}' rob2.txt

awk is field 8 greater-than 2015 results

One More Thing

I mentioned at the beginning of the article, that awk was great for data conversion or translation.

Read More:   Microsoft Solidifies CNTK Deep Learning Toolkit for Industrial-Grade AI – InApps Technology 2022

Suppose we want to change the year from 2015 to 2016, when it occurs in field 8 (the year). It is as easy as replacing the “$8” field, in the print part, with “2016”.

rob% awk '{if ($8==2015) print $9,"Date =",$6,$7,"2016"}' rob2.txt

awk replacing 2015 with 2016 results

Although this is a trivial example, in principle it could be used in quite a few practical situations.

Going Further

awk has a lot of options and it can handle seriously large files. I usually use quick one-liners and output results, on-the-fly, to my terminal or save it to a new file using a standard Linux redirection (the > character). awk has scripting capabilities and that can get quite complex. We can investigate those details in a future story.

Data conversions and translations can be tedious. awk, while practically magical does have a learning curve. With a little bit of practice, awk is certainly better than going through a data file manually.

Don’t forget that awk is available everywhere. You will find it on Linux servers, desktops, notebooks, the Raspberry Pi boards and a variety of nano-Linux machines. Maybe use awk for standalone high-powered data processing at the edge.

TNS Managing Editor Joab Jackson contributed to this post. 

Contact Rob “drtorq” Reilly for consultation, speaking engagements and commissioned projects at [email protected] or 407-718-3274.

List of Keywords users find our article on Google:

awk print
hire awk developers
awk 2022
linux at
awk replace
libreoffice label template
libreoffice calc text to number
best linux printer 2020
simple human liners
awk group
libreoffice export csv
institutional can liners
awk wikipedia
linux edge
ls stories
libre calc if statement
libreoffice calc wikipedia
libreoffice calc range
at linux
the edge by common
libreoffice calc date format
awk programming language
shopify app cli
simple human r liners
“redirection consultant”
office 2016 txt
qc-calc
best elearning software 2015
awk trim whitespace

Source: InApps.net

Rate this post
As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Get a custom Proposal

Please fill in your information and your need to get a suitable solution.

    You need to enter your email to download

      [cf7sr-simple-recaptcha]

      Success. Downloading...