Linux Awk, Sort, and Uniq Log Searching

I have used awk to aide in log parsing for a few years now.  A few years back, I managed HIGH traffic DNS and mail spooling servers at a telecom.  This made it very difficult for a home grown network IDS box to keep track of everything, due to high throughput.  Occasionally, something would get through that would cause latency on the cluster.  Normally this was someones DNS forwarding server that hung and blasted 5k queries a second to the DNS pool.  So, this meant command line troubleshooting.  The following is just an example of using the command strings to check an Apache log.

Move to the log directory
# cd /IBM/HTTPServer/logs/

This server does a decent amount of traffic.  The following pulls out all the hits from Nov 11 (about 4 hours worth from system clock) and dumps them to novaccess file.
# cat access_log | grep “11/Nov/2008″ > novacces

Now we look into novaccess.  The following views the file using cat, passes the data over to awk. Awk prints off only the first data of the file using “print $1″, which is the connecting IP address.  From there, its handed over to uniq to count the number of occurrences for each IP and prefix the resulting number to each IP address.  That information is handed over to sort using -g to compare according to general numerical value, and “r” to reverse the order.  Reversing is used to display the IPs with the highest count first, as not to scroll 100 pages.  Besides, who cares about the one hitters?

# cat novacces| awk ‘{print $1}’ | uniq -c | sort -gr | more
170 217.x.x.90
142 172.x.x.124
120 172.x.x.124
92 172.x.x.124
89 172.x.x.124
88 172.x.x.124
87 172.x.x.124
85 217.x.x.90
85 172.x.x.124
82 172.x.x.124
81 51.x.x.186

I clipped off after this since I am only concerned with the higher numbers.  This query actually resulted in about 700  uniq lines.

Here is an example of what the file actually contained
# more novacces
127.x.x.92 – - [11/Nov/2008:00:00:03 +0000] “POST /servlet/heartbeat HTTP/1.1″ 200 76
255.x.x.109 – - [11/Nov/2008:00:00:05 +0000] “POST /servlet/heartbeat HTTP/1.1″ 200 76
255.x.x.189 – - [11/Nov/2008:00:00:15 +0000] “POST /servlet/heartbeat HTTP/1.1″ 200 76

Note: In awk, print $ can be incremented up to see other fields from the file.  Also, remember all IPs are always changed/modified in my posts.  So this means I do know that 255.x.x.x is not a valid IP address.  Awk,uniq,more, and sort are all standard utilities provided by most *nix installations.

Advertisement

~ by Kevin Goodman on November 11, 2008.

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

You are commenting using your WordPress.com account. Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 1,031 other followers