To Splunk or Not To Splunk

Hmm.  This is a decision I have been trying to decide on for about 2 weeks now.  I have the trial software installed on a Centos server and man was it easy!  Just had to install about 3 package dependencies and the Splunk RPM.  Everything was working fine from there.  Only downside is you have to get an “Enterprise Trial License” through their online system before getting user management and web-portal password protection.  The license lasts for 30 days.  Turns out everything runs great on this unit.  Especially in the highly chattering snmp / syslogs of the network I manage.  Only problem I have with Splunk is with the SNMP indexing.  Their site shows to create and use a FIFO file in the /var file system.  That’s fine, but under comments the Splunk documentation says that it has been deprecated and there’s a new method to log SNMP.  Ok, where?  There are no links I can find on their sites within 10 minutes, or links from that comment section!  I currently have about 60 servers remotely syslogging to splunk, as well as about 10 Cisco switches.  That includes Cisco MDS fiber switches.  The example searches and the Splunk “Applications” have so far not been as hand off as I wanted.  This has left me writing a lot of my own saved searches.  Saved searches can be saved and ran on a schedule via Cron or Splunks built in scheduler.  I have ones created for Linux port downs, Cisco interface drops, etc.  Alerting has worked great!  Sending them via email and snmp puts.  No complaints at all there!  WIll add some more details tomorrow when they are in front of me.

http://splunk.com

Below are a few rules or “searches” that I have written more tailored to my collocations networks. Of course xxxx is replaced with the actual log name, and these are starting on a schedule.  Using the startminutesago function as not to begin parsing the logs for the beginning of time and getting data I do not want.  “maxspan” tells it to only go from 5 minutes in the past to now basically.

Searches:

Cisco MDS Fiber Switch SANOS
source=”/var/log/xxxx” IF_DOWN_LINK_FAILURE startminutesago=5 | transaction maxspan=5m

Cisco Ethernet Switch
source=”/var/log/xxxx” punct=”__::_…_:___::._:_%–:__//,____” startminutesago=5 | transaction maxspan=5m

Linux Ethernet Down
source=”/var/log/xxxx” link down startminutesago=5 | transaction maxspan=5m

Cisco Failed Login
source=”/var/log/xxxx” Authentication failure for illegal user startminutesago=5 | transaction maxspan=5m

Cisco Config Save
source=”/var/log/xxxx” CFGWRITE startminutesago=5 | transaction maxspan=5m

Cisco Config Mode
source=”/var/log/xxxx” Configuring console from pts startminutesago=5 | transaction maxspan=5m

Here is an example email of what was sent to me via Splunks built in alerting system:

Name: ‘Cisco Config Mode’

Query Terms: ‘now=1224079500 source=”/var/log/messages” Configuring console from pts startminutesago=5 | transaction maxspan=5m’

Alert was triggered because of: ‘Saved Search [Cisco Config Mode]: number of events(1) greater than 0′

Attached to the email is a CSV file that contained all the fields needed for auditing, as well as what was alerted on:

Oct 15 10:00:32 192.x.x.x : 2008 Oct 15 06:49:00 UTC: %VSHD-5-VSHD_SYSLOG_CONFIG_I: Configuring console from pts/1 192.x.x.x

Will provide schreen shots of Splunk in action at a later date.

~ by Kevin Goodman on October 21, 2008.

Leave a Reply