Sunday, August 17, 2014

Using Logstash to process nagios performance data and sending it to Graphite

Nagios is a very powerful tool that lets you monitor various parts of your infrastructure. It collects a lot of information which can be used to learn more about your infrastructure.

Logstash is a tool to process / pipe all types of events and logs. There are many output filters for this tool one of which is graphite.

I have found it difficult to find specific examples on doing just this by googling. So I thought I will put my results here so that it might help other to adopt these tools easily.

The Nagios Section

The nagios specific configurations that I have done to process performance data that nagios monitors is.

process_performance_data=1

host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata


host_perfdata_file=/var/log/nagios/host-perfdata
service_perfdata_file=/var/log/nagios/service-perfdata

host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$


These settings enable nagios to process the services and hosts performance data. It creates /var/log/nagios/service-perfdata.out file with data described in the template.

A sample Current Load service will look like

1408303233      localhost       Current Load    OK      1       HARD    0.004   0.103   OK - load average: 0.59, 0.51, 0.54     load1=0.590;5.000;10.000;0; load5=0.510;4.000;6.000;0; load15=0.540;3.000;4.000;0;

We can use logstash to pick up this data process them into various fields and send the appropriate fields to graphite.

The LogStash Section
An example logstash configuration will look like

input {
        file {
                type => "serviceperf"
                path => "/var/log/nagios/service-perfdata.out"
        }
}
filter {
        if [type] == "serviceperf" {
                grok {
                        match => [ "message" , "%{NUMBER:timestamp}\t%{HOST:server}\tCurrent Load\t%{WORD:state}\t%{GREEDYDATA} load average: %{NUMBER:load_avg_1m}, %{NUMBER:load_avg_5m}, %{NUMBER:load_avg_15m}"]
                        add_tag => ["cpu"]
                }
                date {
                        match => [ "timestamp", "UNIX" ]
                }
        }
}
output {
        if  "cpu" in [tags] {
                graphite {
                        host => "localhost"
                        port => 2003
                        metrics => [ "%{server}.load_avg_1m","%{load_avg_1m}",
                                "%{server}.load_avg_5m","%{load_avg_5m}",
                                "%{server}.load_avg_15m","%{load_avg_15m}"]
                        }
        }
}



The configuration has 3 sections the
1. The input section will process the data that is generated by nagios.
2. Filter section will match and convert the plain row of text to a json format with key value pairs and we can use these fields to capture the desired values.
3. Output section send the output through carbon cache to graphite with the names of server and the captured values.

I am not including any setup of individual components in this blog post as each of them could be a whole another post. If I get some comments about questions I might write about that.

A good tool I used to get to grok filter definitions is Grok debugger. Basically you put in the line you want to process and you can start building the filter that meets your needs. I in this example wanted to extract the load average fields from the nagios check and plot them through graphite which looks the the image below.

Sample graphite graph
I will try to write about how to setup each component in the future.

3 comments:

Anonymous said...

This is more easily tackled buy installing collectd and using that input in parallel with nagios for monitoring. Then you can create charts in Kibana

Anonymous said...

Good writing, but I think you are not parsing performance data. Parsing perfdata is better to have a common filter for different types of service checks.

Unknown said...

Thanks for your post. Could you please guide me how to send the output to Elasticsearch instead of Graphite?