Oct 29, 2013

Log Collection and Analysis with logstash (and Redis, ElasticSearch & Kibana)

In this post I want to show you how to setup a decent and complete infrastructure for centralized log management based on logstash - demonstrated on Apache Tomcat logs on Windows. This post is adapted from a Strata Conference 2013 tutorial by Israel Ekpo and the official logstash getting started guide.

Logstash is a lightweight and simple-to-use toolchain to collect, transform and analyze log data. To do so logstash requires a buffer to collect log events  (Redis) and a fulltext search engine as the final destination for the collected events (ElasticSearch). Assembled together this looks like this:

logstash architecture & data flow

A shipper detects new log events on his node and dispatches them to a central broker. The indexer polls the broker for new events and pushes them to the storage&search facility. The web interface is then able to query the storage & search facility on many ways. So - how to setup and wire all this stuff? These are the four necessary steps:

1) Basic setup

Sample application to produce logs
If you have no own Tomcat webapp to produce sample logs you can use this one: https://github.com/israelekpo/graph-service. Just checkout the code, re-configure the path to store the data, build it with maven and deploy the WAR to a Tomcat installation (the path to the installation is referred to as <catalina-home>). Then startup Tomcat and perform some requests. Further details are provided on the mentioned GitHub page. If you don't want to use cURL then you can also help yourself with Chrome Advanced REST Client.

Download the jar (logstash-<version>-flatjar.jar) from here and place it in any folder (referred to as <logstash-home>).

Download the archive from here and uncompress it into a folder (referred to as <elasticsearch-home>).

Download the archive from here and uncompress it into a folder (referred to as <redis-home>).

It's that easy.

2) The shipper
Now we've to to setup the agent which collects log data at a node and puts it into redis. All you need to do so is to write a configuration file (e.g. <somewhere>/logstash-agent.conf, referred to as <agent-config-filepath>) with the following content:

input {
 file {
  type => "CATALINA"
  path => "<catalina-home>/logs/catalina.log"
 file {
  type => "ACCESS"
  path => "<catalina-home>/logs/localhost_access_log*.txt"

output {
  stdout { codec => json }
  redis { host => "" data_type => "list" key => "logstash" }

This configuration file makes logstash to collect log entries from the Tomcat engine log and all Tomcat access logs (matched by a glob). The default for logstash is to perform a tail on each log file to process only new entries. If you want to initally read the whole file (and then perform a tail) you've to add start_position => "beginning" into the file block.

How to test it:

  1. Startup Redis: <redis-home>\redis-server --loglevel verbose
  2. Startup a logstash agent: java -jar <logstash-home>\logstash-<version>-flatjar.jar agent -f "<agent-config-filepath>"
  3. Make the app to produce logs.

On the Redis console you then should see the logstash agent connect:

To be really sure you can open the Redis command line client (on Windows: <redis-home>\redis-cli.exe) and check if there is appropriate data inside of Redis:

3) The indexer
The next step is to setup the indexer which transfers the log data from Redis into ElasticSearch. Same as above the only thing you've to to is to write a configuration file  (e.g. <somewhere>/logstash-indexer.conf, referred to as <indexer-config-filepath>) with the following content:

input {
  redis {
    host => ""
    data_type => "list"
    key => "logstash"
    codec => json

output {
  stdout { debug => true debug_format => "json"}
  elasticsearch {
    host => ""

How to test it:

  1. Startup ElasticSearch: <elasticsearch-home>\bin\elasticsearch.bat
  2. Fire up a logstash agent with the right configuration: java -jar <logstash-home>\logstash-<version>-flatjar.jar agent -f "<indexer-config-filepath>"

Now the whole toolchain should perform. To test if the log data really reaches ElasticSearch you can use its REST-API or better: Let the Sense Chrome extension help you. Just install & open it and run the default query. You should see your logs as indexed documents:

Sense output

4) The web interface
And now it's getting really professional - we also start a web user interface to analyze and visualize the collected log entries. It's that easy:

java -jar <logstash-home>\logstash-<version>-flatjar.jar web

Just point your browser to and start searching logs. The web interface is called 'kibana' - you can learn more about kibana at http://kibana.org.

Kibana user interface

The most important links:

1 comment: