Nariman Mani, P.Eng., PhD Computer and Software Engineering

Introduction to Logstash

At its core, Logstash is an open-source, server-side data processing pipeline. Its role within the ELK Stack is pivotal, acting as the conduit through which data is ingested, filtered, and enhanced before being stored in Elasticsearch. But why is Logstash so critical for data analysis and monitoring?

The value of Logstash lies in its ability to handle diverse data sources and formats, making it an indispensable tool for modern log management. It's not just about collecting logs; it's about making sense of them. Logstash's powerful filtering and enrichment capabilities allow you to transform raw data into structured, queryable information that drives insights and operational intelligence.

Why Logstash Stands Out

Flexibility and Compatibility: Logstash can process data from a myriad of sources, including log files, metrics, web applications, data stores, and cloud services. This flexibility ensures that Logstash can fit into nearly any data processing workflow.
Robust Processing Features: With a rich set of input, filter, and output plugins, Logstash allows for detailed customization of the data processing pipeline. You can enrich your data with additional fields, remove unnecessary information, and even transform data formats on the fly.
Scalability and Resilience: Designed to handle peak loads and recover from temporary failures, Logstash ensures that your data processing is both scalable and reliable. Features like persistent queues and dead letter queues help manage data flow and integrity.

Getting Started with Logstash

Installation

Prerequisite: Ensure Java 8 or Java 11 is installed on your system.
Download Logstash: Visit the Elasticsearch official download page and choose the appropriate version for your operating system.
Install: Follow the provided instructions for your OS.

Configuration Basics

A Logstash configuration file has three parts: input, filter, and output.

input {
  # Your input plugin configuration
}

filter {
  # Your filter plugin configuration
}

output {
  # Your output plugin configuration
}

Your First Pipeline

Let's create a simple pipeline to process system logs.

Input (log file):

input {
  file {
    path => "/var/log/system.log"
    start_position => "beginning"
  }
}

Filter (parse date and message):

filter {
  grok {
    match => { "message" => "%{TIMESTAMP_ISO8601:log_timestamp} %{GREEDYDATA:message}" }
  }
  date {
    match => [ "log_timestamp", "ISO8601" ]
  }
}

Output (to Elasticsearch):

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "system-logs-%{+YYYY.MM.dd}"
  }
}

A little bit more practical example : Generating and Sending Logs to Logstash

Understanding how logs are generated and sent to Logstash is pivotal for setting up a robust log management solution. This section will cover a basic example of log generation and how these logs can be configured to be sent to Logstash for further processing.

Generating Logs

Logs can be generated by various sources: web servers, applications, databases, and operating systems, to name a few. For this example, let's consider a web server running Apache. Apache generates access and error logs that can provide valuable insights into your web server's operations.

Apache logs are typically stored in /var/log/apache2/access.log and /var/log/apache2/error.log on a Linux system. These files are continuously updated as the web server processes requests.

Configuring Logstash to Collect Apache Logs

To send these logs to Logstash, you'll first need to configure Logstash to ingest these log files. This is done through the input section of the Logstash configuration file.

Logstash Configuration:

Here’s how you can modify the input section of your logstash-simple.conf to collect Apache access logs:

input {
  file {
    path => "/var/log/apache2/access.log"
    start_position => "beginning"
    ignore_older => 0
  }
}

This configuration tells Logstash to read logs from the specified path, starting from the beginning of the file. The ignore_older option is set to 0 to ensure Logstash doesn't skip any old logs upon first run.

Filtering and Parsing Logs:

To make the most out of your logs in Logstash, you can use filters to parse and transform the log data. For Apache logs, the grok filter is commonly used to parse the log entries into structured fields.

Add this filter section to your logstash-simple.conf:

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
}

This grok pattern, %{COMBINEDAPACHELOG}, is designed to parse the typical format of Apache access logs, breaking down each log entry into fields like client IP, request path, HTTP response code, and more.

Sending Logs to Elasticsearch and Datadog:

You can now extend the output section of your Logstash configuration to send processed logs to both Elasticsearch (expianed previously) and/or Datadog (explaiend below).

Testing Your Configuration

After setting up your Logstash configuration to collect, parse, and send Apache logs, start Logstash with your configuration file:

bin/logstash -f logstash-simple.conf

Monitor the Logstash logs for any errors and ensure that your Apache logs are being processed and sent to your specified outputs.

Extending Your Logstash Pipeline to Datadog

After setting up a basic Logstash pipeline for processing and sending logs to Elasticsearch, you might also want to explore how to integrate Logstash with other analytics and monitoring tools. Datadog is a powerful service for monitoring applications and services, and it offers an HTTP API for log management. Here, we'll guide you through sending your logs from Logstash to Datadog using the HTTP API.

Prerequisites

A Datadog account. If you don't have one, you can sign up for a free trial.
An API key from Datadog. You can find this in the Datadog UI under Integrations > APIs.

Configuring Logstash to Send Logs to Datadog

Output Plugin Configuration:

To send logs to Datadog, you'll use the http output plugin of Logstash. This plugin allows Logstash to make HTTP requests to a specified URL, which in this case, will be Datadog's log intake API.

Add the following output configuration to your Logstash pipeline (logstash-simple.conf), replacing YOUR_DATADOG_API_KEY with your actual Datadog API key:

output {
  http {
    format => "json"
    http_method => "post"
    url => "https://http-intake.logs.datadoghq.com/v1/input/YOUR_DATADOG_API_KEY?ddsource=logstash&service=my_application"
    headers => {
      "Content-Type" => "application/json"
    }
    retry_failed => true
    message_format => '%{message}'
  }
}

This configuration does the following:

format: Specifies the encoding of the payload. json is required by Datadog.
http_method: The HTTP method, post, for sending the data.
url: The Datadog HTTP intake URL with your API key and optional parameters (ddsource and service) for better organizing your logs within Datadog.
headers: Sets the Content-Type header to application/json, as required by Datadog.
retry_failed: Enables automatic retries if the request fails.
message_format: Defines the format of the log message sent to Datadog. Customize this based on your specific needs.

Running Your Extended Pipeline:

With the Datadog output configuration in place, run your Logstash pipeline as before:

bin/logstash -f logstash-simple.conf

Logstash will now process your logs and send them to both Elasticsearch and Datadog, leveraging the power of both platforms for monitoring and analysis.