Nariman Mani, P.Eng., PhD Computer and Software Engineering

Network Diagram for Logstash Deployment in Kubernetes

Prerequisites

A running Kubernetes cluster.
kubectl installed and configured to communicate with your cluster.
Basic familiarity with Kubernetes concepts like pods, deployments, and ConfigMaps.

Step 1: Create a Logstash Configuration

First, you need to define your Logstash configuration. This involves specifying the input, filter, and output sections of your Logstash pipeline. For Kubernetes, a common approach is to collect logs using a file or container log input, process them as needed, and then send them to Elasticsearch.

Save the following Logstash configuration as logstash-configmap.yaml. This example configuration collects logs from a file path (which will be mounted from Kubernetes logs) and outputs them to stdout for demonstration purposes. In a real-world scenario, you'd output to Elasticsearch or another log management solution. This configuration uses the grok filter to parse Nginx access logs and outputs to Elasticsearch.

apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-config
data:
  logstash.yml: |
    input {
      file {
        path => "/usr/share/logstash/logs/nginx-access.log"
        start_position => "beginning"
        sincedb_path => "/dev/null"
      }
    }
    filter {
      grok {
        match => { "message" => "%{COMBINEDAPACHELOG}" }
      }
    }
    output {
      elasticsearch {
        hosts => ["http://elasticsearch:9200"]
        index => "nginx-logs-%{+YYYY.MM.dd}"
      }
    }

This configuration is basic and intended for demonstration. Adjust the input and output to suit your specific logging architecture and requirements. In the hosts field under output.elasticsearch, replace "http://elasticsearch:9200" with your actual Elasticsearch service URL.

Step 2: Deploy Logstash in Kubernetes

Next, you'll deploy Logstash in your Kubernetes cluster using a deployment configuration. You'll reference the ConfigMap created in the previous step to provide Logstash with its configuration.

Create a file named logstash-deployment.yaml with the following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: logstash
spec:
  replicas: 1
  selector:
    matchLabels:
      app: logstash
  template:
    metadata:
      labels:
        app: logstash
    spec:
      containers:
      - name: logstash
        image: docker.elastic.co/logstash/logstash:7.9.3
        volumeMounts:
        - name: config-volume
          mountPath: /usr/share/logstash/config/logstash.yml
          subPath: logstash.yml
        - name: log-volume
          mountPath: /usr/share/logstash/logs
      volumes:
      - name: config-volume
        configMap:
          name: logstash-config
      - name: log-volume
        emptyDir: {}
---
apiVersion: v1
kind: Pod
metadata:
  name: log-copier
spec:
  containers:
  - name: log-copier
    image: busybox
    command: ["/bin/sh"]
    args: ["-c", "while true; do cp /var/log/nginx/access.log /logs/nginx-access.log; sleep 10; done"]
    volumeMounts:
    - name: log-volume
      mountPath: /logs
  volumes:
  - name: log-volume
    emptyDir: {}

This deployment creates a Logstash pod, mounts the configuration from the ConfigMap, and also mounts a volume (/var/log/your-application) where your application logs are stored. You need to adjust the hostPath to the location of your Kubernetes application logs. This deployment sets up Logstash and a helper pod, log-copier, which simulates log file updates by copying Nginx access logs to the shared volume. Replace /var/log/nginx/access.log with the actual path to your Nginx access logs.

Step 3: Apply the Configuration

Apply the ConfigMap and Deployment to your Kubernetes cluster:

kubectl apply -f logstash-configmap.yaml
kubectl apply -f logstash-deployment.yaml

Step 4: Verify Deployment

Check the status of your deployment:

kubectl get pods -l app=logstash

View logs from the Logstash pod to ensure it's processing logs correctly:

kubectl logs -f <logstash-pod-name>

Collecting Logs

The path /usr/share/logstash/logs/nginx-access.log in the Logstash configuration is used to specify where Logstash expects to find the log files it should process. This specific path is part of the container's filesystem where Logstash runs, not the host machine's filesystem. Here's why this approach is taken, especially in a Kubernetes context:

Isolation: Running Logstash in a containerized environment like Kubernetes means working within isolated filesystems. Logstash, running inside its container, has its own separate filesystem from the host and other containers. By specifying a path like /usr/share/logstash/logs/nginx-access.log, you're pointing Logstash to a location within its container's filesystem where it expects to find log files.
Volume Mounts: Kubernetes allows you to mount volumes into containers. This mechanism is used to share data between containers and between the host and containers. In the example setup, a shared volume (like emptyDir or a more persistent option depending on your requirements) is mounted into both the Logstash container and another container or the host system that generates or holds the Nginx logs. This setup ensures that when Nginx logs are written to this shared volume (on a path accessible to both the Nginx container/host and the Logstash container), Logstash can access and process these logs from its designated path.
Flexibility and Configuration: The path /usr/share/logstash/logs/nginx-access.log is an arbitrary choice made for demonstration. In practice, you can configure this path based on your specific deployment needs and how you set up your volumes in Kubernetes. The key is to ensure consistency between where your logs are written to and where Logstash expects to find them.
Simplifying Log Management: By centralizing logs from various sources to a specific directory that Logstash monitors, you simplify log management. Logstash continuously watches this directory for new or updated log files to process, transforming and forwarding them to Elasticsearch or another destination as configured.

In summary, the path /usr/share/logstash/logs/nginx-access.log is a convention used within the Logstash container's filesystem, aligning with how volumes are mounted and shared in Kubernetes, to facilitate efficient log processing in a containerized environment.

When you have multiple pods within a Kubernetes cluster generating logs, managing and processing these logs efficiently becomes a key concern. Logstash, deployed within the cluster, can still aggregate and process logs from all these pods, but the setup becomes slightly more complex to ensure all logs are collected. Here’s an approach to handle this scenario:

Using a Sidecar Container for Log Collection

A common pattern for collecting logs from multiple pods is to use a sidecar container. This container runs alongside your application container within the same pod and is responsible for collecting logs from the application container and forwarding them to Logstash. The sidecar container can tail log files from a shared volume or capture stdout/stderr streams.

Steps to Aggregate Logs from Multiple Pods

Centralized Logging Volume: Configure each pod to write logs to a shared volume. This could be an emptyDir volume if temporary storage is sufficient, or a more persistent storage solution if needed.
Sidecar Container: Deploy a sidecar container in each pod that has the sole purpose of forwarding logs. This container could use tools like fluentd, filebeat, or a simple custom script that tails log files and sends them to Logstash.
Logstash Configuration: Configure Logstash to listen for incoming logs from these sidecar containers. Depending on how you set up the sidecar, Logstash might listen over a network protocol (e.g., HTTP or TCP) or process files from a shared volume if running as a DaemonSet within the cluster.
DaemonSet Deployment for Logstash: Alternatively, Logstash can be deployed as a DaemonSet. This ensures that a Logstash instance is running on every node, allowing it to collect logs from sidecar containers across the cluster more efficiently. Each Logstash instance would then forward the processed logs to a centralized Elasticsearch cluster.

Example Configuration for a Sidecar Approach

Deployment YAML for an Application Pod with a Logging Sidecar

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-application
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-application
  template:
    metadata:
      labels:
        app: my-application
    spec:
      containers:
        - name: my-application
          image: my-application-image
          volumeMounts:
            - name: log-volume
              mountPath: /var/log/my-application
        - name: log-forwarder
          image: log-forwarder-image
          env:
            - name: LOGSTASH_HOST
              value: "logstash-service"
          volumeMounts:
            - name: log-volume
              mountPath: /var/log/my-application
      volumes:
        - name: log-volume
          emptyDir: {}

This deployment includes an application container and a log-forwarding sidecar container. Both containers mount the same volume where the application writes its logs. The sidecar container is responsible for forwarding these logs to Logstash.

Logstash Configuration to Receive Logs

Depending on the sidecar's mechanism (e.g., HTTP or TCP), Logstash’s input configuration needs to match this. For a TCP input from a sidecar container that forwards logs over TCP, the configuration might look like:

input {
  tcp {
    port => 5000
  }
}

Ensure the Logstash service within Kubernetes is accessible to the sidecar containers, possibly using a Kubernetes Service of type ClusterIP.

Log Forwarder

The implementation of a log forwarder in a Kubernetes environment, particularly when used as a sidecar container, involves capturing logs from the application within the same pod and forwarding them to a centralized logging system like Logstash. A log forwarder typically focuses on efficient log collection, optional processing (like adding metadata), and reliable transmission of logs. Here's an overview of how you can implement a log forwarder, with examples using Filebeat and a custom script approach:

Application Pod with Logging Sidecar Component Diagram

Using Filebeat as a Log Forwarder

Filebeat is a lightweight, open-source shipper for log file data. As part of the Elastic Stack, it's designed to forward logs to Elasticsearch or Logstash while providing backpressure-sensitive protocols to handle large volumes of data.

Filebeat Configuration: Configure Filebeat to watch for log files in a specific directory (which your application writes to) and forward them to Logstash.
Deployment: Deploy Filebeat as a sidecar container in your application pods.

Example: Filebeat Sidecar Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-application-with-filebeat
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-application
  template:
    metadata:
      labels:
        app: my-application
    spec:
      containers:
      - name: my-application
        image: my-application-image
        volumeMounts:
        - name: log-volume
          mountPath: /var/log/my-app
      - name: filebeat-sidecar
        image: docker.elastic.co/beats/filebeat:7.9.3
        args: [
          "-c", "/etc/filebeat.yml",
          "-e",
        ]
        volumeMounts:
        - name: log-volume
          mountPath: /var/log/my-app
        - name: config-volume
          mountPath: /etc/filebeat.yml
          subPath: filebeat.yml
      volumes:
      - name: log-volume
        emptyDir: {}
      - name: config-volume
        configMap:
          name: filebeat-config

This configuration assumes you have a ConfigMap named filebeat-config with your Filebeat configuration pointing to Logstash.

Custom Script as a Log Forwarder

For simpler use cases or when you have specific forwarding needs, a custom script can be written and deployed as a sidecar container. This script can tail log files and forward them to Logstash.

Script Implementation: Implement a script in a language of your choice (e.g., Python, Bash) that tails a log file and sends each new line to Logstash.
Deployment: Deploy this script as a sidecar container in your application pods, similar to the Filebeat example.

Example: Custom Script Sidecar

#!/bin/bash

# Tail logs from a specific file and forward to Logstash
tail -F /var/log/my-app/application.log | while read line
do
  # Example: Forwarding to Logstash using netcat
  echo "$line" | nc logstash-service 5000
done

This simplistic script reads new lines from the application's log file and forwards them to Logstash using netcat. The actual implementation can be more complex, based on your needs (e.g., handling multiline logs, adding metadata).

Conclusion

Deploying Logstash in Kubernetes allows you to efficiently manage logs across your cluster. By tailoring the input, filter, and output configurations, you can adapt Logstash to meet the specific needs of your Kubernetes environment, ensuring that your logging infrastructure is as dynamic and scalable as your containerized applications.