My photo

Nariman Mani, P.Eng., PhD Computer and Software Engineering
Home

    Logstash in Kubernetes

    April 13, 2024

    Using Logstash in a Kubernetes environment can significantly enhance your ability to process and analyze logs generated by your containers and applications. Kubernetes, with its dynamic and distributed nature, produces logs that are crucial for monitoring the health and performance of your applications. Integrating Logstash into your Kubernetes cluster allows you to efficiently collect, transform, and forward these logs to a centralized logging solution like Elasticsearch.

    Network Diagram for Logstash Deployment in Kubernetes

    Prerequisites

    • A running Kubernetes cluster.
    • kubectl installed and configured to communicate with your cluster.
    • Basic familiarity with Kubernetes concepts like pods, deployments, and ConfigMaps.

    Step 1: Create a Logstash Configuration

    First, you need to define your Logstash configuration. This involves specifying the input, filter, and output sections of your Logstash pipeline. For Kubernetes, a common approach is to collect logs using a file or container log input, process them as needed, and then send them to Elasticsearch.

    Save the following Logstash configuration as logstash-configmap.yaml. This example configuration collects logs from a file path (which will be mounted from Kubernetes logs) and outputs them to stdout for demonstration purposes. In a real-world scenario, you'd output to Elasticsearch or another log management solution. This configuration uses the grok filter to parse Nginx access logs and outputs to Elasticsearch.

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: logstash-config
    data:
      logstash.yml: |
        input {
          file {
            path => "/usr/share/logstash/logs/nginx-access.log"
            start_position => "beginning"
            sincedb_path => "/dev/null"
          }
        }
        filter {
          grok {
            match => { "message" => "%{COMBINEDAPACHELOG}" }
          }
        }
        output {
          elasticsearch {
            hosts => ["http://elasticsearch:9200"]
            index => "nginx-logs-%{+YYYY.MM.dd}"
          }
        }
    
    

    This configuration is basic and intended for demonstration. Adjust the input and output to suit your specific logging architecture and requirements. In the hosts field under output.elasticsearch, replace "http://elasticsearch:9200" with your actual Elasticsearch service URL.

    Step 2: Deploy Logstash in Kubernetes

    Next, you'll deploy Logstash in your Kubernetes cluster using a deployment configuration. You'll reference the ConfigMap created in the previous step to provide Logstash with its configuration.

    Create a file named logstash-deployment.yaml with the following content:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: logstash
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: logstash
      template:
        metadata:
          labels:
            app: logstash
        spec:
          containers:
          - name: logstash
            image: docker.elastic.co/logstash/logstash:7.9.3
            volumeMounts:
            - name: config-volume
              mountPath: /usr/share/logstash/config/logstash.yml
              subPath: logstash.yml
            - name: log-volume
              mountPath: /usr/share/logstash/logs
          volumes:
          - name: config-volume
            configMap:
              name: logstash-config
          - name: log-volume
            emptyDir: {}
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: log-copier
    spec:
      containers:
      - name: log-copier
        image: busybox
        command: ["/bin/sh"]
        args: ["-c", "while true; do cp /var/log/nginx/access.log /logs/nginx-access.log; sleep 10; done"]
        volumeMounts:
        - name: log-volume
          mountPath: /logs
      volumes:
      - name: log-volume
        emptyDir: {}
    
    

    This deployment creates a Logstash pod, mounts the configuration from the ConfigMap, and also mounts a volume (/var/log/your-application) where your application logs are stored. You need to adjust the hostPath to the location of your Kubernetes application logs. This deployment sets up Logstash and a helper pod, log-copier, which simulates log file updates by copying Nginx access logs to the shared volume. Replace /var/log/nginx/access.log with the actual path to your Nginx access logs.

    Step 3: Apply the Configuration

    Apply the ConfigMap and Deployment to your Kubernetes cluster:

    kubectl apply -f logstash-configmap.yaml
    kubectl apply -f logstash-deployment.yaml
    

    Step 4: Verify Deployment

    Check the status of your deployment:

    kubectl get pods -l app=logstash
    

    View logs from the Logstash pod to ensure it's processing logs correctly:

    kubectl logs -f <logstash-pod-name>
    

    Collecting Logs

    The path /usr/share/logstash/logs/nginx-access.log in the Logstash configuration is used to specify where Logstash expects to find the log files it should process. This specific path is part of the container's filesystem where Logstash runs, not the host machine's filesystem. Here's why this approach is taken, especially in a Kubernetes context:

    1. Isolation: Running Logstash in a containerized environment like Kubernetes means working within isolated filesystems. Logstash, running inside its container, has its own separate filesystem from the host and other containers. By specifying a path like /usr/share/logstash/logs/nginx-access.log, you're pointing Logstash to a location within its container's filesystem where it expects to find log files.

    2. Volume Mounts: Kubernetes allows you to mount volumes into containers. This mechanism is used to share data between containers and between the host and containers. In the example setup, a shared volume (like emptyDir or a more persistent option depending on your requirements) is mounted into both the Logstash container and another container or the host system that generates or holds the Nginx logs. This setup ensures that when Nginx logs are written to this shared volume (on a path accessible to both the Nginx container/host and the Logstash container), Logstash can access and process these logs from its designated path.

    3. Flexibility and Configuration: The path /usr/share/logstash/logs/nginx-access.log is an arbitrary choice made for demonstration. In practice, you can configure this path based on your specific deployment needs and how you set up your volumes in Kubernetes. The key is to ensure consistency between where your logs are written to and where Logstash expects to find them.

    4. Simplifying Log Management: By centralizing logs from various sources to a specific directory that Logstash monitors, you simplify log management. Logstash continuously watches this directory for new or updated log files to process, transforming and forwarding them to Elasticsearch or another destination as configured.

    In summary, the path /usr/share/logstash/logs/nginx-access.log is a convention used within the Logstash container's filesystem, aligning with how volumes are mounted and shared in Kubernetes, to facilitate efficient log processing in a containerized environment.

    When you have multiple pods within a Kubernetes cluster generating logs, managing and processing these logs efficiently becomes a key concern. Logstash, deployed within the cluster, can still aggregate and process logs from all these pods, but the setup becomes slightly more complex to ensure all logs are collected. Here’s an approach to handle this scenario:

    Using a Sidecar Container for Log Collection

    A common pattern for collecting logs from multiple pods is to use a sidecar container. This container runs alongside your application container within the same pod and is responsible for collecting logs from the application container and forwarding them to Logstash. The sidecar container can tail log files from a shared volume or capture stdout/stderr streams.

    Steps to Aggregate Logs from Multiple Pods

    1. Centralized Logging Volume: Configure each pod to write logs to a shared volume. This could be an emptyDir volume if temporary storage is sufficient, or a more persistent storage solution if needed.

    2. Sidecar Container: Deploy a sidecar container in each pod that has the sole purpose of forwarding logs. This container could use tools like fluentd, filebeat, or a simple custom script that tails log files and sends them to Logstash.

    3. Logstash Configuration: Configure Logstash to listen for incoming logs from these sidecar containers. Depending on how you set up the sidecar, Logstash might listen over a network protocol (e.g., HTTP or TCP) or process files from a shared volume if running as a DaemonSet within the cluster.

    4. DaemonSet Deployment for Logstash: Alternatively, Logstash can be deployed as a DaemonSet. This ensures that a Logstash instance is running on every node, allowing it to collect logs from sidecar containers across the cluster more efficiently. Each Logstash instance would then forward the processed logs to a centralized Elasticsearch cluster.

    Example Configuration for a Sidecar Approach

    Deployment YAML for an Application Pod with a Logging Sidecar

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-application
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: my-application
      template:
        metadata:
          labels:
            app: my-application
        spec:
          containers:
            - name: my-application
              image: my-application-image
              volumeMounts:
                - name: log-volume
                  mountPath: /var/log/my-application
            - name: log-forwarder
              image: log-forwarder-image
              env:
                - name: LOGSTASH_HOST
                  value: "logstash-service"
              volumeMounts:
                - name: log-volume
                  mountPath: /var/log/my-application
          volumes:
            - name: log-volume
              emptyDir: {}
    

    This deployment includes an application container and a log-forwarding sidecar container. Both containers mount the same volume where the application writes its logs. The sidecar container is responsible for forwarding these logs to Logstash.

    Logstash Configuration to Receive Logs

    Depending on the sidecar's mechanism (e.g., HTTP or TCP), Logstash’s input configuration needs to match this. For a TCP input from a sidecar container that forwards logs over TCP, the configuration might look like:

    input {
      tcp {
        port => 5000
      }
    }
    

    Ensure the Logstash service within Kubernetes is accessible to the sidecar containers, possibly using a Kubernetes Service of type ClusterIP.

    Log Forwarder

    The implementation of a log forwarder in a Kubernetes environment, particularly when used as a sidecar container, involves capturing logs from the application within the same pod and forwarding them to a centralized logging system like Logstash. A log forwarder typically focuses on efficient log collection, optional processing (like adding metadata), and reliable transmission of logs. Here's an overview of how you can implement a log forwarder, with examples using Filebeat and a custom script approach:

    Application Pod with Logging Sidecar Component Diagram

    Using Filebeat as a Log Forwarder

    Filebeat is a lightweight, open-source shipper for log file data. As part of the Elastic Stack, it's designed to forward logs to Elasticsearch or Logstash while providing backpressure-sensitive protocols to handle large volumes of data.

    1. Filebeat Configuration: Configure Filebeat to watch for log files in a specific directory (which your application writes to) and forward them to Logstash.

    2. Deployment: Deploy Filebeat as a sidecar container in your application pods.

    Example: Filebeat Sidecar Configuration

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-application-with-filebeat
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: my-application
      template:
        metadata:
          labels:
            app: my-application
        spec:
          containers:
          - name: my-application
            image: my-application-image
            volumeMounts:
            - name: log-volume
              mountPath: /var/log/my-app
          - name: filebeat-sidecar
            image: docker.elastic.co/beats/filebeat:7.9.3
            args: [
              "-c", "/etc/filebeat.yml",
              "-e",
            ]
            volumeMounts:
            - name: log-volume
              mountPath: /var/log/my-app
            - name: config-volume
              mountPath: /etc/filebeat.yml
              subPath: filebeat.yml
          volumes:
          - name: log-volume
            emptyDir: {}
          - name: config-volume
            configMap:
              name: filebeat-config
    

    This configuration assumes you have a ConfigMap named filebeat-config with your Filebeat configuration pointing to Logstash.

    Custom Script as a Log Forwarder

    For simpler use cases or when you have specific forwarding needs, a custom script can be written and deployed as a sidecar container. This script can tail log files and forward them to Logstash.

    1. Script Implementation: Implement a script in a language of your choice (e.g., Python, Bash) that tails a log file and sends each new line to Logstash.

    2. Deployment: Deploy this script as a sidecar container in your application pods, similar to the Filebeat example.

    Example: Custom Script Sidecar

    #!/bin/bash
    
    # Tail logs from a specific file and forward to Logstash
    tail -F /var/log/my-app/application.log | while read line
    do
      # Example: Forwarding to Logstash using netcat
      echo "$line" | nc logstash-service 5000
    done
    

    This simplistic script reads new lines from the application's log file and forwards them to Logstash using netcat. The actual implementation can be more complex, based on your needs (e.g., handling multiline logs, adding metadata).

    Conclusion

    Deploying Logstash in Kubernetes allows you to efficiently manage logs across your cluster. By tailoring the input, filter, and output configurations, you can adapt Logstash to meet the specific needs of your Kubernetes environment, ensuring that your logging infrastructure is as dynamic and scalable as your containerized applications.

2024 All rights reserved.