On production servers, logs live everywhere: app, nginx, systemd, database, kernel. When something breaks, SSHing into 20 servers and running grep is not sustainable. The ELK Stack — Elasticsearch (search), Logstash (transform), Kibana (UI) — collects every log centrally, indexes it, and makes it searchable in seconds.

Components

  • Filebeat: lightweight log shipper running on every server
  • Logstash: parse, filter, enrich (optional — Filebeat can ship straight to Elastic)
  • Elasticsearch: index, search, storage
  • Kibana: UI — search, dashboards, alerts

Docker Compose Installation

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=true
      - ELASTIC_PASSWORD=changeme
      - ES_JAVA_OPTS=-Xms1g -Xmx1g
    volumes:
      - es-data:/usr/share/elasticsearch/data
    ports: ['9200:9200']

  kibana:
    image: docker.elastic.co/kibana/kibana:8.11.0
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
      - ELASTICSEARCH_USERNAME=kibana_system
      - ELASTICSEARCH_PASSWORD=changeme
    ports: ['5601:5601']
    depends_on: [elasticsearch]

volumes:
  es-data:

Log Shipping with Filebeat

# /etc/filebeat/filebeat.yml (on each application server)
filebeat.inputs:
  - type: filestream
    id: nginx-access
    paths:
      - /var/log/nginx/access.log
    parsers:
      - ndjson: {}
    fields:
      service: nginx
      type: access

  - type: filestream
    id: app-log
    paths:
      - /var/log/myapp/*.log
    fields:
      service: myapp

  - type: journald
    id: systemd
    include_matches.match:
      - _SYSTEMD_UNIT=keydal.service

processors:
  - add_host_metadata: {}
  - add_cloud_metadata: {}

output.elasticsearch:
  hosts: ['https://es.example.com:9200']
  username: elastic
  password: changeme
  index: "logs-%{[fields.service]}-%{+yyyy.MM.dd}"
systemctl enable --now filebeat
filebeat test config
filebeat test output

Transforming Logs with Logstash

Write a Logstash pipeline to parse and enrich raw logs.

# /etc/logstash/conf.d/nginx.conf
input {
  beats {
    port => 5044
  }
}

filter {
  if [service] == "nginx" {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    }
    geoip {
      source => "clientip"
    }
    useragent {
      source => "agent"
      target => "ua"
    }
    mutate {
      convert => { "response" => "integer" "bytes" => "integer" }
    }
  }
}

output {
  elasticsearch {
    hosts => ["http://elasticsearch:9200"]
    index => "nginx-%{+YYYY.MM.dd}"
  }
}

Searching in Kibana

# KQL (Kibana Query Language)
service: "nginx" AND response >= 500
service: "nginx" AND response: (500 or 502 or 503 or 504)
request: "*wp-admin*"
clientip: "1.2.3.4"
NOT status: (200 or 304)
response_time > 1000

# Lucene (more powerful)
message:error OR level:ERROR

Index Lifecycle Management (ILM)

After 30 days logs take up a lot of space. ILM policies automate rollover, downgrade and delete.

PUT _ilm/policy/logs-policy
{
  "policy": {
    "phases": {
      "hot":   { "actions": { "rollover": { "max_size": "50gb", "max_age": "7d" } } },
      "warm":  { "min_age": "7d",  "actions": { "shrink": { "number_of_shards": 1 }, "forcemerge": { "max_num_segments": 1 } } },
      "cold":  { "min_age": "30d", "actions": { "allocate": { "number_of_replicas": 0 } } },
      "delete":{ "min_age": "90d", "actions": { "delete": {} } }
    }
  }
}

Alerting

In Kibana, go to Stack Management → Alerts to define rules like "notify Slack when the 5xx rate in the last 5 minutes exceeds 5%". Threshold, anomaly detection and index patterns are all supported.

Performance Notes

  • Give Elasticsearch plenty of RAM — JVM heap half of total RAM, max 31 GB
  • SSD is mandatory
  • Shard size: 20-40 GB is the sweet spot
  • On a large cluster, separate master/data/ingest nodes
  • Use index templates to tune shards/replicas

Alternatives

  • Grafana Loki: much lighter than ELK, label-based (no full index). Integrates with Grafana
  • OpenSearch: the AWS fork of Elasticsearch — useful if licensing is a concern
  • Datadog, New Relic, Splunk: managed SaaS, no setup but expensive

Conclusion

ELK is the reference solution for centralized logging. It feels like overkill at first, but once you have 10+ servers, making log search 100x faster fundamentally changes production triage. If resources are tight, look at Loki, but ELK's feature set is unmatched.

Centralized logging infrastructure

Reach out to KEYDAL for ELK Stack or Loki setup, log parsing and alerting. Contact us

WhatsApp