How to Configure Fluentd
How to Configure Fluentd Fluentd is an open-source data collector designed to unify logging and monitoring across diverse systems. With its lightweight architecture, plugin-based extensibility, and support for over 800 data sources and destinations, Fluentd has become a cornerstone in modern cloud-native and hybrid infrastructure environments. Whether you're managing microservices on Kubernetes, s
How to Configure Fluentd
Fluentd is an open-source data collector designed to unify logging and monitoring across diverse systems. With its lightweight architecture, plugin-based extensibility, and support for over 800 data sources and destinations, Fluentd has become a cornerstone in modern cloud-native and hybrid infrastructure environments. Whether you're managing microservices on Kubernetes, scaling applications across hybrid clouds, or centralizing logs from legacy systems, Fluentd provides a reliable, scalable, and flexible solution for log aggregation and forwarding.
Configuring Fluentd correctly is critical to ensuring data integrity, minimizing latency, and maintaining system performance. A misconfigured Fluentd instance can lead to log loss, excessive resource consumption, or even service outages. This comprehensive guide walks you through every step of configuring Fluentdfrom initial installation to advanced tuningequipping you with the knowledge to deploy Fluentd confidently in production environments.
Step-by-Step Guide
Step 1: Understand Fluentds Architecture
Before configuring Fluentd, its essential to understand its core components. Fluentd operates on a simple yet powerful model: input ? filter ? output. Data flows through these stages:
- Input: Sources that collect data (e.g., files, syslog, HTTP, Docker containers).
- Filter: Optional transformations applied to log records (e.g., parsing JSON, masking sensitive fields, adding metadata).
- Output: Destinations where data is sent (e.g., Elasticsearch, S3, Kafka, CloudWatch).
Fluentd also supports buffering, which temporarily stores logs during network outages or destination unavailability. This feature ensures no data is lost during transient failures.
Fluentds configuration filetypically named fluentd.confdefines how these components are chained together. Understanding this flow is the foundation of effective configuration.
Step 2: Install Fluentd
Fluentd can be installed on Linux, macOS, Windows, and within containerized environments. Below are the most common installation methods.
On Ubuntu/Debian
Use the official Fluentd repository to ensure you receive the latest stable version with security updates.
curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh | sh
This script installs td-agent, the official Fluentd distribution maintained by Treasure Data, which includes bundled plugins and system service integration.
After installation, verify its working:
sudo systemctl status td-agent
On CentOS/RHEL
curl -L https://toolbelt.treasuredata.com/sh/install-redhat-8-td-agent4.sh | sh
sudo systemctl status td-agent
Using Docker
For containerized deployments, use the official Fluentd image:
docker run -d --name fluentd -p 24224:24224 -v $(pwd)/fluentd.conf:/etc/fluent/fluent.conf fluent/fluentd:latest
Ensure your configuration file (fluentd.conf) is mounted correctly. This method is ideal for Kubernetes and Docker Compose environments.
Using Ruby Gem (Advanced)
If you need full control over plugin versions or are developing custom plugins, install Fluentd via RubyGems:
gem install fluentd
Then start Fluentd manually:
fluentd -c /path/to/fluentd.conf
Use this method only if youre experienced with Ruby environments and dependency management.
Step 3: Create a Basic Configuration File
Fluentds configuration file uses a simple, human-readable syntax. Below is a minimal working configuration that reads from a file and outputs to stdout.
<source>
@type tail
path /var/log/app.log
pos_file /var/log/fluentd-app.pos
tag app.log
format none
</source>
<match **>
@type stdout
</match>
Lets break this down:
<source>defines the input.@type tailmonitors a file for new lines, similar to the Unixtail -fcommand.pathspecifies the log file to monitor.pos_filetracks the last read position to avoid duplicate logs after restarts.taglabels the data stream. Tags are used for routing in Fluentd.format nonemeans no parsing is appliedeach line is treated as raw text.<match **>captures all tagged data and sends it to@type stdout, which prints to the console.
Save this as fluentd.conf and start Fluentd:
sudo systemctl restart td-agent
Generate test log entries:
echo "2024-06-10T10:00:00Z INFO User logged in" >> /var/log/app.log
Check the Fluentd logs to confirm output:
sudo tail -f /var/log/td-agent/td-agent.log
You should see the log line printed in the Fluentd log output.
Step 4: Parse Structured Logs
Most modern applications output logs in structured formats like JSON. Fluentd can parse these to extract fields for better querying and analysis.
Update your source block to parse JSON:
<source>
@type tail
path /var/log/app.log
pos_file /var/log/fluentd-app.pos
tag app.log
format json
time_key timestamp
time_format %Y-%m-%dT%H:%M:%S.%NZ
</source>
Now, if your log file contains:
{"timestamp":"2024-06-10T10:00:00.123Z","level":"INFO","message":"User logged in","user_id":12345}
Fluentd will extract timestamp, level, message, and user_id as individual fields. These become available for filtering and routing.
Important: Ensure your JSON logs are valid and consistent. Invalid JSON will cause Fluentd to drop the record. Use tools like jq to validate logs before ingestion.
Step 5: Use Filters to Transform Data
Filters modify log records before they reach output. Common use cases include adding hostnames, redacting sensitive data, or enriching logs with metadata.
Example: Add server hostname and mask email addresses.
<filter app.log>
@type record_transformer
<record>
hostname ${HOSTNAME}
</record>
</filter>
<filter app.log>
@type grep
<regexp>
key message
pattern \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
</regexp>
<exclude>
key message
pattern \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
</exclude>
</filter>
The first filter adds a hostname field using the systems hostname. The second uses grep to detect emails and remove them from the message field. Note: The grep filter here is used for exclusion; for masking, use record_transformer with regex substitution.
For masking emails safely:
<filter app.log>
@type record_transformer
<record>
message ${record["message"].gsub(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/, "[REDACTED_EMAIL]")}
</record>
</filter>
Filters are processed in order. Place them logically: parse first, then enrich, then sanitize.
Step 6: Configure Multiple Outputs
Fluentd can send the same log data to multiple destinations simultaneously. This is useful for redundancy, compliance, or analytics.
<match app.log>
@type copy
<store>
@type elasticsearch
host localhost
port 9200
index_name fluentd-app
type_name _doc
flush_interval 5s
</store>
<store>
@type s3
aws_key_id YOUR_AWS_KEY
aws_sec_key YOUR_AWS_SECRET
s3_bucket your-logs-bucket
path logs/app/
s3_region us-east-1
buffer_path /var/log/fluentd-s3
time_slice_format %Y%m%d
time_slice_wait 10m
buffer_chunk_limit 256m
</store>
</match>
Here, logs are sent to both Elasticsearch (for real-time search) and S3 (for long-term archival). The @type copy directive enables multi-output routing.
For high availability, use @type forward to send logs to multiple Fluentd instances:
<match app.log>
@type forward
<server>
host fluentd-primary.example.com
port 24224
</server>
<server>
host fluentd-backup.example.com
port 24224
</server>
heartbeat_type tcp
heartbeat_interval 10s
</match>
Fluentd will automatically fail over if the primary server becomes unreachable.
Step 7: Configure Buffering for Reliability
Buffering is Fluentds safety net. It ensures logs arent lost during network issues or destination downtime.
Every output plugin supports buffering. Heres a robust buffer configuration for production:
<match app.log>
@type elasticsearch
host elasticsearch.example.com
port 9200
index_name fluentd-app-${tag}
flush_interval 10s
buffer_type file
buffer_path /var/log/fluentd-buffers/app
buffer_queue_limit 256
buffer_chunk_limit 8m
flush_thread_count 4
retry_max_times 10
retry_wait 10s
max_retry_wait 60s
disable_retry_limit false
</match>
Key parameters:
buffer_type file: Stores data on disk (recommended for production).buffer_queue_limit: Maximum number of chunks in memory before spilling to disk.buffer_chunk_limit: Max size per chunk (8MB is safe for most systems).flush_thread_count: Number of threads to flush buffers (increase for high throughput).retry_max_timesandretry_wait: Control how often Fluentd retries failed deliveries.
Monitor buffer usage:
curl http://localhost:24220/api/plugins.json
This API endpoint returns real-time buffer metrics, including queue depth and retry counts.
Step 8: Secure Fluentd with Authentication and TLS
Never expose Fluentd to the public internet. Use TLS and authentication for internal communication.
Enable TLS for Forward Input
Configure Fluentd to accept encrypted connections:
<source>
@type forward
port 24224
bind 0.0.0.0
<transport tls>
cert_path /etc/fluent/cert.pem
private_key_path /etc/fluent/key.pem
ca_cert_path /etc/fluent/ca-cert.pem
verify_mode peer
</transport>
</source>
Generate certificates using OpenSSL:
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout key.pem -out cert.pem
On the client side (e.g., another Fluentd instance), configure the output to use TLS:
<match app.log>
@type forward
<server>
host fluentd-server.example.com
port 24224
<transport tls>
cert_path /etc/fluent/client-cert.pem
private_key_path /etc/fluent/client-key.pem
ca_cert_path /etc/fluent/ca-cert.pem
</transport>
</server>
</match>
Use Authentication (Optional)
For added security, enable Fluentds auth plugin:
<source>
@type forward
port 24224
<transport tls>
cert_path /etc/fluent/cert.pem
private_key_path /etc/fluent/key.pem
</transport>
<security>
self_hostname fluentd-server.example.com
<auth>
method secret
secret your-super-secret-password
</auth>
</security>
</source>
Client must include the same secret:
<match app.log>
@type forward
<server>
host fluentd-server.example.com
port 24224
<transport tls>
cert_path /etc/fluent/client-cert.pem
private_key_path /etc/fluent/client-key.pem
</transport>
<security>
secret your-super-secret-password
</security>
</server>
</match>
Step 9: Monitor and Log Fluentds Own Health
Fluentd should monitor itself. Enable internal metrics and expose them via HTTP.
<system>
log_level info
root_dir /var/lib/td-agent
</system>
<source>
@type monitor_agent
bind 0.0.0.0
port 24220
</source>
Access metrics at:
http://your-fluentd-host:24220/api/plugins.json
This endpoint returns JSON with buffer usage, throughput, error rates, and plugin status. Integrate this into your monitoring stack (e.g., Prometheus + Grafana) using the fluentd-plugin-prometheus plugin.
Step 10: Restart and Validate Configuration
After making changes, always validate the configuration before restarting:
sudo td-agent -c /etc/fluent/fluent.conf --dry-run
If the output says Configuration is valid, proceed to restart:
sudo systemctl restart td-agent
Monitor logs for errors:
sudo journalctl -u td-agent -f
Test data flow with real logs and verify output destinations are receiving data.
Best Practices
1. Use Tags to Organize Log Streams
Tags are Fluentds routing keys. Structure them hierarchically: app.service.component. For example:
web.nginx.accessapi.auth.servicedb.postgresql.log
This enables precise filtering, routing, and indexing in downstream systems like Elasticsearch or BigQuery.
2. Avoid Using Wildcard Matches in Output
While <match **> captures everything, it makes debugging and routing difficult. Always use explicit tags or regex patterns like <match app.*> to ensure predictable behavior.
3. Separate Logs by Severity or Type
Route error logs to a high-priority destination (e.g., PagerDuty-integrated system), and info/debug logs to archival storage. Use filters to classify logs by level:
<filter app.log>
@type record_transformer
<record>
severity ${record["level"].upcase}
</record>
</filter>
<match app.log>
@type copy
<store>
@type elasticsearch
index_name fluentd-errors
<buffer>
@type file
path /var/log/fluentd-buffers/errors
</buffer>
<match>
severity ERROR
</match>
</store>
<store>
@type s3
index_name fluentd-info
<match>
severity INFO
</match>
</store>
</match>
Note: The <match> inside <store> is a Fluentd 1.0+ feature. Use <filter> + <match> if on older versions.
4. Optimize Buffer Settings for Your Workload
High-throughput environments (e.g., 10K+ logs/sec) require larger buffers and more flush threads. Monitor buffer queue depth and adjust:
- Set
buffer_chunk_limitto 816MB. - Use
buffer_type file(not memory) for persistence. - Set
flush_thread_countto 48 on multi-core systems. - Use
retry_waitwith exponential backoff (e.g., 10s, 20s, 40s).
5. Use External Configuration Management
Manage Fluentd configurations via tools like Ansible, Puppet, or GitOps (FluxCD). Store templates in version control and deploy using automated pipelines. This ensures consistency across hundreds of nodes.
6. Limit Plugin Usage to Whats Necessary
Each plugin consumes memory and CPU. Avoid installing plugins you dont use. For example, if youre not sending logs to Splunk, dont install the fluent-plugin-splunk gem.
7. Regularly Rotate and Clean Buffer Files
Buffer files grow over time. Set up log rotation for /var/log/fluentd-buffers/ using logrotate:
/var/log/fluentd-buffers/* {
daily
rotate 7
compress
missingok
notifempty
create 0644 td-agent td-agent
}
8. Test Configuration Changes in Staging First
Always validate configuration changes in a non-production environment. Use tools like fluentd -c config.conf --dry-run and simulate traffic with curl or fluent-cat:
echo '{"message":"test"}' | fluent-cat app.log
9. Document Your Fluentd Setup
Create a runbook including:
- Configuration file structure
- Tagging conventions
- Buffer thresholds and alerting rules
- How to restart Fluentd without downtime
- Common error codes and resolutions
10. Integrate with Observability Tools
Connect Fluentd to Prometheus for metrics, Grafana for dashboards, and alerting systems like Alertmanager. Use the fluentd-plugin-prometheus plugin to expose internal metrics:
<source>
@type prometheus
port 24231
</source>
<source>
@type prometheus_output_monitor
</source>
Then scrape metrics from http://fluentd-host:24231/metrics.
Tools and Resources
Official Documentation
The official Fluentd documentation at https://docs.fluentd.org is the most authoritative source for configuration syntax, plugin references, and architecture guides.
Fluentd Plugin Registry
Explore over 800 plugins at https://rubygems.org/search?query=fluentd. Popular plugins include:
fluent-plugin-elasticsearchSend logs to Elasticsearch/OpenSearchfluent-plugin-s3Archive logs to AWS S3fluent-plugin-kafkaStream logs to Apache Kafkafluent-plugin-docker_metadata_filterExtract Docker container metadatafluent-plugin-prometheusExpose metrics for monitoringfluent-plugin-aws-cloudwatch-logsSend logs to AWS CloudWatch
Fluent Bit (Lightweight Alternative)
For resource-constrained environments (e.g., edge devices, IoT), consider Fluent Bita faster, memory-efficient cousin of Fluentd. It supports 90% of Fluentds plugins and integrates seamlessly with Fluentd via forward protocol.
Containerized Deployments
For Kubernetes, use the official Fluentd DaemonSet template. It automatically collects logs from Docker and containerd runtimes.
Validation and Debugging Tools
fluent-catSend test messages to Fluentdfluentd -c config.conf --dry-runValidate syntaxcurl http://localhost:24220/api/plugins.jsonMonitor buffer and plugin statusjqParse and validate JSON logstail -f /var/log/td-agent/td-agent.logMonitor Fluentds own logs
Community and Support
Join the Fluentd Slack community and GitHub discussions. The Fluentd project is actively maintained by the Cloud Native Computing Foundation (CNCF) and has a vibrant contributor base.
Monitoring and Alerting
Integrate Fluentd with:
- Prometheus + Grafana For metrics visualization
- ELK Stack For log search and analysis
- Datadog For unified observability
- Sumo Logic For cloud-native log analytics
Real Examples
Example 1: Centralized Logging for a Microservice Architecture
Scenario: You have 15 microservices running in Kubernetes, each outputting JSON logs to stdout. You want to collect, parse, enrich, and send them to Elasticsearch and S3.
Configuration:
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
read_from_head true
</source>
<filter kubernetes.**>
@type kubernetes_metadata
</filter>
<filter kubernetes.**>
@type record_transformer
<record>
service_name ${record["kubernetes"]["labels"]["app"]}
namespace ${record["kubernetes"]["namespace_name"]}
</record>
</filter>
<match kubernetes.**>
@type copy
<store>
@type elasticsearch
host elasticsearch.logging.svc.cluster.local
port 9200
index_name k8s-logs-${record["service_name"]}
type_name _doc
flush_interval 10s
buffer_type file
buffer_path /var/log/fluentd-buffers/k8s
buffer_chunk_limit 8m
buffer_queue_limit 128
retry_max_times 10
retry_wait 10s
</store>
<store>
@type s3
aws_key_id YOUR_KEY
aws_sec_key YOUR_SECRET
s3_bucket your-k8s-logs-bucket
path logs/k8s/${record["namespace_name"]}/${record["service_name"]}/
s3_region us-east-1
buffer_path /var/log/fluentd-buffers/s3
time_slice_format %Y/%m/%d/%H
time_slice_wait 10m
buffer_chunk_limit 256m
</store>
</match>
This configuration automatically detects container logs, enriches them with Kubernetes metadata, tags them by service and namespace, and routes them to both Elasticsearch (for real-time search) and S3 (for compliance).
Example 2: Legacy System Log Forwarding
Scenario: You have an old Linux server running a proprietary application that writes logs to /var/log/legacy/app.log in a custom format: [TIMESTAMP] [LEVEL] MESSAGE.
Configuration:
<source>
@type tail
path /var/log/legacy/app.log
pos_file /var/log/fluentd-legacy.pos
tag legacy.app
format /^(?\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(?[A-Z]+)\] (?.*)$/
time_format %Y-%m-%d %H:%M:%S
</source>
<filter legacy.app>
@type record_transformer
<record>
source "legacy-server-01"
</record>
</filter>
<match legacy.app>
@type forward
<server>
host fluentd-central.example.com
port 24224
<transport tls>
cert_path /etc/fluent/client-cert.pem
private_key_path /etc/fluent/client-key.pem
ca_cert_path /etc/fluent/ca-cert.pem
</transport>
</server>
</match>
This uses a regex parser to extract timestamp, level, and message from unstructured logs, adds source metadata, and forwards securely to a central Fluentd collector.
Example 3: Docker Container Logging with Fluentd
Scenario: Youre running Docker containers and want to collect logs using Fluentd instead of Dockers default JSON-file driver.
Run containers with Fluentd log driver:
docker run -d \
--name myapp \
--log-driver=fluentd \
--log-opt fluentd-address=localhost:24224 \
--log-opt tag=docker.myapp \
my-image
Fluentd configuration:
<source>
@type forward
port 24224
</source>
<match docker.**>
@type elasticsearch
host elasticsearch
port 9200
index_name docker-logs
type_name _doc
flush_interval 5s
</match>
Fluentd automatically receives logs from Docker and forwards them to Elasticsearch. The tag docker.myapp enables routing by container name.
FAQs
1. Whats the difference between Fluentd and Fluent Bit?
Fluentd is a full-featured, Ruby-based log collector with extensive plugin support and complex routing. Fluent Bit is a lightweight, C-based alternative optimized for performance and low memory usage. Use Fluent Bit for edge devices or Kubernetes nodes; use Fluentd for centralized aggregation and advanced processing.
2. How do I prevent log loss in Fluentd?
Use file-based buffering, set appropriate buffer_queue_limit and buffer_chunk_limit, enable retry logic, and monitor buffer metrics. Never use memory-only buffering in production.
3. Can Fluentd handle high-throughput logging (10K+ logs/sec)?
Yes. With proper tuningmultiple flush threads, larger buffer chunks, and optimized output pluginsFluentd can handle tens of thousands of events per second on modern hardware.
4. How do I parse non-JSON logs in Fluentd?
Use the regexp format type with a custom regex pattern. For example: format /^(?.
5. Why are my logs not appearing in Elasticsearch?
Check: (1) Fluentds own logs for errors, (2) Elasticsearch connectivity, (3) buffer status via curl http://localhost:24220/api/plugins.json, (4) index permissions, and (5) whether the tag matches your <match> directive.
6. How do I update Fluentd plugins without downtime?
Fluentd does not support hot-reloading. Plan maintenance windows. Use a rolling update strategy in Kubernetes: deploy new Fluentd pods with updated configs, drain old ones, then terminate.
7. Is Fluentd secure by default?
No. Fluentd listens on unencrypted ports by default. Always enable TLS for network inputs and use authentication in multi-tenant environments.
8. How do I test my Fluentd configuration without affecting production?
Use fluentd -c config.conf --dry-run to validate syntax. Use fluent-cat to inject test logs. Deploy to a staging environment with identical infrastructure before rolling out.
9. Can Fluentd forward logs to multiple cloud providers?
Yes. Use the @type copy directive to send the same logs to AWS CloudWatch, Google Cloud Logging, and Azure Monitor simultaneously.
10. What should I do if Fluentd consumes too much memory?
Reduce buffer_queue_limit, decrease flush_thread_count, disable unused plugins, and monitor buffer usage. Consider switching to Fluent Bit for high-density deployments.
Conclusion
Configuring Fluentd is not merely a technical taskits a strategic decision that impacts the reliability, scalability, and observability of your entire infrastructure. From parsing unstructured logs to securely forwarding data across hybrid clouds, Fluentd provides the flexibility to meet virtually any logging requirement.
This guide has walked you through the full lifecycle of Fluentd configuration: from installation and basic syntax to advanced buffering, security, and real-world use cases. Youve learned how to structure logs with tags, transform data with filters, ensure durability with buffers, and integrate with modern observability tools.
Remember: Fluentds power lies in its simplicity and extensibility. Start smallcollect logs from one service, validate the flow, then scale. Document every change. Monitor relentlessly. Test before you deploy.
As cloud-native architectures continue to evolve, Fluentd remains a foundational tool for centralized logging. Whether youre managing a dozen containers or thousands of microservices, a well-configured Fluentd instance is your key to visibility, control, and resilience.
Now that you understand how to configure Fluentd, take the next step: automate your deployment, integrate it with your CI/CD pipeline, and make logging a first-class citizen in your DevOps workflow.