How to Configure Fluentd

How to Configure Fluentd Fluentd is an open-source data collector designed to unify logging and monitoring across diverse systems. With its lightweight architecture, plugin-based extensibility, and support for over 800 data sources and destinations, Fluentd has become a cornerstone in modern cloud-native and hybrid infrastructure environments. Whether you're managing microservices on Kubernetes, s

alex

Nov 6, 2025 - 19:38

How to Configure Fluentd

Fluentd is an open-source data collector designed to unify logging and monitoring across diverse systems. With its lightweight architecture, plugin-based extensibility, and support for over 800 data sources and destinations, Fluentd has become a cornerstone in modern cloud-native and hybrid infrastructure environments. Whether you're managing microservices on Kubernetes, scaling applications across hybrid clouds, or centralizing logs from legacy systems, Fluentd provides a reliable, scalable, and flexible solution for log aggregation and forwarding.

Configuring Fluentd correctly is critical to ensuring data integrity, minimizing latency, and maintaining system performance. A misconfigured Fluentd instance can lead to log loss, excessive resource consumption, or even service outages. This comprehensive guide walks you through every step of configuring Fluentdfrom initial installation to advanced tuningequipping you with the knowledge to deploy Fluentd confidently in production environments.

Step-by-Step Guide

Step 1: Understand Fluentds Architecture

Before configuring Fluentd, its essential to understand its core components. Fluentd operates on a simple yet powerful model: input ? filter ? output. Data flows through these stages:

Input: Sources that collect data (e.g., files, syslog, HTTP, Docker containers).
Filter: Optional transformations applied to log records (e.g., parsing JSON, masking sensitive fields, adding metadata).
Output: Destinations where data is sent (e.g., Elasticsearch, S3, Kafka, CloudWatch).

Fluentd also supports buffering, which temporarily stores logs during network outages or destination unavailability. This feature ensures no data is lost during transient failures.

Fluentds configuration filetypically named fluentd.confdefines how these components are chained together. Understanding this flow is the foundation of effective configuration.

Step 2: Install Fluentd

Fluentd can be installed on Linux, macOS, Windows, and within containerized environments. Below are the most common installation methods.

On Ubuntu/Debian

Use the official Fluentd repository to ensure you receive the latest stable version with security updates.

curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh | sh

This script installs td-agent, the official Fluentd distribution maintained by Treasure Data, which includes bundled plugins and system service integration.

After installation, verify its working:

sudo systemctl status td-agent

On CentOS/RHEL

curl -L https://toolbelt.treasuredata.com/sh/install-redhat-8-td-agent4.sh | sh sudo systemctl status td-agent

Using Docker

For containerized deployments, use the official Fluentd image:

docker run -d --name fluentd -p 24224:24224 -v $(pwd)/fluentd.conf:/etc/fluent/fluent.conf fluent/fluentd:latest

Ensure your configuration file (fluentd.conf) is mounted correctly. This method is ideal for Kubernetes and Docker Compose environments.

Using Ruby Gem (Advanced)

If you need full control over plugin versions or are developing custom plugins, install Fluentd via RubyGems:

gem install fluentd

Then start Fluentd manually:

fluentd -c /path/to/fluentd.conf

Use this method only if youre experienced with Ruby environments and dependency management.

Step 3: Create a Basic Configuration File

Fluentds configuration file uses a simple, human-readable syntax. Below is a minimal working configuration that reads from a file and outputs to stdout.

<source>
@type tail
path /var/log/app.log
pos_file /var/log/fluentd-app.pos
tag app.log
format none
</source>
<match **>
@type stdout
</match>

Lets break this down:

<source> defines the input. @type tail monitors a file for new lines, similar to the Unix tail -f command.
path specifies the log file to monitor.
pos_file tracks the last read position to avoid duplicate logs after restarts.
tag labels the data stream. Tags are used for routing in Fluentd.
format none means no parsing is appliedeach line is treated as raw text.
<match **> captures all tagged data and sends it to @type stdout, which prints to the console.

Save this as fluentd.conf and start Fluentd:

sudo systemctl restart td-agent

Generate test log entries:

echo "2024-06-10T10:00:00Z INFO User logged in" >> /var/log/app.log

Check the Fluentd logs to confirm output:

sudo tail -f /var/log/td-agent/td-agent.log

You should see the log line printed in the Fluentd log output.

Step 4: Parse Structured Logs

Most modern applications output logs in structured formats like JSON. Fluentd can parse these to extract fields for better querying and analysis.

Update your source block to parse JSON:

<source> @type tail path /var/log/app.log pos_file /var/log/fluentd-app.pos tag app.log format json time_key timestamp time_format %Y-%m-%dT%H:%M:%S.%NZ </source>

Now, if your log file contains:

{"timestamp":"2024-06-10T10:00:00.123Z","level":"INFO","message":"User logged in","user_id":12345}

Fluentd will extract timestamp, level, message, and user_id as individual fields. These become available for filtering and routing.

Important: Ensure your JSON logs are valid and consistent. Invalid JSON will cause Fluentd to drop the record. Use tools like jq to validate logs before ingestion.

Step 5: Use Filters to Transform Data

Filters modify log records before they reach output. Common use cases include adding hostnames, redacting sensitive data, or enriching logs with metadata.

Example: Add server hostname and mask email addresses.

<filter app.log>
@type record_transformer
<record>
hostname ${HOSTNAME}
</record>
</filter>
<filter app.log>
@type grep
<regexp>
key message
pattern \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
</regexp>
<exclude>
key message
pattern \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
</exclude>
</filter>

The first filter adds a hostname field using the systems hostname. The second uses grep to detect emails and remove them from the message field. Note: The grep filter here is used for exclusion; for masking, use record_transformer with regex substitution.

For masking emails safely:

<filter app.log>
@type record_transformer
<record>
message ${record["message"].gsub(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/, "[REDACTED_EMAIL]")}
</record>
</filter>

Filters are processed in order. Place them logically: parse first, then enrich, then sanitize.

Step 6: Configure Multiple Outputs

Fluentd can send the same log data to multiple destinations simultaneously. This is useful for redundancy, compliance, or analytics.

<match app.log>
@type copy
<store>
@type elasticsearch
host localhost
port 9200
index_name fluentd-app
type_name _doc
flush_interval 5s
</store>
<store>
@type s3
aws_key_id YOUR_AWS_KEY
aws_sec_key YOUR_AWS_SECRET
s3_bucket your-logs-bucket
path logs/app/
s3_region us-east-1
buffer_path /var/log/fluentd-s3
time_slice_format %Y%m%d
time_slice_wait 10m
buffer_chunk_limit 256m
</store>
</match>

Here, logs are sent to both Elasticsearch (for real-time search) and S3 (for long-term archival). The @type copy directive enables multi-output routing.

For high availability, use @type forward to send logs to multiple Fluentd instances:

<match app.log>
@type forward
<server>
host fluentd-primary.example.com
port 24224
</server>
<server>
host fluentd-backup.example.com
port 24224
</server>
heartbeat_type tcp
heartbeat_interval 10s
</match>

Fluentd will automatically fail over if the primary server becomes unreachable.

Step 7: Configure Buffering for Reliability

Buffering is Fluentds safety net. It ensures logs arent lost during network issues or destination downtime.

Every output plugin supports buffering. Heres a robust buffer configuration for production:

<match app.log> @type elasticsearch host elasticsearch.example.com port 9200 index_name fluentd-app-${tag} flush_interval 10s buffer_type file buffer_path /var/log/fluentd-buffers/app buffer_queue_limit 256 buffer_chunk_limit 8m flush_thread_count 4 retry_max_times 10 retry_wait 10s max_retry_wait 60s disable_retry_limit false </match>

Key parameters:

buffer_type file: Stores data on disk (recommended for production).
buffer_queue_limit: Maximum number of chunks in memory before spilling to disk.
buffer_chunk_limit: Max size per chunk (8MB is safe for most systems).
flush_thread_count: Number of threads to flush buffers (increase for high throughput).
retry_max_times and retry_wait: Control how often Fluentd retries failed deliveries.

Monitor buffer usage:

curl http://localhost:24220/api/plugins.json

This API endpoint returns real-time buffer metrics, including queue depth and retry counts.

Step 8: Secure Fluentd with Authentication and TLS

Never expose Fluentd to the public internet. Use TLS and authentication for internal communication.

Enable TLS for Forward Input

Configure Fluentd to accept encrypted connections:

<source>
@type forward
port 24224
bind 0.0.0.0
<transport tls>
cert_path /etc/fluent/cert.pem
private_key_path /etc/fluent/key.pem
ca_cert_path /etc/fluent/ca-cert.pem
verify_mode peer
</transport>
</source>

Generate certificates using OpenSSL:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout key.pem -out cert.pem

On the client side (e.g., another Fluentd instance), configure the output to use TLS:

<match app.log>
@type forward
<server>
host fluentd-server.example.com
port 24224
<transport tls>
cert_path /etc/fluent/client-cert.pem
private_key_path /etc/fluent/client-key.pem
ca_cert_path /etc/fluent/ca-cert.pem
</transport>
</server>
</match>

Use Authentication (Optional)

For added security, enable Fluentds auth plugin:

<source>
@type forward
port 24224
<transport tls>
cert_path /etc/fluent/cert.pem
private_key_path /etc/fluent/key.pem
</transport>
<security>
self_hostname fluentd-server.example.com
<auth>
method secret
secret your-super-secret-password
</auth>
</security>
</source>

Client must include the same secret:

<match app.log>
@type forward
<server>
host fluentd-server.example.com
port 24224
<transport tls>
cert_path /etc/fluent/client-cert.pem
private_key_path /etc/fluent/client-key.pem
</transport>
<security>
secret your-super-secret-password
</security>
</server>
</match>

Step 9: Monitor and Log Fluentds Own Health

Fluentd should monitor itself. Enable internal metrics and expose them via HTTP.

<system>
log_level info
root_dir /var/lib/td-agent
</system>
<source>
@type monitor_agent
bind 0.0.0.0
port 24220
</source>

Access metrics at:

http://your-fluentd-host:24220/api/plugins.json

This endpoint returns JSON with buffer usage, throughput, error rates, and plugin status. Integrate this into your monitoring stack (e.g., Prometheus + Grafana) using the fluentd-plugin-prometheus plugin.

Step 10: Restart and Validate Configuration

After making changes, always validate the configuration before restarting:

sudo td-agent -c /etc/fluent/fluent.conf --dry-run

If the output says Configuration is valid, proceed to restart:

sudo systemctl restart td-agent

Monitor logs for errors:

sudo journalctl -u td-agent -f

Test data flow with real logs and verify output destinations are receiving data.

Best Practices

1. Use Tags to Organize Log Streams

Tags are Fluentds routing keys. Structure them hierarchically: app.service.component. For example:

web.nginx.access
api.auth.service
db.postgresql.log

This enables precise filtering, routing, and indexing in downstream systems like Elasticsearch or BigQuery.

2. Avoid Using Wildcard Matches in Output

While <match **> captures everything, it makes debugging and routing difficult. Always use explicit tags or regex patterns like <match app.*> to ensure predictable behavior.

3. Separate Logs by Severity or Type

Route error logs to a high-priority destination (e.g., PagerDuty-integrated system), and info/debug logs to archival storage. Use filters to classify logs by level:

<filter app.log>
@type record_transformer
<record>
severity ${record["level"].upcase}
</record>
</filter>
<match app.log>
@type copy
<store>
@type elasticsearch
index_name fluentd-errors
<buffer>
@type file
path /var/log/fluentd-buffers/errors
</buffer>
<match>
severity ERROR
</match>
</store>
<store>
@type s3
index_name fluentd-info
<match>
severity INFO
</match>
</store>
</match>

Note: The <match> inside <store> is a Fluentd 1.0+ feature. Use <filter> + <match> if on older versions.

4. Optimize Buffer Settings for Your Workload

High-throughput environments (e.g., 10K+ logs/sec) require larger buffers and more flush threads. Monitor buffer queue depth and adjust:

Set buffer_chunk_limit to 816MB.
Use buffer_type file (not memory) for persistence.
Set flush_thread_count to 48 on multi-core systems.
Use retry_wait with exponential backoff (e.g., 10s, 20s, 40s).

5. Use External Configuration Management

Manage Fluentd configurations via tools like Ansible, Puppet, or GitOps (FluxCD). Store templates in version control and deploy using automated pipelines. This ensures consistency across hundreds of nodes.

6. Limit Plugin Usage to Whats Necessary

Each plugin consumes memory and CPU. Avoid installing plugins you dont use. For example, if youre not sending logs to Splunk, dont install the fluent-plugin-splunk gem.

7. Regularly Rotate and Clean Buffer Files

Buffer files grow over time. Set up log rotation for /var/log/fluentd-buffers/ using logrotate:

/var/log/fluentd-buffers/* { daily rotate 7 compress missingok notifempty create 0644 td-agent td-agent }

8. Test Configuration Changes in Staging First

Always validate configuration changes in a non-production environment. Use tools like fluentd -c config.conf --dry-run and simulate traffic with curl or fluent-cat:

echo '{"message":"test"}' | fluent-cat app.log

9. Document Your Fluentd Setup

Create a runbook including:

Configuration file structure
Tagging conventions
Buffer thresholds and alerting rules
How to restart Fluentd without downtime
Common error codes and resolutions

10. Integrate with Observability Tools

Connect Fluentd to Prometheus for metrics, Grafana for dashboards, and alerting systems like Alertmanager. Use the fluentd-plugin-prometheus plugin to expose internal metrics:

<source>
@type prometheus
port 24231
</source>
<source>
@type prometheus_output_monitor
</source>

Then scrape metrics from http://fluentd-host:24231/metrics.

Tools and Resources

Official Documentation

The official Fluentd documentation at https://docs.fluentd.org is the most authoritative source for configuration syntax, plugin references, and architecture guides.

Fluentd Plugin Registry

Explore over 800 plugins at https://rubygems.org/search?query=fluentd. Popular plugins include:

fluent-plugin-elasticsearch Send logs to Elasticsearch/OpenSearch
fluent-plugin-s3 Archive logs to AWS S3
fluent-plugin-kafka Stream logs to Apache Kafka
fluent-plugin-docker_metadata_filter Extract Docker container metadata
fluent-plugin-prometheus Expose metrics for monitoring
fluent-plugin-aws-cloudwatch-logs Send logs to AWS CloudWatch

Fluent Bit (Lightweight Alternative)

For resource-constrained environments (e.g., edge devices, IoT), consider Fluent Bita faster, memory-efficient cousin of Fluentd. It supports 90% of Fluentds plugins and integrates seamlessly with Fluentd via forward protocol.

Containerized Deployments

For Kubernetes, use the official Fluentd DaemonSet template. It automatically collects logs from Docker and containerd runtimes.

Validation and Debugging Tools

fluent-cat Send test messages to Fluentd
fluentd -c config.conf --dry-run Validate syntax
curl http://localhost:24220/api/plugins.json Monitor buffer and plugin status
jq Parse and validate JSON logs
tail -f /var/log/td-agent/td-agent.log Monitor Fluentds own logs

Community and Support

Join the Fluentd Slack community and GitHub discussions. The Fluentd project is actively maintained by the Cloud Native Computing Foundation (CNCF) and has a vibrant contributor base.

Monitoring and Alerting

Integrate Fluentd with:

Prometheus + Grafana For metrics visualization
ELK Stack For log search and analysis
Datadog For unified observability
Sumo Logic For cloud-native log analytics

Real Examples

Example 1: Centralized Logging for a Microservice Architecture

Scenario: You have 15 microservices running in Kubernetes, each outputting JSON logs to stdout. You want to collect, parse, enrich, and send them to Elasticsearch and S3.

Configuration:

<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
read_from_head true
</source>
<filter kubernetes.**>
@type kubernetes_metadata
</filter>
<filter kubernetes.**>
@type record_transformer
<record>
service_name ${record["kubernetes"]["labels"]["app"]}
namespace ${record["kubernetes"]["namespace_name"]}
</record>
</filter>
<match kubernetes.**>
@type copy
<store>
@type elasticsearch
host elasticsearch.logging.svc.cluster.local
port 9200
index_name k8s-logs-${record["service_name"]}
type_name _doc
flush_interval 10s
buffer_type file
buffer_path /var/log/fluentd-buffers/k8s
buffer_chunk_limit 8m
buffer_queue_limit 128
retry_max_times 10
retry_wait 10s
</store>
<store>
@type s3
aws_key_id YOUR_KEY
aws_sec_key YOUR_SECRET
s3_bucket your-k8s-logs-bucket
path logs/k8s/${record["namespace_name"]}/${record["service_name"]}/
s3_region us-east-1
buffer_path /var/log/fluentd-buffers/s3
time_slice_format %Y/%m/%d/%H
time_slice_wait 10m
buffer_chunk_limit 256m
</store>
</match>

This configuration automatically detects container logs, enriches them with Kubernetes metadata, tags them by service and namespace, and routes them to both Elasticsearch (for real-time search) and S3 (for compliance).

Example 2: Legacy System Log Forwarding

Scenario: You have an old Linux server running a proprietary application that writes logs to /var/log/legacy/app.log in a custom format: [TIMESTAMP] [LEVEL] MESSAGE.

Configuration:

<source>
@type tail
path /var/log/legacy/app.log
pos_file /var/log/fluentd-legacy.pos
tag legacy.app
format /^(?\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(?[A-Z]+)\] (?.*)$/
time_format %Y-%m-%d %H:%M:%S
</source>
<filter legacy.app>
@type record_transformer
<record>
source "legacy-server-01"
</record>
</filter>
<match legacy.app>
@type forward
<server>
host fluentd-central.example.com
port 24224
<transport tls>
cert_path /etc/fluent/client-cert.pem
private_key_path /etc/fluent/client-key.pem
ca_cert_path /etc/fluent/ca-cert.pem
</transport>
</server>
</match>

This uses a regex parser to extract timestamp, level, and message from unstructured logs, adds source metadata, and forwards securely to a central Fluentd collector.

Example 3: Docker Container Logging with Fluentd

Scenario: Youre running Docker containers and want to collect logs using Fluentd instead of Dockers default JSON-file driver.

Run containers with Fluentd log driver:

docker run -d \ --name myapp \ --log-driver=fluentd \ --log-opt fluentd-address=localhost:24224 \ --log-opt tag=docker.myapp \ my-image

Fluentd configuration:

<source>
@type forward
port 24224
</source>
<match docker.**>
@type elasticsearch
host elasticsearch
port 9200
index_name docker-logs
type_name _doc
flush_interval 5s
</match>

Fluentd automatically receives logs from Docker and forwards them to Elasticsearch. The tag docker.myapp enables routing by container name.

FAQs

1. Whats the difference between Fluentd and Fluent Bit?

Fluentd is a full-featured, Ruby-based log collector with extensive plugin support and complex routing. Fluent Bit is a lightweight, C-based alternative optimized for performance and low memory usage. Use Fluent Bit for edge devices or Kubernetes nodes; use Fluentd for centralized aggregation and advanced processing.

2. How do I prevent log loss in Fluentd?

Use file-based buffering, set appropriate buffer_queue_limit and buffer_chunk_limit, enable retry logic, and monitor buffer metrics. Never use memory-only buffering in production.

3. Can Fluentd handle high-throughput logging (10K+ logs/sec)?

Yes. With proper tuningmultiple flush threads, larger buffer chunks, and optimized output pluginsFluentd can handle tens of thousands of events per second on modern hardware.

4. How do I parse non-JSON logs in Fluentd?

Use the regexp format type with a custom regex pattern. For example: format /^(?[^ ]* [^ ]*) (?[^ ]*) (?[^ ]*) \[(?[^\]]*)\] (?.*)$/.

5. Why are my logs not appearing in Elasticsearch?

Check: (1) Fluentds own logs for errors, (2) Elasticsearch connectivity, (3) buffer status via curl http://localhost:24220/api/plugins.json, (4) index permissions, and (5) whether the tag matches your <match> directive.

6. How do I update Fluentd plugins without downtime?

Fluentd does not support hot-reloading. Plan maintenance windows. Use a rolling update strategy in Kubernetes: deploy new Fluentd pods with updated configs, drain old ones, then terminate.

7. Is Fluentd secure by default?

No. Fluentd listens on unencrypted ports by default. Always enable TLS for network inputs and use authentication in multi-tenant environments.

8. How do I test my Fluentd configuration without affecting production?

Use fluentd -c config.conf --dry-run to validate syntax. Use fluent-cat to inject test logs. Deploy to a staging environment with identical infrastructure before rolling out.

9. Can Fluentd forward logs to multiple cloud providers?

Yes. Use the @type copy directive to send the same logs to AWS CloudWatch, Google Cloud Logging, and Azure Monitor simultaneously.

10. What should I do if Fluentd consumes too much memory?

Reduce buffer_queue_limit, decrease flush_thread_count, disable unused plugins, and monitor buffer usage. Consider switching to Fluent Bit for high-density deployments.

Conclusion

Configuring Fluentd is not merely a technical taskits a strategic decision that impacts the reliability, scalability, and observability of your entire infrastructure. From parsing unstructured logs to securely forwarding data across hybrid clouds, Fluentd provides the flexibility to meet virtually any logging requirement.

This guide has walked you through the full lifecycle of Fluentd configuration: from installation and basic syntax to advanced buffering, security, and real-world use cases. Youve learned how to structure logs with tags, transform data with filters, ensure durability with buffers, and integrate with modern observability tools.

Remember: Fluentds power lies in its simplicity and extensibility. Start smallcollect logs from one service, validate the flow, then scale. Document every change. Monitor relentlessly. Test before you deploy.

As cloud-native architectures continue to evolve, Fluentd remains a foundational tool for centralized logging. Whether youre managing a dozen containers or thousands of microservices, a well-configured Fluentd instance is your key to visibility, control, and resilience.

Now that you understand how to configure Fluentd, take the next step: automate your deployment, integrate it with your CI/CD pipeline, and make logging a first-class citizen in your DevOps workflow.

alex

How to Configure Fluentd

How to Configure Fluentd

Step-by-Step Guide

Step 1: Understand Fluentds Architecture

Step 2: Install Fluentd

On Ubuntu/Debian

On CentOS/RHEL

Using Docker

Using Ruby Gem (Advanced)

Step 3: Create a Basic Configuration File

Step 4: Parse Structured Logs

Step 5: Use Filters to Transform Data

Step 6: Configure Multiple Outputs

Step 7: Configure Buffering for Reliability

Step 8: Secure Fluentd with Authentication and TLS

Enable TLS for Forward Input

Use Authentication (Optional)

Step 9: Monitor and Log Fluentds Own Health

Step 10: Restart and Validate Configuration

Best Practices

1. Use Tags to Organize Log Streams

2. Avoid Using Wildcard Matches in Output

3. Separate Logs by Severity or Type

4. Optimize Buffer Settings for Your Workload

5. Use External Configuration Management

6. Limit Plugin Usage to Whats Necessary

7. Regularly Rotate and Clean Buffer Files

8. Test Configuration Changes in Staging First

9. Document Your Fluentd Setup

10. Integrate with Observability Tools

Tools and Resources

Official Documentation

Fluentd Plugin Registry

Fluent Bit (Lightweight Alternative)

Containerized Deployments

Validation and Debugging Tools

Community and Support

Monitoring and Alerting

Real Examples

Example 1: Centralized Logging for a Microservice Architecture

Example 2: Legacy System Log Forwarding

Example 3: Docker Container Logging with Fluentd

FAQs

1. Whats the difference between Fluentd and Fluent Bit?

2. How do I prevent log loss in Fluentd?

3. Can Fluentd handle high-throughput logging (10K+ logs/sec)?

4. How do I parse non-JSON logs in Fluentd?

5. Why are my logs not appearing in Elasticsearch?

6. How do I update Fluentd plugins without downtime?

7. Is Fluentd secure by default?

8. How do I test my Fluentd configuration without affecting production?

9. Can Fluentd forward logs to multiple cloud providers?

10. What should I do if Fluentd consumes too much memory?

Conclusion

Related Posts

Popular Posts

Recommended Posts

Popular Tags