第四部分Fluentd部署设置

1、日志

本文介绍Fluentd的日志记录机制。
Fluentd有两个日志层：全局和每个插件。可以为全局日志记录和插件级别日志记录设置不同的日志级别。

1.1、Log Level

下面显示的是支持值的列表，按照冗长的顺序增加：

fatal
error
warn
info
debug
trace

默认日志级别为info，Fluentd默认输出 info, warn, error and fatal 日志.

1.2、全局日志

Fluentd core和插件使用全局日志记录就不能设置自己的日志级别。可以向上或者向下调整全局日志级别。

通过命令行选项

增加详细程度

-v选项设置要调试（debug）的详细程序

-vv选项将详细程度设置为追踪（trace）

$ fluentd -v ... # debug level
$ fluentd -vv ... # trace level

这些选项对于调试目的很有用。

降低日志精度

-q选项将verbosity设置为warn

-qq选项将verbosity设置为error。

$ fluentd -q ... # warn level
$ fluentd -qq ... # error level

使用配置文件

您也可以使用下面的配置文件中的部分更改日志记录级别。

<system>
 # equal to -qq option
 log_level error
</system>

1.3、每个插件日志

log_level选项为每个插件设置不同级别的日志记录。它可以在每个插件的配置文件中设置。

例如，为了调试in_tail但是抑制所有 fatal的日志消息in_http，它们各自的log_level选项应该如下设置：

<source>
 @type tail
 @log_level debug
 path /var/log/data.log
 ...
</source>
<source>
 @type http
 @log_level fatal
</source>

如果不指定log_level参数，插件将使用全局日志级别（默认是info）。

一些插件还不支持每插件日志记录。 “插件开发”文章的日志记录部分介绍如何更新此类插件以支持新的日志级别系统。

1.4、抑制重复堆栈跟踪

Fluentd可以使用--suppress-repeated-stacktrace来抑制相同的堆栈跟踪。例如，如果您将--suppress-repeated-stacktrace传递给fluentd：

2013-12-04 15:05:53 +0900 [warn]: fluent/engine.rb:154:rescue in emit_stream: emit transaction failed error_class = RuntimeError error = #<RuntimeError: syslog>
 2013-12-04 15:05:53 +0900 [warn]: fluent/engine.rb:140:emit_stream: /Users/repeatedly/devel/fluent/fluentd/lib/fluent/plugin/out_stdout.rb:43:in `emit'
 [snip]
 2013-12-04 15:05:53 +0900 [warn]: fluent/engine.rb:140:emit_stream: /Users/repeatedly/devel/fluent/fluentd/lib/fluent/plugin/in_object_space.rb:63:in `run'
2013-12-04 15:05:53 +0900 [error]: plugin/in_object_space.rb:113:rescue in on_timer: object space failed to emit error = "foo.bar" error_class = "RuntimeError" tag = "foo" record = "{ ...}"
2013-12-04 15:05:55 +0900 [warn]: fluent/engine.rb:154:rescue in emit_stream: emit transaction failed error_class = RuntimeError error = #<RuntimeError: syslog>
2013-12-04 15:05:53 +0900 [warn]: fluent/engine.rb:140:emit_stream: /Users/repeatedly/devel/fluent/fluentd/lib/fluent/plugin/o/2.0.0/gems/cool.io-1.1.1/lib/cool.io/loop.rb:96:in `run'
 [snip]

日志更改为：

2013-12-04 15:05:53 +0900 [warn]: fluent/engine.rb:154:rescue in emit_stream: emit transaction failed error_class = RuntimeError error = #<RuntimeError: syslog>
 2013-12-04 15:05:53 +0900 [warn]: fluent/engine.rb:140:emit_stream: /Users/repeatedly/devel/fluent/fluentd/lib/fluent/plugin/o/2.0.0/gems/cool.io-1.1.1/lib/cool.io/loop.rb:96:in `run'
 [snip]
 2013-12-04 15:05:53 +0900 [warn]: fluent/engine.rb:140:emit_stream: /Users/repeatedly/devel/fluent/fluentd/lib/fluent/plugin/in_object_space.rb:63:in `run'
2013-12-04 15:05:53 +0900 [error]: plugin/in_object_space.rb:113:rescue in on_timer: object space failed to emit error = "foo.bar" error_class = "RuntimeError" tag = "foo" record = "{ ...}"
2013-12-04 15:05:55 +0900 [warn]: fluent/engine.rb:154:rescue in emit_stream: emit transaction failed error_class = RuntimeError error = #<RuntimeError: syslog>
 2013-12-04 15:05:55 +0900 [warn]: plugin/in_object_space.rb:111:on_timer: suppressed same stacktrace

相同的堆栈跟踪被替换为抑制相同的堆栈跟踪消息，直到接收到其他堆栈跟踪。

1.5、输出到日志文件

Fluentd默认将日志output到STDOUT。要输出到文件，请指定-o选项。

$ fluentd -o /path/to/log_file

1.6、捕获Fluentd日志

Fluentd用Fluent标签标记自己的日志。您可以使用<match fluent.>或<match >处理Fluentd日志（当然会捕获其他日志）。如果您在配置中定义了<match fluent.>，则Fluentd会将自己的日志发送到此匹配目的地。这对于监视Fluentd日志很有用。

例如，如果您具有以下<match fluent.**>：

# omit other source / match
<match fluent.**>
 @type stdout
</match>

然后Fluentd将fluent.info日志输出到stdout，如下所示：

2014-02-27 00:00:00 +0900 [info]: shutting down fluentd
2014-02-27 00:00:01 +0900 fluent.info: { "message":"shutting down fluentd"} # by <match fluent.**>
2014-02-27 00:00:01 +0900 [info]: process finished code = 0

案例1：将Fluentd日志发送到监控服务

您可以通过插件将Fluentd日志发送到监控服务，例如 datadog, sentry, irc, etc.

# Add hostname for identifying the server
<filter fluent.**>
 @type record_transformer
 <record>
 host "#{Socket.gethostname}"
 </record>
</match>

<match fluent.**>
 @type monitoring_plugin
 # parameters...
</match>

案例2：使用聚合/监控服务器

您可以使用out_forward将Fluentd日志发送到监控服务器。监控服务器可以过滤并发送日志到通知系统：chat, irc, etc.等

Leaf server example:

# Add hostname for identifying the server and tag to filter by log level
<filter fluent.**>
 @type record_transformer
 <record>
 host "#{Socket.gethostname}"
 original_tag ${tag}
 </record>
</match>

<match fluent.**>
 @type forward
 <server>
 # Monitoring server parameters
 </server>
</match>

监控服务器示例：

<source>
 @type forward
 label @FLUENTD_INTERNAL_LOG
</source>

<label @FLUENTD_INTERNAL_LOG>
 # Ignore trace, debug and info log
 <filter fluent.**>
 @type grep
 regexp1 original_tag fluent.(warn|error|fatal)
 </match>

 <match fluent.**>
 # your notification setup. This example uses irc plugin
 @type irc
 host irc.domain
 channel notify
 message notice: %s [%s] @%s %s
 out_keys original_tag,time,host,message
 </match>
</label>

如果发生错误，您将在irc通知通道中收到通知消息。

01:01 fluentd: [11:10:24] notice: fluent.warn [2014/02/27 01:00:00] @leaf.server.domain detached forwarding server 'server.name'

2、监控

主要是监控Fluentd守护进程

2.1监控Agent

Fluentd有一个监控Agent，通过HTTP检索JSON中的内部指标。请将以下行添加到您的配置文件中。

<source>
 @type monitor_agent
 bind 0.0.0.0
 port 24220
</source>

接下来，请重新启动agent并通过HTTP获取指标。

$ curl host:24220/api/plugins.json
{
 "plugins":[
 {
 "plugin_id":"object:3fec669d6ac4",
 "type":"forward",
 "output_plugin":false,
 "config":{
 "type":"forward"
 }
 },
 {
 "plugin_id":"object:3fec669dfa48",
 "type":"monitor_agent",
 "output_plugin":false,
 "config":{
 "type":"monitor_agent",
 "port":"24220"
 }
 },
 {
 "plugin_id":"object:3fec66aead48",
 "type":"forward",
 "output_plugin":true,
 "buffer_queue_length":0,
 "buffer_total_queued_size":0,
 "retry_count":0,
 "config":{
 "type":"forward",
 "host":"192.168.0.11"
 }
 }
 ]
}

Reuse（重用） plugins

v0.12.17以后，monitor_agent插件有tag参数。如果设置标签monitor.metrics，则monitor_agent插件将内部指标发送到monitor.metrics标签。这是一个stdout输出的例子。

2015-09-16 20:28:19 +0900 monitor.metrics: {"plugin_id":"object:3fc62f0e5d64","plugin_category":"input","type":"monitor_agent","output_plugin":false,"retry_count":null}
2015-09-16 20:28:19 +0900 monitor.metrics: {"plugin_id":"object:3fc62f0e9c84","plugin_category":"output","type":"stdout","output_plugin":true,"retry_count":null}

2.2、进程监控
执行两个ruby进程（父和子）。请确保这些进程正在运行。 td-agent的示例如下所示。

/opt/td-agent/embedded/bin/ruby /usr/sbin/td-agent
 --daemon /var/run/td-agent/td-agent.pid
 --log /var/log/td-agent/td-agent.log

对于Linux上的td-agent，可以使用以下命令检查进程状态。如果没有问题，应该显示两个过程。

$ ps w -C ruby -C td-agent --no-heading
32342 ? Sl 0:00 /opt/td-agent/embedded/bin/ruby /usr/sbin/td-agent --daemon /var/run/td-agent/td-agent.pid --log /var/log/td-agent/td-agent.log
32345 ? Sl 0:01 /opt/td-agent/embedded/bin/ruby /usr/sbin/td-agent --daemon /var/run/td-agent/td-agent.pid --log /var/log/td-agent/td-agent.log

2.3、端口监控

Fluentd根据配置文件打开几个端口。我们建议您检查这些端口的可用性。默认端口设置如下所示：

TCP 0.0.0.0 9880 (HTTP by default)
TCP 0.0.0.0 24224 (Forward by default)

调试（debug）端口

建议进行本地通讯的调试端口进行故障排除。请注意，以下配置将被要求。

<source>
 @type debug_agent
 bind 127.0.0.1
 port 24230
</source>

您可以使用Fluent-debug命令通过dRuby附加(attach)该进程。

2.4、Datadog (dd-agent) 集成

Datadog是云监控服务，其监控代理dd-agent与Fluentd进行本地集成。

请参阅本文档了解更多详细信息。

3、Fluentd的信号（Signal）处理

本文解释了Fluentd如何处理UNIX信号。

3.1、进程模式（Process Model）

当您启动Fluentd时，它将创建两个进程：supervisor and worker。 supervisor管进程控制工作进程的生命周期。请确保发送任何信号到supervisor进程。

3.2、Signals（信号）

SIGINT或SIGTERM

优雅地停止守护进程。
Fluentd将尝试一次刷新整个内存缓冲区，但如果刷新失败，则不会重试。 Fluentd不会刷新文件缓冲区; 默认情况下，日志将保留在磁盘上。

SIGUSR1

强制缓冲的消息被刷新并重新打开Fluentd的日志。
Fluentd会立即刷新当前的缓冲区（内存和文件），并在flush_interval上继续刷新。

SIGHUP

通过正常重新启动工作进程来重新加载配置文件。
Fluentd将尝试一次刷新整个内存缓冲区，但如果刷新失败，则不会重试。 Fluentd不会刷新文件缓冲区; 默认情况下，日志将保留在磁盘上。

SIGCONT

调用sigdump来转储流畅的内部状态。参见故障排除文章。

4、Fluentd的HTTP RPC

4.1、概述

HTTP RPC是管理Fluentd实例的一种方式。几个提供的RPC是信号的替换。响应正文是JSON格式。

信号不受支持的环境，例如 Windows，您可以使用RPC而不是信号（signals）。

4.2、配置

默认关闭RPC。如果要启用RPC，请在部分中设置rpc_endpoint。

<system>
 rpc_endpoint 127.0.0.1:24444
</system>

之后，您可以访问RPC，如下所示。

$ curl 127.0.0.1:24444/api/plugins.flushBuffers
{"ok":true}

4.3、RPCs

/api/processes.interruptWorkers

更换信号的SIGINT。停止守护进程。

/api/processes.killWorkers

更换信号的SIGTERM。停止守护进程。

/api/plugins.flushBuffers

更换信号的SIGUSR1。刷新缓冲的消息。

/api/config.reload

更换信号的SIGHUP。重新加载配置。

5、Fluentd高可用性配置

高流量的网站,我们建议使用一个高可用性配置Fluentd。

5.1、消息传递语义（Message Delivery Semantics）

Fluentd主要用于事件日志传送系统。

在这样的系统中，可以提供几种交付保证：

最多一次：消息立即转移。如果传输成功，则不再发送消息。然而，许多故障情况可能会导致丢失的消息（例如：不再有写入容量）
至少一次：每个消息至少传送一次。在故障情况下，消息可能会被传送两次。
完全一次：每个消息只传递一次。这是人们想要的。

如果系统“不能丢失单个事件”，并且还必须传输“一次”，则系统在写入容量不足时必须停止摄取事件。 正确的方法是在不能接受事件时使用同步记录和返回错误。

这就是为什么Fluentd提供“最多一次”和“至少一次”转移。 为了收集大量数据而不影响应用程序性能，数据记录器必须异步传输数据。这可以以潜在的交付失败为代价改善性能。

然而，大多数故障情况是可以预防的。以下部分介绍如何设置Fluentd的拓扑以实现高可用性。

5.2、网络拓扑结构

为高可用性配置Fluentd,我们假设您的网络组成的‘log forwarders（日志转发）’ 和‘log aggregators（日志聚合）’.

“日志转发（log forwarders）”通常安装在每个节点上以接收本地事件。收到一个事件后，他们通过网络将其转发到“日志聚合器（log aggregators）”。

'日志聚合器（log aggregators）'是从日志转发器不断接收事件的守护进程。他们缓冲事件并定期将数据上传到云端。

Fluentd可以作为日志转发器log forwarders或日志聚合器log aggregators，具体取决于其配置。接下来的部分将介绍各自的设置。我们假设活动日志聚合器具有ip'192.168.0.1'，并且备份具有ip'192.168.0.2'。

5.3、日志转发Log Forwarder配置

请将以下行添加到您的配置文件中用于日志转发器。这将配置您的日志转发器将日志传输到日志聚合器。

# TCP input
<source>
 @type forward
 port 24224
</source>

# HTTP input
<source>
 @type http
 port 8888
</source>

# Log Forwarding
<match mytag.**>
 @type forward

 # primary host
 <server>
 host 192.168.0.1
 port 24224
 </server>
 # use secondary host
 <server>
 host 192.168.0.2
 port 24224
 standby
 </server>

 # use longer flush_interval to reduce CPU usage.
 # note that this is a trade-off against latency.
 flush_interval 60s
</match>

当活动聚合器（192.168.0.1）死机时，日志将被发送到备份聚合器（192.168.0.2）。如果两台服务器都死机，则日志在相应的转发节点上缓存在磁盘上。

5.4、Log Aggregator配置

请将以下行添加到日志聚合器的配置文件中。日志传输的输入源是TCP。

# Input
<source>
 @type forward
 port 24224
</source>

# Output
<match mytag.**>
 ...
</match>

传入的日志被缓冲，然后定期上传到云端。如果上传失败，日志将存储在本地磁盘上，直到重传成功。

5.5、失败案例情景

Forwarder Failure转发器故障

当日志转发器从应用程序接收到事件时，事件首先写入磁盘缓冲区（由buffer_path指定）。在每次flush_interval之后，缓冲的数据将转发到聚合器。

这个过程本质上是健壮的，可以防止数据丢失，如果日志转发器的Fluentd进程中断，缓冲的数据将在重新启动后正确传输到其聚合器。如果转发器和聚合器之间的网络中断，数据传输将自动重试。

然而，可能存在消息丢失场景:

在接收到事件后，进程立即终止，但在将它们写入缓冲区之前。
转发器的磁盘被破坏，文件缓冲区丢失。

聚合器Aggregator故障

当日志聚合器接收来自日志代理的事件时，事件首先写入磁盘缓冲区(由buffer_path指定)。每次flush_interval之后，缓冲数据就会被上传到云端。
这个过程本质上是健壮的，可以防止数据丢失。如果一个日志聚合器的fluentd进程死亡，那么来自日志传送程序的数据在重新启动后就被正确地重新传输。如果聚合器和云之间的网络断开，数据传输就会自动重试。

然而，可能存在消息丢失场景:

在接收到事件后，进程立即终止，但在将它们写入缓冲区之前。
聚合器的磁盘被破坏，文件缓冲区丢失。

5.6、故障排除

“没有可用节点（no nodes are available）”

请确保您可以使用TCP和UDP通信。这些命令将有助于检查网络配置。

$ telnet host 24224
$ nmap -p 24224 -sU host

请注意，有一个已知的问题，VMware偶尔会丢失用于心跳的UDP包。

6、失败场景Failure Scenarios

本文列出了各种Fluentd失败场景。我们假设您已经配置了高可用性的Fluentd，因此每个应用程序节点都有它的本地代理，所有的日志都聚合为多个聚合器。

6.1、应用程序不能将记录发布到Forwarder

在失败场景中，应用程序在使用不同语言的日志程序库时，有时无法将记录提交到本地Fluentd实例。根据每个logger库的成熟度，已经实现了一些聪明的机制来防止数据丢失。

1)内存缓冲(可用于Ruby、Java、Python和Perl)

如果目的地Fluentd实例死机，某些logger实现将使用额外内存来保存传入日志。当Fluentd返回时，这些记录器会自动将缓冲日志发送到Fluentd。一旦达到最大缓冲区内存大小，大多数当前实现将把数据写到磁盘上，或者删除日志。

Exponential Backoff (available for Ruby, Java)

当尝试将日志重新发送到本地代理时，一些实现将使用Exponential Backoff防止过度的重新连接请求。

6.2、Forwarder or Aggregator Fluentd Goes Down（停止）

当一个Fluentd进程由于某种原因而死亡时会发生什么?这取决于你的缓冲配置。

buf_memory

如果使用buf_memory，缓冲数据完全丢失。这是对更高性能的权衡。降低flush_interval将降低丢失数据的概率，但会增加代理和聚合器之间的传输量。

buf_file

如果使用buf_file，缓冲数据存储在磁盘上。在Fluentd恢复后，它将尝试再次将缓冲数据发送到目的地。

请注意，如果由于I / O错误导致缓冲区文件被破坏，数据将丢失。如果磁盘是满的，数据也会丢失，因为没有地方存储磁盘上的数据。

6.3、Storage Destination Goes Down（存储目的地挂了）

如果存储目的地(如Amazon S3、MongoDB、HDFS等)下降，Fluentd将继续尝试重新发送缓存数据。重试逻辑依赖于插件的实现。

如果使用buf_memory，聚合器将停止接受新的日志，一旦它们达到缓冲区限制。如果使用buf_file，聚合器将继续接受日志，直到它们耗尽磁盘空间。

7、性能调优

7.1、使用top命令检查服务器情况

如果Fluentd没有您预期的那么好或者不执行，请先检查top命令。您需要确定系统的哪个部分是瓶颈(CPU ?内存?磁盘I / O ?等等)。

7.2、避免额外的计算

这更像是一个一般性的建议，但最好不要在Fluentd中有额外的计算。Fluentd是灵活的，可以在内部做一些事情，但是在Fluentd的配置文件中添加太多的逻辑使得它很难读和维护，同时也使它变得不那么健壮。配置文件应该尽可能简单。

7.3、使用num_threads参数

如果您的日志的目的地是一个远程存储或服务，那么添加num_threads选项将并行化您的输出(默认值为1)。此参数适用于所有输出插件。

<match test>
 @type output_plugin
 num_threads 8
 ...
</match>

重要的是这个选项并不能提高处理性能，例如数值计算，突变记录等。

7.4、使用外部gzip命令S3 /TD

Ruby有GIL(全局解释器锁)，它只允许一次执行一个线程。虽然I / O任务可以进行多路转换，但cpu密集型任务将阻塞其他作业。在Fluentd中的一个cpu密集型任务是压缩。

新版本的S3 / Treasure数据插件允许在Fluentd进程之外使用gzip进行压缩。这样可以释放Ruby解释器，同时允许Fluentd处理其他任务。

# S3
<match ...>
 @type s3
 store_as gzip_command
 num_threads 8
 ...
</match>

# Treasure Data
<match ...>
 @type tdlog
 use_gzip_command
 num_threads 8
 ...
</match>

虽然不能完美地利用多个CPU核心，但这对于大多数Fluentd部署都是有效的。和以前一样，您也可以使用num_threads选项来运行这个选项。

7.5、减少内存使用

Ruby有多个GC参数来调优GC性能，您可以通过环境变量(这里的参数列表)配置这些参数。为了减少内存使用，将RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR设置为较低的值。RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR用于完整的GC触发器，默认值为2.0。引用文档。

Do full GC when the number of old objects is more than R * N
 where R is this factor and
 N is the number of old objects just after last full GC.

因此，默认GC行为不会调用完整的GC，直到旧对象的数量达到2.0 *之前。这提高了吞吐量，但增加了总内存使用量。这种设置不利于低资源环境，例如小容器。对于这种情况，可以尝试RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR = 0.9或RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR = 1.2。

参见Ruby 2.1垃圾收集:准备生产，并在工作文章中查看和理解Ruby 2.1垃圾收集器，了解更多细节。

7.6、多进程插件

CPU常常是处理数十亿传入记录的流entd实例的瓶颈。为了利用多个CPU核心，我们建议使用in_multiprocess插件。

in_multiprocess

8、插件管理

本文解释了如何管理Fluentd插件,包括添加第三方插件。

8.1、fluent-gem

fluent- gem命令用于安装Fluentd插件。这是一个围绕gem命令的包装器。

fluent-gem install fluent-plugin-grep

Ruby不保证其主要版本之间的C扩展API兼容性。如果更新Fluentd的Ruby版本，应该重新安装依赖于C扩展的插件。

如果使用td-agent，使用/ usr/sbin/td-agent-gem

如果您正在使用td-agent，请确保使用td-agent的td-agent-gem命令。否则(例如，您使用属于system、rvm等的命令)，您将无法找到您的“installed”插件。
有关更多信息，请参见此FAQ。

8.2、Gem and native extension

一些插件依赖于natvie扩展库。这意味着您需要安装开发包来构建它，例如gcc、make、autoconf等。如果您看到如下的日志，那么在安装插件之前安装开发包。

Building native extensions. This could take a while...
ERROR: Error installing fluent-plugin-twitter:
 ERROR: Failed to build gem native extension.

 /opt/td-agent/embedded/bin/ruby extconf.rb

checking for rb_str_scrub()... yes
creating Makefile

make "DESTDIR = " clean
sh: 1: make: not found

make "DESTDIR = "
sh: 1: make: not found

make failed, exit code 127

Gem files will remain installed in /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/string-scrub-0.0.3 for inspection.
Results logged to /opt/td-agent/embedded/lib/ruby/gems/2.1.0/extensions/x86_64-linux/2.1.0/string-scrub-0.0.3/gem_make.out

8.3 “-p” option

Fluentd的- p选项用于向加载路径添加一个额外的插件目录。例如，如果你输入out_foo.rb插件进入/ path/to/ plugin，您可以加载out_foo.rb插件通过指定- p选项，如下图所示。

fluentd -p /path/to/plugin

您可以多次指定- p选项。

8.4、通过/etc/fluent/plugin添加插件

在默认情况下，Fluentd将/ etc/ fluent/plugin目录添加到它的加载路径。因此，在/ etc/ fluent/plugin中放置的任何附加插件都将自动加载。
例如,如果/etc/fluent/plugin/out_foo.rb存在，您可以在中使用@type foo。

如果使用 td-agent, Use /etc/td-agent/plugin

如果您使用的是td-agent，Fluentd使用/etc/td-agent/plugin目录，而不是/etc/fluent/plugin。请把你的插件放在这里。

8.5、插件版本管理

Fluentd和插件正在进化，因此您可能会遇到最新版本的意外错误，例如，通过新特性的回归，删除已弃用的参数，更改库依赖等。如果您想更新fluentd或插件，请首先检查测试环境中的行为。例如，在每个版本中，td - agent修复了fluentd和plugins版本。
Fluentd插件是rubygems,rubygems默认安装最新版本。因此，我们不建议在生产上执行以下命令:

gem install fluentd
gem install fluent-plugin-elasticsearch
gem update # This is very dangerous. Update all existing gems（这是非常危险的。更新所有现有的gems）

另一个问题：如果您安装的是依赖于fluentd v0.14的插件，即使安装了fluentd v0.12，gem也将fluentd v0.14安装在一起。这是Fluentd v0.12用户意想不到的结果。

您应该使用-v选项指定目标版本。

gem install fluentd -v 0.12.34
gem install fluent-plugin-elasitcsearch -v 1.9.3

/usr/sbin/td-agent-gem也是一样的，因为/usr/sbin/td-agent-gem内部使用gem命令。

8.6、“-gemfile”选项

一个Ruby应用程序使用Gemfile和Bundler来管理gem依赖。Fluentd—gemfile选项采用相同的方法，对于管理与共享gems分离的插件版本很有用。
例如，如果您在/etc/fluent/gemfile中跟踪Gemfile:

source 'https://rubygems.org'

gem 'fluentd', '0.12.34'
gem 'fluent-plugin-elasticsearch', '1.9.3'

您可以通过Gemfile选项将这个--gemfile传递到Fluentd。

fluentd --gemfile /etc/fluent/Gemfile

当指定--gemfile选项时，Fluentd将尝试使用Bundler安装列出的gem。 Fluentd只会加载与共享gems分开的列出的gems，并且还可以防止意外的插件更新。

另外，如果您更新Fluentd的Ruby版本，Bundler将重新安装新的Ruby版本的列出的gems。这样可以避免C扩展API兼容性问题。

9、Fluentd故障排除

9.1、看日志

如果事情没有如预期的那样发生，请先看看您的日志。对于td代理(rpm/deb)，日志位于

/var/log/td-agent/td-agent.log

9.2、打开详细日志记录

如果启用详细日志记录，则可以获得更多关于日志的信息。请按照下面的步骤操作。

rpm

edit /etc/init.d/td-agent
add -vv to TD_AGENT_ARGS
restart td-agent

# at /etc/init.d/td-agent
...
TD_AGENT_ARGS="... -vv"
...

deb

edit /etc/init.d/td-agent
add -vv to DAEMON_ARGS
restart td-agent

# at /etc/init.d/td-agent
...
DAEMON_ARGS="... -vv"
...

gem

Please add -vv to your command line.

$ fluentd .. -vv

9.3、Dump Fluentd内部信息

Fluentd使用sigdump向本地文件DumpFluentd内部信息，如线程Dump、对象分配等。如果您有一个与Fluentd类似的问题，请将SIGCONT发送给Fluentd父进程和子进程。

9.4、高CPU使用率的问题

如果fluentd突然出现意外的高CPU使用率问题，有以下几个原因:

一个插件有一个race condition或类似的错误
依赖的gems有一个错误
有破损数据的正则表达式
系统调用有一个bug，比如有大量文件的inotify

在这种情况下，您可以在最近的Linux上使用perf工具来研究这个问题。参见Linux perf示例页面。
如果您想知道是哪个调用导致了问题，pid2line.rb是有用的。

9.5、检查未捕获日志

您有时会意外地关闭不为零的退出状态，如下所示。

2016-01-01 00:00:00 +0800 [info]: starting fluentd-0.12.28
2016-01-01 00:00:00 +0800 [info]: reading config file path="/etc/td-agent/td-agent.conf"
[...snip...]
2016-01-01 00:00:02 +0800 [info]: process finished code=6

如果在ruby中出现了问题，例如分割错误、C扩展错误等等，那么当fluentd进程被daemonized时，就无法得到完整的日志。

例如，td-agent使用--daemon选项启动fluentd。在td-agent的情况下，您可以使用以下命令获取整个日志，以模拟/etc/init.d/td-agent start而不进行后台管理。

$ sudo LD_PRELOAD=/opt/td-agent/embedded/lib/libjemalloc.so /usr/sbin/td-agent -c /etc/td-agent/td-agent.conf --user td-agent --group td-agent

10、在ssl下转发数据

这是一个快速教程，介绍如何使用安全转发插件来支持fluentd -to- fluentd数据传输的SSL。

它的目的是作为一个快速介绍。全面的文档，包括参数的定义，请结帐了out_secure_forward和in_secure_forward。

10.1、设置:接收机（Receiver）

首先，安装安全转发插件。

Fluentd: gem install fluent-plugin-secure-forward
td-agent v2: /usr/sbin/td-agent-gem install fluent-plugin-secure-forward
td-agent v1: /usr/lib/fluent/ruby/bin/fluent-gem install fluent-plugin-secure-forward

然后，按如下所示设置配置文件：

<source>
 @type secure_forward
 shared_key YOUR_SHARED_KEY
 self_hostname server.fqdn.local
 cert_auto_generate yes
</source>

<match secure.**>
 @type stdout
</match>

这个条件使用out_stdout，将转发的消息打印到STDOUT
（td-agent的日志在var/log/td-agent/td-agent.log）

10.2、设置发送

首先，安装安全转发插件。

Fluentd: fluent-gem install fluent-plugin-secure-forward
td-agent v2: /usr/sbin/td-agent-gem install fluent-plugin-secure-forward
td-agent v1: /usr/lib/fluent/ruby/bin/fluent-gem install fluent-plugin-secure-forward

然后，设置配置文件如下:

:::text
 <source>
 @type forward
 </source>
 <match secure.**>
 @type secure_forward
 shared_key YOUR_SHARED_KEY
 self_hostname "#{Socket.gethostname}"
 <server>
 host RECEIVER_IP
 port 24284
 </server>
</match>

条件是用来将测试数据输入到[in_forward](https://docs.fluentd.org/v0.12/articles/in_forward)的Fluentd中。请确保您的shared_key与接收方相同。

10.3、确认:通过SSL发送事件

在发送方机器上，使用fluent-cat运行以下命令

Fluentd: echo '{"message":"testing the SSL forwarding"}' | fluent-cat --json secure.test
td-agent v2: echo '{"message":"testing the SSL forwarding"}' | /opt/td-agent/embedded/bin/fluent-cat --json secure.test
td-agent v1: echo '{"message":"testing the SSL forwarding"}' | /usr/lib/fluent/ruby/bin/fluent-cat --json secure.test

现在，检查接收方的Fluentd的日志(对于td代理来说，这将是/ var/log/td-agent/td-agent.log)，应该有这样一条线:

2014-10-21 18:18:26 -0400 secure.test: {"message":"testing the SSL forwarding"}

10.4、资源

in_secure_forward
out_secure_forward
the secure forward plugin’s GitHub repo

11、Fluentd UI

fluentd-ui是一个基于浏览器的fluentd和td-agent管理器，支持以下操作。

安装、卸载、升级Fluentd插件
启动/停止/启动fluentd过程
配置Fluentd设置配置文件等内容,pid文件路径等
查看Fluentd日志查看器使用简单的错误

11.1、开始

如果你已经安装了td-agent,你可以开始了td-agent-ui start如下:

$ sudo /usr/sbin/td-agent-ui start
Puma 2.9.2 starting...
* Min threads: 0, max threads: 16
* Environment: production
* Listening on tcp://0.0.0.0:9292

或者如果使用fluentd gem，首先安装fluentd-ui通过gem命令。

$ gem install -V fluentd-ui
$ fluentd-ui start
Puma 2.9.2 starting...
* Min threads: 0, max threads: 16
* Environment: production
* Listening on tcp://0.0.0.0:9292

然后,打开你的浏览器localhost:9292。
默认的账户和密码：admin changeme

11.2、截图

Dashboard

setting

in_tail setting

Plugin

12、命令行选项

本文描述了内置命令及其选项

12.1、fluentd

调用fluentd。这是支持选项:

Usage: fluentd [options]
 -s, --setup [DIR=/etc/fluent] install sample configuration file to the directory`
 -c, --config PATH config file path (default: /etc/fluent/fluent.conf)
 --dry-run Check fluentd setup is correct or not
 -p, --plugin DIR add plugin directory
 -I PATH add library path
 -r NAME load library
 -d, --daemon PIDFILE daemonize fluent process
 --no-supervisor run without fluent supervisor
 --user USER change user
 --group GROUP change group
 -o, --log PATH log file path
 -i CONFIG_STRING, inline config which is appended to the config file on-fly
 --inline-config
 --emit-error-log-interval SECONDS
 suppress interval seconds of emit error logs
 --suppress-repeated-stacktrace [VALUE]
 suppress repeated stacktrace
 --without-source invoke a fluentd without input plugins
 --use-v1-config Use v1 configuration format (default)
 --use-v0-config Use v0 configuration format
 -v, --verbose increase verbose level (-v: debug, -vv: trace)
 -q, --quiet decrease verbose level (-q: warn, -qq: error)
 --suppress-config-dump suppress config dumping when fluentd starts
 -g, --gemfile GEMFILE Gemfile path
 -G, --gem-path GEM_INSTALL_PATH Gemfile install path (default: $(dirname $gemfile)/vendor/bundle)

重要的参数

–suppress-config-dump

Fluentd启动没有配dump。如果您不想在fluentd日志中显示配置，例如，不要显示私钥，那么这个选项是有用的。

–suppress-repeated-stacktrace

如果设置true，在fluentd日志中抑制重复的日志堆栈stacktrace。从v0.12开始，默认情况下这个选项是true的。

–without-source

Fluentd开始时没有输入插件。此选项对于不使用新传入事件的刷新缓冲区非常有用。

-i, –inline-config

如果您在XaaS上使用不支持持久磁盘的fluentd，这个选项是有用的。

–no-supervisor

如果你想使用你的supervisor tools，这个选项可以避免双重 supervisor。

通过配置文件设置

可以通过配置文件设置几个选项。参见配置文件文章。

12.2、fluent-cat

发送事件到fluentd的in_forward / in_unix插件。这对测试很有用。
发送事件到fluentd的in_forward / in_unix插件。这对测试很有用。

Usage: fluent-cat [options] <tag>
 -p, --port PORT fluent tcp port (default: 24224)
 -h, --host HOST fluent host (default: 127.0.0.1)
 -u, --unix use unix socket instead of tcp
 -s, --socket PATH unix socket path (default: /var/run/fluent/fluent.sock)
 -f, --format FORMAT input format (default: json)
 --json same as: -f json
 --msgpack same as: -f msgpack
 --none same as: -f none
 --message-key KEY key field for none format (default: message)

例子

使用debug发送json消息。本地fluentd日志标签:

% echo '{"message":"hello"}' | fluent-cat debug.log

发送到其他机器:

% echo '{"message":"hello"}' | fluent-cat debug.log --host testserver --port 24225