lumberjack
o/~ I'm a lumberjack and I'm ok! I sleep when idle, then I ship logs all day! I parse your logs, I eat the JVM agent for lunch! o/~
Questions and support
If you have questions and cannot find answers, please join the #logstash irc
channel on freenode irc or ask on the logstash-users@googlegroups.com mailing
list.
What is this?
A tool to collect logs locally in preparation for processing elsewhere!
Problem: logstash jar releases are too fat for constrained systems. Until we can comfortably promise logstash executing with less resource usage...
Solution: lumberjack
Configuring
lumberjack is configured with a json file you specify with thei -config flag:
lumberjack -config yourstuff.json
Here's a sample, with comments in-line to describe the settings. Please please
please keep in mind that comments are technically invalid in JSON, so you can't
include them in your config.:
{
# The network section covers network configuration :)
"network": {
# A list of downstream servers listening for our messages.
# lumberjack will pick one at random and only switch if
# the selected one appears to be dead or unresponsive
"servers": [ "localhost:5043" ],
# The path to your client ssl certificate (optional)
"ssl certificate": "./lumberjack.crt",
# The path to your client ssl key (optional)
"ssl key": "./lumberjack.key",
# The path to your trusted ssl CA file. This is used
# to authenticate your downstream server.
"ssl ca": "./lumberjack_ca.crt",
# Network timeout in seconds. This is most important for lumberjack
# determining whether to stop waiting for an acknowledgement from the
# downstream server. If an timeout is reached, lumberjack will assume
# the connection or server is bad and will connect to a server chosen
# at random from the servers list.
"timeout": 15
},
# The list of files configurations
"files": [
# An array of hashes. Each hash tells what paths to watch and
# what fields to annotate on events from those paths.
{
"paths": [
# single paths are fine
"/var/log/messages",
# globs are fine too, they will be periodically evaluated
# to see if any new files match the wildcard.
"/var/log/*.log"
],
# A dictionary of fields to annotate on each event.
"fields": { "type": "syslog" }
}, {
# A path of "-" means stdin.
"paths": [ "-" ],
"fields": { "type": "stdin" }
}, {
"paths": [
"/var/log/apache/httpd-*.log"
],
"fields": { "type:" "apache" }
}
]
}
Goals
- Minimize resource usage where possible (CPU, memory, network).
- Secure transmission of logs.
- Configurable event data.
- Easy to deploy with minimal moving parts.
- Simple inputs only:
- Follows files and respects rename/truncation conditions.
- Accepts
STDIN
, useful for things like varnishlog | lumberjack...
.
Building it
-
Install FPM
$ sudo gem install fpm
-
Install go
-
Compile lumberjack
$ git clone git://github.com/jordansissel/lumberjack.git
$ cd lumberback
$ make
-
Make packages, either:
$ make rpm
Or:
$ make deb
Installing it
Packages install to /opt/lumberjack
. Lumberjack builds all necessary
dependencies itself, so there should be no run-time dependencies you
need.
Running it
Generally:
$ lumberjack.sh --host somehost --port 12345 /var/log/messages
See lumberjack.sh --help
for all the flags
Key points
- You'll need an SSL CA to verify the server (host) with.
- You can specify custom fields with the
--field foo=bar
. Any number of these
may be specified. I use them to set fields like type
and other custom
attributes relevant to each log.
- Any non-flag argument after is considered a file path. You can watch any
number of files.
Use with logstash
In logstash, you'll want to use the lumberjack input, something like:
input {
lumberjack {
# The port to listen on
port => 12345
# The paths to your ssl cert and key
ssl_certificate => "path/to/ssl.crt"
ssl_key => "path/to/ssl.key"
# Set this to whatever you want.
type => "somelogs"
}
}
Implementation details
Below is valid as of 2012/09/19
Minimize resource usage
- Sets small resource limits (memory, open files) on start up based on the
number of files being watched.
- CPU: sleeps when there is nothing to do.
- Network/CPU: sleeps if there is a network failure.
- Network: uses zlib for compression.
Secure transmission
- Uses OpenSSL to verify the server certificates (so you know who you
are sending to).
- Uses OpenSSL to transport logs.
Configurable event data
- The protocol lumberjack uses supports sending a
string:string
map.
- The lumberjack tool lets you specify arbitrary extra data with
--field name=value
.
Easy deployment
- All dependencies are built at compile-time (OpenSSL, jemalloc, etc) because many os distributions lack these dependencies.
- The
make deb
or make rpm
commands will package everything into a
single DEB or RPM.
- The
bin/lumberjack.sh
script makes sure the dependencies are found
when run in production.
Future functional features
- Re-evaluate globs periodically to look for new log files.
- Track position of in the log.
Future protocol discussion
I would love to not have a custom protocol, but nothing I've found implements
what I need, which is: encrypted, trusted, compressed, latency-resilient, and
reliable transport of events.
- Redis development refuses to accept encryption support, would likely reject
compression as well.
- ZeroMQ lacks authentication, encryption, and compression.
- Thrift also lacks authentication, encryption, and compression, and also is an
RPC framework, not a streaming system.
- Websockets don't do authentication or compression, but support encrypted
channels with SSL. Websockets also require XORing the entire payload of all
messages - wasted energy.
- SPDY is still changing too frequently and is also RPC. Streaming requires
custom framing.
- HTTP is RPC and very high overhead for small events (uncompressable headers,
etc). Streaming requires custom framing.
License
See LICENSE file.