lumberjack

command module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 16, 2013 License: Apache-2.0 Imports: 21 Imported by: 0

README

lumberjack

o/~ I'm a lumberjack and I'm ok! I sleep when idle, then I ship logs all day! I parse your logs, I eat the JVM agent for lunch! o/~

Questions and support

If you have questions and cannot find answers, please join the #logstash irc channel on freenode irc or ask on the logstash-users@googlegroups.com mailing list.

What is this?

A tool to collect logs locally in preparation for processing elsewhere!

Problem: logstash jar releases are too fat for constrained systems. Until we can comfortably promise logstash executing with less resource usage...

Solution: lumberjack

Configuring

lumberjack is configured with a json file you specify with thei -config flag:

lumberjack -config yourstuff.json

Here's a sample, with comments in-line to describe the settings. Please please please keep in mind that comments are technically invalid in JSON, so you can't include them in your config.:

{
  # The network section covers network configuration :)
  "network": {
    # A list of downstream servers listening for our messages.
    # lumberjack will pick one at random and only switch if
    # the selected one appears to be dead or unresponsive
    "servers": [ "localhost:5043" ],

    # The path to your client ssl certificate (optional)
    "ssl certificate": "./lumberjack.crt",
    # The path to your client ssl key (optional)
    "ssl key": "./lumberjack.key",

    # The path to your trusted ssl CA file. This is used
    # to authenticate your downstream server.
    "ssl ca": "./lumberjack_ca.crt",

    # Network timeout in seconds. This is most important for lumberjack
    # determining whether to stop waiting for an acknowledgement from the
    # downstream server. If an timeout is reached, lumberjack will assume
    # the connection or server is bad and will connect to a server chosen
    # at random from the servers list.
    "timeout": 15
  },

  # The list of files configurations
  "files": [
    # An array of hashes. Each hash tells what paths to watch and
    # what fields to annotate on events from those paths.
    {
      "paths": [ 
        # single paths are fine
        "/var/log/messages",
        # globs are fine too, they will be periodically evaluated
        # to see if any new files match the wildcard.
        "/var/log/*.log"
      ],

      # A dictionary of fields to annotate on each event.
      "fields": { "type": "syslog" }
    }, {
      # A path of "-" means stdin.
      "paths": [ "-" ],
      "fields": { "type": "stdin" }
    }, {
      "paths": [
        "/var/log/apache/httpd-*.log"
      ],
      "fields": { "type:" "apache" }
    }
  ]
}
Goals
  • Minimize resource usage where possible (CPU, memory, network).
  • Secure transmission of logs.
  • Configurable event data.
  • Easy to deploy with minimal moving parts.
  • Simple inputs only:
    • Follows files and respects rename/truncation conditions.
    • Accepts STDIN, useful for things like varnishlog | lumberjack....

Building it

  1. Install FPM

     $ sudo gem install fpm
    
  2. Install go

  3. Compile lumberjack

     $ git clone git://github.com/jordansissel/lumberjack.git
     $ cd lumberback
     $ make
    
  4. Make packages, either:

     $ make rpm
    

    Or:

     $ make deb
    

Installing it

Packages install to /opt/lumberjack. Lumberjack builds all necessary dependencies itself, so there should be no run-time dependencies you need.

Running it

Generally:

$ lumberjack.sh --host somehost --port 12345 /var/log/messages

See lumberjack.sh --help for all the flags

Key points
  • You'll need an SSL CA to verify the server (host) with.
  • You can specify custom fields with the --field foo=bar. Any number of these may be specified. I use them to set fields like type and other custom attributes relevant to each log.
  • Any non-flag argument after is considered a file path. You can watch any number of files.

Use with logstash

In logstash, you'll want to use the lumberjack input, something like:

input {
  lumberjack {
    # The port to listen on
    port => 12345

    # The paths to your ssl cert and key
    ssl_certificate => "path/to/ssl.crt"
    ssl_key => "path/to/ssl.key"

    # Set this to whatever you want.
    type => "somelogs"
  }
}

Implementation details

Below is valid as of 2012/09/19

Minimize resource usage
  • Sets small resource limits (memory, open files) on start up based on the number of files being watched.
  • CPU: sleeps when there is nothing to do.
  • Network/CPU: sleeps if there is a network failure.
  • Network: uses zlib for compression.
Secure transmission
  • Uses OpenSSL to verify the server certificates (so you know who you are sending to).
  • Uses OpenSSL to transport logs.
Configurable event data
  • The protocol lumberjack uses supports sending a string:string map.
  • The lumberjack tool lets you specify arbitrary extra data with --field name=value.
Easy deployment
  • All dependencies are built at compile-time (OpenSSL, jemalloc, etc) because many os distributions lack these dependencies.
  • The make deb or make rpm commands will package everything into a single DEB or RPM.
  • The bin/lumberjack.sh script makes sure the dependencies are found when run in production.
Future functional features
  • Re-evaluate globs periodically to look for new log files.
  • Track position of in the log.
Future protocol discussion

I would love to not have a custom protocol, but nothing I've found implements what I need, which is: encrypted, trusted, compressed, latency-resilient, and reliable transport of events.

  • Redis development refuses to accept encryption support, would likely reject compression as well.
  • ZeroMQ lacks authentication, encryption, and compression.
  • Thrift also lacks authentication, encryption, and compression, and also is an RPC framework, not a streaming system.
  • Websockets don't do authentication or compression, but support encrypted channels with SSL. Websockets also require XORing the entire payload of all messages - wasted energy.
  • SPDY is still changing too frequently and is also RPC. Streaming requires custom framing.
  • HTTP is RPC and very high overhead for small events (uncompressable headers, etc). Streaming requires custom framing.

License

See LICENSE file.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
go
src

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL