weblog

package
v0.0.0-...-6ade924 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 8, 2022 License: GPL-3.0 Imports: 13 Imported by: 0

README

Web log (Apache, NGINX) monitoring with Netdata

This module parses Apache and NGINX web servers logs.

Metrics

All metrics have "web_log." prefix.

Metric Scope Dimensions Units
requests global requests requests/s
excluded_requests global unmatched requests/s
type_requests global success, bad, redirect, error requests/s
status_code_class_responses global 1xx, 2xx, 3xx, 4xx, 5xx responses/s
status_code_class_1xx_responses global a dimension per 1xx code responses/s
status_code_class_2xx_responses global a dimension per 2xx code responses/s
status_code_class_3xx_responses global a dimension per 3xx code responses/s
status_code_class_4xx_responses global a dimension per 4xx code responses/s
status_code_class_5xx_responses global a dimension per 5xx code responses/s
bandwidth global received, sent kilobits/s
request_processing_time global min, max, avg milliseconds
requests_processing_time_histogram global a dimension per bucket requests/s
upstream_response_time global min, max, avg milliseconds
upstream_responses_time_histogram global a dimension per bucket requests/s
current_poll_uniq_clients global ipv4, ipv6 clients
vhost_requests global a dimension per vhost requests/s
port_requests global a dimension per port requests/s
scheme_requests global http, https requests/s
http_method_requests global a dimension per HTTP method requests/s
http_version_requests global a dimension per HTTP version requests/s
ip_proto_requests global ipv4, ipv6 requests/s
ssl_proto_requests global a dimension per SSL protocol requests/s
ssl_cipher_suite_requests global a dimension per SSL cipher suite requests/s
url_pattern_requests global a dimension per URL pattern requests/s
custom_field_pattern_requests global a dimension per custom field pattern requests/s
custom_time_field_summary custom time field min, max, avg milliseconds
custom_time_field_histogram custom time field a dimension per bucket observations
url_pattern_status_code_responses URL pattern a dimension per pattern responses/s
url_pattern_http_method_requests URL pattern a dimension per HTTP method requests/s
url_pattern_bandwidth URL pattern received, sent kilobits/s
url_pattern_request_processing_time URL pattern min, max, avg milliseconds

Log Parsers

Weblog supports 4 different log parsers:

Try to avoid using RegExp because it's much slower than the other parsers. Prefer to use LTSV or CSV parser.

There is an example job for every log parser.

jobs:
  - name: csv_parser_example
    path: /path/to/file.log
    log_type: csv
    csv_config:
      format: 'FORMAT'
      fields_per_record: -1
      delimiter: ' '
      trim_leading_space: no

  - name: json_parser_example
    path: /path/to/file.log
    log_type: json
    json_config:
      mapping:
        label1: field1
        label2: field2

  - name: ltsv_parser_example
    path: /path/to/file.log
    log_type: ltsv
    ltsv_config:
      field_delimiter: ' '
      value_delimiter: ':'
      mapping:
        label1: field1
        label2: field2

  - name: regexp_parser_example
    path: /path/to/file.log
    log_type: regexp
    regexp_config:
      pattern: 'PATTERN'

Log Parser Auto-Detection

If log_type parameter set to auto (which is default), weblog will try to auto-detect appropriate log parser and log format using the last line of the log file.

  • checks if format is CSV (using regexp).
  • checks if format is JSON (using regexp).
  • assumes format is CSV and tries to find appropriate CSV log format using predefind list of formats. It tries to parse the line using each of them in the following order:
$host:$server_port $remote_addr - - [$time_local] "$request" $status $body_bytes_sent - - $request_length $request_time $upstream_response_time
$host:$server_port $remote_addr - - [$time_local] "$request" $status $body_bytes_sent - - $request_length $request_time
$host:$server_port $remote_addr - - [$time_local] "$request" $status $body_bytes_sent     $request_length $request_time $upstream_response_time
$host:$server_port $remote_addr - - [$time_local] "$request" $status $body_bytes_sent     $request_length $request_time
$host:$server_port $remote_addr - - [$time_local] "$request" $status $body_bytes_sent
                   $remote_addr - - [$time_local] "$request" $status $body_bytes_sent - - $request_length $request_time $upstream_response_time
                   $remote_addr - - [$time_local] "$request" $status $body_bytes_sent - - $request_length $request_time
                   $remote_addr - - [$time_local] "$request" $status $body_bytes_sent     $request_length $request_time $upstream_response_time
                   $remote_addr - - [$time_local] "$request" $status $body_bytes_sent     $request_length $request_time
                   $remote_addr - - [$time_local] "$request" $status $body_bytes_sent

The first one matches is used later. If you use default Apache/NGINX log format auto-detect will do for you. If it doesn't work you need to set format manually.

Known Fields

These are NGINX and Apache log format variables.

Weblog is aware how to parse and interpret the fields:

nginx apache description
$host ($http_host) %v Name of the server which accepted a request.
$server_port %p Port of the server which accepted a request.
$scheme - Request scheme. "http" or "https".
$remote_addr %a (%h) Client address.
$request %r Full original request line. The line is "$request_method $request_uri $server_protocol".
$request_method %m Request method. Usually "GET" or "POST".
$request_uri %U Full original request URI.
$server_protocol %H Request protocol. Usually "HTTP/1.0", "HTTP/1.1", or "HTTP/2.0".
$status %s (%>s) Response status code.
$request_length %I Bytes received from a client, including request and headers.
$bytes_sent %O Bytes sent to a client, including request and headers.
$body_bytes_sent %B (%b) Bytes sent to a client, not counting the response header.
$request_time %D Request processing time.
$upstream_response_time - Time spent on receiving the response from the upstream server.
$ssl_protocol - Protocol of an established SSL connection.
$ssl_cipher - String of ciphers used for an established SSL connection.

In addition to that weblog understands user defined fields.

Notes:

  • Apache %h logs the IP address if HostnameLookups is Off. The web log collector counts hostnames as IPv4 addresses. We recommend either to disable HostnameLookups or use %a instead of %h.
  • Since httpd 2.0, unlike 1.3, the %b and %B format strings do not represent the number of bytes sent to the client, but simply the size in bytes of the HTTP response. It will differ, for instance, if the connection is aborted, or if SSL is used. The %O format provided by mod_logio will log the actual number of bytes sent over the network.
  • To get %I and %O working you need to enable mod_logio on Apache.
  • NGINX logs URI with query parameters, Apache doesnt.
  • $request is parsed into $request_method, $request_uri and $server_protocol. If you have $request in your log format, there is no sense to have others.
  • Don't use both $bytes_sent and $body_bytes_sent (%O and %B or %b). The module does not distinguish between these parameters.

Custom Log Format

Custom log format is easy. Use known fields to construct your log format.

  • If using CSV parser

Since weblog understands NGINX and Apache variables all you need is to copy your log format and... that is it! If there is a field that is not known by the weblog it's not a problem. It will skip it during parsing. We suggest replace all unknown fields with - for optimization purposes.

Let's take as an example some non default format.

# apache
LogFormat "\"%{Referer}i\" \"%{User-agent}i\" %h %l %u %t \"%r\" %>s %b" custom

# nginx
log_format custom '"$http_referer" "$http_user_agent" '
                  '$remote_addr - $remote_user [$time_local] '
                  '"$request" $status $body_bytes_sent'

To get it working we need to copy the format without any changes (make it a line for nginx). Replacing unknown fields is optional but recommended.

Special case:

Both %t and $time_local fields represent time in Common Log Format. It is a special case because it's in fact 2 fields after csv parse (ex.: [22/Mar/2009:09:30:31 +0100]). Weblog understands it, and you don't need to replace it with - (if we want to do it we need to make it - -).

jobs:
  - name: apache_csv_custom_format_example
    path: /path/to/file.log
    log_type: csv
    csv_config:
      format: '- - %h - - %t \"%r\" %>s %b'

  - name: nginx_csv_custom_format_example
    path: /path/to/file.log
    log_type: csv
    csv_config:
      format: '- - $remote_addr - - [$time_local] "$request" $status $body_bytes_sent'
  • If using JSON parser

Provide fields mapping if needed. Don't use $ and % prefixes for mapped field names. They are only needed in CSV format.

  • If using LTSV parser

Provide fields mapping if needed. Don't use $ and % prefixes for mapped field names. They are only needed in CSV format.

  • If using RegExp parser

Use pattern with subexpressions names. These names should be known by weblog.

Custom Fields Feature

Weblog is able to extract user defined fields and count patterns matches against these fields.

This feature needs:

  • custom log format with user defined fields
  • list of patterns to match against appropriate fields

Pattern syntax: matcher.

There is an example with 2 custom fields - $http_referer and $http_user_agent. Weblog is unaware of these fields, but we still can get some info from them.

  - name: nginx_csv_custom_fields_example
    path: /path/to/file.log
    log_type: csv
    csv_config:
      format: '- - $remote_addr - - [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
    custom_fields:
      - name: http_referer     # same name as in 'format' without $
        patterns:
          - name: cacti
            match: '~ cacti'
          - name: observium
            match: '~ observium'
      - name: http_user_agent  # same name as in 'format' without $
        patterns:
          - name: android
            match: '~ Android'
          - name: iphone
            match: '~ iPhone'
          - name: other
            match: '* *'

Custom time fields feature

The web log collector is also able to extract user defined time fields and could count min/avg/max + histogram against these fields.

This feature needs:

  • A custom log format with user-defined time fields.
  • A histogram to show response time in seconds, which is optional.

As an example, Apache mod_logio adds a ^FB logging directive. This value shows a delay in microseconds between when the request arrived, and the first byte of the response headers are written.

As with the custom fields feature, Netdata's web log collector is unaware of these fields, but we can still get some info from them.

  - name: apache_csv_custom_fields_example
    path: /path/to/file.log
    log_type: csv
    csv_config:
      format: '%v %a %p %m %H \"%U\" %t %>s %O %I %D %^FB \"%{Referer}i\" \"%{User-Agent}i\" \"%r\"'
    custom_time_fields:
      - name: '^FB'
        histogram: [.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10] # optional field

Configuration

Edit the go.d/web_log.conf configuration file using edit-config from the Netdata config directory, which is typically at /etc/netdata.

cd /etc/netdata # Replace this path with your Netdata config directory
sudo ./edit-config go.d/web_log.conf

This module needs only path to log file. If it fails to auto-detect your log format you need to set it manually.

jobs:
  - name: nginx
    path: /var/log/nginx/access.log

  - name: apache
    path: /var/log/apache2/access.log
    log_type: csv
    csv_config:
      format: '- - %h - - %t \"%r\" %>s %b'

For all available options, please see the module configuration file.

Troubleshooting

To troubleshoot issues with the web_log collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn't working.

  • Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that's not the case on your system, open netdata.conf and look for the plugins setting under [directories].

    cd /usr/libexec/netdata/plugins.d/
    
  • Switch to the netdata user.

    sudo -u netdata -s
    
  • Run the go.d.plugin to debug the collector:

    ./go.d.plugin -d -m web_log
    

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Chart

type Chart = module.Chart

type Charts

type Charts = module.Charts

type Config

type Config struct {
	Parser           logs.ParserConfig `yaml:",inline"`
	Path             string            `yaml:"path"`
	ExcludePath      string            `yaml:"exclude_path"`
	URLPatterns      []userPattern     `yaml:"url_patterns"`
	CustomFields     []customField     `yaml:"custom_fields"`
	CustomTimeFields []customTimeField `yaml:"custom_time_fields"`
	Histogram        []float64         `yaml:"histogram"`
	GroupRespCodes   bool              `yaml:"group_response_codes"`
}

type Dim

type Dim = module.Dim

type Dims

type Dims = module.Dims

type WebLog

type WebLog struct {
	module.Base
	Config `yaml:",inline"`
	// contains filtered or unexported fields
}

func New

func New() *WebLog

func (*WebLog) Charts

func (w *WebLog) Charts() *module.Charts

func (*WebLog) Check

func (w *WebLog) Check() bool

func (*WebLog) Cleanup

func (w *WebLog) Cleanup()

func (*WebLog) Collect

func (w *WebLog) Collect() map[string]int64

func (*WebLog) Init

func (w *WebLog) Init() bool

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL