parser

package module
v0.0.21 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 19, 2024 License: MIT Imports: 17 Imported by: 1

README

access-log-parser

CI codecov Go Reference Go Report Card

Simple access log parser utilities written in Go

Features

  • Flexible serialization of log lines
  • Streaming processing support
  • Line filtering by filter expressions like size < 100 method == GET remote_host =~ ^192.168.
  • Display column selection by field name
  • Line skipping by line number
  • Customization by handler functions
  • Various preset constructors for well-known log formats
  • LTSV format support

Usage

Example

Output format

Parsed log lines are sequentially output to Writer. After the parsing is finished, the total result is output.

// Result encapsulates the outcomes of parsing operations, detailing matched, unmatched, excluded,
// and skipped line counts, along with processing time and source information.
type Result struct {
	Total       int           `json:"total"`                // Total number of processed lines.
	Matched     int           `json:"matched"`              // Count of lines that matched the patterns.
	Unmatched   int           `json:"unmatched"`            // Count of lines that did not match any patterns.
	Excluded    int           `json:"excluded"`             // Count of lines excluded based on keyword search.
	Skipped     int           `json:"skipped"`              // Count of lines skipped explicitly.
	ElapsedTime time.Duration `json:"elapsedTime"`          // Processing time for the log data.
	Source      string        `json:"source"`               // Source of the log data.
	ZipEntries  []string      `json:"zipEntries,omitempty"` // List of processed zip entries, if applicable.
	Errors      []Errors      `json:"errors"`               // Collection of errors encountered during parsing.
	inputType   inputType     `json:"-"`                    // Type of input being processed.
}

// Errors stores information about log lines that couldn't be parsed
// according to the provided patterns. This helps in tracking and analyzing
// log lines that do not conform to expected formats.
type Errors struct {
	Entry      string `json:"entry,omitempty"` // Optional entry name if the log came from a zip file.
	LineNumber int    `json:"lineNumber"`      // Line number of the problematic log entry.
	Line       string `json:"line"`            // Content of the problematic log line.
}

The struct Result implements fmt.Stringer as follows:

/* SUMMARY */

+-------+---------+-----------+----------+---------+-------------+--------------------------------+
| Total | Matched | Unmatched | Excluded | Skipped | ElapsedTime | Source                         |
+-------+---------+-----------+----------+---------+-------------+--------------------------------+
|     5 |       4 |         1 |        0 |       0 | 1.16375ms   | sample_s3_contains_unmatch.log |
+-------+---------+-----------+----------+---------+-------------+--------------------------------+

Total     : Total number of log line processed
Matched   : Number of log line that successfully matched pattern
Unmatched : Number of log line that did not match any pattern
Excluded  : Number of log line that did not extract by filter expressions
Skipped   : Number of log line that skipped by line number

/* UNMATCH LINES */

+------------+------------------------------------------------------------------------------------------------------+
| LineNumber | Line                                                                                                 |
+------------+------------------------------------------------------------------------------------------------------+
|          4 | d45e67fa89b012c3a45678901b234c56d78a90f12b3456789a012345c6789d01 awsrandombucket89 [03/Feb/2019:03:5 |
|            | 4:33 +0000] 192.0.2.76 d45e67fa89b012c3a45678901b234c56d78a90f12b3456789a012345c6789d01 7B4A0FABBEXA |
|            | MPLE REST.GET.VERSIONING - "GET /awsrandombucket89?versioning HTTP/1.1" 200 - 113 - 33 - "-" "S3Cons |
|            | ole/0.4"                                                                                             |
+------------+------------------------------------------------------------------------------------------------------+

LineNumber : Line number of the log that did not match any pattern
Line       : Raw log line that did not match any pattern

Customize

The processing of each matched row can be overridden.

p := parser.NewRegexParser(ctx, os.Stdout, parser.Option{
	LineHandler: yourCustomLineHandler,
})

The following function type must be followed:

// LineHandler is a function type that processes each matched line.
type LineHandler func(labels, values []string, isFirst bool) (string, error)

[!NOTE] The reason we did not use maps is that the measured results were almost identical when the overhead of setting the order keep is taken into account. (However, we did not take a very rigorous benchmark.)

The following handlers are preset:

  • JSON (default): JSONLineHandler
  • Pretty JSON: PrettyJSONLineHandler
  • key=value pair: KeyValuePairLineHandler
  • LTSV: LTSVLineHandler
  • TSV: TSVLineHandler

Preset Constructors

Functions are provided by default to instantiate the following parsers:

  • Apache common/combined log format: NewApacheCLFRegexParser()
  • Apache common/combined log format with virtual host: NewApacheCLFWithVHostRegexParser()
  • Amazon S3 access log format: NewS3RegexParser()
  • Amazon CloudFront access log format: NewCFRegexParser()
  • AWS Application Load Balancer access log format: NewALBRegexParser()
  • AWS Network Load Balancer access log format: NewNLBRegexParser()
  • AWS Classic Load Balancer access log format: NewCLBRegexParser()

Sample

alpen is an application for parsing and encoding various access logs.

Todo

  • Support for time in filter expressions like: time < 1710141640
  • Refine the specification to allow KeyValuePairLineHandler to be used as logfmt

Author

nekrassov01

License

MIT

Documentation

Overview

Package parser provides utilities to read and parse logs from various inputs (standard input, plain text, gzip, zip) and convert them to structured formats such as JSON and LTSV. It is simple yet sophisticated, filtering by labels, extracting rows by filter expressions, aggregating results, and applying custom conversion functions.

Index

Constants

View Source
const Version = "0.0.21"

Version of access-log-parser.

Variables

This section is empty.

Functions

func JSONLineHandler added in v0.0.11

func JSONLineHandler(labels, values []string, _ bool) (string, error)

JSONLineHandler serializes log lines into JSON (NDJSON) format. It keywords the line number if specified. Labels and values are combined into key-value pairs, and the result is a single JSON object.

func KeyValuePairLineHandler added in v0.0.11

func KeyValuePairLineHandler(labels, values []string, _ bool) (string, error)

KeyValuePairLineHandler converts log lines into a space-separated string of key-value pairs.

func LTSVLineHandler added in v0.0.11

func LTSVLineHandler(labels, values []string, _ bool) (string, error)

LTSVLineHandler formats log lines as LTSV (Labeled Tab-separated Values).

func PrettyJSONLineHandler added in v0.0.11

func PrettyJSONLineHandler(labels, values []string, _ bool) (string, error)

PrettyJSONLineHandler enhances JSONLineHandler by formatting the output for readability. It uses indentation and new lines.

func TSVLineHandler added in v0.0.15

func TSVLineHandler(labels, values []string, isFirst bool) (string, error)

TSVLineHandler formats log lines as TSV (Tab-separated Values).

Types

type Errors added in v0.0.18

type Errors struct {
	Entry      string `json:"entry,omitempty"` // Optional entry name if the log came from a zip file.
	LineNumber int    `json:"lineNumber"`      // Line number of the problematic log entry.
	Line       string `json:"line"`            // Content of the problematic log line.
}

Errors stores information about log lines that couldn't be parsed according to the provided patterns. This helps in tracking and analyzing log lines that do not conform to expected formats.

type LTSVParser added in v0.0.11

type LTSVParser struct {
	// contains filtered or unexported fields
}

LTSVParser implements the Parser interface for parsing logs in LTSV (Labeled Tab-separated Values) format. It allows customization of line handling for LTSV formatted data.

func NewLTSVParser added in v0.0.11

func NewLTSVParser(ctx context.Context, w io.Writer, opt Option) *LTSVParser

NewLTSVParser initializes a new LTSVParser with default handlers for line decoding, line handling. This parser is specifically tailored for LTSV formatted log data.

func (*LTSVParser) Parse added in v0.0.11

func (p *LTSVParser) Parse(reader io.Reader) (*Result, error)

Parse processes log data from an io.Reader, applying the configured line handlers. This method supports context cancellation, prefixing of lines, and exclusion of specific lines.

func (*LTSVParser) ParseFile added in v0.0.11

func (p *LTSVParser) ParseFile(filePath string) (*Result, error)

ParseFile reads and parses log data from a file, leveraging the configured patterns and handlers. This method simplifies file-based LTSV log parsing with automatic line processing.

func (*LTSVParser) ParseGzip added in v0.0.11

func (p *LTSVParser) ParseGzip(gzipPath string) (*Result, error)

ParseGzip processes gzip-compressed log data, extending the parser's capabilities to compressed LTSV logs. It applies skip lines and line number handling as configured for gzip-compressed files.

func (*LTSVParser) ParseString added in v0.0.11

func (p *LTSVParser) ParseString(s string) (*Result, error)

ParseString processes a log string directly, applying configured skip lines and line number handling. It's designed for quick parsing of a single LTSV formatted log string.

func (*LTSVParser) ParseZipEntries added in v0.0.11

func (p *LTSVParser) ParseZipEntries(zipPath, globPattern string) (*Result, error)

ParseZipEntries processes log data within zip archive entries, applying skip lines, line number handling, and optional glob pattern matching. This method is ideal for batch processing of LTSV logs in zip files.

type LineHandler

type LineHandler func(labels, values []string, isFirst bool) (string, error)

LineHandler is a function type that processes each matched line.

type Option added in v0.0.6

type Option struct {
	Labels       []string    // specify fields to output by label name
	Filters      []string    // conditional expression for output log lines
	SkipLines    []int       // line numbers to exclude from output (not index)
	Prefix       bool        // whether to prefix the output lines or not
	UnmatchLines bool        // whether to output unmatched lines as raw logs or not
	LineNumber   bool        // whether to add line numbers or not
	LineHandler  LineHandler // handler function to convert log lines
}

Option defines the parser settings. Each field is used to customize the output.

type Parser

type Parser interface {
	Parse(reader io.Reader) (*Result, error)
	ParseString(s string) (*Result, error)
	ParseFile(filePath string) (*Result, error)
	ParseGzip(gzipPath string) (*Result, error)
	ParseZipEntries(zipPath, globPattern string) (*Result, error)
}

Parser interface defines methods for parsing log data from various sources. Basically used internally to implement RegexParser and LTSVParser.

type RegexParser added in v0.0.11

type RegexParser struct {
	// contains filtered or unexported fields
}

RegexParser implements the Parser interface using regular expressions to parse log data. It allows customization of line handling as well as pattern matching.

func NewALBRegexParser added in v0.0.11

func NewALBRegexParser(ctx context.Context, w io.Writer, opt Option) *RegexParser

NewALBRegexParser initializes a new RegexParser for parsing AWS Application Load Balancer (ALB) access logs. It comes preconfigured with patterns designed to parse ALB logs, making it easier to extract useful data from ALB logs.

func NewApacheCLFRegexParser added in v0.0.11

func NewApacheCLFRegexParser(ctx context.Context, w io.Writer, opt Option) *RegexParser

NewApacheCLFRegexParser initializes a new RegexParser specifically for parsing Apache Common Log Format (CLF) logs. It preconfigures the parser with regular expression patterns that match the Apache CLF log format.

func NewApacheCLFWithVHostRegexParser added in v0.0.11

func NewApacheCLFWithVHostRegexParser(ctx context.Context, w io.Writer, opt Option) *RegexParser

NewApacheCLFWithVHostRegexParser initializes a new RegexParser for parsing Apache logs with Virtual Host information. It extends the Apache CLF parser to include patterns that also capture the virtual host of each log entry.

func NewCFRegexParser added in v0.0.11

func NewCFRegexParser(ctx context.Context, w io.Writer, opt Option) *RegexParser

NewCFRegexParser initializes a new RegexParser for parsing Amazon CloudFront logs. It keywords patterns tailored to the CloudFront log format, simplifying the parsing of CloudFront access logs.

func NewCLBRegexParser added in v0.0.11

func NewCLBRegexParser(ctx context.Context, w io.Writer, opt Option) *RegexParser

NewCLBRegexParser initializes a new RegexParser for parsing AWS Classic Load Balancer (CLB) access logs. It provides patterns that are tailored to the CLB log format, enabling efficient parsing of CLB logs.

func NewNLBRegexParser added in v0.0.11

func NewNLBRegexParser(ctx context.Context, w io.Writer, opt Option) *RegexParser

NewNLBRegexParser initializes a new RegexParser for parsing AWS Network Load Balancer (NLB) access logs. This parser is equipped with patterns that are specifically designed for the NLB log format.

func NewRegexParser added in v0.0.11

func NewRegexParser(ctx context.Context, w io.Writer, opt Option) *RegexParser

NewRegexParser initializes a new RegexParser with default handlers for line decoding, line handling. It's ready to use with additional pattern setup.

func NewS3RegexParser added in v0.0.11

func NewS3RegexParser(ctx context.Context, w io.Writer, opt Option) *RegexParser

NewS3RegexParser initializes a new RegexParser for parsing Amazon S3 access logs. It is preconfigured with patterns that match the S3 access log format, facilitating easy parsing of S3 logs.

func (*RegexParser) AddPattern added in v0.0.11

func (p *RegexParser) AddPattern(pattern string) error

AddPattern adds a new regular expression pattern to the parser's pattern list. It validates the pattern to ensure it has named capture groups for structured parsing.

func (*RegexParser) AddPatterns added in v0.0.11

func (p *RegexParser) AddPatterns(patterns []string) error

AddPatterns adds multiple regular expression patterns to the parser's list. It leverages AddPattern for individual pattern validation and addition.

func (*RegexParser) Parse added in v0.0.11

func (p *RegexParser) Parse(reader io.Reader) (*Result, error)

Parse processes log data from an io.Reader, applying configured patterns and handlers. It supports context cancellation, prefixing, and exclusion of lines.

func (*RegexParser) ParseFile added in v0.0.11

func (p *RegexParser) ParseFile(filePath string) (*Result, error)

ParseFile processes log data from a file, applying skip lines and line number handling. It leverages the parser's configured patterns and handlers for file-based log parsing.

func (*RegexParser) ParseGzip added in v0.0.11

func (p *RegexParser) ParseGzip(gzipPath string) (*Result, error)

ParseGzip processes gzip-compressed log data, applying skip lines and line number handling. It utilizes the parser's configurations for compressed log parsing.

func (*RegexParser) ParseString added in v0.0.11

func (p *RegexParser) ParseString(s string) (*Result, error)

ParseString processes a single log string, applying skip lines and line number handling. It's a convenience method for quick string parsing with the configured parser instance.

func (*RegexParser) ParseZipEntries added in v0.0.11

func (p *RegexParser) ParseZipEntries(zipPath, globPattern string) (*Result, error)

ParseZipEntries processes log data within zip archive entries, applying skip lines, line number handling, and glob pattern matching. It extends the parser's capabilities to zip-compressed logs.

func (*RegexParser) Patterns added in v0.0.18

func (p *RegexParser) Patterns() []*regexp.Regexp

Patterns returns the list of regular expression patterns currently configured in the parser.

type Result

type Result struct {
	Total       int           `json:"total"`                // Total number of processed lines.
	Matched     int           `json:"matched"`              // Count of lines that matched the patterns.
	Unmatched   int           `json:"unmatched"`            // Count of lines that did not match any patterns.
	Excluded    int           `json:"excluded"`             // Count of lines excluded based on keyword search.
	Skipped     int           `json:"skipped"`              // Count of lines skipped explicitly.
	ElapsedTime time.Duration `json:"elapsedTime"`          // Processing time for the log data.
	Source      string        `json:"source"`               // Source of the log data.
	ZipEntries  []string      `json:"zipEntries,omitempty"` // List of processed zip entries, if applicable.
	Errors      []Errors      `json:"errors"`               // Collection of errors encountered during parsing.
	// contains filtered or unexported fields
}

Result encapsulates the outcomes of parsing operations, detailing matched, unmatched, excluded, and skipped line counts, along with processing time and source information.

func (*Result) String added in v0.0.18

func (r *Result) String() string

String generates a summary report of the parsing process, including a table of unmatched lines and a summary of counts.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL