harvest

module

v0.16.0 Latest Latest Go to latest Published: Oct 18, 2019 License: MIT

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/k1LoW/harvest

Links

Open Source Insights

README ¶

Harvest

Portable log aggregation tool for middle-scale system operation/troubleshooting.

screencast

Harvest provides the hrv command with the following features.

Agentless.
Portable.
Only 1 config file.
Fetch various remote/local log data via SSH/exec/Kubernetes API. ( hrv fetch )
Output all fetched logs in the order of timestamp. ( hrv cat )
Stream various remote/local logs via SSH/exec/Kubernetes API. ( hrv stream )
Copy remote/local raw logs via SSH/exec. ( hrv cp )

Quick Start ( for Kubernetes )

$ hrv generate-k8s-config > cluster.yml
$ hrv stream -c cluster.yml --tag='kube_apiserver or coredns' --with-path --with-timestamp

Usage

🪲 Fetch and output remote/local log data

1. Set log sources (and log type) in config.yml

---
targetSets:
  -
    description: webproxy syslog
    type: syslog
    sources:
      - 'ssh://webproxy.example.com/var/log/syslog*'
    tags:
      - webproxy
      - syslog
  -
    description: webproxy NGINX access log
    type: combinedLog
    sources:
      - 'ssh://webproxy.example.com/var/log/nginx/access_log*'
    tags:
      - webproxy
      - nginx
  -
    description: app log
    type: regexp
    regexp: 'time:([^\t]+)'
    timeFormat: 'Jan 02 15:04:05' # Golang time format and 'unixtime'
    timeZone: '+0900'
    sources:
      - 'ssh://app-1.example.com/var/log/ltsv.log*'
      - 'ssh://app-2.example.com/var/log/ltsv.log*'
      - 'ssh://app-3.example.com/var/log/ltsv.log*'
    tags:
      - app
  -
    description: db dump log
    type: regexp
    regexp: '"ts":"([^"]+)"'
    timeFormat: '2006-01-02T15:04:05.999-0700'
    sources:
      - 'ssh://db.example.com/var/log/tcpdp/eth0/dump*'
    tags:
      - db
      - query
  -
    description: PostgreSQL log
    type: regexp
    regexp: '^\[?(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} \w{3})'
    timeFormat: '2006-01-02 15:04:05 MST'
    multiLine: true
    sources:
      - 'ssh://db.example.com/var/log/postgresql/postgresql*'
    tags:
      - db
      - postgresql
  -
    description: local Apache access log
    type: combinedLog
    sources:
      - 'file:///path/to/httpd/access.log'
    tags:
      - httpd
-
    description: api on Kubernetes
    type: k8s
    sources:
      - 'k8s://context-name/namespace/pod-name*'
    tags:
      - api
      - k8s

You can use hrv configtest for config test.

$ hrv configtest -c config.yml

2. Fetch target log data via SSH/exec/Kubernetes API ( `hrv fecth` )

$ hrv fetch -c config.yml --tag=webproxy,db

3. Output log data ( `hrv cat` )

$ hrv cat harvest-20181215T2338+900.db --with-timestamp --with-host --with-path | less -R

4. Count log data ( `hrv count` )

$ hrv count harvest-20191015T2338+900.db -g minute -g webproxy -b db
ts      webproxy db
2019-09-24 08:01:00     9618    5910
2019-09-24 08:02:00     9767    5672
2019-09-24 08:03:00     10815   7394
2019-09-24 08:04:00     11782   7109
2019-09-24 08:05:00     9896    6346
[...]
2019-09-24 08:24:00     11619   5646
2019-09-24 08:25:00     10541   6097
2019-09-24 08:26:00     11336   5264
2019-09-24 08:27:00     1102    5261
2019-09-24 08:28:00     1318    6660
2019-09-24 08:29:00     10362   5663
2019-09-24 08:30:00     11136   5373
2019-09-24 08:31:00     1748    1340

🪲 Stream remote/local logs

1. Set config.yml

2. Stream target logs via SSH/exec/Kubernetes API ( `hrv stream` )

$ hrv stream -c config.yml --with-timestamp --with-host --with-path --with-tag

🪲 Copy remote/local raw logs

1. Set config.yml

2. Copy remote/local raw logs to local directory via SSH/exec ( `hrv cp` )

$ hrv cp -c config.yml

--tag filter operators

The following operators can be used to filter targets

not, and, or, !, &&, ||

$ hrv stream -c config.yml --tag='webproxy or db' --with-timestamp --with-host --with-path

`,` is converted to `or`

$ hrv stream -c config.yml --tag='webproxy,db'

is converted to

$ hrv stream -c config.yml --tag='webproxy or db'

--source filter

filter targets using source regexp

$ hrv fetch -c config.yml --source='app-[0-9].example'

Architecture

`hrv fetch` and `hrv cat`

`hrv stream`

Installation

$ brew install k1LoW/tap/harvest

or

$ go get github.com/k1LoW/harvest/cmd/hrv

What is "middle-scale system"?

< 50 instances
< 1 million logs per hrv fetch

What if you are operating a large-scale/super-large-scale/hyper-large-scale system?

Let's consider agent-base log collector/platform, service mesh and distributed tracing platform!

Internal

harvest-*.db database schema

Requirements

UNIX commands
- date
- find
- grep
- head
- ls
- tail
- xargs
- zcat
sudo
SQLite

WANT

tag DAG
Viewer / Visualizer

References

Hayabusa: A Simple and Fast Full-Text Search Engine for Massive System Log Data
- Make simple with a combination of commands.
- Full-Text Search Engine using SQLite FTS.
stern: ⎈ Multi pod and container log tailing for Kubernetes
- Multiple Kubernetes log streaming architecture.

Directories ¶

Path	Synopsis
client
k8s
cmd
hrv
hrv/cmd
collector
config
db
logger
parser
stdout
version

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Harvest

Quick Start ( for Kubernetes )

Usage

🪲 Fetch and output remote/local log data

1. Set log sources (and log type) in config.yml

2. Fetch target log data via SSH/exec/Kubernetes API ( hrv fecth )

3. Output log data ( hrv cat )

4. Count log data ( hrv count )

🪲 Stream remote/local logs

1. Set config.yml

2. Stream target logs via SSH/exec/Kubernetes API ( hrv stream )

🪲 Copy remote/local raw logs

1. Set config.yml

2. Copy remote/local raw logs to local directory via SSH/exec ( hrv cp )

--tag filter operators

, is converted to or

--source filter

Architecture

hrv fetch and hrv cat

hrv stream

Installation

What is "middle-scale system"?

What if you are operating a large-scale/super-large-scale/hyper-large-scale system?

Internal

Requirements

WANT

References

Directories ¶

2. Fetch target log data via SSH/exec/Kubernetes API ( `hrv fecth` )

3. Output log data ( `hrv cat` )

4. Count log data ( `hrv count` )

2. Stream target logs via SSH/exec/Kubernetes API ( `hrv stream` )

2. Copy remote/local raw logs to local directory via SSH/exec ( `hrv cp` )

`,` is converted to `or`

`hrv fetch` and `hrv cat`

`hrv stream`