warcscan

command
v1.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 3, 2023 License: Apache-2.0 Imports: 6 Imported by: 0

Documentation

Overview

Warcscan is a simple script to enable searching WARC files and retrieving individual WARC records. Built it to generate test files with interesting characteristics e.g. presence of continuations or Content-Encoding.

This is a very basic implementation that just scans on "WARC/" to split WARC files. Because using bufio.Scanner, cannot retrieve very large WARC records in this way (60kb is defined as limit to prevent panics).

Once built, use e.g. `warcscan -s termone,termtwo -a -o output.warc input.warc` The -s flag is a comma-separated list of search terms. The -a flag says all terms must be matched (otherwise is an OR search). The -o flag names the output file (defaults to out.warc)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL