Documentation ¶
Overview ¶
Warcscan is a simple script to enable searching WARC files and retrieving individual WARC records. Built it to generate test files with interesting characteristics e.g. presence of continuations or Content-Encoding.
This is a very basic implementation that just scans on "WARC/" to split WARC files. Because using bufio.Scanner, cannot retrieve very large WARC records in this way (60kb is defined as limit to prevent panics).
Once built, use e.g. `warcscan -s termone,termtwo -a -o output.warc input.warc` The -s flag is a comma-separated list of search terms. The -a flag says all terms must be matched (otherwise is an OR search). The -o flag names the output file (defaults to out.warc)
Click to show internal directories.
Click to hide internal directories.