warc

package module
v0.0.0-...-a50edd1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 6, 2015 License: CC0-1.0 Imports: 10 Imported by: 5

README

warc

warc provides primitives for reading and writing WARC files in Go. This version is based on edsu's warc library, but many changes were made:

This package works with WARC files in plain text, GZip compression and BZip2 compression out of the box. The record content is exposed via io.Reader interfaces. Types and functions were renamed to follow Go's naming conventions. All external dependencies were removed. A Writer was added.

Example

The following example reads a WARC file from stdin and prints the header values of each record to stdout.

reader, err := warc.NewReader(os.Stdin)
if err != nil {
	panic(err)
}
defer reader.Close()

for {
	record, err := reader.ReadRecord()
	if err != nil {
		break
	}
	fmt.Println("Record:")
	for key, value := range record.Header {
		fmt.Printf("%v = %v\n", key, value)
	}
}

The next example writes a WARC record to stdout.

writer := warc.NewWriter(os.Stdout)
record := warc.NewRecord()
record.Header.Set("warc-type", "resource")
record.Header.Set("content-type", "plain/text")
record.Content = strings.NewReader("Hello, World!")
if _, err := writer.WriteRecord(record); err != nil {
	panic(err)
}
Performance

Parsing WARC files is as fast as it can get. The real overhead stems from the underlying compression algorithms. So if you are about to parse the same file several times for whatever reason, consider decompressing it first.

License

warc is released under CC0 license. You can find a copy of the CC0 License in the LICENSE file.

Documentation

Overview

package warc provides primitives for reading and writing WARC files.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Header map[string]string

Header provides information about the WARC record. It stores WARC record field names and their values. Since WARC field names are case-insensitive, the Header methods are case-insensitive as well.

func NewHeader

func NewHeader() Header

NewRecord creates a new WARC header.

func (Header) Del

func (h Header) Del(key string)

Del deletes the value associated with key.

func (Header) Get

func (h Header) Get(key string) string

Get returns the value associated with the given key. If there is no value associated with the key, Get returns "".

func (Header) Set

func (h Header) Set(key, value string)

Set sets the header field associated with key to value.

type Mode

type Mode int

Mode defines the way Reader will generate Records.

const (
	// SequentialMode means Records have to be consumed one by one and a call to
	// ReadRecord() invalidates the previous record. The benefit is that
	// Records have almost no overhead since they wrap around
	// the underlying Reader.
	SequentialMode Mode = iota
	// AsynchronousMode means calls to ReadRecord don't effect previously
	// returned Records. This mode copies the Record's content into
	// separate memory, thus bears memory overhead.
	AsynchronousMode
	// DefaultMode defines the reading mode used in NewReader().
	DefaultMode = AsynchronousMode
)

func (Mode) String

func (m Mode) String() string

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader reads WARC records from WARC files.

Example
package main

import (
	"fmt"
	"github.com/slyrz/warc"
	"os"
)

func main() {
	reader, err := warc.NewReader(os.Stdin)
	if err != nil {
		panic(err)
	}
	defer reader.Close()

	for {
		record, err := reader.ReadRecord()
		if err != nil {
			break
		}
		fmt.Println("Record:")
		for key, value := range record.Header {
			fmt.Printf("%v = %v\n", key, value)
		}
	}
}
Output:

func NewReader

func NewReader(reader io.Reader) (*Reader, error)

NewReader creates a new WARC reader.

func NewReaderMode

func NewReaderMode(reader io.Reader, mode Mode) (*Reader, error)

NewReaderMode is like NewReader, but specifies the mode instead of assuming DefaultMode.

func (*Reader) Close

func (r *Reader) Close()

Close closes the reader.

func (*Reader) Mode

func (r *Reader) Mode() Mode

Mode returns the reader mode.

func (*Reader) ReadRecord

func (r *Reader) ReadRecord() (*Record, error)

ReadRecord reads the next record from the opened WARC file.

type Record

type Record struct {
	Header  Header
	Content io.Reader
}

Record represents a WARC record.

func NewRecord

func NewRecord() *Record

NewRecord creates a new WARC record.

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer writes WARC records to WARC files.

Example
package main

import (
	"github.com/slyrz/warc"
	"os"
	"strings"
)

func main() {
	writer := warc.NewWriter(os.Stdout)
	record := warc.NewRecord()
	record.Header.Set("warc-type", "resource")
	record.Header.Set("content-type", "plain/text")
	record.Content = strings.NewReader("Hello, World!")
	if _, err := writer.WriteRecord(record); err != nil {
		panic(err)
	}
}
Output:

func NewWriter

func NewWriter(writer io.Writer) *Writer

NewWriter creates a new WARC writer.

func (*Writer) WriteRecord

func (w *Writer) WriteRecord(r *Record) (int, error)

WriteRecord writes a record to the underlying WARC file.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL