warc

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2020 License: MIT Imports: 7 Imported by: 0

README

WARC

===

Web archiver to bundle web page and its resources into single file.

WARC is a Go package that archive a web page and its resources into a single bolt database file.

It still in development phase but should be stable enough to use. The bolt database that used by this project is also stable both in API and file format. Unfortunately, right now WARC will disable Javascript when archiving a page so it still doesn't not work in SPA site like Twitter or Reddit.

Installation

To install this package, just run go get :

go get -u -v gitea.com/huiyifyj/warc

Licenses

WARC is distributed under MIT license, which means you can use and modify it however you want. However, if you make an enhancement for it, if possible, please send a pull request.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewArchive

func NewArchive(req ArchivalRequest, dstPath string) error

NewArchive creates new archive based on submitted request, then save it to specified path.

Types

type ArchivalRequest

type ArchivalRequest struct {
	URL         string
	Reader      io.Reader
	ContentType string
	UserAgent   string
	LogEnabled  bool
}

ArchivalRequest is request for archiving a web page, either from URL or from an io.Reader.

type Archive

type Archive struct {
	// contains filtered or unexported fields
}

Archive is the storage for archiving the web page.

func Open

func Open(path string) (*Archive, error)

Open opens the archive from specified path.

func (*Archive) Close

func (arc *Archive) Close()

Close closes the storage.

func (*Archive) HasResource

func (arc *Archive) HasResource(name string) bool

HasResource checks if the resource exists in archive.

func (*Archive) Read

func (arc *Archive) Read(name string) ([]byte, string, error)

Read fetch the resource with specified name from archive.

Directories

Path Synopsis
internal
dom

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL