zim

package module
v0.0.0-...-e7c5cff Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 30, 2018 License: MIT Imports: 16 Imported by: 2

README

Build Status

gozim

A Go native implementation for ZIM files. See http://akhenakh.github.io/gozim

ZIM files are used mainly as offline wikipedia copies.

See http://openzim.org/wiki/ZIM_file_format and http://openzim.org/wiki/ZIM_File_Example

Wikipedia/Wikinews/... ZIMs can be downloaded from there http://download.kiwix.org/zim/

ScreenShot ScreenShot

build and installation

On Ubuntu/Debian youn need those packages to compile gozim

apt-get install git liblzma-dev mercurial build-essential

For the indexer bleve to work properly it's recommended that you use leveldb as storage.

go get -u -v -tags all github.com/blevesearch/bleve/...

Gozim http server is using go.rice to embed html/css in the binary install the rice command

go get github.com/GeertJohan/go.rice
go get github.com/GeertJohan/go.rice/rice
go install github.com/GeertJohan/go.rice
go install github.com/GeertJohan/go.rice/rice

Get and build the gozim executables

go get github.com/akhenakh/gozim/...
cd $GOPATH/src/github.com/akhenakh/gozim
go build github.com/akhenakh/gozim/cmd/gozimhttpd
go build github.com/akhenakh/gozim/cmd/gozimindex

After build gozimhttpd command run to embed the files:

rice append --exec gozimhttpd

cross-compilation

For easy cross-compilation a !cgo build version uses a pure go library for lzma parsing. The pure go library is around ~2.5x slower in benchmarks so compile on your target OS if performance is important.

running

Optionally, build an index file: gozimindex -path=yourzimfile.zim -indexPath=yourzimfile.idx

Start the gozim server: gozimhttpd -path=yourzimfile.zim [-index=yourzimfile.idx]

TODO

Mmap 1st 2GB on 32 bits Selective Gzip encode response based on content type func rather than if for getBytes

Documentation

Index

Constants

View Source
const (
	RedirectEntry   uint16 = 0xffff
	LinkTargetEntry        = 0xfffe
	DeletedEntry           = 0xfffd
)

Variables

This section is empty.

Functions

func AnalyzerConstructorEn

func AnalyzerConstructorEn(config map[string]interface{}, cache *registry.Cache) (*analysis.Analyzer, error)

func AnalyzerConstructorFr

func AnalyzerConstructorFr(config map[string]interface{}, cache *registry.Cache) (*analysis.Analyzer, error)

Types

type Article

type Article struct {
	// EntryType is a RedirectEntry/LinkTargetEntry/DeletedEntry or an idx
	// pointing to ZimReader.mimeTypeList
	EntryType uint16
	Title     string
	URLPtr    uint64
	Namespace byte
	// contains filtered or unexported fields
}

func (*Article) Data

func (a *Article) Data() ([]byte, error)

return the uncompressed data associated with this article

func (*Article) FullURL

func (a *Article) FullURL() string

return the url prefixed by the namespace

func (*Article) MimeType

func (a *Article) MimeType() string

func (*Article) RedirectIndex

func (a *Article) RedirectIndex() (uint32, error)

RedirectIndex return the redirect index of RedirectEntry type article return an err if not a redirect entry

func (*Article) String

func (a *Article) String() string

type XZReader

type XZReader struct {
	*xz.Decompressor
}

func NewXZReader

func NewXZReader(r io.Reader) (*XZReader, error)

type ZimReader

type ZimReader struct {
	ArticleCount uint32
	// contains filtered or unexported fields
}

ZimReader keep tracks of everything related to ZIM reading

func NewReader

func NewReader(f io.ReaderAt) (*ZimReader, error)

create a new zim reader

func (*ZimReader) ArticleAt

func (z *ZimReader) ArticleAt(offset uint64) (*Article, error)

get the article (Directory) pointed by the offset found in URLpos or Titlepos

func (*ZimReader) ArticleAtURLIdx

func (z *ZimReader) ArticleAtURLIdx(idx uint32) (*Article, error)

convenient method to return the Article at URL index idx

func (*ZimReader) FillArticleAt

func (z *ZimReader) FillArticleAt(a *Article, offset uint64) error

Fill an article with datas found at offset

func (*ZimReader) GetPageNoIndex

func (z *ZimReader) GetPageNoIndex(url string) (*Article, error)

return the article at the exact url not using any index

func (*ZimReader) ListArticles

func (z *ZimReader) ListArticles() <-chan *Article

list all articles, using url index, contained in a zim file note that this is a slow implementation, a real iterator is faster you are not suppose to use this method on big zim files, use indexes

func (*ZimReader) ListTitlesPtr

func (z *ZimReader) ListTitlesPtr() <-chan uint32

list all title pointer, Titles by position contained in a zim file Titles are pointers to URLpos index, usefull for indexing cause smaller to store: uint32 note that this is a slow implementation, a real iterator is faster you are not suppose to use this method on big zim files prefer ListTitlesPtrIterator to build your index

func (*ZimReader) ListTitlesPtrIterator

func (z *ZimReader) ListTitlesPtrIterator(cb func(uint32))

list all title pointer, Titles by position contained in a zim file Titles are pointers to URLpos index, usefull for indexing cause smaller to store: uint32

func (*ZimReader) MainPage

func (z *ZimReader) MainPage() (*Article, error)

return the article main page if it exists

func (*ZimReader) MimeTypes

func (z *ZimReader) MimeTypes() []string

Return an ordered list of mime types present in the ZIM file

func (*ZimReader) OffsetAtURLIdx

func (z *ZimReader) OffsetAtURLIdx(idx uint32) (uint64, error)

get the offset pointing to Article at pos in the URL idx

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL