nuvi

package module

v0.0.0-...-a06ed9b Latest Latest Go to latest Published: Aug 24, 2016 License: MIT Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

nuvi

A web scraper for zip files. This utility inspect an html pages and downloads any files that is an zip archive. Note that it consider a file as a zip archive if its anchor hyperlink contains .zip extension.

Prerequisites

You should have the following dependencies installed and configured:

Golang 1.6
Redis

Installation

go get github.com/svett/nuvi/cmd/nuvi

Usage

The navi binary can be executed with the following arguments:

url the page address that will be inspected for zip files. required
redis-addr the address of redis server. optional
redis-password the password of redis server that the app is connecting to. optional
max-parallel-download-conn the number of files downloaded in parallel. optional

$ nuvi -url=http_url_to_desired_page \
       -redis-addr=redis_server_host_and_port \
       -redis-password=redis_server_password \
       -max-parallel-download-conn=5

Example

$ nuvi -url=http://feed.omgili.com/5Rh5AMTrc4Pv/mainstream/posts/

Contribution

Getting the sources and all dependencies with the following git commands:

$ git clone https://github.com/svett/nuvi
$ git submodule update --init --recursive

In order to start contributing to the project, you should install ginkgo and gomega package that are used in unit and integration tests:

$ go get github.com/onsi/ginkgo/ginkgo
$ go get github.com/onsi/gomega

You can run all unit and integration tests by executing the following script:

Note that you need redis-server installed. Every integration tests starts and stops the server. Therefore, you should not have it running as a deamon.

The redis-server is running on port 6379. If your instance is configured to run on different port, you should set the environment variable REDIS_SERVER_PORT before you execute the tests.

$ ./scripts/run_tests.sh

Also you can use ginkgo binary directly to execute the tests:

# Running the integration tests
$ ginkgo integration/
# Running the unit tests
$ ginkgo .

Presently the test coverage is 91.7%.

License

MIT License

Documentation ¶

Index ¶

type ArchiveWalker
type ArchiveWalkerFunc
type Cacher
type Downloader
type Extractor
type HTTPDownloader
- func (downloader HTTPDownloader) Download(url string) (io.ReadCloser, error)
type LinkExtractor
- func (extractor *LinkExtractor) Extract(reader io.Reader) ([]string, error)
type Logger
type RedisCacher
- func (cacher *RedisCacher) Cache(reader io.Reader)
type RedisClient
type Scraper
- func (scraper *Scraper) Scrape(url string) error
type ZIPWalker
- func (walker *ZIPWalker) Walk(reader io.Reader, walk ArchiveWalkerFunc)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type ArchiveWalker ¶

type ArchiveWalker interface {
	// Walk walks throu the content of io.Reader
	Walk(reader io.Reader, walker ArchiveWalkerFunc)
}

ArchiveWalker unarchive zip archives

type ArchiveWalkerFunc ¶

type ArchiveWalkerFunc func(io.Reader)

ArchiveWalkerFunc callback function

type Cacher ¶

type Cacher interface {
	// Cache caches the content provided by the reader
	Cache(reader io.Reader)
}

Cacher caches any content

type Downloader ¶

type Downloader interface {
	// Download downloads the content provided by url
	Download(url string) (io.ReadCloser, error)
}

Downloader downloads a content from URL

type Extractor ¶

type Extractor interface {
	// Extract extracts a links/anchors from a io.Reader
	Extract(reader io.Reader) ([]string, error)
}

Extractor extracts a content of the page

type HTTPDownloader ¶

type HTTPDownloader func(string) (*http.Response, error)

HTTPDownloader downloads

func (HTTPDownloader) Download ¶

func (downloader HTTPDownloader) Download(url string) (io.ReadCloser, error)

type LinkExtractor ¶

type LinkExtractor struct {
	// FileExt are the file extension
	FileExt string
	// Logger logs information
	Logger Logger
}

LinkExtractor extract <a href="*.zip"> links

func (*LinkExtractor) Extract ¶

func (extractor *LinkExtractor) Extract(reader io.Reader) ([]string, error)

Extract extracts html anchor links

type Logger ¶

type Logger interface {
	Println(v ...interface{})
	Printf(format string, v ...interface{})
}

Logger logs messages

type RedisCacher ¶

type RedisCacher struct {
	Key    string
	Client RedisClient
	Logger Logger
}

RedisCacher caches content into redis

func (*RedisCacher) Cache ¶

func (cacher *RedisCacher) Cache(reader io.Reader)

Cache caches the content of io.Reader

type RedisClient ¶

type RedisClient interface {
	LPush(key string, values ...interface{}) *redis.IntCmd
	LIndex(key string, index int64) *redis.StringCmd
	LLen(key string) *redis.IntCmd
}

RedisClient connects to Redis

type Scraper ¶

type Scraper struct {
	Downloader    Downloader
	Extractor     Extractor
	ArchiveWalker ArchiveWalker
	Cacher        Cacher
	MaxConn       int
	Logger        Logger
}

Scraper scrapes a web content

func (*Scraper) Scrape ¶

func (scraper *Scraper) Scrape(url string) error

Scrape scrapes a web page

type ZIPWalker ¶

type ZIPWalker struct {
	// FileExt specifies the file extetnsion
	FileExt string
	// Logger logs information
	Logger Logger
}

ZIPWalker unzip *.zip files

func (*ZIPWalker) Walk ¶

func (walker *ZIPWalker) Walk(reader io.Reader, walk ArchiveWalkerFunc)

Walk unzips a *.zip files

The original implementation can be found in my blog http://blog.ralch.com/tutorial/golang-working-with-zip/

ZIP algorithm is using random access so unforthunately we need to read the whole file before we unzip it

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
nuvi
fakes This file was generated by counterfeiter This file was generated by counterfeiter This file was generated by counterfeiter This file was generated by counterfeiter This file was generated by counterfeiter This file was generated by counterfeiter This file was generated by counterfeiter	This file was generated by counterfeiter This file was generated by counterfeiter This file was generated by counterfeiter This file was generated by counterfeiter This file was generated by counterfeiter This file was generated by counterfeiter This file was generated by counterfeiter
integration
utils

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL