sites

package
v0.1.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 26, 2018 License: MIT Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Site

type Site struct {
	URL    *url.URL
	Parent *url.URL
	Depth  int
}

Site represents what travels: an URL which may have a Parent URL, and a Depth.

type SiteWaiter

type SiteWaiter interface {
	Add(delta int)
	Done()
	Wait()
}

SiteWaiter - as implemented by `*sync.WaitGroup` - attends Flapdoors and keeps counting who enters and who leaves.

Use SiteDoneWait to learn about when the facilities are closed.

Note: You may also use Your provided `*sync.WaitGroup.Wait()` to know when to close the facilities. Just: SiteDoneWait is more convenient as it also closes the primary channel for You.

Just make sure to have _all_ entrances and exits attended, and `Wait()` only *after* You've started flooding the facilities.

type Traffic

type Traffic struct {
	// contains filtered or unexported fields
}

Traffic goes around inside a circular Site pipe network, e. g. a crawling Crawler.

func New

func New() (t *Traffic)

New returns a new and operational Traffic processor.

func (*Traffic) Done

func (t *Traffic) Done() (done <-chan struct{})

Done returns a channel which will be signalled and closed when traffic has subsided and nothing is left to be processed and consequently all goroutines have terminated.

Note: Done() here is a convenience method.
It is well known from the "context" package.
Just: no need to use it as `Processor` returns same.

func (*Traffic) Feed

func (t *Traffic) Feed(urls []*url.URL, parent *url.URL, depth int)

Feed registers new entries.

func (*Traffic) Processor

func (t *Traffic) Processor(crawl func(s Site), parallel int) (done <-chan struct{})

Processor builds the site traffic processing network; it is cirular if crawl uses Feed to provide feedback.

returned is a channel which will be signalled and closed when traffic has subsided and nothing is left to be processed and consequently all goroutines have terminated - as is from `Done()`.

func (*Traffic) SiteChan

func (my *Traffic) SiteChan(inp ...Site) (out <-chan Site)

SiteChan returns a channel to receive all inputs before close.

func (*Traffic) SiteChanFuncErr

func (my *Traffic) SiteChanFuncErr(gen func() (Site, error)) (out <-chan Site)

SiteChanFuncErr returns a channel to receive all results of generator `gen` until `err != nil` before close.

func (*Traffic) SiteChanFuncNok

func (my *Traffic) SiteChanFuncNok(gen func() (Site, bool)) (out <-chan Site)

SiteChanFuncNok returns a channel to receive all results of generator `gen` until `!ok` before close.

func (*Traffic) SiteChanSlice

func (my *Traffic) SiteChanSlice(inp ...[]Site) (out <-chan Site)

SiteChanSlice returns a channel to receive all inputs before close.

func (*Traffic) SiteDone

func (my *Traffic) SiteDone(inp <-chan Site) (done <-chan struct{})

SiteDone returns a channel to receive one signal before close after `inp` has been drained.

func (*Traffic) SiteDoneFunc

func (my *Traffic) SiteDoneFunc(inp <-chan Site, act func(a Site)) (done <-chan struct{})

SiteDoneFunc returns a channel to receive one signal after `act` has been applied to every `inp` before close.

func (*Traffic) SiteDoneLeave

func (my *Traffic) SiteDoneLeave(inp <-chan Site, wg SiteWaiter) (done <-chan struct{})

SiteDoneLeave returns a channel to receive one signal after all throughput on `inp` has been registered as departure on the given `sync.WaitGroup` before close.

func (*Traffic) SiteDoneSlice

func (my *Traffic) SiteDoneSlice(inp <-chan Site) (done <-chan []Site)

SiteDoneSlice returns a channel to receive a slice with every Site received on `inp` before close.

Note: Unlike SiteDone, DoneSiteSlice sends the fully accumulated slice, not just an event, once upon close of inp.

func (*Traffic) SiteDoneWait

func (my *Traffic) SiteDoneWait(inp chan<- Site, wg SiteWaiter) (done <-chan struct{})

SiteDoneWait returns a channel to receive one signal after wg.Wait() has returned and inp has been closed before close.

Note: Use only *after* You've started flooding the facilities.

func (*Traffic) SiteFanIn2

func (my *Traffic) SiteFanIn2(inp1, inp2 <-chan Site) (out <-chan Site)

SiteFanIn2 returns a channel to receive all to receive all from both `inp1` and `inp2` before close.

func (*Traffic) SiteFini

func (my *Traffic) SiteFini() func(inp <-chan Site) (done <-chan struct{})

SiteFini returns a closure around `SiteDone(_)`.

func (*Traffic) SiteFiniFunc

func (my *Traffic) SiteFiniFunc(act func(a Site)) func(inp <-chan Site) (done <-chan struct{})

SiteFiniFunc returns a closure around `SiteDoneFunc(_, act)`.

func (*Traffic) SiteFiniLeave

func (my *Traffic) SiteFiniLeave(wg SiteWaiter) func(inp <-chan Site) (done <-chan struct{})

SiteFiniLeave returns a closure around `SiteDoneLeave(_, wg)` registering throughput as departure on the given `sync.WaitGroup`.

func (*Traffic) SiteFiniSlice

func (my *Traffic) SiteFiniSlice() func(inp <-chan Site) (done <-chan []Site)

SiteFiniSlice returns a closure around `SiteDoneSlice(_)`.

func (*Traffic) SiteFiniWait

func (my *Traffic) SiteFiniWait(wg SiteWaiter) func(inp chan<- Site) (done <-chan struct{})

SiteFiniWait returns a closure around `DoneSiteWait(_, wg)`.

func (*Traffic) SiteFork

func (my *Traffic) SiteFork(inp <-chan Site) (out1, out2 <-chan Site)

SiteFork returns two channels either of which is to receive every result of inp before close.

func (*Traffic) SiteForkSeen

func (my *Traffic) SiteForkSeen(inp <-chan Site) (new, old <-chan Site)

SiteForkSeen returns two channels, `new` and `old`, where `new` is to receive all `inp` not been seen before and `old` all `inp` seen before (internally growing a `sync.Map` to discriminate) until close.

func (*Traffic) SiteForkSeenAttr

func (my *Traffic) SiteForkSeenAttr(inp <-chan Site, attr func(a Site) interface{}) (new, old <-chan Site)

SiteForkSeenAttr returns two channels, `new` and `old`, where `new` is to receive all `inp` whose attribute `attr` has not been seen before and `old` all `inp` seen before (internally growing a `sync.Map` to discriminate) until close.

func (*Traffic) SiteMakeChan

func (my *Traffic) SiteMakeChan() (out chan Site)

SiteMakeChan returns a new open channel (simply a 'chan Site' that is). Note: No 'Site-producer' is launched here yet! (as is in all the other functions).

This is useful to easily create corresponding variables such as:

var mySitePipelineStartsHere := SiteMakeChan() // ... lot's of code to design and build Your favourite "mySiteWorkflowPipeline"

// ...
// ... *before* You start pouring data into it, e.g. simply via:
for drop := range water {

mySitePipelineStartsHere <- drop

}

close(mySitePipelineStartsHere)

Hint: especially helpful, if Your piping library operates on some hidden (non-exported) type
(or on a type imported from elsewhere - and You don't want/need or should(!) have to care.)

Note: as always (except for SitePipeBuffer) the channel is unbuffered.

func (*Traffic) SitePair

func (my *Traffic) SitePair(inp <-chan Site) (out1, out2 <-chan Site)

SitePair returns a pair of channels to receive every result of inp before close.

Note: Yes, it is a VERY simple fanout - but sometimes all You need.

func (*Traffic) SitePipeAdjust

func (my *Traffic) SitePipeAdjust(inp <-chan Site, sizes ...int) (out <-chan Site)

SitePipeAdjust returns a channel to receive all `inp` buffered by a SiteSendProxy process before close.

func (*Traffic) SitePipeEnter

func (my *Traffic) SitePipeEnter(inp <-chan Site, wg SiteWaiter) (out <-chan Site)

SitePipeEnter returns a channel to receive all `inp` and registers throughput as arrival on the given `sync.WaitGroup` until close.

func (*Traffic) SitePipeFunc

func (my *Traffic) SitePipeFunc(inp <-chan Site, act func(a Site) Site) (out <-chan Site)

SitePipeFunc returns a channel to receive every result of action `act` applied to `inp` before close. Note: it 'could' be PipeSiteMap for functional people, but 'map' has a very different meaning in go lang.

func (*Traffic) SitePipeLeave

func (my *Traffic) SitePipeLeave(inp <-chan Site, wg SiteWaiter) (out <-chan Site)

SitePipeLeave returns a channel to receive all `inp` and registers throughput as departure on the given `sync.WaitGroup` until close.

func (*Traffic) SitePipeSeen

func (my *Traffic) SitePipeSeen(inp <-chan Site) (out <-chan Site)

SitePipeSeen returns a channel to receive all `inp` not been seen before while silently dropping everything seen before (internally growing a `sync.Map` to discriminate) until close. Note: SitePipeFilterNotSeenYet might be a better name, but is fairly long.

func (*Traffic) SitePipeSeenAttr

func (my *Traffic) SitePipeSeenAttr(inp <-chan Site, attr func(a Site) interface{}) (out <-chan Site)

SitePipeSeenAttr returns a channel to receive all `inp` whose attribute `attr` has not been seen before while silently dropping everything seen before (internally growing a `sync.Map` to discriminate) until close. Note: SitePipeFilterAttrNotSeenYet might be a better name, but is fairly long.

func (*Traffic) SiteSendProxy

func (my *Traffic) SiteSendProxy(out chan<- Site, sizes ...int) chan<- Site

SiteSendProxy returns a channel to serve as a sending proxy to 'out'. Uses a goroutine to receive values from 'out' and store them in an expanding buffer, so that sending to 'out' never blocks.

Note: the expanding buffer is implemented via "container/ring"

Note: SiteSendProxy is kept for the Sieve example and other dynamic use to be discovered even so it does not fit the pipe tube pattern as SitePipeAdjust does.

func (*Traffic) SiteStrew

func (my *Traffic) SiteStrew(inp <-chan Site, size int) (outS [](<-chan Site))

SiteStrew returns a slice (of size = size) of channels one of which shall receive each inp before close.

func (*Traffic) SiteTubeAdjust

func (my *Traffic) SiteTubeAdjust(sizes ...int) (tube func(inp <-chan Site) (out <-chan Site))

SiteTubeAdjust returns a closure around SitePipeAdjust (_, sizes ...int).

func (*Traffic) SiteTubeEnter

func (my *Traffic) SiteTubeEnter(wg SiteWaiter) (tube func(inp <-chan Site) (out <-chan Site))

SiteTubeEnter returns a closure around SitePipeEnter (_, wg) registering throughput as arrival on the given `sync.WaitGroup`.

func (*Traffic) SiteTubeFunc

func (my *Traffic) SiteTubeFunc(act func(a Site) Site) (tube func(inp <-chan Site) (out <-chan Site))

SiteTubeFunc returns a closure around PipeSiteFunc (_, act).

func (*Traffic) SiteTubeLeave

func (my *Traffic) SiteTubeLeave(wg SiteWaiter) (tube func(inp <-chan Site) (out <-chan Site))

SiteTubeLeave returns a closure around SitePipeLeave (_, wg) registering throughput as departure on the given `sync.WaitGroup`.

func (*Traffic) SiteTubeSeen

func (my *Traffic) SiteTubeSeen() (tube func(inp <-chan Site) (out <-chan Site))

SiteTubeSeen returns a closure around SitePipeSeen() (silently dropping every Site seen before).

func (*Traffic) SiteTubeSeenAttr

func (my *Traffic) SiteTubeSeenAttr(attr func(a Site) interface{}) (tube func(inp <-chan Site) (out <-chan Site))

SiteTubeSeenAttr returns a closure around SitePipeSeenAttr() (silently dropping every Site whose attribute `attr` was seen before).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL