iterator

package
v0.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 16, 2024 License: BSD-3-Clause Imports: 14 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ItNextOrNil

func ItNextOrNil(it Iterator) (element.Element, error)

Moves to the next item and returns it, or nil if we reached the end.

func ItPreviousOrNil

func ItPreviousOrNil(it Iterator) (element.Element, error)

Moves to the previous item and returns it, or nil if we reached the beginning.

func TraverseNode

func TraverseNode(visitor NodeVisitor, root *html.Node)

Start a depth-first traverse of the root and all of its descendants. This implementation does not use recursion, so a deep DOM does not risk blowing the stack. From JSoup: https://github.com/jhy/jsoup/blob/1762412a28fa7b08ccf71d93fc4c98dc73086e03/src/main/java/org/jsoup/select/NodeTraversor.java#L20 NOTE: Unlike the JSoup implementation, we expect any implementor of NodeVisitor to be read-only, because it simplifies implementation

Types

type Direction

type Direction int8
const Backward Direction = -1
const Foward Direction = 1

func (Direction) Delta

func (d Direction) Delta() int

Just turn the direction into a number by casting it

type ElementInDirection

type ElementInDirection struct {
	El  element.Element
	Dir Direction
}

[Element] loaded with [hasPrevious] or [hasNext], associated with the move direction.

type ElementWithDelta

type ElementWithDelta struct {
	El    element.Element
	Delta int
}

[Element] loaded with [hasPrevious] or [hasNext], associated with the move delta.

type HTMLContentIterator

type HTMLContentIterator struct {
	BeforeMaxLength int // Locators will contain a `before` context of up to this amount of characters.
	// contains filtered or unexported fields
}

func NewHTML

func NewHTML(resource fetcher.Resource, locator manifest.Locator) *HTMLContentIterator

Iterates an HTML [resource], starting from the given [locator]. If you want to start mid-resource, the [locator] must contain a `cssSelector` key in its [Locator.Locations] object. If you want to start from the end of the resource, the [locator] must have a `progression` of 1.0.

func (*HTMLContentIterator) HasNext

func (it *HTMLContentIterator) HasNext() (bool, error)

func (*HTMLContentIterator) HasPrevious

func (it *HTMLContentIterator) HasPrevious() (bool, error)

func (*HTMLContentIterator) Next

func (it *HTMLContentIterator) Next() element.Element

func (*HTMLContentIterator) Previous

func (it *HTMLContentIterator) Previous() element.Element

type HTMLConverter

type HTMLConverter struct {
	// contains filtered or unexported fields
}

Note that this whole thing is based off of JSoup's NodeVisitor and NodeTraverser classes https://jsoup.org/apidocs/org/jsoup/select/NodeVisitor.html https://jsoup.org/apidocs/org/jsoup/select/NodeTraversor.html

func (*HTMLConverter) Head

func (c *HTMLConverter) Head(n *html.Node, depth int)

Implements NodeTraversor

func (*HTMLConverter) Result

func (c *HTMLConverter) Result() ParsedElements

func (*HTMLConverter) Tail

func (c *HTMLConverter) Tail(n *html.Node, depth int)

Implements NodeTraversor

type IndexedIterator

type IndexedIterator struct {
	// contains filtered or unexported fields
}

Iterator for a resource, associated with its [index] in the reading order.

func (*IndexedIterator) NextContentIn

func (it *IndexedIterator) NextContentIn(direction Direction) (element.Element, error)

type Iterator

type Iterator interface {
	HasNext() (bool, error)     // Returns true if the iterator has a next element
	Next() element.Element      // Retrieves the element computed by a preceding call to [hasNext]. Panics if [hasNext] was not invoked.
	HasPrevious() (bool, error) // Returns true if the iterator has a previous element
	Previous() element.Element  // Retrieves the element computed by a preceding call to [hasPrevious]. Panics if [hasNext] was not invoked.
}

Iterates through a list of [Element] items asynchronously. [hasNext] and [hasPrevious] refer to the last element computed by a previous call to any of both methods. TODO: It's based on a kotlin iterator, maybe we can make this more of something for go?

type NodeVisitor

type NodeVisitor interface {
	Head(n *html.Node, depth int) // Callback for when a node is first visited.
	Tail(n *html.Node, depth int) // Callback for when a node is last visited, after all of its descendants have been visited.
}

type ParsedElements

type ParsedElements struct {
	Elements   []element.Element
	StartIndex int
}

Holds the result of parsing the HTML resource into a list of element.Element. The [startIndex] will be calculated from the element matched by the base [locator], if possible. Defaults to 0.

type PublicationContentIterator

type PublicationContentIterator struct {
	// contains filtered or unexported fields
}

func NewPublicationContent

func NewPublicationContent(manifest manifest.Manifest, fetcher fetcher.Fetcher, startLocator *manifest.Locator, resourceContentIteratorFactories []ResourceContentIteratorFactory) *PublicationContentIterator

TODO maybe wrap manifest/fetcher in something that doesn't depend on pub package

func (*PublicationContentIterator) HasNext

func (it *PublicationContentIterator) HasNext() (bool, error)

func (*PublicationContentIterator) HasPrevious

func (it *PublicationContentIterator) HasPrevious() (bool, error)

func (*PublicationContentIterator) Next

func (*PublicationContentIterator) Previous

type ResourceContentIteratorFactory

type ResourceContentIteratorFactory = func(fetcher.Resource, manifest.Locator) Iterator

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL