parse

package
v0.0.0-...-8ee748d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 18, 2024 License: MIT Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type MediaWikiParser

type MediaWikiParser struct{}

func (MediaWikiParser) ParsePage

func (_ MediaWikiParser) ParsePage(res scrape.Response) (*Page, error)

ParseSections parses raw HTML from MediaWikiResponse. Returns a parse.Page instance.

Can error when:

  • The content in the response is not valid HTML

TODO: Add table parsing support

func (MediaWikiParser) ParseSection

func (_ MediaWikiParser) ParseSection(res scrape.Response, heading string) (*Page, error)

ParseSection parses the raw HTML of a MediaWikiResponse and searches for a section that contains the heading specified by the function argument. If found, returns a scrape.Page containing the section (heading and its corresponding body text.)

Can error when:

  • The content in the response is not valid HTML

  • The heading is not found.

    FIX: Add support for querying the Intro section - either take in a flag arg to this function, or just do a simple if-check at the start to see if that's what the user wants. Could run into problems with this if there is a name clash with "Introduction" and an existing section though.

type Page

type Page struct {
	Title    string
	Document *goquery.Document
	Sections []*Section
}

Page represents a wiki/backend agnostic container for storing the content of a wiki page.

type Parser

type Parser interface {
	ParsePage(res scrape.Response) (*Page, error)
	// Section should still return the page it belongs to, just with a single section
	ParseSection(res scrape.Response, heading string) (*Page, error)
}

Parser dentes the methods one should implement on a parser struct for a specific wiki in order to handle parsing for different formats.

type ParsingError

type ParsingError struct {
	Code string
	Info string
}

ParsingError represents an error encountered during the parsing of a goquery document generated from a scrape.Response body.

func (*ParsingError) Error

func (e *ParsingError) Error() string

Error returns a formatted Parsing error including code and additional information

type Section

type Section struct {
	Heading string
	Index   int
	Content string
}

Section represents a wiki/backend agnostic container for storing the contents of a single section of a wiki page.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL