Documentation
¶
Overview ¶
htmltable enables structured data extraction from HTML tables and URLs
Index ¶
- Variables
- func NewSlice[T any](ctx context.Context, r io.Reader) ([]T, error)
- func NewSliceFromPage[T any](p *Page) ([]T, error)
- func NewSliceFromResponse[T any](resp *http.Response) ([]T, error)
- func NewSliceFromString[T any](in string) ([]T, error)
- func NewSliceFromURL[T any](url string) ([]T, error)
- type Page
- type Table
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var Logger func(_ context.Context, msg string, fields ...any)
Logger is a very simplistic structured logger, than should be overriden by integrations.
Functions ¶
func NewSliceFromPage ¶ added in v0.4.0
NewSliceFromPage finds a table matching the slice and returns the slice
func NewSliceFromResponse ¶
NewSliceFromString is same as NewSlice(context.Context, io.Reader), but takes just an http.Response
func NewSliceFromString ¶
NewSliceFromString is same as NewSlice(context.Context, io.Reader), but takes just a string.
func NewSliceFromURL ¶
NewSliceFromString is same as NewSlice(context.Context, io.Reader), but takes just an URL.
Example (RowspansAndColspans) ¶
type AM4 struct { Model string `header:"Model"` ReleaseDate string `header:"Release date"` PCIeSupport string `header:"PCIesupport[a]"` MultiGpuCrossFire bool `header:"Multi-GPU CrossFire"` MultiGpuSLI bool `header:"Multi-GPU SLI"` USBSupport string `header:"USBsupport[b]"` SATAPorts int `header:"Storage features SATAports"` RAID string `header:"Storage features RAID"` AMDStoreMI bool `header:"Storage features AMD StoreMI"` Overclocking string `header:"Processoroverclocking"` TDP string `header:"TDP"` SupportExcavator string `header:"CPU support Excavator"` SupportZen string `header:"CPU support Zen"` SupportZenPlus string `header:"CPU support Zen+"` SupportZen2 string `header:"CPU support Zen 2"` SupportZen3 string `header:"CPU support Zen 3"` Architecture string `header:"Architecture"` } am4Chipsets, _ := htmltable.NewSliceFromURL[AM4]("https://en.wikipedia.org/wiki/List_of_AMD_chipsets") fmt.Println(am4Chipsets[2].Model) fmt.Println(am4Chipsets[2].SupportZen2)
Output: X370 Varies[c]
Types ¶
type Page ¶ added in v0.2.0
type Page struct { Tables []*Table // contains filtered or unexported fields }
Page is the container for all tables parseable
func NewFromResponse ¶
NewFromResponse is same as New(ctx.Context, io.Reader), but from http.Response.
In case of failure, returns `ResponseError`, that could be further inspected.
func NewFromString ¶
NewFromString is same as New(ctx.Context, io.Reader), but from string
Example ¶
page, _ := htmltable.NewFromString(`<body> <h1>foo</h2> <table> <tr><td>a</td><td>b</td></tr> <tr><td> 1 </td><td>2</td></tr> <tr><td>3 </td><td>4 </td></tr> </table> <h1>bar</h2> <table> <tr><th>b</th><th>c</th><th>d</th></tr> <tr><td>1</td><td>2</td><td>5</td></tr> <tr><td>3</td><td>4</td><td>6</td></tr> </table> </body>`) fmt.Printf("found %d tables\n", page.Len()) _ = page.Each2("c", "d", func(c, d string) error { fmt.Printf("c:%s d:%s\n", c, d) return nil })
Output: found 2 tables c:2 d:5 c:4 d:6
func NewFromURL ¶
NewFromURL is same as New(ctx.Context, io.Reader), but from URL.
In case of failure, returns `ResponseError`, that could be further inspected.
Example ¶
page, _ := htmltable.NewFromURL("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies") _, err := page.FindWithColumns("invalid", "column", "names") fmt.Println(err)
Output: cannot find table with columns: invalid, column, names
func (*Page) Each ¶ added in v0.2.0
Each row would call func with the value of the table cell from the column specified in the first argument.
Returns an error if table has no matching column name.
func (*Page) Each2 ¶ added in v0.2.0
Each2 will get two columns specified in the first two arguments and call the func with those values for every row in the table.
Returns an error if table has no matching column names.
func (*Page) Each3 ¶ added in v0.2.0
Each3 will get three columns specified in the first three arguments and call the func with those values for every row in the table.
Returns an error if table has no matching column names.
func (*Page) FindWithColumns ¶ added in v0.2.0
FindWithColumns performs fuzzy matching of tables by given header column names