rss

package module
v1.0.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 20, 2022 License: BSD-3-Clause Imports: 14 Imported by: 84

README

rss

GoDoc

RSS is a small library for simplifying the parsing of RSS and Atom feeds. The package could do with more testing, but it conforms to the RSS 1.0, 2.0, and Atom 1.0 specifications, to the best of my ability. I've tested it with about 15 different feeds, and it seems to work fine with them.

If anyone has any problems with feeds being parsed incorrectly, please let me know so that I can debug and improve the package.

Dependencies:

go get github.com/axgle/mahonia

Example usage:

package main

import "github.com/SlyMarbo/rss"

func main() {
	feed, err := rss.Fetch("http://example.com/rss")
	if err != nil {
		// handle error.
	}

	// ... Some time later ...

	err = feed.Update()
	if err != nil {
		// handle error.
	}
}

The output structure is pretty much as you'd expect:

type Feed struct {
	Nickname    string              // This is not set by the package, but could be helpful.
	Title       string
	Description string
	Link        string              // Link to the creator's website.
	UpdateURL   string              // URL of the feed itself.
	Image       *Image              // Feed icon.
	Items       []*Item
	ItemMap     map[string]struct{} // Used in checking whether an item has been seen before.
	Refresh     time.Time           // Earliest time this feed should next be checked.
	Unread      uint32              // Number of unread items. Used by aggregators.
}

type Item struct {
	Title     string
	Summary   string
	Content   string
	Link      string
	Date      time.Time
	DateValid bool
	ID        string
	Read      bool
}

type Image struct {
	Title   string
	URL     string
	Height  uint32
	Width   uint32
}

The library does its best to follow the appropriate specifications and not to set the Refresh time too soon. It currently follows all update time management methods in the RSS 1.0, 2.0, and Atom 1.0 specifications. If one is not provided, it defaults to 12 hour intervals (see DefaultRefreshInterval). If you are having issues with feed providors dropping connections, please let me know and I can increase this default, or you can increase the Refresh time manually. The Feed.Update method uses this Refresh time, so if Update seems to be returning very quickly with no new items, it's likely not making a request due to the provider's Refresh interval.

The project is not proactively maintained, but I'll respond to issues and PRs as soon as I can.

Documentation

Overview

Package rss is a small library for simplifying the parsing of RSS and Atom feeds.

The package could do with more testing, but it conforms to the RSS 1.0, 2.0, and Atom 1.0 specifications, to the best of my ability. I've tested it with about 15 different feeds, and it seems to work fine with them.

If anyone has any problems with feeds being parsed incorrectly, please let me know so that I can debug and improve the package.

Example usage:

package main

import "github.com/SlyMarbo/rss"

func main() {
	feed, err := rss.Fetch("http://example.com/rss")
	if err != nil {
		// handle error.
	}

	// ... Some time later ...

	err = feed.Update()
	if err != nil {
		// handle error.
	}
}

The output structure is pretty much as you'd expect:

type Feed struct {
	Nickname    string // This is not set by the package, but could be helpful.
	Title       string
	Description string
	Link        string // Link to the creator's website.
	UpdateURL   string // URL of the feed itself.
	Image       *Image // Feed icon.
	Items       []*Item
	ItemMap     map[string]struct{} // Used in checking whether an item has been seen before.
	Refresh     time.Time           // Earliest time this feed should next be checked.
	Unread      uint32              // Number of unread items. Used by aggregators.
}

type Item struct {
	Title     string
	Summary   string
	Content   string
	Link      string
	Date      time.Time
	DateValid bool
	ID        string
	Read      bool
}

type Image struct {
	Title   string
	URL     string
	Height  uint32
	Width   uint32
}

The library does its best to follow the appropriate specifications and not to set the Refresh time too soon. It currently follows all update time management methods in the RSS 1.0, 2.0, and Atom 1.0 specifications. If one is not provided, it defaults to 12 hour intervals (see DefaultRefreshInterval). If you are having issues with feed providors dropping connections, please let me know and I can increase this default, or you can increase the Refresh time manually. The Feed.Update method uses this Refresh time, so if Update seems to be returning very quickly with no new items, it's likely not making a request due to the provider's Refresh interval.

Index

Constants

View Source
const DATE = "15:04:05 MST 02/01/2006"

DATE is a constant date string.

Variables

View Source
var DefaultFetchFunc = func(url string) (resp *http.Response, err error) {
	client := http.DefaultClient
	return client.Get(url)
}

DefaultFetchFunc uses http.DefaultClient to fetch a feed.

View Source
var DefaultRefreshInterval = 12 * time.Hour

DefaultRefreshInterval is the minimum wait until the next refresh, provided the feed does not provide its own interval.

Setting this too high will delay the feed receiving new items, setting it too low will put excessive load on the feed hosts.

The default value is 12 hours.

View Source
var TimeLayouts = []string{
	"Mon, 2 Jan 2006 15:04:05 Z",
	"Mon, 2 Jan 2006 15:04:05",
	"Mon, 2 Jan 2006 15:04:05 -0700",
	"Mon, 2 Jan 06 15:04:05 -0700",
	"Mon, 2 Jan 06 15:04:05",
	"2 Jan 2006 15:04:05 -0700",
	"2 Jan 2006 15:04:05",
	"2 Jan 06 15:04:05 -0700",
	"2006-01-02 15:04:05 -0700",
	"2006-01-02 15:04:05",
	time.ANSIC,
	time.UnixDate,
	time.RubyDate,
	time.RFC822Z,
	time.RFC1123Z,
	time.RFC3339,
	time.RFC3339Nano,

	"2 Jan 2006 15:04:05 -0700 MST",
	"2 Jan 2006 15:04:05 MST -0700",
	"Mon, 2 Jan 2006 15:04:05 MST -0700",
	"Mon, 2 Jan 2006 15:04:05 -0700 MST",
	"2 Jan 06 15:04:05 -0700 MST",
	"2 Jan 06 15:04:05 MST -0700",
	"Jan 2, 2006 15:04 PM -0700 MST",
	"Jan 2, 2006 15:04 PM MST -0700",
	"Jan 2, 06 15:04 PM MST -0700",
	"Jan 2, 06 15:04 PM -0700 MST",
}

TimeLayouts is contains a list of time.Parse() layouts that are used in attempts to convert item.Date and item.PubDate string to time.Time values. The layouts are attempted in ascending order until either time.Parse() does not return an error or all layouts are attempted.

View Source
var TimeLayoutsLoadLocation = []string{
	"Mon, 2 Jan 2006 15:04:05 MST",
	"Mon, 2 Jan 06 15:04:05 MST",
	"2 Jan 2006 15:04:05 MST",
	"2 Jan 06 15:04:05 MST",
	"Jan 2, 2006 15:04 PM MST",
	"Jan 2, 06 15:04 PM MST",

	time.RFC1123,
	time.RFC850,
	time.RFC822,
}

TimeLayoutsLoadLocation are time layouts which do not contain the location as a fixed constant. Instead of -0700, they use MST. Golang does not load the timezone by default, which means parseTime calls `time.LoadLocation(t.Location().String())` and then applies the offset returned by LoadLocation to the result.

Functions

This section is empty.

Types

type Enclosure

type Enclosure struct {
	URL    string `json:"url"`
	Type   string `json:"type"`
	Length uint   `json:"length"`
}

Enclosure maps an enclosure.

func (*Enclosure) Get

func (e *Enclosure) Get() (io.ReadCloser, error)

Get uses http.Get to fetch an enclosure.

type Feed

type Feed struct {
	Nickname    string              `json:"nickname"` // This is not set by the package, but could be helpful.
	Title       string              `json:"title"`
	Language    string              `json:"language"`
	Author      string              `json:"author"`
	Description string              `json:"description"`
	Link        string              `json:"link"`      // Link to the creator's website.
	UpdateURL   string              `json:"updateurl"` // URL of the feed itself.
	Image       *Image              `json:"image"`     // Feed icon.
	Categories  []string            `json:"categories"`
	Items       []*Item             `json:"items"`
	ItemMap     map[string]struct{} `json:"itemmap"` // Used in checking whether an item has been seen before.
	Refresh     time.Time           `json:"refresh"` // Earliest time this feed should next be checked.
	Unread      uint32              `json:"unread"`  // Number of unread items. Used by aggregators.
	FetchFunc   FetchFunc           `json:"-"`
}

Feed is the top-level structure.

func Fetch

func Fetch(url string) (*Feed, error)

Fetch downloads and parses the RSS feed at the given URL

func FetchByClient

func FetchByClient(url string, client *http.Client) (*Feed, error)

FetchByClient uses a http.Client to fetch a URL.

func FetchByFunc

func FetchByFunc(fetchFunc FetchFunc, url string) (*Feed, error)

FetchByFunc uses a func to fetch a URL.

func Parse

func Parse(data []byte) (*Feed, error)

Parse RSS or Atom data.

func (*Feed) String

func (f *Feed) String() string

func (*Feed) Update

func (f *Feed) Update() error

Update fetches any new items and updates f.

func (*Feed) UpdateByFunc

func (f *Feed) UpdateByFunc(fetchFunc FetchFunc) error

UpdateByFunc uses a func to update f.

type FetchFunc

type FetchFunc func(url string) (resp *http.Response, err error)

A FetchFunc is a function that fetches a feed for given URL.

type Image

type Image struct {
	Title  string `json:"title"`
	Href   string `json:"href"`
	URL    string `json:"url"`
	Height uint32 `json:"height"`
	Width  uint32 `json:"width"`
}

Image maps an image.

func (*Image) Get

func (i *Image) Get() (io.ReadCloser, error)

Get uses http.Get to fetch an image.

func (*Image) String

func (i *Image) String() string

type Item

type Item struct {
	Title      string    `json:"title"`
	Summary    string    `json:"summary"`
	Content    string    `json:"content"`
	Categories []string  `json:"category"`
	Link       string    `json:"link"`
	Date       time.Time `json:"date"`
	Image      *Image    `json:"image"`
	DateValid  bool
	ID         string       `json:"id"`
	Enclosures []*Enclosure `json:"enclosures"`
	Read       bool         `json:"read"`
}

Item represents a single story.

func (*Item) Format

func (i *Item) Format(indent int) string

Format formats an item using tabs.

func (*Item) String

func (i *Item) String() string

type RAWContent

type RAWContent struct {
	RAWContent string `xml:",innerxml"`
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL