gofeed

package module
v1.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2024 License: MIT Imports: 18 Imported by: 1,053

README

gofeed

Build Status Coverage Status Go Report Card License

Gofeed: A Robust Feed Parser for Golang



gofeed is a powerful and flexible library designed for parsing RSS, Atom, and JSON feeds across various formats and versions. It effectively manages non-standard elements and known extensions, and demonstrates resilience against common feed issues.

Table of Contents

Features

Comprehensive Feed Support
  • RSS (0.90 to 2.0)
  • Atom (0.3, 1.0)
  • JSON (1.0, 1.1)
Handling Invalid Feeds

gofeed takes a best-effort approach to deal with broken or invalid XML feeds, capable of handling issues like:

  • Unescaped markup
  • Undeclared namespace prefixes
  • Missing or illegal tags
  • Incorrect date formats
  • ...and more.
Extension Support

gofeed treats elements outside the feed's default namespace as extensions, storing them in tree-like structures under Feed.Extensions and Item.Extensions. This feature allows you to access custom extension elements easily.

Built-In Support for Popular Extensions For added convenience, gofeed includes native support for parsing certain well-known extensions into dedicated structs. Currently, it supports:

  • Dublin Core: Accessible via Feed.DublinCoreExt and Item.DublinCoreExt
  • Apple iTunes: Accessible via Feed.ITunesExt and Item.ITunesExt

Overview

In gofeed, you have two primary choices for feed parsing: a universal parser for handling multiple feed types seamlessly, and specialized parsers for more granular control over individual feed types.

Universal Feed Parser

The universal gofeed.Parser is designed to make it easy to work with various types of feeds—RSS, Atom, JSON—by converting them into a unified gofeed.Feed model. This is especially useful when you're dealing with multiple feed formats and you want to treat them the same way.

The universal parser uses built-in translators like DefaultRSSTranslator, DefaultAtomTranslator, and DefaultJSONTranslator to convert between the specific feed types and the universal feed. Not happy with the defaults? Implement your own gofeed.Translator to tailor the translation process to your needs.

Specialized Feed Parsers: RSS, Atom, JSON

Alternatively, if your focus is on a single feed type, then using a specialized parser offers advantages in terms of performance and granularity. For example, if you're interested solely in RSS feeds, you can use rss.Parser directly. These feed-specific parsers map fields to their corresponding models, ensuring names and structures that match the feed type exactly.

Basic Usage

Universal Feed Parser

Here's how to parse feeds using gofeed.Parser:

From a URL
fp := gofeed.NewParser()
feed, _ := fp.ParseURL("http://feeds.twit.tv/twit.xml")
fmt.Println(feed.Title)
From a String
feedData := `<rss version="2.0">
<channel>
<title>Sample Feed</title>
</channel>
</rss>`
fp := gofeed.NewParser()
feed, _ := fp.ParseString(feedData)
fmt.Println(feed.Title)
From an io.Reader
file, _ := os.Open("/path/to/a/file.xml")
defer file.Close()
fp := gofeed.NewParser()
feed, _ := fp.Parse(file)
fmt.Println(feed.Title)
From a URL with a 60s Timeout
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
fp := gofeed.NewParser()
feed, _ := fp.ParseURLWithContext("http://feeds.twit.tv/twit.xml", ctx)
fmt.Println(feed.Title)
From a URL with a Custom User-Agent
fp := gofeed.NewParser()
fp.UserAgent = "MyCustomAgent 1.0"
feed, _ := fp.ParseURL("http://feeds.twit.tv/twit.xml")
fmt.Println(feed.Title)
Feed Specific Parsers

If you have a usage scenario that requires a specialized parser:

RSS Feed
feedData := `<rss version="2.0">
<channel>
<webMaster>example@site.com (Example Name)</webMaster>
</channel>
</rss>`
fp := rss.Parser{}
rssFeed, _ := fp.Parse(strings.NewReader(feedData))
fmt.Println(rssFeed.WebMaster)
Atom Feed
feedData := `<feed xmlns="http://www.w3.org/2005/Atom">
<subtitle>Example Atom</subtitle>
</feed>`
fp := atom.Parser{}
atomFeed, _ := fp.Parse(strings.NewReader(feedData))
fmt.Println(atomFeed.Subtitle)
JSON Feed
feedData := `{"version":"1.0", "home_page_url": "https://daringfireball.net"}`
fp := json.Parser{}
jsonFeed, _ := fp.Parse(strings.NewReader(feedData))
fmt.Println(jsonFeed.HomePageURL)

Advanced Usage

With Basic Authentication
fp := gofeed.NewParser()
fp.AuthConfig = &gofeed.Auth{
  Username: "foo",
  Password: "bar",
}
Using Custom Translators for Advanced Parsing

If you need more control over how fields are parsed and prioritized, you can specify your own custom translator. Below is an example that shows how to create a custom translator to give the /rss/channel/itunes:author field higher precedence than the /rss/channel/managingEditor field in RSS feeds.

Step 1: Define Your Custom Translator

First, we'll create a new type that embeds the default RSS translator provided by the library. We'll override its Translate method to implement our custom logic.

type MyCustomTranslator struct {
  defaultTranslator *gofeed.DefaultRSSTranslator
}

func NewMyCustomTranslator() *MyCustomTranslator {
  t := &MyCustomTranslator{}
  t.defaultTranslator = &gofeed.DefaultRSSTranslator{}
  return t
}

func (ct *MyCustomTranslator) Translate(feed interface{}) (*gofeed.Feed, error) {
  rss, found := feed.(*rss.Feed)
  if !found {
    return nil, fmt.Errorf("Feed did not match expected type of *rss.Feed")
  }

  f, err := ct.defaultTranslator.Translate(rss)
  if err != nil {
    return nil, err
  }

  // Custom logic to prioritize iTunes Author over Managing Editor
  if rss.ITunesExt != nil && rss.ITunesExt.Author != "" {
    f.Author = rss.ITunesExt.Author
  } else {
    f.Author = rss.ManagingEditor
  }
  
  return f, nil
}
Step 2: Use Your Custom Translator

Once your custom translator is defined, you can tell gofeed.Parser to use it instead of the default one.

feedData := `<rss version="2.0">
<channel>
<managingEditor>Ender Wiggin</managingEditor>
<itunes:author>Valentine Wiggin</itunes:author>
</channel>
</rss>`

fp := gofeed.NewParser()
fp.RSSTranslator = NewMyCustomTranslator()
feed, _ := fp.ParseString(feedData)
fmt.Println(feed.Author) // Valentine Wiggin

Dependencies

License

This project is licensed under the MIT License

Credits

Documentation

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrFeedTypeNotDetected = errors.New("Failed to detect feed type")

ErrFeedTypeNotDetected is returned when the detection system can not figure out the Feed format

Functions

This section is empty.

Types

type Auth added in v1.2.0

type Auth struct {
	Username string
	Password string
}

Auth is a structure allowing to use the BasicAuth during the HTTP request It must be instantiated with your new Parser

type DefaultAtomTranslator

type DefaultAtomTranslator struct{}

DefaultAtomTranslator converts an atom.Feed struct into the generic Feed struct.

This default implementation defines a set of mapping rules between atom.Feed -> Feed for each of the fields in Feed.

func (*DefaultAtomTranslator) Translate

func (t *DefaultAtomTranslator) Translate(feed interface{}) (*Feed, error)

Translate converts an Atom feed into the universal feed type.

type DefaultJSONTranslator added in v1.1.0

type DefaultJSONTranslator struct{}

DefaultJSONTranslator converts an json.Feed struct into the generic Feed struct.

This default implementation defines a set of mapping rules between json.Feed -> Feed for each of the fields in Feed.

func (*DefaultJSONTranslator) Translate added in v1.1.0

func (t *DefaultJSONTranslator) Translate(feed interface{}) (*Feed, error)

Translate converts an JSON feed into the universal feed type.

type DefaultRSSTranslator

type DefaultRSSTranslator struct{}

DefaultRSSTranslator converts an rss.Feed struct into the generic Feed struct.

This default implementation defines a set of mapping rules between rss.Feed -> Feed for each of the fields in Feed.

func (*DefaultRSSTranslator) Translate

func (t *DefaultRSSTranslator) Translate(feed interface{}) (*Feed, error)

Translate converts an RSS feed into the universal feed type.

type Enclosure

type Enclosure struct {
	URL    string `json:"url,omitempty"`
	Length string `json:"length,omitempty"`
	Type   string `json:"type,omitempty"`
}

Enclosure is a file associated with a given Item.

type Feed

type Feed struct {
	Title           string                   `json:"title,omitempty"`
	Description     string                   `json:"description,omitempty"`
	Link            string                   `json:"link,omitempty"`
	FeedLink        string                   `json:"feedLink,omitempty"`
	Links           []string                 `json:"links,omitempty"`
	Updated         string                   `json:"updated,omitempty"`
	UpdatedParsed   *time.Time               `json:"updatedParsed,omitempty"`
	Published       string                   `json:"published,omitempty"`
	PublishedParsed *time.Time               `json:"publishedParsed,omitempty"`
	Author          *Person                  `json:"author,omitempty"` // Deprecated: Use feed.Authors instead
	Authors         []*Person                `json:"authors,omitempty"`
	Language        string                   `json:"language,omitempty"`
	Image           *Image                   `json:"image,omitempty"`
	Copyright       string                   `json:"copyright,omitempty"`
	Generator       string                   `json:"generator,omitempty"`
	Categories      []string                 `json:"categories,omitempty"`
	DublinCoreExt   *ext.DublinCoreExtension `json:"dcExt,omitempty"`
	ITunesExt       *ext.ITunesFeedExtension `json:"itunesExt,omitempty"`
	Extensions      ext.Extensions           `json:"extensions,omitempty"`
	Custom          map[string]string        `json:"custom,omitempty"`
	Items           []*Item                  `json:"items"`
	FeedType        string                   `json:"feedType"`
	FeedVersion     string                   `json:"feedVersion"`
}

Feed is the universal Feed type that atom.Feed and rss.Feed gets translated to. It represents a web feed. Sorting with sort.Sort will order the Items by oldest to newest publish time.

func (Feed) Len

func (f Feed) Len() int

Len returns the length of Items.

func (Feed) Less

func (f Feed) Less(i, k int) bool

Less compares PublishedParsed of Items[i], Items[k] and returns true if Items[i] is less than Items[k].

func (Feed) String

func (f Feed) String() string

func (Feed) Swap

func (f Feed) Swap(i, k int)

Swap swaps Items[i] and Items[k].

type FeedType

type FeedType int

FeedType represents one of the possible feed types that we can detect.

const (
	// FeedTypeUnknown represents a feed that could not have its
	// type determiend.
	FeedTypeUnknown FeedType = iota
	// FeedTypeAtom repesents an Atom feed
	FeedTypeAtom
	// FeedTypeRSS represents an RSS feed
	FeedTypeRSS
	// FeedTypeJSON represents a JSON feed
	FeedTypeJSON
)

func DetectFeedType

func DetectFeedType(feed io.Reader) FeedType

DetectFeedType attempts to determine the type of feed by looking for specific xml elements unique to the various feed types.

Example
package main

import (
	"fmt"
	"strings"

	"github.com/mmcdole/gofeed"
)

func main() {
	feedData := `<rss version="2.0">
<channel>
<title>Sample Feed</title>
</channel>
</rss>`
	feedType := gofeed.DetectFeedType(strings.NewReader(feedData))
	if feedType == gofeed.FeedTypeRSS {
		fmt.Println("Wow! This is an RSS feed!")
	}
}
Output:

type HTTPError

type HTTPError struct {
	StatusCode int
	Status     string
}

HTTPError represents an HTTP error returned by a server.

func (HTTPError) Error

func (err HTTPError) Error() string

type Image

type Image struct {
	URL   string `json:"url,omitempty"`
	Title string `json:"title,omitempty"`
}

Image is an image that is the artwork for a given feed or item.

type Item

type Item struct {
	Title           string                   `json:"title,omitempty"`
	Description     string                   `json:"description,omitempty"`
	Content         string                   `json:"content,omitempty"`
	Link            string                   `json:"link,omitempty"`
	Links           []string                 `json:"links,omitempty"`
	Updated         string                   `json:"updated,omitempty"`
	UpdatedParsed   *time.Time               `json:"updatedParsed,omitempty"`
	Published       string                   `json:"published,omitempty"`
	PublishedParsed *time.Time               `json:"publishedParsed,omitempty"`
	Author          *Person                  `json:"author,omitempty"` // Deprecated: Use item.Authors instead
	Authors         []*Person                `json:"authors,omitempty"`
	GUID            string                   `json:"guid,omitempty"`
	Image           *Image                   `json:"image,omitempty"`
	Categories      []string                 `json:"categories,omitempty"`
	Enclosures      []*Enclosure             `json:"enclosures,omitempty"`
	DublinCoreExt   *ext.DublinCoreExtension `json:"dcExt,omitempty"`
	ITunesExt       *ext.ITunesItemExtension `json:"itunesExt,omitempty"`
	Extensions      ext.Extensions           `json:"extensions,omitempty"`
	Custom          map[string]string        `json:"custom,omitempty"`
}

Item is the universal Item type that atom.Entry and rss.Item gets translated to. It represents a single entry in a given feed.

type Parser

type Parser struct {
	AtomTranslator Translator
	RSSTranslator  Translator
	JSONTranslator Translator
	UserAgent      string
	AuthConfig     *Auth
	Client         *http.Client
	// contains filtered or unexported fields
}

Parser is a universal feed parser that detects a given feed type, parsers it, and translates it to the universal feed type.

func NewParser

func NewParser() *Parser

NewParser creates a universal feed parser.

func (*Parser) Parse

func (f *Parser) Parse(feed io.Reader) (*Feed, error)

Parse parses a RSS or Atom or JSON feed into the universal gofeed.Feed. It takes an io.Reader which should return the xml/json content.

Example
package main

import (
	"fmt"
	"strings"

	"github.com/mmcdole/gofeed"
)

func main() {
	feedData := `<rss version="2.0">
<channel>
<title>Sample Feed</title>
</channel>
</rss>`
	fp := gofeed.NewParser()
	feed, err := fp.Parse(strings.NewReader(feedData))
	if err != nil {
		panic(err)
	}
	fmt.Println(feed.Title)
}
Output:

func (*Parser) ParseString

func (f *Parser) ParseString(feed string) (*Feed, error)

ParseString parses a feed XML string and into the universal feed type.

Example
package main

import (
	"fmt"

	"github.com/mmcdole/gofeed"
)

func main() {
	feedData := `<rss version="2.0">
<channel>
<title>Sample Feed</title>
</channel>
</rss>`
	fp := gofeed.NewParser()
	feed, err := fp.ParseString(feedData)
	if err != nil {
		panic(err)
	}
	fmt.Println(feed.Title)
}
Output:

func (*Parser) ParseURL

func (f *Parser) ParseURL(feedURL string) (feed *Feed, err error)

ParseURL fetches the contents of a given url and attempts to parse the response into the universal feed type.

Example
package main

import (
	"fmt"

	"github.com/mmcdole/gofeed"
)

func main() {
	fp := gofeed.NewParser()
	feed, err := fp.ParseURL("http://feeds.twit.tv/twit.xml")
	if err != nil {
		panic(err)
	}
	fmt.Println(feed.Title)
}
Output:

func (*Parser) ParseURLWithContext

func (f *Parser) ParseURLWithContext(feedURL string, ctx context.Context) (feed *Feed, err error)

ParseURLWithContext fetches contents of a given url and attempts to parse the response into the universal feed type. You can instantiate the Auth structure with your Username and Password to use the BasicAuth during the HTTP call. It will be automatically added to the header of the request Request could be canceled or timeout via given context

type Person

type Person struct {
	Name  string `json:"name,omitempty"`
	Email string `json:"email,omitempty"`
}

Person is an individual specified in a feed (e.g. an author)

type Translator

type Translator interface {
	Translate(feed interface{}) (*Feed, error)
}

Translator converts a particular feed (atom.Feed or rss.Feed of json.Feed) into the generic Feed struct

Directories

Path Synopsis
cmd
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL