feedpushr
A simple feed aggregator daemon with sugar on top.
Features
- Single executable with an embedded database.
- Manage feed subscriptions.
- Import/Export feed subscriptions with OPML files.
- Aggressive and tunable aggregation process.
- Manage feed aggregation individually.
- Apply modifications on articles with a pluggable filter system.
- Push new articles to a pluggable output system (STDOUT, HTTP endpoint, ...).
- Use tags to customize the pipeline.
- Support of PubSubHubbud the open, simple, web-scale and
decentralized pubsub protocol.
- REST API with complete OpenAPI documentation.
- Full feature Web UI and CLI to interact with the daemon's API.
- Metrics production for monitoring.
Installation
Run the following command:
$ go get -v github.com/ncarlier/feedpushr
Or download the binary regarding your architecture:
$ sudo curl -s https://raw.githubusercontent.com/ncarlier/feedpushr/master/install.sh | bash
Or use Docker:
$ docker run -d --name=feedpushr ncarlier/feedpushr
Configuration
You can configure the daemon by setting environment variables:
Variable |
Default |
Description |
APP_ADDR |
:8080 |
HTTP server address |
APP_PUBLIC_URL |
none |
Public URL used by PubSubHubbud Hubs. PSHB is disabled if not set. |
APP_STORE |
boltdb://data.db |
Data store location (BoltDB file) |
APP_FILTERS |
none |
Filter chain (ex: foo://,fetch:// ) |
APP_OUTPUTS |
stdout:// |
Output destinations (stdout://,http://example.org ) |
APP_DELAY |
1m |
Delay between aggregations (ex: 30s , 2m , 1h , ...) |
APP_TIMEOUT |
5s |
Aggregation timeout (ex: 2s , 30s , ...) |
APP_CACHE_RETENTION |
72h |
Cache retention duration (ex: 24h , 48h , ...) |
APP_LOG_LEVEL |
info |
Logging level (debug , info , warn or error ) |
APP_LOG_PRETTY |
false |
Plain text log output format if true (JSON otherwise) |
APP_LOG_OUTPUT |
stdout |
Log output target (stdout or file://sample.log ) |
You can override this settings by using program parameters.
Type feedpushr --help
to see those parameters.
Filters
Before being sent articles can be modified through a filter chain.
A filter is declared as a URL. The scheme of the URL is the filter name.
Other parts of the URL configure the filter.
The query parameters are the filter properties and the URL fragment configures the filter tags.
Currently, there are some built-in filter:
title://?prefix=Feedpushr:
:
This filter will prefix the title of the article with a given value.
fetch://
:
This filter will attempt to extract the content of the article from the source
URL.
minify://
:
This filter will minify the HTML content of the article.
You can chain all the filters you need.
Filters can be extended using plugins.
Tags are used to customize the pipeline.
You can define tags on feeds using the Web UI or the API:
$ curl -XPOST http://localhost:8080/v1/feeds?url=http://www.hashicorp.com/feed.xml&tags=foo,bar
Tags can also be imported/exported in OPML format. When using OMPL, tags are stored into the category attribute. OPML category is a string of comma-separated slash-delimited category strings.
For example, this OMPL attribute <category>/test,foo,/bar/bar</category>
will be converted to the following tag list: test, foo, bar_bar
.
Once feeds are configured with tags, each new article will inherit these tags and be pushed out with them.
Tags are also used by filters and outputs to manage their activation.
If you start the daemon with a filter or an output using tags, only articles corresponding to these tags will be processed by this filter or output.
Example:
$ feedpushr --filter "title://?prefix=Sample:#foo,bar"
In this example, only new articles with tags foo
and bar
will have their title modified with a prefix.
Outputs
New articles are sent to outputs.
An output is declared as a URL. The scheme of the URL is the output provider name.
Other parts of the URL configure the output provider.
Currently, there are two built-in output providers:
stdout://
: New articles are sent as JSON document to the standard output of the
process.
This can be useful if you want to pipe the command to another shell command.
ex: Store the output into a file. Forward the stream via Netcat
. Use an ETL
tool such as Logstash, etc.
http://<URL>
: New articles are sent as JSON document to an HTTP endpoint (POST).
Outputs can be extended using plugins.
Plugins
You can easily extend the application by adding plugins.
A plugin is a compiled library file that must be loaded when the application
starts.
To load a plugin you have to use the --plugin
parameter. Example:
$ feedpushr --plugin ./feedpushr-twitter-linux-amd64.so
You can find some external plugins (such as for Twitter) into this
repository.
UI
You can access Web UI on http://localhost:8080/ui
Use cases
Start the daemon
$ # Start the daemon with default configuration:
$ feedpushr
$ # Start the daemon and send new articles to a HTTP endpoint:
$ feedpushr --output https://requestb.in/t4gdzct4
$ # Start the daemon with a database initialized
$ # with subscriptions from an OPML file:
$ feedpushr --import ./my-subscriptions.xml
$ # Start the daemon with custom configuration:
$ export APP_OUTPUTS="https://requestb.in/t4gdzct4"
$ export APP_STORE="boltdb:///var/opt/feedpushr.db"
$ export APP_DELAY=20s
$ export APP_LOG_LEVEL=warn
$ feedpushr
Add feeds
$ # Add feed with the CLI
$ feedpushr-ctl create feed --url http://www.hashicorp.com/feed.xml
$ # Add feed with cURL
$ curl -XPOST http://localhost:8080/v1/feeds?url=http://www.hashicorp.com/feed.xml
$ # Import feeds from an OPML file
$ curl -XPOST http://localhost:8080/v1/opml -F"file=@subscriptions.opml"
Manage feeds
$ # List feeds
$ feedpushr-ctl list feed
$ # Get a feed
$ feedpushr-ctl get feed --id=9090dfac0ccede1cfcee186826d0cc0d
$ # Remove a feed
$ feedpushr-ctl delete feed --id=9090dfac0ccede1cfcee186826d0cc0d
$ # Stop aggregation of a feed
$ feedpushr-ctl stop feed --id=9090dfac0ccede1cfcee186826d0cc0d
$ # Start aggregation of a feed
$ feedpushr-ctl start feed --id=9090dfac0ccede1cfcee186826d0cc0d
Misc
$ # Get OpenAPI JSON
$ curl http://localhost:8080/swagger.json
$ # Get runtime vars
$ curl http://localhost:8080/v1/vars
$ # Here a quick ETL shell pipeline:
$ # Send transformed articles to HTTP endpoint using shell tools (jq and httpie)
$ feedpushr \
| jq -c "select(.title) | {title:.title, content:.description, origin: .link}" \
| while read next; do echo "$next" | http http://postb.in/b/i1J32KdO; done
For development
To be able to build the project you will need to:
Then you can build the project using make:
$ make
Type make help
to see other possibilities.