krip

package module

v1.0.5 Latest Latest Go to latest Published: Jan 9, 2024 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/borschtapp/krip

README ¶

Krip* 🇺🇦

A Go library for fast, comprehensive and generalised scraping of culinary recipes from any website or HTML file.

Krip is a Ukrainian word for dill. The bud of the dill looks like web pages connected by a stem (a reference to the web and graphs).

I started this project as I wanted to build my own recipe keeper and found that there is only one library that everyone uses for scraping recipes recipe-scrapers written in Python. The library is great, but contains lots of scrappers that seem redundant, however, it wasn't able to scrape the recipes I wanted. I sent some PullRequests to it, but the language I chose for the project is Go, so I decided to rewrite it in Go.

This library contains completely rewritten parsers that are slightly inspired by the Python library. I focused on speed and flexibility to cover most of the possible schemas and websites from the beginning and to retrieve a rich model. Still, it supports per-domain customisation in case someone does not use a schema.

Note: WIP, I am still learning how to use Go. Any hint or advice is welcome.

Install

go get -u github.com/borschtapp/krip

Features

Parses microdata, opengraph and json-ld schemas
Corrects erroneous json in the source code of websites (e.g. jsonc with comments or new lines)
The resulting Recipe struct (object) is compatible with the https://schema.org/Recipe schema (see comments)
Includes custom parsers for specific websites (domains) that do not use any known recipe schema
Removes empty, duplicate values and performs some normalization
Command-line tool to scrape recipes from the internet
Fast and efficient, thanks Go :)

Contributing

Contributions are welcome! Please open an issue or a PR if you have any ideas or found a bug.
Most common way to contribute is to add a custom parser for a specific website (if you have trouble, please open an issue and I will help you).

Implementing custom scrapers

All you need is to implement a Scraper interface and register it via krip.RegisterScraper().

Take a look at the already implemented custom scrapers:

To-Do List

more custom parsers, implement all from the python library
allergens support (missing in recipe schema)
parsing of ingredients (missing in recipe schema)
parsing of recipes from text
validation and normalisation of units, constants, etc.

Usage

Command-line tool

go install github.com/borschtapp/krip/cmd/krip
krip --help
krip https://cooking.nytimes.com/recipes/3783-original-plum-torte

Scrape recipe from web

recipe, err := krip.ScrapeUrl("https://cooking.nytimes.com/recipes/3783-original-plum-torte")
if err != nil {
  // handle err
}

// Retrieve the recipe data
name := recipe.Name
ingredients := recipe.Ingredients
instructions := recipe.Instructions

// Print the recipe as JSON
fmt.Println(recipe)

{
  "@id": "https://cooking.nytimes.com/recipes/3783-original-plum-torte",
  "name": "Original Plum Torte",
  "thumbnailUrl": "https://static01.nyt.com/images/2019/09/07/dining/plumtorte/plumtorte-articleLarge-v4.jpg",
  "author": {
    "name": "Marian Burros"
  },
  "publisher": {
    "name": "NYT Cooking",
    "url": "https://cooking.nytimes.com"
  },
  "inLanguage": "en-US",
  "description": "The Times published Marian Burros’s recipe for Plum Torte every September from 1983 until 1989, when the editors determined that enough was enough. The recipe was to be printed for the last time that year. “To counter anticipated protests,” Ms. Burros wrote a few years later, “the recipe was printed in larger type than usual with a broken-line border around it to encourage clipping.” It didn’t help. The paper was flooded with angry letters. “The appearance of the recipe, like the torte itself, is bittersweet,” wrote a reader in Tarrytown, N.Y. “Summer is leaving, fall is coming. That's what your annual recipe is all about. Don't be grumpy about it.” We are not! And we pledge that every year, as summer gives way to fall, we will make sure that the recipe is easily available to one and all. The original 1983 recipe called for 1 cup sugar; the 1989 version reduced that to 3/4 cup. We give both options below. Here are \u003ca href=\" http://www.nytimes.com/interactive/2016/09/14/dining/marian-burros-plum-torte-recipe-variations.html\"\u003efive ways to adapt the torte\u003c/a\u003e.",
  "totalTime": 75,
  "recipeCategory": [
    "breakfast",
    "brunch",
    "easy",
    "weekday",
    "times classics",
    "dessert"
  ],
  "keywords": [
    "flour",
    "plum",
    "unsalted butter",
    "nut-free",
    "vegetarian"
  ],
  "recipeYield": 8,
  "recipeIngredient": [
    "3/4 to 1 cup sugar",
    "1/2 cup unsalted butter, softened",
    "1 cup unbleached flour, sifted",
    "1 teaspoon baking powder",
    "Pinch of salt (optional)",
    "2 eggs",
    "24 halves pitted purple plums",
    "Sugar, lemon juice and cinnamon, for topping"
  ],
  "recipeInstructions": [
    {
      "text": "Heat oven to 350 degrees."
    },
    {
      "text": "Cream the sugar and butter in a bowl. Add the flour, baking powder, salt and eggs and beat well."
    },
    {
      "text": "Spoon the batter into a springform pan of 8, 9 or 10 inches. Place the plum halves skin side up on top of the batter. Sprinkle lightly with sugar and lemon juice, depending on the sweetness of the fruit. Sprinkle with about 1 teaspoon of cinnamon, depending on how much you like cinnamon."
    },
    {
      "text": "Bake 1 hour, approximately. Remove and cool; refrigerate or freeze if desired. Or cool to lukewarm and serve plain or with whipped cream. (To serve a torte that was frozen, defrost and reheat it briefly at 300 degrees.)"
    }
  ],
  "nutrition": {
    "calories": "350",
    "carbohydrateContent": "57 grams",
    "fatContent": "13 grams",
    "fiberContent": "3 grams",
    "proteinContent": "4 grams",
    "saturatedFatContent": "8 grams",
    "sodiumContent": "63 milligrams",
    "sugarContent": "42 grams",
    "transFatContent": "0 grams",
    "unsaturatedFatContent": "4 grams"
  },
  "aggregateRating": {
    "ratingCount": 8717,
    "ratingValue": 5
  }
}

Tested on

The scraper contains a test for the source and was able to extract all the important fields, including but not limited to:

url
name
inLanguage
thumbnailUrl
recipeIngredient
recipeInstructions
publisher (including name and url)

For the following websites

Documentation ¶

Index ¶

func RegisterScraper(hostname string, fn model.Scraper)
func Scrape(input *model.DataInput) (*model.Recipe, error)
func ScrapeFile(fileName string) (*model.Recipe, error)
func ScrapeUrl(url string) (*model.Recipe, error)
type AggregateRating
type DataInput
type HowToSection
type HowToStep
type ImageObject
type InputOptions
type NutritionInformation
type Organization
type Person
type Recipe
type Scraper
type VideoObject

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func RegisterScraper ¶

func RegisterScraper(hostname string, fn model.Scraper)

func Scrape ¶

func Scrape(input *model.DataInput) (*model.Recipe, error)

func ScrapeFile ¶

func ScrapeFile(fileName string) (*model.Recipe, error)

ScrapeFile reads content and scrapes a recipe from the file

func ScrapeUrl ¶

func ScrapeUrl(url string) (*model.Recipe, error)

ScrapeUrl retrieves and scrapes a recipe from the url

Types ¶

type AggregateRating ¶ added in v0.1.1

type AggregateRating = model.AggregateRating

type DataInput ¶ added in v0.1.1

type DataInput = model.DataInput

type HowToSection ¶ added in v0.1.1

type HowToSection = model.HowToSection

type HowToStep ¶ added in v0.1.1

type HowToStep = model.HowToStep

type ImageObject ¶ added in v0.1.1

type ImageObject = model.ImageObject

type InputOptions ¶ added in v0.1.1

type InputOptions = model.InputOptions

type NutritionInformation ¶ added in v0.1.1

type NutritionInformation = model.NutritionInformation

type Organization ¶ added in v0.1.1

type Organization = model.Organization

type Person ¶ added in v0.1.1

type Person = model.Person

type Recipe ¶ added in v0.1.1

type Recipe = model.Recipe

type Scraper ¶ added in v0.1.1

type Scraper = model.Scraper

type VideoObject ¶ added in v0.1.1

type VideoObject = model.VideoObject

Source Files ¶

View all Source files

krip.go

Directories ¶

Path	Synopsis
cmd
krip
model
scraper
common
opengraph
schema
website
utils
web

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL