jsongenius

command module
v0.0.0-...-3b80bd6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 11, 2023 License: Apache-2.0 Imports: 10 Imported by: 0

README

JsonGenius

JsonGenius is a self-hosted scraping API that extracts structured data described by a JSON Schema. Provide any URL and a desired JSON Schema, and JsonGenius will return the structured data from the website.

Demo

image

Prerequisites

  • Docker Compose
  • OPEN_AI_KEY - An API key for OpenAI. You can get one for free here. This should be set as an environment variable.

Usage

git clone https://github.com/semanser/jsongenius
cd jsongenius
export OPEN_AI_KEY=<your key here>
docker compose up

The API will be available at http://localhost:3001. You can change the port by editing the docker-compose.yml file.

Compile from source
git clone https://github.com/semanser/jsongenius
cd jsongenius
export OPEN_AI_KEY=<your key here>
go build .
./jsongenius

API

POST /lookup

This endpoint accepts a JSON body with the following fields:

  • url: The URL of the website to scrape
  • schema: The JSON Schema to use to extract data from the website. The schema must be a valid JSON Schema object. Read more about JSON Schema here.
Example
curl -X POST -H "Content-Type: application/json" -d '{
  "url": "https://www.amazon.com/s?k=gaming+headsets",
  "schema": {
    "type": "object",
    "properties": {
      "products": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string",
              "description": "The product name"
            },
            "price": {
              "type": "number",
              "description": "The price of the product in USD"
            }
          }
        }
      }
    }
  }
}' http://localhost:3001/lookup
FAQ
  • Does it work with JS heavy websites? Yes! JsonGenius uses Chromium to render the page, so it can handle any website that a normal browser can.
  • Can I bring my own Chromium instance? Yes! You can set the WS_URL environment variable that points to a Chrome DevTools Protocol endpoint. JsonGenius will use that instead of spinning up its own Chromium instance.

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL