jepp

module

v0.0.12 Latest Latest Go to latest Published: Jun 25, 2023 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ecshreve/jepp

README ¶

jepp

API fun with Jeopardy! Access >300k Jeopardy clues scraped from j-archive via a simple api.

GitHub go.mod Go version GitHub release (release name instead of tag name)

jepp

API

the api is a simple go web server built with gin that exposes a handful of endpoints to access jeopardy data
the shape of the data returned from the api aligns with the db schema, this is accomplished via various struct tags on the type definitions
for example, the Clue type is defined as follows:

type Clue struct {
	ClueID     int64  `db:"clue_id" json:"clueId" example:"804002032"`
	GameID     int64  `db:"game_id" json:"gameId" example:"8040"`
	CategoryID int64  `db:"category_id" json:"categoryId" example:"804092001"`
	Question   string `db:"question" json:"question" example:"This is the question."`
	Answer     string `db:"answer" json:"answer" example:"This is the answer."`
}

the db struct tag is used by the sqlx library to map the db columns to the struct fields
the json struct tag is used by the gin library to map the struct fields to the json response
the example struct tag is used by the swaggo library to generate example responses for the swagger docs

Frontend / UI

the ui is served from the / endpoint and is a simple html page that displays the swagger docs and some other info
the embedded swagger ui provides runnable request / response examples and type references

Swagger Docs

swagger docs generated with swaggo and embedded in the /ui page as part of the html template
the --parseVendor was helpful here to generate the full swagger.json file that could be used in standalone mode by the ui

DB

getting the data into the database and cleaning it up after has been a manual process for the most part
for local development i set the DB_HOST, DB_USER, DB_PASS, DB_NAME environment variables to target a mariadb/mysql server running in my home lab (also experimented with defining the db service and build params in a docker compose file)
so personally i play with that local copy of the data, but for the public api i use a mysql db hosted on digital ocean
- to populate this db i first created a backup of my local db and then restored it to the digital ocean db through an adminer ui running in my home lab

Data Scraping

the scrape package contains the code to scrape j-archive and write the data to a mysql database
colly is used to scrape the data and sqlx is used to write the data to the db

the scraping happened in a few passes to get all the data

first pass was to get all the seasons and populate the seasons table
- this scrape targeted the season summary page on j-archive and pulled the season number, start date, end date for each season
second pass was to get all the games for each season and populate the game table
- this scrape targets the individual season show pages on j-archive and pulls the game number, air date, taped date for each season
third pass was to get all the clues for each game in each season and populate the category and clue table
- this scrape targets the individual game pages on j-archive and pulls the clue data from the tables on the page

demo

this is an example of a simple web app that uses a local copy of the database and a simple web ui to display the data

drop

references

Directories ¶

Path	Synopsis
cmd
scrape
server generated by cmd/genswag/main.go	generated by cmd/genswag/main.go
docs
pkg
models
scraper
server
utils

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL