Documentation ¶
Index ¶
- Variables
- func OpenDatabase(filename string) (*sqlx.DB, error)
- func SliceContains(ss []string, v string) bool
- type Entry
- type ErrorMessage
- type FetchGroup
- type Fetcher
- type Map
- type Pinger
- type Response
- type Server
- type Snippet
- type SqliteFetcher
- type StopWatch
- func (s *StopWatch) Elapsed() time.Duration
- func (s *StopWatch) Entries() []*Entry
- func (s *StopWatch) LogTable()
- func (s *StopWatch) Record(msg string)
- func (s *StopWatch) Recordf(msg string, vs ...interface{})
- func (s *StopWatch) Reset()
- func (s *StopWatch) SetEnabled(enabled bool)
- func (s *StopWatch) Table() string
Constants ¶
This section is empty.
Variables ¶
var ( // ErrBlobNotFound can be used for unfetchable blobs. ErrBlobNotFound = errors.New("blob not found") ErrBackendsFailed = errors.New("all backends failed") )
Functions ¶
func OpenDatabase ¶
OpenDatabase first ensures a file does actually exists, then create as read-only connection.
func SliceContains ¶
SliceContains returns true, if a string slice contains a given value.
Types ¶
type ErrorMessage ¶
ErrorMessage from failed requests.
type FetchGroup ¶
type FetchGroup struct {
Backends []Fetcher
}
FetchGroup allows to run a index data fetch operation in a cascade over a couple of backends. The result from the first database that contains a value for a given id is returned. Currently sequential, but could be made parallel, maybe.
func (*FetchGroup) Fetch ¶
func (g *FetchGroup) Fetch(id string) ([]byte, error)
Fetch constructs a URL from a template and retrieves the blob.
func (*FetchGroup) FromFiles ¶
func (g *FetchGroup) FromFiles(files ...string) error
FromFiles sets up a fetch group from a list of sqlite3 database filenames.
type Map ¶
Map is a generic lookup table. We use it together with sqlite3. This corresponds to the format generated by the makta command line tool: https://github.com/miku/labe/tree/main/go/ckit#makta.
type Pinger ¶
type Pinger interface {
Ping() error
}
Pinger allows to perform a simple health check.
type Response ¶
type Response struct { ID string `json:"id,omitempty"` DOI string `json:"doi,omitempty"` Citing []json.RawMessage `json:"citing,omitempty"` Cited []json.RawMessage `json:"cited,omitempty"` Unmatched struct { Citing []json.RawMessage `json:"citing,omitempty"` Cited []json.RawMessage `json:"cited,omitempty"` } `json:"unmatched,omitempty"` Extra struct { UnmatchedCitingCount int `json:"unmatched_citing_count"` UnmatchedCitedCount int `json:"unmatched_cited_count"` CitingCount int `json:"citing_count"` CitedCount int `json:"cited_count"` Cached bool `json:"cached"` Took float64 `json:"took"` // seconds // Institution is set optionally (e.g. to "DE-14"), if the response has // been tailored towards the holdings of a given institution. Institution string `json:"institution,omitempty"` } `json:"extra,omitempty"` }
Response contains a subset of index data fused with citation data. Citing and cited documents are kept unparsed for flexibility and performance; we expect JSON. For unmatched docs, we may only transmit the DOI, e.g. {"doi_str_mv": "10.12/34"}.
type Server ¶
type Server struct { // IdentifierDatabase maps local ids to DOI. The expected schema is // documented here: https://github.com/miku/labe/tree/main/go/ckit#makta // // 0-025152688 10.1007/978-3-476-03951-4 // 0-025351737 10.13109/9783666551536 // 0-024312134 10.1007/978-1-4612-1116-7 // 0-025217100 10.1007/978-3-322-96667-4 // ... IdentifierDatabase *sqlx.DB // OciDatabase contains DOI to DOI mappings representing a citation // relationship. The expected schema is documented here: // https://github.com/miku/labe/tree/main/go/ckit#makta // // 10.1002/9781119393351.ch1 10.1109/icelmach.2012.6350005 // 10.1002/9781119393351.ch1 10.1115/detc2011-48151 // 10.1002/9781119393351.ch1 10.1109/ical.2009.5262972 // 10.1002/9781119393351.ch1 10.1109/cdc.2013.6760196 // ... OciDatabase *sqlx.DB // IndexData allows to fetch a metadata blob for an identifier. This is // an interface that in the past has been implemented by types wrapping // microblob, SOLR and sqlite3, as well as a FetchGroup, that allows to // query multiple backends. We settled on sqlite3 and FetchGroup, the other // implementations are now gone. // // dswarm-126-ZnR0aG9zdHdlc3RsaX... {"id":"dswarm-126-ZnR0aG9zdHdlc3RsaXBwZ... // dswarm-126-ZnR0aG9zdHdlc3RsaX... {"id":"dswarm-126-ZnR0aG9zdHdlc3RsaXBwZ... // dswarm-126-ZnR0dW11ZW5jaGVuOm... {"id":"dswarm-126-ZnR0dW11ZW5jaGVuOm9ha... // dswarm-126-ZnR0dW11ZW5jaGVuOm... {"id":"dswarm-126-ZnR0dW11ZW5jaGVuOm9ha... // ... IndexData Fetcher // Router to register routes on. Router *mux.Router // StopWatchEnabled enabled the stopwatch, a builtin, simplistic request tracer. StopWatchEnabled bool // Cache for expensive items. Cache *cache.Cache // CacheTriggerDuration determines which items to cache. CacheTriggerDuration time.Duration // Stats, like request counts and status codes. Stats *stats.Stats }
Server wraps three data sources required for index and citation data fusion. The IdentifierDatabase maps a local identifier (e.g. 0-1238201) to a DOI, the OciDatabase contains citing and cited relationships from OCI/COCI citation corpus and IndexData allows to fetch a metadata blob from a backing store.
A performance data point: On a 8 core 16G RAM machine we can keep a sustained load of about 12K SQL qps, 150MB/s reads off disk. Total size of databases involved is about 220GB plus 10GB cache (ie. at most 6% of the data could be held in memory at any given time).
Requesting the most costly (and large) 150K docs under load, the server will hover at around 10% (of 16GB) RAM.
type Snippet ¶
type Snippet struct {
Institutions []string `json:"institution"`
}
Snippet is a small piece of index metadata used for institution filtering.
type SqliteFetcher ¶
SqliteFetcher serves index documents from sqlite database with a fixed schema, as generated by the makta tool.
type StopWatch ¶
StopWatch allows to record events over time and render them in a pretty table; thread-safe. Example log output (via stopwatch.LogTable()).
2021/09/29 17:22:40 timings for XVlB
> XVlB 0 0s 0.00 started query for: ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTIxMC9qYy4yMDExLTAzODU > XVlB 1 134.532µs 0.00 found doi for id: 10.1210/jc.2011-0385 > XVlB 2 67.918529ms 0.24 found 0 outbound and 4628 inbound edges > XVlB 3 32.293723ms 0.12 mapped 4628 dois back to ids > XVlB 4 3.358704ms 0.01 recorded unmatched ids > XVlB 5 68.636671ms 0.25 fetched 2567 blob from index data store > XVlB 6 105.771005ms 0.38 encoded JSON > XVlB - - - - > XVlB S 278.113164ms 1.00 total
func (*StopWatch) LogTable ¶
func (s *StopWatch) LogTable()
LogTable write a table using standard library log facilities.
func (*StopWatch) SetEnabled ¶
SetEnabled enables or disables the stopwatch. If disabled, any call will be a noop.
Directories ¶
Path | Synopsis |
---|---|
Package cache implements caching helpers, e.g.
|
Package cache implements caching helpers, e.g. |
cmd
|
|
doisniffer
Sniff out DOI from JSON document, optionally update docs with found DOI.
|
Sniff out DOI from JSON document, optionally update docs with found DOI. |
makta
makta takes a two column TSV file and turns it into an indexed sqlite3 database.
|
makta takes a two column TSV file and turns it into an indexed sqlite3 database. |
Package doi helps to find DOI in JSON documents.
|
Package doi helps to find DOI in JSON documents. |
Package xflag add an additional flag type Array for repeated string flags.
|
Package xflag add an additional flag type Array for repeated string flags. |