Documentation ¶
Overview ¶
Package cache implements caching helpers, e.g. an sqlite3 based cache. Caching is important for the most cited items, these can take seconds to assemble; so we cache the serialized JSON in sqlite3 and serve subsequent requests from there.
Without cache:
| Rank | Links | T | |---------+-------+------| | ~5000 | 2999 | 2.8s | | ~10000 | 2108 | 3.5s | | ~50000 | 937 | 1.2s | | ~100000 | 659 | 0.8s | | ~150000 | 538 | 0.6s |
A data point: The hundert most expensive ids take 175s to request (in parallel). After caching, this time reduces to 2.78s. Individual requests from cache are in the 1-10ms range.
Another data point: Warming the cache with the most expensive 150K DOI takes less than 2h.
$ time zstd -qcd -T0 /usr/share/labe/data/OpenCitationsRanked/current | \ awk '{ print $2 }' | head -n 150000 | shuf | \ parallel -j 32 -I {} 'curl -sL "http://localhost:8000/doi/{}"' > /dev/null real 103m36.376s user 21m57.202s sys 18m15.376s
The cache database (with zstd compressed values) is about 8GB in size.
Index ¶
Constants ¶
This section is empty.
Variables ¶
Functions ¶
This section is empty.
Types ¶
type Cache ¶
type Cache struct { Path string MaxFileSize int64 // Lock applies to both, db and readOnly. sync.Mutex // contains filtered or unexported fields }
Cache is a minimalistic cache based on sqlite. In the future, values could be transparently compressed as well.