mobsql

package module
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 29, 2024 License: GPL-2.0 Imports: 7 Imported by: 4

README

Mobsql

builds.sr.ht status

Mobsql is a Go library and commandline application which facilitates loading one or multiple Mobility Database source GTFS archives into a SQLite database. Its internal SQLite schema mirrors GTFS's spec but adds a source field to each table (thus allowing multiple sources to be loaded to the database simulatenously). Mobsql's API is designed to be as simple as possible: the user defines a filter specification to determine which source(s) should be addressed, and from there can download, compute, purge, and query the status(es) of the specified one or more Mobility DB sources with a single command or library call.

While primarily developed to be used by Mobroute, the minimal GTFS router and related project, mobsql itself is a fully independent tool and can be used as a standalone general-purpose GTFS-to-SQLite ETL & import utility (either via CLI or as a Go library).

Supported functionality:

  • Downloads & imports GTFS ZIP archives into a local SQLite database with an additional 'source' field in all GTFS tables correlating to mdb_source_id; thus allowing multiple sources to be stored without conflict.
  • Supports bulk import (e.g. 1-insert-for-multiple rows) functionality to decrease load time.
  • Supports a caching system, storing a checksum of the imported GTFS table(s) such that successive imports on the same data nops effectively if there would be no net change.
  • In addition to downloading functionality, can be used to compute contrived data (similar to materialized views), purge, and query status information about sources.
  • Features specification of a filter to determine a 'sourceset' of Mobility Database sources to import rather then making the user manually look up mdbids.
  • Simply models the database schema including: GTFS specification import rules, Mobility Database catalog (CSV) imported as table, internal tracking tables, and SQLite views in a single file - see schema datastructure.
  • Implements conversion logic building atop the GTFS schema for fields which can be stored more efficiently (such as stop_times's departure_time/arrival_time as integers) rather then as colon time strings.
  • Allows the creation of SQL views (currently used for internal tracking).
  • Implements automatic index creation based on schema specification.
  • Implements automatic creation of computed tables based on view logic (e.g. similar to the concept of materialized views).
  • Implements loading of non-Mobility Database GTFS (e.g. local) ZIP archive sources via passing custom Mobility Database mock CSV (which allowing loading file:// URIs as ZIP archive locations for GTFS).

Planned functionality per roadmap (but not-yet implemented):

  • Implement cleanup logic to gracefully handle partial / interrupted loads (e.g. killing process during GTFS zip extraction & SQL import / operations).

Installation

Mobsql functions as both a commandline and as a Go library.

  • Installation for usage a CLI application:
    • Clone repo: git clone https://git.sr.ht/~mil/mobsql
    • Build executable: go build -tags=sqlite_math_functions cli/mobsql.go
    • Run via: ./mobsql
    • Optionally install to $PATH etc.: cp mobsql /usr/local/bin
    • See Commandline Documentation for further information on usage
  • Installing as Go module to use in your Go project as a library:
    • Change directory to your go project's root directory: cd foo
    • Add via go get: go get git.sr.ht/~mil/mobsql
    • See Go Module Documentation for information on Go library / module API

Request Configuration

The API commands for Load, Compute, Purge, and Status work based on on a configuration object. See the Go documentation for information on encodable fields.

Souceset Filter Functionality

Mobsql's commands (purge, load, compute, status) are built around the idea of addressing a (subset) of the Mobility Database Catalog with a single command. Rather then manually specifying the specific sources you want to address by the identifier in the Mobility Database (mdbid) in cases where you want to work with multiple sources, you instead define a 'sourceset filter' which is translated to a 'sourceset' of mdbids.

The basic logic behind the sourceset filter concept is:

  • {country: FR, municipality: Paris} (filter) -> [1026 1069 1159 1283 1291 1314](mdbids)

By default (with a sourceset filter specified as {}); the entire catalog is addressed (e.g. all ~1400 sources). As such, in almost all cases you will want to specify particular properties on the filter to more specifically address whichever sources you wish to perform the requested command / operation on. Some examples might be that maybe you want might want to load all GTFS sources for France ({country: FR}), 2 specific mdbids (mdbid: [510, 515]), or address only sources you've already loaded previously in the database ({loaded: true}) so that you can update these sources.

In the CLI the sourceset filter is expected to be passed as a YAML object (see below); and meanwhile for the Go API, the sourceset filter is passed as a struct for the Load, Compute, Status, and Purge commands.

YAML Sourceset Filter Specification:

Filter is defined as a YAML object. Note all string properties use a substring incasesensitive match policy. The following properties are valid and will be use in a combinatory fashion (e.g. properties translate to AND conditions):

  • country: References the country per location.country_code in schema (string)
  • municipality: References the municipality per location.municipality in schema (string)
  • subdivision: References the subdivision per location.subdivision_name in schema (string)
  • mdbid: References the unique mobility database source id per mdb_source_id in schema ([]int)
  • provider: References the unique mobility database provider provider in schema (string)
  • name: References the unique mobility database provider name in schema (string)
  • glob: Wildcard match {country,municipality,subdivision,provider,name} (sring)
  • coords: Bounding box match ensuring contains coordinates ([][2]float)
  • maxkm: Bounding box match ensuring bbox area is less then specification (float)

YAML Sourceset Filter Examples:

  • {mdbid: [510, 516]}: Mdbid is 510 or 516
  • {country: BE}: BE (Belgium) country matches
  • {municipality: paris}: Paris municipality matches
  • {subdivision: ontario}: Ontario subdivsion matches
  • {glob: foo}: Matches for {country,municipality,subdivision,provider,name} containing subtring foo
  • {coords: [[40.512764, -74.251961]]}: Matches where coordinate is within source bounding box
  • {maxkm: 20}: Matches with bounding box maximum size of 20km (kilometers)
  • {loaded: true}: Matches where sources are in DB/ previously downloaded (useful for updates, querying etc.)
  • {loaded: true, country: FR}: Matches where source previously loaded & country matches FR
  • {}: All sources / entire catalog

Commandline Documentation

Commandline documentation is available via running mobsql -h; for reference the generated documentation is available below:

Usage:
  ./mobsql [FLAGS]

Flags:
  -c string
        Command: should be one of {status,load,compute,purge}; note, you must pass -f when using -c.
        ---
        load: Load downloads the associated GTFS zip archives from the MobilityDB's
        URL; and then performs an import, pulling each GTFS Zip CSV file into
        the local SQLite database. Note that this operation is cached (e.g.
        subsequent requests with the same source will skip tables already imported
        if underlying GTFS CSV file checksums are unchanged).

        compute: Compute recomputes the application's computed tables (contrived from the
        source loaded GTFS tables) specified in application's configuration. Each
        computed table is based on a underlying view (see config). The concept of
        a computed table itself is similar to concept of a SQL materialized view;
        however 'recomputation' occurs on a per-source level. So you don't need
        to 'rematerialize' an entire table.

        status: Status queries the database for sources that match the given sourceset.
        This is a read-only operation which can be used to determine the effected
        sources for subsequent requests with the same sourceset.

        purge: Purge removes (deletes) all associated data from underlying GTFS tables
        in the SQLite database for the given sourceset.

  -dprof string
        Write debug pprof CPU profile to file
  -f string
        YAML configuration specification to apply to -c command.
        Pass as a YAML object - see Go documentation for encodable fields.
        The 'filter' property nested within can containing optional properties:
          {mdbid, country, municipality, subdivision, provider, name, glob, bbox, maxkm, loaded}
        Example YAML configuration specifications:
          -f '{}'
          -f '{filter: {mdbid: [510, 516]}}'
          -f '{filter: {country: BE}}'
          -f '{filter: {municipality: paris}}'
          -f '{filter: {subdivision: ontario}}'
          -f '{filter: {glob: foo}}'
          -f '{filter: {coords: [[40.512764, -74.251961]]}}'
          -f '{filter: {maxkm: 20}}'
          -f '{filter: {loaded: true}}'
          -f '{filter: {glob: foo, country: US}}'

  -h    Display help information
  -v    Display version information
Examples:
  mobsql -f '{filter: {}}' -c status (View all sources status)
  mobsql -f '{filter: {loaded: true}}' -c status (View all loaded sources status)
  mobsql -f '{filter: {glob: nyc}}' -c status (View all sources matching glob of nyc)
  mobsql -f '{filter: {country: FR}}' -c load (Load all souces matching France country)
  mobsql -f '{filter: {country: FR}}' -c purge (Purge database of all sources matching France country)
  mobsql -f '{filter: {}}' -c purge (Purge database of all sources)
  mobsql -h (Display help text)
  mobsql -v (Display version)

Go Module Documentation

Documentation for the Go module is available through pkg.go.dev. Note that the major functionalities of the go package (Load, Compute, Purge, and Status) are also available through the commandline application.

https://pkg.go.dev/git.sr.ht/~mil/mobsql

Documentation

Overview

Package mobsql is an interface for downloading and loading one or multiple GTFS archives (pulled from the Mobility Database catalog) into a SQLite database.

The database (seeded through mobsql) mirrors GTFS's specification (e.g. there is a transfers, stop_times, stops, agency tables etc.); however each GTFS table also has an additional 'source' column. The source column refers to Mobility Database mdb_id field. Besides that, GTFS schema is imported as 1-to-1 for GTFS schedule specification sans several exceptions (such as stop_times conversion of departure_time to int). Exceptions can be seen by examining the config package's Schema variable (and noting Conversion fields set on LoadColumn specs).

The only package endconsumers of this API should import and use directly is the top level git.sr.ht/~mil/mobsql; everything is aliased from there. Subpackages are internal implementations & may change between versions. See aliased package implementations for documentation on each function and type.

There are 6 primary functions exposed by mobsql's API:

  • InitializeRuntime: Initializes a Mobsql 'runtime' which represents a a connection to the SQLite DB & configuration params which all other API operations depend on.

  • FilterToSourceset: Converts a 'filter specification' to a set of MDBIDs which all Sourceset* operations operate based on. Filter specification is a broad way of 'searching' the Mobility Database for arbitrary GTFS sources to operate on.

  • Sourceset{Load,Compute,Purge,Status}: The main logic of the application. Allows load (downloading & importing GTFS to the DB), status (checking the GTFS sources that have been loaded), compute (converting SQL views into materialized tables based on the origin GTFS tables), and purge (removing GTFS data from the SQLite database).

The general usage pattern for using Mobsql as an API is:

  1. Create a runtime via InitializeRuntime()

  2. Determine which MDBID sources you will use either by using FilterToSourceset() OR by manually referencing the Mobility Database (https://database.mobilitydata.org/) for the MDBID needed

  3. Run Sourceset{Load,Compute,Status,Purge} by passing the runtime and the MDBIDs

The CLI (cli package) implements all the above functionality (1-3) and should be understood as a good reference implementation for using Mobsql as a Go library in practice.

Index

Constants

This section is empty.

Variables

View Source
var FilterToSourceset = apifilter.FilterToSourceset
View Source
var InitializeRuntime = apiinit.InitializeRuntime
View Source
var SourcesetCompute = apicompute.SourcesetCompute
View Source
var SourcesetLoad = apiload.SourcesetLoad
View Source
var SourcesetPurge = apipurge.SourcesetPurge
View Source
var SourcesetStatus = apistatus.SourcesetStatus

Functions

This section is empty.

Types

type ComputedTable added in v0.4.0

type ComputedTable = apptypes.ComputedTable

type LoadColumn added in v0.4.0

type LoadColumn = apptypes.LoadColumn

type MobsqlRuntime added in v0.4.0

type MobsqlRuntime = apptypes.MobsqlRuntime

type PurgeTablesOption added in v0.5.0

type PurgeTablesOption = apipurge.PurgeTablesOption
var PurgeTablesOptionAll PurgeTablesOption = apipurge.PurgeTablesOptionAll
var PurgeTablesOptionComputed PurgeTablesOption = apipurge.PurgeTablesOptionComputed
var PurgeTablesOptionGTFS PurgeTablesOption = apipurge.PurgeTablesOptionGTFS

type RuntimeConfig added in v0.4.0

type RuntimeConfig = apptypes.RuntimeConfig

type SchemaExtra added in v0.4.0

type SchemaExtra = apptypes.SchemaExtra

type SourceStatus

type SourceStatus = apistatus.SourceStatus

type SourcesetFilter

type SourcesetFilter = apifilter.SourcesetFilter

type SourcesetOpResult

type SourcesetOpResult = apptypes.SourcesetOpResult

type TableSpec added in v0.4.0

type TableSpec = apptypes.TableSpec

type View added in v0.4.0

type View = apptypes.View

Directories

Path Synopsis
api
Package main is the CLI interface for mobsql consuming the mobsql API.
Package main is the CLI interface for mobsql consuming the mobsql API.
Package config contains the data structures used to represent the database schema for seeding and loading
Package config contains the data structures used to represent the database schema for seeding and loading
util
utildownload
Package utildownload contains some internal download functions utilized by the primary mobsql app logic.
Package utildownload contains some internal download functions utilized by the primary mobsql app logic.
utilfiles
Package utildownload contains internal helper functions related to file IO utilized by the primary mobsql app logic.
Package utildownload contains internal helper functions related to file IO utilized by the primary mobsql app logic.
utilfuncs
Package utildownload contains internal helper functions utilized by the primary mobsql app logic.
Package utildownload contains internal helper functions utilized by the primary mobsql app logic.
utillog
Package utillog is a simple logging interface with three different types of log messages that can be enabled via the 3 variables LogInfo, LogWarn, and LogDebug.
Package utillog is a simple logging interface with three different types of log messages that can be enabled via the 3 variables LogInfo, LogWarn, and LogDebug.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL