Scripts
A list of helpful scripts to load data for use in the Search API.
A list of scripts
Retrieve CMD Datasets
This script retrieves a list of datasets stored in mongodb instance and will check that the url to dataset resource on the ons website exists before storing the data in a csv file.
You can run either of the following commands:
if you do not set the flags or environment variables for mongodb bind address and filename, the script will use a default value set to localhost:27017
and cmd-datasets.csv
respectively.
Load Datasets
This script reads a csv file defined by flag/environment variable or default value and stores the dataset data into elasticsearch. The csv must contain particular headers (but not in any necessary order).
One can use the Retrieve cmd datasets script to generate a new csv file or use the pre-generated one stored as cmd-datasets.csv
.
- Use Makefile
- Set
dataset_index
, filename
and/or elasticsearch_url
environment variable with:
export dataset_index=<elasticsearch index>
export filename=<file name and loaction>
export elasticsearch_url=<elasticsearch bind address>
- Optionally set
dimensions_filename
environment variable with, should end with .json
:
export taxonomy_filename=<filename and location>
- Optionally set
taxonomy_filename
environment variable with, should end with .json
:
export taxonomy_filename=<filename and location>
- Use go run command with or without flags
-dataset-index
, -filename
and/or -elasticsearch_url
being set
go run upload-datasets/main.go -dataset-index=<elasticsearch index> -filename=<file name and loaction> -dimensions-filename=<dimensions file and location> -taxonomy-filename=<taxonomy file name and location> -elasticsearch_url=<elasticsearch bin address>
Taxonomy and Dimensions will be stored in a json file that will be read into memory in the dataset search API on start up, these file names and locations should match the environment configurations for TAXONOMY_FILENAME
and DIMENSIONS_FILENAME
respectively. For ease of use just run the make commands without editing flags or setting environment variables for these variables.
Retrieve Dataset Taxonomy
This script scrapes the ons website to pull out taxonomy hierarchy by iterating through pages.
You can run either of the following commands:
if you do not set the flag or environment variable for filename, then the script will use a default value set to ../taxonomy/taxonomy.json
.