git

package module
v2.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 1, 2022 License: BSD-3-Clause Imports: 13 Imported by: 26

README

go-whosonfirst-iterate-git

Go package implementing go-whosonfirst-iterate/emitter functionality for Git repositories.

Important

Go Reference

Example

package main

import (
	"context"
	"flag"
	"fmt"
	_ "github.com/whosonfirst/go-whosonfirst-iterate-git/v2"
	"github.com/whosonfirst/go-whosonfirst-iterate/v2/emitter"
	"github.com/whosonfirst/go-whosonfirst-iterate/v2/iterator"
	"io"
	"log"
	"os"
	"strings"
	"sync/atomic"
)

func main() {

	valid_schemes := strings.Join(emitter.Schemes(), ",")
	emitter_desc := fmt.Sprintf("A valid whosonfirst/go-whosonfirst-iterate/v2 URI. Supported emitter URI schemes are: %s", valid_schemes)

	var emitter_uri = flag.String("emitter-uri", "git://", emitter_desc)

	flag.Usage = func() {
		fmt.Fprintf(os.Stderr, "Count files in one or more whosonfirst/go-whosonfirst-iterate/v2 sources.\n")
		fmt.Fprintf(os.Stderr, "Usage:\n\t %s [options] uri(N) uri(N)\n", os.Args[0])
		fmt.Fprintf(os.Stderr, "Valid options are:\n\n")
		flag.PrintDefaults()
	}

	flag.Parse()

	var count int64
	count = 0

	emitter_cb := func(ctx context.Context, path string, fh io.ReadSeeker, args ...interface{}) error {
		atomic.AddInt64(&count, 1)
		return nil
	}

	ctx := context.Background()

	iter, _ := iterator.NewIterator(ctx, *emitter_uri, emitter_cb)

	uris := flag.Args()
	iter.IterateURIs(ctx, uris...)

	log.Printf("Counted %d records (saw %d records)\n", count, iter.Seen)
}

Error handling omitted for the sake of brevity.

Tools

$> make cli
go build -mod vendor -o bin/count cmd/count/main.go
go build -mod vendor -o bin/emit cmd/emit/main.go
count

Count files in one or more whosonfirst/go-whosonfirst-iterate/emitter sources.

$> ./bin/count -h
Count files in one or more whosonfirst/go-whosonfirst-iterate/emitter sources.
Usage:
	 ./bin/count [options] uri(N) uri(N)
Valid options are:

  -emitter-uri string
    	A valid whosonfirst/go-whosonfirst-iterate/emitter URI. Supported emitter URI schemes are: directory://,featurecollection://,file://,filelist://,geojsonl://,git://,repo:// (default "git://")
$> ./bin/count \
	https://github.com/sfomuseum-data/sfomuseum-data-architecture.git

2021/02/17 15:54:32 time to index paths (1) 26.076332877s
2021/02/17 15:54:32 Counted 857 records (indexed 857 records)
$> ./bin/count \
	-emitter-uri 'git://?include=properties.mz:is_current=1&include=properties.sfomuseum:placetype=gate' \
	https://github.com/sfomuseum-data/sfomuseum-data-architecture.git

2021/02/17 16:00:17 time to index paths (1) 24.470490474s
2021/02/17 16:00:17 Counted 120 records (indexed 120 records)

By default go-whosonfirst-iterate-git clones Git repositories in to memory. If your emitter URI contains a path then repositories will be cloned in that path:

$> bin/count \
	-emitter-uri 'git:///tmp/data' \
	git@github.com:whosonfirst-data/whosonfirst-data-admin-is.git

2021/02/17 15:56:54 time to index paths (1) 3.742559429s
2021/02/17 15:56:54 Counted 436 records (indexed 436 records)

By default repositories cloned in to a path are removed. If you want to preserve the cloned repository include a ?preserve=1 query parameter in your URI string:

$> bin/count \
	-emitter-uri 'git:///tmp/data?preserve=1' \
	git@github.com:whosonfirst-data/whosonfirst-data-admin-is.git

2021/02/17 15:57:49 time to index paths (1) 3.465746865s
2021/02/17 15:57:49 Counted 436 records (indexed 436 records)

In this example the clone repository will be store in /tmp/data/whosonfirst-data-admin-is.git.

emit

Publish features from one or more whosonfirst/go-whosonfirst-iterate sources.

> ./bin/emit -h
Publish features from one or more whosonfirst/go-whosonfirst-iterate/emitter sources.
Usage:
	 ./bin/emit [options] uri(N) uri(N)
Valid options are:

  -emitter-uri string
    	A valid whosonfirst/go-whosonfirst-iterate/emitter URI. Supported emitter URI schemes are: directory://,featurecollection://,file://,filelist://,geojsonl://,git://,repo:// (default "git://")
  -geojson
    	Emit features as a well-formed GeoJSON FeatureCollection record.
  -json
    	Emit features as a well-formed JSON array.
  -null
    	Publish features to /dev/null
  -stdout
    	Publish features to STDOUT. (default true)

For example:

$> ./bin/emit \
	-geojson \
	-emitter-uri 'git://?include=properties.mz:is_current=1&include=properties.sfomuseum:placetype=gate' \
	https://github.com/sfomuseum-data/sfomuseum-data-architecture.git \

| jq '.features[]["properties"]["wof:label"]'

"C45 (2019)"
"C42A (2019)"
"C48A (2019)"
"F77 (2019)"
"F84D (2019)"
"F84C (2019)"
"F84B (2019)"
"F70A (2019)"
"F84A (2019)
...and so on

See also

Documentation

Overview

Package git implements the `whosonfirst/go-whosonfirst-iterate/v2` interfaces for iterating (Who's On First) documents stored in a Git repository.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewGitEmitter

func NewGitEmitter(ctx context.Context, uri string) (emitter.Emitter, error)

NewGitEmitter() returns a new `GitEmitter` instance configured by 'uri' in the form of:

git://{PATH}?{PARAMETERS}

Where {PATH} is an optional path on disk where a repository will be clone to (default is to clone repository in memory) and {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules. * `?preserve=` A boolean value indicating whether a Git repository (cloned to disk) should not be removed after processing.

Types

type GitEmitter

type GitEmitter struct {
	emitter.Emitter
	// contains filtered or unexported fields
}

GitEmitter implements the `Emitter` interface for crawling records in a Git repository.

func (*GitEmitter) WalkURI

func (em *GitEmitter) WalkURI(ctx context.Context, index_cb emitter.EmitterCallbackFunc, uri string) error

WalkURI() walks (crawls) the Git repository identified by 'uri' and for each file (not excluded by any filters specified when `idx` was created) invokes 'index_cb'.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL