Documentation
¶
Overview ¶
Package span implements common functions.
Copyright 2015 by Leipzig University Library, http://ub.uni-leipzig.de The Finc Authors, http://finc.info Martin Czygan, <martin.czygan@uni-leipzig.de>
This file is part of some open source application.
Some open source application is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Some open source application is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with Foobar. If not, see <http://www.gnu.org/licenses/>.
@license GPL-3.0+ <http://spdx.org/licenses/GPL-3.0+>
Index ¶
- Constants
- Variables
- func DetectLang3(text string) (string, error)
- func LanguageIdentifier(s string) string
- func LoadSet(r io.Reader, m map[string]struct{}) error
- func ReadLines(filename string) (lines []string, err error)
- func UnescapeTrim(s string) string
- type FileReader
- type LinkReader
- type SavedLink
- type SavedReaders
- type Skip
- type SkipReader
- type WriteCounter
- type ZipContentReader
- type ZipOrPlainLinkReader
Constants ¶
const ( // AppVersion of span package. Commandline tools will show this on -v. AppVersion = "0.1.193" // KeyLengthLimit is a limit imposed by memcached protocol, which is used // for blob storage as of June 2015. If we change the key value store, // this limit might become obsolete. KeyLengthLimit = 250 )
Variables ¶
var ISO639BibliographicToThree = map[string]string{
"alb": "sqi",
"arm": "hye",
"baq": "eus",
"bur": "mya",
"chi": "zho",
"cze": "ces",
"dut": "nld",
"fre": "fra",
"geo": "kat",
"ger": "deu",
"gre": "ell",
"ice": "isl",
"mac": "mkd",
"mao": "mri",
"may": "msa",
"per": "fas",
"rum": "ron",
"slo": "slk",
"tib": "bod",
"wel": "cym",
}
ISO639BibliographicToThree maps 639-2 identifier of the bibliographic applications to three-letter 639-3 identifier.
var ISO639NameToThree = map[string]string{}/* 7849 elements not displayed */
ISO639NameToThree maps a language name to three letter identifier.
var ISO639NameToThreeLower = map[string]string{}/* 7850 elements not displayed */
var ISO639OneToThree = map[string]string{}/* 184 elements not displayed */
ISO639OneToThree maps 639-1 identifier (two letters) (if there is one) to a three-letter 639-3 identifier.
var ISSNPattern = regexp.MustCompile(`[0-9]{4,4}-[0-9]{3,3}[0-9X]`)
ISSNPattern is a regular expression matching standard ISSN.
Functions ¶
func DetectLang3 ¶
DetectLang3 returns the best guess 3-letter language code for a given text.
func LanguageIdentifier ¶ added in v0.1.130
LanguageIdentifier returns the three letter identifier from any string. All data from http://www-01.sil.org/iso639-3/codes.asp.
func LoadSet ¶ added in v0.1.130
LoadSet reads the content of from a reader and creates a set from each line.
func ReadLines ¶ added in v0.1.130
ReadLines returns a list of trimmed lines in a file. Empty lines are skipped.
func UnescapeTrim ¶
UnescapeTrim unescapes HTML character references and trims the space of a given string.
Types ¶
type FileReader ¶ added in v0.1.130
type FileReader struct { Filename string // contains filtered or unexported fields }
FileReader creates a ReadCloser from a filename. If postpones error handling up until the first read. TODO: Throw this out.
func (*FileReader) Close ¶ added in v0.1.130
func (r *FileReader) Close() (err error)
Close closes the file.
type LinkReader ¶ added in v0.1.130
type LinkReader struct { Link string // contains filtered or unexported fields }
LinkReader implements io.Reader for a URL.
type SavedLink ¶ added in v0.1.130
type SavedLink struct { Link string // contains filtered or unexported fields }
SavedLink saves the content of a URL to a file.
type SavedReaders ¶ added in v0.1.130
SavedReaders takes a list of readers and persists their content in a temporary file.
func (*SavedReaders) Remove ¶ added in v0.1.130
func (r *SavedReaders) Remove()
Remove remove any left over temporary file.
func (*SavedReaders) Save ¶ added in v0.1.130
func (r *SavedReaders) Save() (filename string, err error)
Save saves all readers to a temporary file and returns the filename.
type SkipReader ¶ added in v0.1.130
type SkipReader struct { CommentPrefixes []string // contains filtered or unexported fields }
SkipReader skips empty lines and lines with comments.
func NewSkipReader ¶ added in v0.1.130
func NewSkipReader(r *bufio.Reader) *SkipReader
NewSkipReader creates a new SkipReader.
func (SkipReader) ReadString ¶ added in v0.1.130
func (r SkipReader) ReadString(delim byte) (s string, err error)
ReadString will return only non-empty lines and lines not starting with a comment prefix.
type WriteCounter ¶ added in v0.1.130
type WriteCounter struct {
// contains filtered or unexported fields
}
WriteCounter counts the number of bytes written through it.
func (*WriteCounter) Count ¶ added in v0.1.130
func (w *WriteCounter) Count() uint64
Count returns the number of bytes written.
type ZipContentReader ¶ added in v0.1.130
type ZipContentReader struct { Filename string // contains filtered or unexported fields }
ZipContentReader returns the concatenated content of all files in a zip archive given by its filename. All content is temporarily stored in memory, so this type should only be used with smaller archives.
type ZipOrPlainLinkReader ¶ added in v0.1.130
type ZipOrPlainLinkReader struct { Link string // contains filtered or unexported fields }
ZipOrPlainLinkReader is a reader that transparently handles zipped and uncompressed content, given a URL as string.
Directories
¶
Path | Synopsis |
---|---|
cmd
|
|
span-check
span-check runs quality checks on input data
|
span-check runs quality checks on input data |
span-crossref-snapshot
Given as single file with crossref works API message, create a potentially smaller file, which contains only the most recent version of each document.
|
Given as single file with crossref works API message, create a potentially smaller file, which contains only the most recent version of each document. |
span-export
span-export creates various destination formats, mostly for SOLR.
|
span-export creates various destination formats, mostly for SOLR. |
span-import
span-reshape is a dumbed down span-import.
|
span-reshape is a dumbed down span-import. |
span-join-assets
span-join-assets combines a directory of json or single column TSV configurations into a single file.
|
span-join-assets combines a directory of json or single column TSV configurations into a single file. |
span-oa-filter
span-oa-filter will set x.oa to true, if the ISSN of the record is contained in a given ISSN list.
|
span-oa-filter will set x.oa to true, if the ISSN of the record is contained in a given ISSN list. |
span-redact
redact intermediate schema
|
redact intermediate schema |
span-tag
span-tag takes an intermediate schema file and a configuration forest of filters for various tags and runs all filters on every record of the input to produce a stream of tagged records.
|
span-tag takes an intermediate schema file and a configuration forest of filters for various tags and runs all filters on every record of the input to produce a stream of tagged records. |
span-update-labels
span-update-labels takes a TSV of an IDs and ISILs and updates an intermediate schema record x.labels field accordingly.
|
span-update-labels takes a TSV of an IDs and ISILs and updates an intermediate schema record x.labels field accordingly. |
Package sets implements basic set types.
|
Package sets implements basic set types. |
encoding
|
|
csv
Package csv implements a decoder, that supports CSV decoding.
|
Package csv implements a decoder, that supports CSV decoding. |
formeta
Package formeta implements marshaling for formeta (metafacture internal format).
|
Package formeta implements marshaling for formeta (metafacture internal format). |
tsv
Package tsv implements a decoder for tab separated data.
|
Package tsv implements a decoder for tab separated data. |
Package filter implements flexible ISIL attachments with expression trees[1], serialized as JSON.
|
Package filter implements flexible ISIL attachments with expression trees[1], serialized as JSON. |
formats
|
|
doaj
Package doaj maps DOAJ metadata to intermediate schema.
|
Package doaj maps DOAJ metadata to intermediate schema. |
dummy
Package dummy is just a minimal example.
|
Package dummy is just a minimal example. |
elsevier
TODO.
|
TODO. |
jstor
TODO.
|
TODO. |
Package licensing implements support for KBART and ISIL attachments.
|
Package licensing implements support for KBART and ISIL attachments. |
kbart
Package kbart implements support for KBART (Knowledge Bases And Related Tools working group, http://www.uksg.org/kbart/) holding files (http://www.uksg.org/kbart/s5/guidelines/data_format).
|
Package kbart implements support for KBART (Knowledge Bases And Related Tools working group, http://www.uksg.org/kbart/) holding files (http://www.uksg.org/kbart/s5/guidelines/data_format). |
Package parallel implements helpers for fast processing of line oriented inputs.
|
Package parallel implements helpers for fast processing of line oriented inputs. |
Package quality implements quality checks.
|
Package quality implements quality checks. |
Package sift implements various filters, that help with the record labelling (ISIL) process.
|
Package sift implements various filters, that help with the record labelling (ISIL) process. |