Documentation
¶
Overview ¶
Package gnparser implements the main use-case of the project -- parsing scientific names. There are methods to parse one name at a time, a slice of names, or a stream of names. All methods return results in the same order as input. It is achieved by restoring the order after concurrent execution of the parsing process.
Example ¶
package main import ( "fmt" "github.com/gnames/gnparser" "github.com/gnames/gnparser/ent/parsed" ) func main() { names := []string{"Pardosa moesta Banks, 1892", "Bubo bubo"} cfg := gnparser.NewConfig() gnp := gnparser.New(cfg) res := gnp.ParseNames(names) fmt.Println(res[0].Authorship.Verbatim) fmt.Println(res[1].Canonical.Simple) fmt.Println(parsed.HeaderCSV(gnp.Format())) fmt.Println(res[0].Output(gnp.Format())) }
Output: Banks, 1892 Bubo bubo Id,Verbatim,Cardinality,CanonicalStem,CanonicalSimple,CanonicalFull,Authorship,Year,Quality e2fdf10b-6a36-5cc7-b6ca-be4d3b34b21f,"Pardosa moesta Banks, 1892",2,Pardosa moest,Pardosa moesta,Pardosa moesta,"Banks, 1892",1892,1
Index ¶
- Variables
- func NewPool(cfg Config, size int) chan GNparser
- type Config
- type GNparser
- type Option
- func OptBatchSize(i int) Option
- func OptDebug(b bool) Option
- func OptFormat(s string) Option
- func OptIgnoreHTMLTags(b bool) Option
- func OptIsTest(b bool) Option
- func OptJobsNum(i int) Option
- func OptPort(i int) Option
- func OptWithCapitaliation(b bool) Option
- func OptWithCultivars(b bool) Option
- func OptWithDetails(b bool) Option
- func OptWithNoOrder(b bool) Option
- func OptWithPreserveDiaereses(b bool) Option
- func OptWithSpeciesGroupCut(b bool) Option
- func OptWithStream(b bool) Option
- func OptWithWebLogs(b bool) Option
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ( // Version is the version of the gnparser package. When Makefile is // used, the version is calculated out of Git tags. Version = "v1.10.4" // Build is a timestamp of when Makefile was used to compile // the gnparser code. If go build was used, Build stays empty. Build string )
Functions ¶
Types ¶
type Config ¶ added in v1.0.5
type Config struct { // BatchSize sets the maximum number of elements in names-strings slice. BatchSize int // Debug sets a "debug" state for parsing. The debug state forces output // format to showing parsed ast tree. Debug bool // Format sets the output format for CLI and Web interfaces. // There are 3 formats available: 'CSV', 'CompactJSON' and // 'PrettyJSON'. Format gnfmt.Format // IgnoreHTMLTags can be set to true when it is desirable to clean up names // from a few HTML tags often present in names-strings that were planned to // be presented via an HTML page. IgnoreHTMLTags bool // IsTest can be set to true when parsing functionality is used for tests. // In such cases the `ParserVersion` field is presented as `test_version` // instead of displaying the actual version of `gnparser`. IsTest bool // JobsNum sets a level of parallelism used during parsing of // a stream of name-strings. JobsNum int // Port to run wer-service. Port int // WithCapitalization flag, when true, the first letter of a name-string // is capitalized, if appropriate. WithCapitalization bool // WithCultivars flag, when true, cultivar names will be parsed and // modify cardinality, normalized and canonical output. WithCultivars bool // WithDetails can be set to true when a simplified output is not sufficient // for obtaining a required information. WithDetails bool // WithNoOrder flag, when true, output and input are in different order. WithNoOrder bool // WithPreserveDiaereses flag, when true, diaereses will not be transliterated WithPreserveDiaereses bool // WithStream changes from parsing a batch by batch, to parsing one name // at a time. When WithStream is true, BatchSize setting is ignored. WithStream bool // WithWebLogs flag enables logs when running web-service. This flag is // ignored if `Port` value is not set. WithWebLogs bool // WithSpeciesGroupCut flag means that stemmed version of autonyms (ICN) and // species group names (ICZN) will be truncated to species. It helps to // simplify matching names like `Aus bus` and `Aus bus bus`. WithSpeciesGroupCut bool }
Config keeps settings that might affect how parsing is done, of change the parsing output.
type GNparser ¶ added in v1.0.3
type GNparser interface { // ChangeConfig allows to modify settings of GNparser. Changing settings // might modify parsing process, and the final output of results. ChangeConfig(opts ...Option) GNparser // Debug parses a string and outputs raw AST tree from PEG engine. Debug(s string) []byte // Format returns currently chosen desired output format of a JSON or // CSV output. Format() gnfmt.Format // GetVersion provides a version and a build timestamp of gnparser. GetVersion() gnvers.Version // ParseName takes a name-string, and returns parsed results for the name. ParseName(string) parsed.Parsed // ParseNameStream takes a context, an input channel that takes a // a name-string and its position in the input. It returns parsed results // that come in the same order as the input. ParseNameStream(context.Context, <-chan nameidx.NameIdx, chan<- parsed.Parsed) // ParseNames takes a slice of name-strings, and returns a slice of // parsed results in the same order as the input. ParseNames([]string) []parsed.Parsed // WebLogs returns a boolean to show or not the web-service logs. WebLogs() bool }
GNparser is the main use-case interface. It provides methods required for parsing scientific names.
type Option ¶ added in v1.0.5
type Option func(*Config)
Option is a type that has to be returned by all Option functions. Such functions are able to modify the settings of a Config object.
func OptBatchSize ¶ added in v1.0.5
OptBatchSize sets the max number of names in a batch.
func OptFormat ¶ added in v1.0.5
OptFormat takes a string (one of 'csv', 'compact', 'pretty') to set the formatting option for the CLI or Web presentation. If some other string is entered, the default, 'CSV' format is set, accompanied by a warning.
func OptIgnoreHTMLTags ¶ added in v1.0.5
OptKeepHTMLTags sets the KeepHTMLTags field. This option is useful if names with HTML tags shold not be parsed, or they are absent in input data.
func OptWithCapitaliation ¶ added in v1.2.0
OptWithCapitaliation sets the WithCapitalization field.
func OptWithCultivars ¶ added in v1.3.0
OptWithCultivars sets the EnableCultivars field.
func OptWithDetails ¶ added in v1.0.5
OptWithDetails sets the WithDetails field.
func OptWithNoOrder ¶ added in v1.0.9
OptWithNoOrder sets the WithNoOrder field.
func OptWithPreserveDiaereses ¶ added in v1.5.6
OptWithPreserveDiaereses sets the PreserveDiaereses field.
func OptWithSpeciesGroupCut ¶ added in v1.9.0
OptWithSpeciesGroupCut sets WithSpeciesGroupCut field.
func OptWithStream ¶ added in v1.0.5
OptWithDetails sets the WithDetails field.
func OptWithWebLogs ¶ added in v1.6.0
OptWithWebLogs sets the WithWebLogs field.
Source Files
¶
Directories
¶
Path | Synopsis |
---|---|
Package main provides C-binding functionality to use parser in other languages.
|
Package main provides C-binding functionality to use parser in other languages. |
ent
|
|
internal/preprocess
Package preprocess performs preparsing filtering and modification of a scientific-name.
|
Package preprocess performs preparsing filtering and modification of a scientific-name. |
nameidx
Package nameidx provides a structure that preserves original position of a name-string in an input slice.
|
Package nameidx provides a structure that preserves original position of a name-string in an input slice. |
parsed
Package parsed provides a user-friendly output of parsing result, as well as functions to convert the result to CSV or JSON-encoded strings.
|
Package parsed provides a user-friendly output of parsing result, as well as functions to convert the result to CSV or JSON-encoded strings. |
parser
Package parser provides entities and methods to perform Parsing Expression Grammer parsing on scientific names.
|
Package parser provides entities and methods to perform Parsing Expression Grammer parsing on scientific names. |
stemmer
http://snowballstem.org/otherapps/schinke/ http://caio.ueberalles.net/a_stemming_algorithm_for_latin_text_databases-schinke_et_al.pdf
|
http://snowballstem.org/otherapps/schinke/ http://caio.ueberalles.net/a_stemming_algorithm_for_latin_text_databases-schinke_et_al.pdf |
str
Package str provides functions for manipulating scientific name-strings.
|
Package str provides functions for manipulating scientific name-strings. |
cmd
Package cmd creates a command line application for parsing scientific names.
|
Package cmd creates a command line application for parsing scientific names. |
io
|
|
dict
Package dict provides lookup data for gnparser.
|
Package dict provides lookup data for gnparser. |
web
Package web provides RESTful API service and a website for gnparser.
|
Package web provides RESTful API service and a website for gnparser. |