Documentation
¶
Overview ¶
Package gnparser implements the main use-case of the project -- parsing scientific names. There are methods to parse one name at a time, a slice of names, or a stream of names. All methods return results in the same order as input. It is achieved by restoring the order after concurrent execution of the parsing process.
Example ¶
package main import ( "fmt" "github.com/gnames/gnparser" "github.com/gnames/gnparser/ent/parsed" ) func main() { names := []string{"Pardosa moesta Banks, 1892", "Bubo bubo"} cfg := gnparser.NewConfig() gnp := gnparser.New(cfg) res := gnp.ParseNames(names) fmt.Println(res[0].Authorship.Verbatim) fmt.Println(res[1].Canonical.Simple) fmt.Println(parsed.HeaderCSV(gnp.Format())) fmt.Println(res[0].Output(gnp.Format())) }
Output: Banks, 1892 Bubo bubo Id,Verbatim,Cardinality,CanonicalStem,CanonicalSimple,CanonicalFull,Authorship,Year,Quality e2fdf10b-6a36-5cc7-b6ca-be4d3b34b21f,"Pardosa moesta Banks, 1892",2,Pardosa moest,Pardosa moesta,Pardosa moesta,"Banks, 1892",1892,1
Index ¶
- Variables
- func NewPool(cfg Config, size int) chan GNparser
- type Config
- type GNparser
- type Option
- func OptBatchSize(i int) Option
- func OptCode(c nomcode.Code) Option
- func OptDebug(b bool) Option
- func OptFormat(f gnfmt.Format) Option
- func OptIgnoreHTMLTags(b bool) Option
- func OptIsTest(b bool) Option
- func OptJobsNum(i int) Option
- func OptPort(i int) Option
- func OptWithCapitaliation(b bool) Option
- func OptWithDetails(b bool) Option
- func OptWithNoOrder(b bool) Option
- func OptWithPreserveDiaereses(b bool) Option
- func OptWithSpeciesGroupCut(b bool) Option
- func OptWithStream(b bool) Option
- func OptWithWebLogs(b bool) Option
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ( // Version is the version of the gnparser package. When Makefile is // used, the version is calculated out of Git tags. Version = "v1.11.1" // Build is a timestamp of when Makefile was used to compile // the gnparser code. If go build was used, Build stays empty. Build string )
Functions ¶
Types ¶
type Config ¶ added in v1.0.5
type Config struct { // BatchSize sets the maximum number of elements in names-strings slice. BatchSize int // Code contains optional nomenclatural code value. This option is // useful to solve ambiguous parsing cases and to add cultivar botanical // rules. nomcode.Code // Debug sets a "debug" state for parsing. The debug state forces output // format to showing parsed ast tree. Debug bool // Format sets the output format for CLI and Web interfaces. // There are 3 formats available: 'CSV', 'CompactJSON' and // 'PrettyJSON'. Format gnfmt.Format // IgnoreHTMLTags can be set to true when it is desirable to clean up names // from a few HTML tags often present in names-strings that were planned to // be presented via an HTML page. IgnoreHTMLTags bool // IsTest can be set to true when parsing functionality is used for tests. // In such cases the `ParserVersion` field is presented as `test_version` // instead of displaying the actual version of `gnparser`. IsTest bool // JobsNum sets a level of parallelism used during parsing of // a stream of name-strings. JobsNum int // Port to run wer-service. Port int // WithCapitalization flag, when true, the first letter of a name-string // is capitalized, if appropriate. WithCapitalization bool // WithDetails can be set to true when a simplified output is not sufficient // for obtaining a required information. WithDetails bool // WithNoOrder flag, when true, output and input are in different order. WithNoOrder bool // WithPreserveDiaereses flag, when true, diaereses will not be transliterated WithPreserveDiaereses bool // WithStream changes from parsing a batch by batch, to parsing one name // at a time. When WithStream is true, BatchSize setting is ignored. WithStream bool // WithWebLogs flag enables logs when running web-service. This flag is // ignored if `Port` value is not set. WithWebLogs bool // WithSpeciesGroupCut flag means that stemmed version of autonyms (ICN) and // species group names (ICZN) will be truncated to species. It helps to // simplify matching names like `Aus bus` and `Aus bus bus`. WithSpeciesGroupCut bool }
Config keeps settings that might affect how parsing is done, of change the parsing output.
type GNparser ¶ added in v1.0.3
type GNparser interface { // ChangeConfig allows to modify settings of GNparser. Changing settings // might modify parsing process, and the final output of results. ChangeConfig(opts ...Option) GNparser // Debug parses a string and outputs raw AST tree from PEG engine. Debug(s string) []byte // Format returns currently chosen desired output format of a JSON or // CSV output. Format() gnfmt.Format // GetVersion provides a version and a build timestamp of gnparser. GetVersion() gnvers.Version // ParseName takes a name-string, and returns parsed results for the name. ParseName(string) parsed.Parsed // ParseNameStream takes a context, an input channel that takes a // a name-string and its position in the input. It returns parsed results // that come in the same order as the input. ParseNameStream(context.Context, <-chan nameidx.NameIdx, chan<- parsed.Parsed) // ParseNames takes a slice of name-strings, and returns a slice of // parsed results in the same order as the input. ParseNames([]string) []parsed.Parsed // WebLogs returns a boolean to show or not the web-service logs. WebLogs() bool }
GNparser is the main use-case interface. It provides methods required for parsing scientific names.
type Option ¶ added in v1.0.5
type Option func(*Config)
Option is a type that has to be returned by all Option functions. Such functions are able to modify the settings of a Config object.
func OptBatchSize ¶ added in v1.0.5
OptBatchSize sets the max number of names in a batch.
func OptFormat ¶ added in v1.0.5
OptFormat sets the formatting option for CLI or Web presentation. It accepts a gnfmt.Format value to control the output format.
func OptIgnoreHTMLTags ¶ added in v1.0.5
OptKeepHTMLTags sets the KeepHTMLTags field. This option is useful if names with HTML tags shold not be parsed, or they are absent in input data.
func OptWithCapitaliation ¶ added in v1.2.0
OptWithCapitaliation sets the WithCapitalization field.
func OptWithDetails ¶ added in v1.0.5
OptWithDetails sets the WithDetails field.
func OptWithNoOrder ¶ added in v1.0.9
OptWithNoOrder sets the WithNoOrder field.
func OptWithPreserveDiaereses ¶ added in v1.5.6
OptWithPreserveDiaereses sets the PreserveDiaereses field.
func OptWithSpeciesGroupCut ¶ added in v1.9.0
OptWithSpeciesGroupCut sets WithSpeciesGroupCut field.
func OptWithStream ¶ added in v1.0.5
OptWithDetails sets the WithDetails field.
func OptWithWebLogs ¶ added in v1.6.0
OptWithWebLogs sets the WithWebLogs field.
Source Files
¶
Directories
¶
Path | Synopsis |
---|---|
Package main provides C-binding functionality to use parser in other languages.
|
Package main provides C-binding functionality to use parser in other languages. |
ent
|
|
internal/preprocess
Package preprocess performs preparsing filtering and modification of a scientific-name.
|
Package preprocess performs preparsing filtering and modification of a scientific-name. |
nameidx
Package nameidx provides a structure that preserves original position of a name-string in an input slice.
|
Package nameidx provides a structure that preserves original position of a name-string in an input slice. |
parsed
Package parsed provides a user-friendly output of parsing result, as well as functions to convert the result to CSV or JSON-encoded strings.
|
Package parsed provides a user-friendly output of parsing result, as well as functions to convert the result to CSV or JSON-encoded strings. |
parser
Package parser provides entities and methods to perform Parsing Expression Grammer parsing on scientific names.
|
Package parser provides entities and methods to perform Parsing Expression Grammer parsing on scientific names. |
stemmer
http://snowballstem.org/otherapps/schinke/ http://caio.ueberalles.net/a_stemming_algorithm_for_latin_text_databases-schinke_et_al.pdf
|
http://snowballstem.org/otherapps/schinke/ http://caio.ueberalles.net/a_stemming_algorithm_for_latin_text_databases-schinke_et_al.pdf |
str
Package str provides functions for manipulating scientific name-strings.
|
Package str provides functions for manipulating scientific name-strings. |
cmd
Package cmd creates a command line application for parsing scientific names.
|
Package cmd creates a command line application for parsing scientific names. |
io
|
|
dict
Package dict provides lookup data for gnparser.
|
Package dict provides lookup data for gnparser. |
web
Package web provides RESTful API service and a website for gnparser.
|
Package web provides RESTful API service and a website for gnparser. |