Documentation ¶
Overview ¶
Package parsed provides a user-friendly output of parsing result, as well as functions to convert the result to CSV or JSON-encoded strings.
Index ¶
- Variables
- func HeaderCSV(f gnfmt.Format) string
- func NormalizeByType(wrd string, wt WordType) string
- type Annotation
- type Approximation
- type AuthGroup
- type Authors
- type Authorship
- type Canonical
- type Comparison
- type Details
- type DetailsApproximation
- type DetailsComparison
- type DetailsGraftChimeraFormula
- type DetailsHybridFormula
- type DetailsInfraspecies
- type DetailsSpecies
- type DetailsUninomial
- type Infraspecies
- type InfraspeciesElem
- type Parsed
- type ParsedWithIdx
- type QualityWarning
- type Species
- type Uninomial
- type Warning
- type Word
- type WordType
- type Year
Constants ¶
This section is empty.
Variables ¶
var WarningQualityMap = map[Warning]int{ TailWarn: 4, ApostrOtherWarn: 3, AuthAmbiguousFiliusWarn: 2, AuthDoubleParensWarn: 4, AuthEmendWarn: 2, AuthEmendWithoutDotWarn: 3, AuthExWarn: 2, AuthInWarn: 2, AuthExWithDotWarn: 3, AuthInWithDotWarn: 3, AuthMissingOneParensWarn: 4, AuthQuestionWarn: 4, AuthShortWarn: 3, AuthUnknownWarn: 2, AuthUpperCaseWarn: 2, BacteriaMaybeWarn: 1, BotanyAuthorNotSubgenWarn: 2, CandidatusName: 2, CanonicalApostropheWarn: 3, CapWordQuestionWarn: 4, CharBadWarn: 2, ContainsIgnoredAnnotation: 3, CultivarEpithetWarn: 2, DashOtherWarn: 2, DotEpithetWarn: 3, GenusAbbrWarn: 4, GenusUpperCharAfterDash: 2, GraftChimeraCharNoSpaceWarn: 3, GraftChimeraFormulaIncompleteWarn: 4, GraftChimeraFormulaProbIncompleteWarn: 2, GraftChimeraFormulaWarn: 2, GraftChimeraNamedWarn: 2, GreekLetterInRank: 2, HTMLTagsEntitiesWarn: 3, HybridCharNoSpaceWarn: 3, HybridFormulaIncompleteWarn: 4, HybridFormulaProbIncompleteWarn: 2, HybridFormulaWarn: 2, HybridNamedWarn: 2, LowCaseWarn: 4, NameApproxWarn: 4, NameComparisonWarn: 4, RankUncommonWarn: 3, SpaceNonStandardWarn: 2, SpanishAndAsSeparator: 2, SpeciesNumericWarn: 3, SubgenusAbbrWarn: 2, SuperspeciesWarn: 2, UTF8ConvBadWarn: 4, UninomialComboWarn: 2, UninomialWithRank: 2, WhiteSpaceTrailWarn: 2, YearCharWarn: 2, YearDotWarn: 2, YearMisplacedWarn: 3, YearOrigMisplacedWarn: 3, YearPageWarn: 2, YearParensWarn: 2, YearQuestionWarn: 2, YearRangeWarn: 3, YearSqBracketsWarn: 3, }
WarningQualityMap assigns quality of parsing for each warning type.
Functions ¶
func NormalizeByType ¶ added in v1.5.4
NormalizeByType is useful when searching for a word alone. In such cases specific epithets will match better when stemmed, authors and genera low-cased, authors with stripped periods.
The wrd argument supposed to be taken from `Words` field of the `Parsed` output.
Types ¶
type Annotation ¶
type Annotation int
Annotations are additional descriptions of a name type.
const ( // NoAnnot is absence of additional descriptions. NoAnnot Annotation = iota // SurrogateAnnot is a miscellaneous informal name. SurrogateAnnot // ComparisonAnnot name with comparison marker (cf.). ComparisonAnnot // ApproximationAnnot is a name with approximation annotation (sp., spp etc.) ApproximationAnnot // BOLDAnnot is a surrogate name created by BOLD project. BOLDAnnot // HybridAnnot is a miscellaneous hybrid name. HybridAnnot // NamedHybridAnnot is a stable hybrid in botany with registered name. NamedHybridAnnot // HybridFormulaAnnot is a hybrid created by combination of 2 or more names. HybridFormulaAnnot // NothoHybridAnnot is a hybrid with notho- 'ranks'. NothoHybridAnnot // GraftChimeraAnnot is a miscellaneous graft-chimera name. GraftChimeraAnnot // GraftChimeraFormulatAnnot is a graft-chimera created by the combination of 2 or more names GraftChimeraFormulaAnnot // NamedGraftChimeraAnnot is a stable graft-chimera in botany with registered name. NamedGraftChimeraAnnot )
func (Annotation) MarshalJSON ¶
func (a Annotation) MarshalJSON() ([]byte, error)
MarshalJSON implements json.Marshaler. It will encode null if this Int is null.
func (Annotation) String ¶
func (a Annotation) String() string
String is an implementation of fmt.Stringer interface.
func (*Annotation) UnmarshalJSON ¶
func (a *Annotation) UnmarshalJSON(bs []byte) error
UnmarshalJSON implements json.Unmarshaller.
type Approximation ¶
type Approximation struct { // Genus is the genus of a name. Genus string `json:"genus"` // Species is a specific epithet of a name. Species string `json:"species,omitempty"` // Cultivar is a value of a cultivar of a binomial. Cultivar string `json:"cultivar,omitempty"` // SpeciesAuthorship the authorship of Species. SpeciesAuthorship *Authorship `json:"authorship,omitempty"` // ApproxMarker describes what kind of approximation it is (sp., spp. etc.). ApproxMarker string `json:"approximationMarker,omitempty"` // Part of a name after ApproxMarker. Ignored string `json:"ignored,omitempty"` }
Approximation are details for a surrogate approximation name.
type AuthGroup ¶
type AuthGroup struct { // Authors is a slice of strings containing found outhors Authors []string `json:"authors"` // Year provided only if "with_details=true" Year of the original // publication. If a range of the years provided, the start year is kept, // with isApproximate flag set to true. Year *Year `json:"year,omitempty"` // ExAuthors provided only if "with_details=true" A "special" group of // authors, that sometimes appear in scientific names after "ex" // qualifier. ExAuthors *Authors `json:"exAuthors,omitempty"` // InAuthors provided only if "with_details=true" A "special" group of // authors, that sometimes appear in scientific names after "in" // qualifier. InAuthors *Authors `json:"inAuthors,omitempty"` // EmendAuthors provided only if "with_details=true" A "special" group of // authors, that sometimes appear in scientific names after "emend." // qualifier. EmendAuthors *Authors `json:"emendAuthors,omitempty"` }
AuthGroup are provided only if config.WithDetails is true. Group of authors belonging to a particular nomenclatural event. We distinguish two possible situations when AuthGroup is used:
- original - authors of the original description of a name:w - combination - authors of a new combination, rank etc.
type Authors ¶
type Authors struct { // Authors is a slice of strings containing found outhors of an AuthGroup Authors []string `json:"authors"` // Year of publication by the AuthGroup. Year *Year `json:"year,omitempty"` }
Authors contains information about authors and a year of publication.
type Authorship ¶
type Authorship struct { // Verbatim is an authorship string without modifications. Verbatim string `json:"verbatim"` // Normalized is a normalized value of the authorship. Normalized string `json:"normalized"` // Year is a string representing a year of original description of the name. // The year number is surrounded by parentheses "(1758)", in cases when a // year is approximate. Year string `json:"year,omitempty"` // Authors is a slice containing each author as an element. Authors []string `json:"authors,omitempty"` // Original is an AuthGroup that contains authors of the original // description of a name. Original *AuthGroup `json:"originalAuth,omitempty"` // Combination is an AuthGroup that contains authors of new combination, // rank etc. Combination *AuthGroup `json:"combinationAuth,omitempty"` }
Authorship describes provided metainformation about authors of a name. Sometimes authorship is provided for several elements of a name, for example in "Agalinis purpurea (L.) Briton var. borealis (Berg.) Peterson 1987"
The authorship provided outside of "details" section belongs to the most fine-grained element of a name ("var. borealis" for the example above).
type Canonical ¶
type Canonical struct { // Stemmed is the most "normalized" and simplified version of the name. // Species epithets are stripped of suffixes, "j" character converted to "i", // "v" character converted to "u" according to "Schinke R, Greengrass M, // Robertson AM and Willett P (1996)" // // It is most useful to match names when a variability in suffixes is // possible. Stemmed string `json:"stemmed"` // Simple is a simplified version of a name where some elements like ranks, // or hybrid signs "×" are omitted (hybrid signs are present for hybrid // formulas). // // It is most useful to match names in general. Simple string `json:"simple"` // Full is a canonical form that keeps hybrid signs "×" for named // hybrids and shows infra-specific ranks. // // It is most useful for detection of the best matches from // multiple results. It is also recommended for displaying // canonical forms of botanical names. Full string `json:"full"` }
Canonical are simplified forms of a name-string more suitable for matching and comparing name-strings than the verbatim version.
type Comparison ¶
type Comparison struct { // Genus is used if no species information is given Genus string `json:"genus,omitempty"` // Species are details for the binomial part of a name. *Species // InfraSpecies is an infraspecific epthet of a name. InfraSpecies *InfraspeciesElem `json:"infraspecies,omitempty"` // CompMarker, usually "cf.". CompMarker string `json:"comparisonMarker"` }
Comparison are details for a surrogate comparison name.
type Details ¶
type Details interface {
// contains filtered or unexported methods
}
Details is a placeholder interface that allows to unify details of various name types.
type DetailsApproximation ¶
type DetailsApproximation struct { // Approximation details. Approximation Approximation `json:"approximation"` }
DetailsApproximation are details for approximation surrogate names.
type DetailsComparison ¶
type DetailsComparison struct { // Comparison details. Comparison Comparison `json:"comparison"` }
DetailsComparison are details for comparison surrogate names.
type DetailsGraftChimeraFormula ¶ added in v1.5.0
type DetailsGraftChimeraFormula struct {
GraftChimeraFormula []Details `json:"graftChimeraFormula"`
}
DetailsGraftChimeraFormula are details for a graft-chimera formula names.
type DetailsHybridFormula ¶
type DetailsHybridFormula struct {
HybridFormula []Details `json:"hybridFormula"`
}
DetailsHybridFormula are details for a hybrid formula names.
type DetailsInfraspecies ¶
type DetailsInfraspecies struct { // Infraspecies details. Infraspecies Infraspecies `json:"infraspecies"` }
DetailsInfraspecies are multinomial details.
type DetailsSpecies ¶
type DetailsSpecies struct { // Species is details for binomial names. Species Species `json:"species"` }
DetailsSpecies are binomial details.
type DetailsUninomial ¶
type DetailsUninomial struct { // Uninomial details. Uninomial Uninomial `json:"uninomial"` }
DetailsUninomial are Uninomial details.
type Infraspecies ¶
type Infraspecies struct { // Species are details for the binomial part of a name. Species // Infraspecies is a slice of infraspecific epithets of a name. Infraspecies []InfraspeciesElem `json:"infraspecies,omitempty"` }
Infraspecies are details for names with cardinality higher than 2.
type InfraspeciesElem ¶
type InfraspeciesElem struct { // Value of an infraspecific epithet. Value string `json:"value"` // Rank of the infraspecific epithet. Rank string `json:"rank,omitempty"` // Authorship of the infraspecific epithet. Authorship *Authorship `json:"authorship,omitempty"` }
InfraspeciesElem are details for an infraspecific epithet of an Infraspecies name.
type Parsed ¶
type Parsed struct { // Parsed is false if parsing did not succeed. Parsed bool `json:"parsed"` // NomCode modifies parsing rules according to provided nomenclatural code. NomCode string `json:"nomenclaturalCode,omitempty"` // ParseQuality is a number that represents the quality of the // parsing. // // 0 - name-string is not parseable // 1 - no parsing problems encountered // 2 - small parsing problems // 3 - serious parsing problems // 4 - severe problems, name could not be parsed completely // // The ParseQuality is equal to the quality of the most // severe warning (see qualityWarnings). If no problems // are encountered, and the parsing succeeded, the parseQuality // is set to 1. If parsing failed, the parseQuality is 0.� ParseQuality int `json:"quality"` // QualityWarnings contains encountered parsing problems. QualityWarnings []QualityWarning `json:"qualityWarnings,omitempty"` // Verbatim is input name-string without modifications. Verbatim string `json:"verbatim"` // Normalized is a normalized version of the input name-string. Normalized string `json:"normalized,omitempty"` // Canonical are simplified versions of a name-string more suitable for // matching and comparing name-strings than the verbatim version. Canonical *Canonical `json:"canonical,omitempty"` // Cardinality allows to sort, partition names according to number of // elements in their canonical forms. // // 0 - cardinality cannot be calculated // 1 - uninomial // 2 - binomial // 3 - trinomial // 4 - quadrinomial Cardinality int `json:"cardinality"` // Rank provides information about the rank of the name. It is not // always possible to infer rank correctly, so this field will be // omitted when the data for it does not exist. Rank string `json:"rank,omitempty"` // Authorship describes provided metainformation about authors of a name. // This authorship provided outside of Details belongs to // the most fine-grained element of a name. Authorship *Authorship `json:"authorship,omitempty"` // Bacteria is not nil if the input name has a genus // that is registered as bacterial. Possible // values are "maybe" - if the genus has homonyms in other groups // and "yes" if GNparser dictionary does not detect any homonyms // // The bacterial names often contain strain information which are // not parseable and are placed into the "tail" field. Bacteria *tb.Tribool `json:"bacteria,omitempty"` // Candidatus indicates that the parsed string is a candidatus bacterial name. Candidatus bool `json:"candidatus,omitempty"` // Virus is set to true in case if name is not parsed, and probably // belongs to a wide variety of sub-cellular entities like // // - viruses // - plasmids // - prions // - RNA // - DNA // // Viruses are the vast majority in this group of names, // as a result they gave (very imprecise) name to // the field. // // We do plan to create a parser for viruses at some point, // which will expand this group into more precise categories. Virus bool `json:"virus,omitempty"` // Cultivar is true if a name was parsed as a cultivar. Cultivar bool `json:"cultivar,omitempty"` // DaggerChar if true if a name-string includes '†' rune. // This rune might mean a fossil, or be indication of the clade extinction. DaggerChar bool `json:"daggerChar,omitempty"` // Hybrid is not nil if a name is detected as one of the hybrids // // - a non-categorized hybrid // - named hybrid // - notho- hybrid // - hybrid formula Hybrid *Annotation `json:"hybrid,omitempty"` // GraftChimera is not nil if a name is detected as one of the graft chimeras // // - a non-categorized graft chimera // - named graft chimera // - graft chimera formula GraftChimera *Annotation `json:"graftchimera,omitempty"` // - a non-categorized surrogates // - surrogate names from BOLD project // - comparisons (Homo cf. sapiens) // - approximations (names for specimen that not fully identified) Surrogate *Annotation `json:"surrogate,omitempty"` // Tail is an unparseable tail of a name. It might contain "junk", // annotations, malformed parts of a scientific name, taxonomic concept // indications, bacterial strains etc. If there is an unparseable tail, the // quality of the name-parsing is set to the worst category. Tail string `json:"tail,omitempty"` // Details contain more fine-grained information about parsed name. Details Details `json:"details,omitempty"` // Words contain description of every parsed word of a name. Words []Word `json:"words,omitempty"` // VerbatimID is a UUID v5 generated from the verbatim value of the // input name-string. Every unique string always generates the same // UUID. VerbatimID string `json:"id"` // ParserVersion is the version number of the GNparser. ParserVersion string `json:"parserVersion"` }
Parsed is the result of a scientific name-string parsing. It can be converted into JSON or CSV formats.
func (*Parsed) RestoreAmbiguous ¶ added in v1.5.1
RestoreAmbiguous method is used for cases where specific or infra-specific epithets had to be changed to be parsed sucessfully. Such situation arises when an epithet is the same as some word that is also an annotation, a prefix/suffix of an author name etc.
type ParsedWithIdx ¶
ParsedWithIdx structure contains parsing output, its place in the slice, and an unexpected error, if it happened during the parsing.
func (ParsedWithIdx) Index ¶
func (pr ParsedWithIdx) Index() int
func (ParsedWithIdx) Unpack ¶
func (pr ParsedWithIdx) Unpack(v interface{}) error
type QualityWarning ¶
QualityWarning is and object that contains the warning and its corresponding quality.
func Map ¶
func Map(ws []Warning) []QualityWarning
Map converts slice of warnings to a slice of QualityWarning structures.
type Species ¶
type Species struct { // Genus is a value of a genus of a binomial. Genus string `json:"genus"` // Subgenus is a value of subgenus of binomial. Subgenus string `json:"subgenus,omitempty"` // Species is a value of a specific epithet. Species string `json:"species"` // Cultivar is a value of a cultivar of a binomial. Cultivar string `json:"cultivar,omitempty"` // Authorship of the binomial. Authorship *Authorship `json:"authorship,omitempty"` }
Species are details for binomial names with cardinality 2.
type Uninomial ¶
type Uninomial struct { // Value is the uninomial name. Value string `json:"uninomial"` // Rank of the uninomial in a combination name, for example // "Pereskia subg. Maihuenia Philippi ex F.A.C.Weber, 1898" Rank string `json:"rank,omitempty"` // Cultivar is a value of a cultivar of a uninomial. Cultivar string `json:"cultivar,omitempty"` // Parent of a uninomial in a combination name. Parent string `json:"parent,omitempty"` // Authorship of the uninomial. Authorship *Authorship `json:"authorship,omitempty"` }
Uninomial are details for names with cardinality 1.
type Warning ¶
type Warning int
Warning is a type to represent warnings found during parsing of a scientific name.
const ( TailWarn Warning = iota ApostrOtherWarn AuthAmbiguousFiliusWarn AuthDoubleParensWarn AuthEmendWarn AuthEmendWithoutDotWarn AuthExWarn AuthInWarn AuthExWithDotWarn AuthInWithDotWarn AuthMissingOneParensWarn AuthQuestionWarn AuthShortWarn AuthUnknownWarn AuthUpperCaseWarn BacteriaMaybeWarn BotanyAuthorNotSubgenWarn CandidatusName CanonicalApostropheWarn CapWordQuestionWarn CharBadWarn ContainsIgnoredAnnotation CultivarEpithetWarn DashOtherWarn DotEpithetWarn GenusAbbrWarn GenusUpperCharAfterDash GraftChimeraCharNoSpaceWarn GraftChimeraFormulaIncompleteWarn GraftChimeraFormulaProbIncompleteWarn GraftChimeraFormulaWarn GraftChimeraNamedWarn GreekLetterInRank HTMLTagsEntitiesWarn HybridCharNoSpaceWarn HybridFormulaIncompleteWarn HybridFormulaProbIncompleteWarn HybridFormulaWarn HybridNamedWarn LowCaseWarn NameApproxWarn NameComparisonWarn RankUncommonWarn SpaceNonStandardWarn SpanishAndAsSeparator SpeciesNumericWarn SubgenusAbbrWarn SuperspeciesWarn UTF8ConvBadWarn UninomialComboWarn UninomialWithRank WhiteSpaceTrailWarn YearCharWarn YearDotWarn YearMisplacedWarn YearOrigMisplacedWarn YearPageWarn YearParensWarn YearQuestionWarn YearRangeWarn YearSqBracketsWarn )
func (Warning) MarshalJSON ¶
MarshalJSON implements json.Marshaler. It will encode null if this Int is null.
func (Warning) NewQualityWarning ¶
func (w Warning) NewQualityWarning() QualityWarning
NewQualityWarning creates new QualityWarning object.
func (Warning) Quality ¶
Quality returns parsing quality number that corresponds to a particular warning.
func (*Warning) UnmarshalJSON ¶
UnmarshalJSON implements json.Unmarshaller.
type Word ¶
type Word struct { // Verbatim is unmodified value of a word. Verbatim string `json:"verbatim"` // Normalized is normalized value of a word. Normalized string `json:"normalized"` // Type is a semantic meaning of a word. Type WordType `json:"wordType"` // Start is the index of the first letter of a word. Start int `json:"start"` // End is the index of the end of a word. End int `json:"end"` }
Word represents a parsed word and its meaning in the name-string.
type WordType ¶
type WordType int
WordType designates semantic meaning of a word.
func (WordType) MarshalJSON ¶
MarshalJSON implements json.Marshaler.
func (*WordType) UnmarshalJSON ¶
UnmarshalJSON implements json.Unmarshaller.
type Year ¶
type Year struct { // Value is a string value of a year. Value string `json:"year"` // IsApproximate is indication if the year was written as approximate. // Approximate year might be represented by a range of years, by // a question mark "188?", by parentheses "(1888)". IsApproximate bool `json:"isApproximate,omitempty"` }
Year provided only if "with_details=true" Year of the original publication. If a range of the years provided, the start year is kept, with isApproximate flag set to true.