parsed

package
v1.9.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 1, 2024 License: MIT Imports: 7 Imported by: 4

Documentation

Overview

Package parsed provides a user-friendly output of parsing result, as well as functions to convert the result to CSV or JSON-encoded strings.

Index

Constants

This section is empty.

Variables

WarningQualityMap assigns quality of parsing for each warning type.

Functions

func HeaderCSV

func HeaderCSV(f gnfmt.Format) string

HeadersCSV returns the CSV header for parsing output.

func NormalizeByType added in v1.5.4

func NormalizeByType(wrd string, wt WordType) string

NormalizeByType is useful when searching for a word alone. In such cases specific epithets will match better when stemmed, authors and genera low-cased, authors with stripped periods.

The wrd argument supposed to be taken from `Words` field of the `Parsed` output.

Types

type Annotation

type Annotation int

Annotations are additional descriptions of a name type.

const (
	// NoAnnot is absence of additional descriptions.
	NoAnnot Annotation = iota
	// SurrogateAnnot is a miscellaneous informal name.
	SurrogateAnnot
	// ComparisonAnnot name with comparison marker (cf.).
	ComparisonAnnot
	// ApproximationAnnot is a name with approximation annotation (sp., spp etc.)
	ApproximationAnnot
	// BOLDAnnot is a surrogate name created by BOLD project.
	BOLDAnnot
	// HybridAnnot is a miscellaneous hybrid name.
	HybridAnnot
	// NamedHybridAnnot is a stable hybrid in botany with registered name.
	NamedHybridAnnot
	// HybridFormulaAnnot is a hybrid created by combination of 2 or more names.
	HybridFormulaAnnot
	// NothoHybridAnnot is a hybrid with notho- 'ranks'.
	NothoHybridAnnot
	// GraftChimeraAnnot is a miscellaneous graft-chimera name.
	GraftChimeraAnnot
	// GraftChimeraFormulatAnnot is a graft-chimera created by the combination of 2 or more names
	GraftChimeraFormulaAnnot
	// NamedGraftChimeraAnnot is a stable graft-chimera in botany with registered name.
	NamedGraftChimeraAnnot
)

func (Annotation) MarshalJSON

func (a Annotation) MarshalJSON() ([]byte, error)

MarshalJSON implements json.Marshaler. It will encode null if this Int is null.

func (Annotation) String

func (a Annotation) String() string

String is an implementation of fmt.Stringer interface.

func (*Annotation) UnmarshalJSON

func (a *Annotation) UnmarshalJSON(bs []byte) error

UnmarshalJSON implements json.Unmarshaller.

type Approximation

type Approximation struct {
	// Genus is the genus of a name.
	Genus string `json:"genus"`
	// Species is a specific epithet of a name.
	Species string `json:"species,omitempty"`
	// Cultivar is a value of a cultivar of a binomial.
	Cultivar string `json:"cultivar,omitempty"`
	// SpeciesAuthorship the authorship of Species.
	SpeciesAuthorship *Authorship `json:"authorship,omitempty"`
	// ApproxMarker describes what kind of approximation it is (sp., spp. etc.).
	ApproxMarker string `json:"approximationMarker,omitempty"`
	// Part of a name after ApproxMarker.
	Ignored string `json:"ignored,omitempty"`
}

Approximation are details for a surrogate approximation name.

type AuthGroup

type AuthGroup struct {
	// Authors is a slice of strings containing found outhors
	Authors []string `json:"authors"`
	// Year provided only if "with_details=true" Year of the original
	// publication. If a range of the years provided, the start year is kept,
	// with isApproximate flag set to true.
	Year *Year `json:"year,omitempty"`
	// ExAuthors provided only if "with_details=true" A "special" group of
	// authors, that sometimes appear in scientific names after "ex"
	// qualifier.
	ExAuthors *Authors `json:"exAuthors,omitempty"`
	// ExAuthors provided only if "with_details=true" A "special" group of
	// authors, that sometimes appear in scientific names after "emend."
	// qualifier.
	EmendAuthors *Authors `json:"emendAuthors,omitempty"`
}

AuthGroup are provided only if config.WithDetails is true. Group of authors belonging to a particular nomenclatural event. We distinguish two possible situations when AuthGroup is used:

- original - authors of the original description of a name:w - combination - authors of a new combination, rank etc.

type Authors

type Authors struct {
	// Authors is a slice of strings containing found outhors of an AuthGroup
	Authors []string `json:"authors"`
	// Year of publication by the AuthGroup.
	Year *Year `json:"year,omitempty"`
}

Authors contains information about authors and a year of publication.

type Authorship

type Authorship struct {
	// Verbatim is an authorship string without modifications.
	Verbatim string `json:"verbatim"`
	// Normalized is a normalized value of the authorship.
	Normalized string `json:"normalized"`
	// Year is a string representing a year of original description of the name.
	// The year number is surrounded by parentheses "(1758)", in cases when a
	// year is approximate.
	Year string `json:"year,omitempty"`
	// Authors is a slice containing each author as an element.
	Authors []string `json:"authors,omitempty"`
	// Original is an AuthGroup that contains authors of the original
	// description of a name.
	Original *AuthGroup `json:"originalAuth,omitempty"`
	// Combination is an AuthGroup that contains authors of new combination,
	// rank etc.
	Combination *AuthGroup `json:"combinationAuth,omitempty"`
}

Authorship describes provided metainformation about authors of a name. Sometimes authorship is provided for several elements of a name, for example in "Agalinis purpurea (L.) Briton var. borealis (Berg.) Peterson 1987"

The authorship provided outside of "details" section belongs to the most fine-grained element of a name ("var. borealis" for the example above).

type Canonical

type Canonical struct {
	// Stemmed is the most "normalized" and simplified version of the name.
	// Species epithets are stripped of suffixes, "j" character converted to "i",
	// "v" character converted to "u" according to "Schinke R, Greengrass M,
	// Robertson AM and Willett P (1996)"
	//
	// It is most useful to match names when a variability in suffixes is
	// possible.
	Stemmed string `json:"stemmed"`
	// Simple is a simplified version of a name where some elements like ranks,
	// or hybrid signs "×" are omitted (hybrid signs are present for hybrid
	// formulas).
	//
	// It is most useful to match names in general.
	Simple string `json:"simple"`
	// Full is a canonical form that keeps hybrid signs "×" for named
	// hybrids and shows infra-specific ranks.
	//
	// It is most useful for detection of the best matches from
	// multiple results. It is also recommended for displaying
	// canonical forms of botanical names.
	Full string `json:"full"`
}

Canonical are simplified forms of a name-string more suitable for matching and comparing name-strings than the verbatim version.

type Comparison

type Comparison struct {
	// Genus is used if no species information is given
	Genus string `json:"genus,omitempty"`

	// Species are details for the binomial part of a name.
	*Species

	// InfraSpecies is an infraspecific epthet of a name.
	InfraSpecies *InfraspeciesElem `json:"infraspecies,omitempty"`

	// CompMarker, usually "cf.".
	CompMarker string `json:"comparisonMarker"`
}

Comparison are details for a surrogate comparison name.

type Details

type Details interface {
	// contains filtered or unexported methods
}

Details is a placeholder interface that allows to unify details of various name types.

type DetailsApproximation

type DetailsApproximation struct {
	// Approximation details.
	Approximation Approximation `json:"approximation"`
}

DetailsApproximation are details for approximation surrogate names.

type DetailsComparison

type DetailsComparison struct {
	// Comparison details.
	Comparison Comparison `json:"comparison"`
}

DetailsComparison are details for comparison surrogate names.

type DetailsGraftChimeraFormula added in v1.5.0

type DetailsGraftChimeraFormula struct {
	GraftChimeraFormula []Details `json:"graftChimeraFormula"`
}

DetailsGraftChimeraFormula are details for a graft-chimera formula names.

type DetailsHybridFormula

type DetailsHybridFormula struct {
	HybridFormula []Details `json:"hybridFormula"`
}

DetailsHybridFormula are details for a hybrid formula names.

type DetailsInfraspecies

type DetailsInfraspecies struct {
	// Infraspecies details.
	Infraspecies Infraspecies `json:"infraspecies"`
}

DetailsInfraspecies are multinomial details.

type DetailsSpecies

type DetailsSpecies struct {
	// Species is details for binomial names.
	Species Species `json:"species"`
}

DetailsSpecies are binomial details.

type DetailsUninomial

type DetailsUninomial struct {
	// Uninomial details.
	Uninomial Uninomial `json:"uninomial"`
}

DetailsUninomial are Uninomial details.

type Infraspecies

type Infraspecies struct {
	// Species are details for the binomial part of a name.
	Species
	// Infraspecies is a slice of infraspecific epithets of a name.
	Infraspecies []InfraspeciesElem `json:"infraspecies,omitempty"`
}

Infraspecies are details for names with cardinality higher than 2.

type InfraspeciesElem

type InfraspeciesElem struct {
	// Value of an infraspecific epithet.
	Value string `json:"value"`
	// Rank of the infraspecific epithet.
	Rank string `json:"rank,omitempty"`
	// Authorship of the infraspecific epithet.
	Authorship *Authorship `json:"authorship,omitempty"`
}

InfraspeciesElem are details for an infraspecific epithet of an Infraspecies name.

type Parsed

type Parsed struct {
	// Parsed is false if parsing did not succeed.
	Parsed bool `json:"parsed"`

	// ParseQuality is a number that represents the quality of the
	// parsing.
	//
	//  0 - name-string is not parseable
	//  1 - no parsing problems encountered
	//  2 - small parsing problems
	//  3 - serious parsing problems
	//  4 - severe problems, name could not be parsed completely
	//
	// The ParseQuality is equal to the quality of the most
	// severe warning (see qualityWarnings). If no problems
	// are encountered, and the parsing succeeded, the parseQuality
	// is set to 1. If parsing failed, the parseQuality is 0.�
	ParseQuality int `json:"quality"`

	// QualityWarnings contains encountered parsing problems.
	QualityWarnings []QualityWarning `json:"qualityWarnings,omitempty"`

	// Verbatim is input name-string without modifications.
	Verbatim string `json:"verbatim"`

	// Normalized is a normalized version of the input name-string.
	Normalized string `json:"normalized,omitempty"`

	// Canonical are simplified versions of a name-string more suitable for
	// matching and comparing name-strings than the verbatim version.
	Canonical *Canonical `json:"canonical,omitempty"`

	// Cardinality allows to sort, partition names according to number of
	// elements in their canonical forms.
	//
	// 0 - cardinality cannot be calculated
	// 1 - uninomial
	// 2 - binomial
	// 3 - trinomial
	// 4 - quadrinomial
	Cardinality int `json:"cardinality"`

	// Authorship describes provided metainformation about authors of a name.
	// This authorship provided outside of Details belongs to
	// the most fine-grained element of a name.
	Authorship *Authorship `json:"authorship,omitempty"`

	// Bacteria is not nil if the input name has a genus
	// that is registered as bacterial. Possible
	// values are "maybe" - if the genus has homonyms in other groups
	// and "yes" if GNparser dictionary does not detect any homonyms
	//
	// The bacterial names often contain strain information which are
	// not parseable and are placed into the "tail" field.
	Bacteria *tb.Tribool `json:"bacteria,omitempty"`

	// Virus is set to true in case if name is not parsed, and probably
	// belongs to a wide variety of sub-cellular entities like
	//
	// - viruses
	// - plasmids
	// - prions
	// - RNA
	// - DNA
	//
	// Viruses are the vast majority in this group of names,
	// as a result they gave (very imprecise) name to
	// the field.
	//
	// We do plan to create a parser for viruses at some point,
	// which will expand this group into more precise categories.
	Virus bool `json:"virus,omitempty"`

	// DaggerChar if true if a name-string includes '†' rune.
	// This rune might mean a fossil, or be indication of the clade extinction.
	DaggerChar bool `json:"daggerChar,omitempty"`

	// Hybrid is not nil if a name is detected as one of the hybrids
	//
	// - a non-categorized hybrid
	// - named hybrid
	// - notho- hybrid
	// - hybrid formula
	Hybrid *Annotation `json:"hybrid,omitempty"`

	// GraftChimera is not nil if a name is detected as one of the graft chimeras
	//
	// - a non-categorized graft chimera
	// - named graft chimera
	// - graft chimera formula
	GraftChimera *Annotation `json:"graftchimera,omitempty"`

	// - a non-categorized surrogates
	// - surrogate names from BOLD project
	// - comparisons (Homo cf. sapiens)
	// - approximations (names for specimen that not fully identified)
	Surrogate *Annotation `json:"surrogate,omitempty"`

	// Tail is an unparseable tail of a name. It might contain "junk",
	// annotations, malformed parts of a scientific name, taxonomic concept
	// indications, bacterial strains etc.  If there is an unparseable tail, the
	// quality of the name-parsing is set to the worst category.
	Tail string `json:"tail,omitempty"`

	// Details contain more fine-grained information about parsed name.
	Details Details `json:"details,omitempty"`

	// Words contain description of every parsed word of a name.
	Words []Word `json:"words,omitempty"`

	// VerbatimID is a UUID v5 generated from the verbatim value of the
	// input name-string. Every unique string always generates the same
	// UUID.
	VerbatimID string `json:"id"`

	// ParserVersion is the version number of the GNparser.
	ParserVersion string `json:"parserVersion"`
}

Parsed is the result of a scientific name-string parsing. It can be converted into JSON or CSV formats.

func (Parsed) Output

func (p Parsed) Output(f gnfmt.Format) string

Output creates a JSON or CSV representation of Parsed results.

func (*Parsed) RestoreAmbiguous added in v1.5.1

func (p *Parsed) RestoreAmbiguous(epithet, subst string)

RestoreAmbiguous method is used for cases where specific or infra-specific epithets had to be changed to be parsed sucessfully. Such situation arises when an epithet is the same as some word that is also an annotation, a prefix/suffix of an author name etc.

type ParsedWithIdx

type ParsedWithIdx struct {
	Idx    int
	Parsed Parsed
	Error  error
}

ParsedWithIdx structure contains parsing output, its place in the slice, and an unexpected error, if it happened during the parsing.

func (ParsedWithIdx) Index

func (pr ParsedWithIdx) Index() int

func (ParsedWithIdx) Unpack

func (pr ParsedWithIdx) Unpack(v interface{}) error

type QualityWarning

type QualityWarning struct {
	Quality int     `json:"quality"`
	Warning Warning `json:"warning"`
}

QualityWarning is and object that contains the warning and its corresponding quality.

func Map

func Map(ws []Warning) []QualityWarning

Map converts slice of warnings to a slice of QualityWarning structures.

type Species

type Species struct {
	// Genus is a value of a genus of a binomial.
	Genus string `json:"genus"`
	// Subgenus is a value of subgenus of binomial.
	Subgenus string `json:"subgenus,omitempty"`
	// Species is a value of a specific epithet.
	Species string `json:"species"`
	// Cultivar is a value of a cultivar of a binomial.
	Cultivar string `json:"cultivar,omitempty"`
	// Authorship of the binomial.
	Authorship *Authorship `json:"authorship,omitempty"`
}

Species are details for binomial names with cardinality 2.

type Uninomial

type Uninomial struct {
	// Value is the uninomial name.
	Value string `json:"uninomial"`
	// Rank of the uninomial in a combination name, for example
	// "Pereskia subg. Maihuenia Philippi ex F.A.C.Weber, 1898"
	Rank string `json:"rank,omitempty"`
	// Cultivar is a value of a cultivar of a uninomial.
	Cultivar string `json:"cultivar,omitempty"`
	// Parent of a uninomial in a combination name.
	Parent string `json:"parent,omitempty"`
	// Authorship of the uninomial.
	Authorship *Authorship `json:"authorship,omitempty"`
}

Uninomial are details for names with cardinality 1.

type Warning

type Warning int

Warning is a type to represent warnings found during parsing of a scientific name.

const (
	TailWarn Warning = iota
	ApostrOtherWarn
	AuthAmbiguousFiliusWarn
	AuthDoubleParensWarn
	AuthEmendWarn
	AuthEmendWithoutDotWarn
	AuthExWarn
	AuthExWithDotWarn
	AuthMissingOneParensWarn
	AuthQuestionWarn
	AuthShortWarn
	AuthUnknownWarn
	AuthUpperCaseWarn
	BacteriaMaybeWarn
	BotanyAuthorNotSubgenWarn
	CandidatusName
	CanonicalApostropheWarn
	CapWordQuestionWarn
	CharBadWarn
	ContainsIgnoredAnnotation
	CultivarEpithetWarn
	DashOtherWarn
	DotEpithetWarn
	GenusAbbrWarn
	GenusUpperCharAfterDash
	GraftChimeraCharNoSpaceWarn
	GraftChimeraFormulaIncompleteWarn
	GraftChimeraFormulaProbIncompleteWarn
	GraftChimeraFormulaWarn
	GraftChimeraNamedWarn
	GreekLetterInRank
	HTMLTagsEntitiesWarn
	HybridCharNoSpaceWarn
	HybridFormulaIncompleteWarn
	HybridFormulaProbIncompleteWarn
	HybridFormulaWarn
	HybridNamedWarn
	LowCaseWarn
	NameApproxWarn
	NameComparisonWarn
	RankUncommonWarn
	SpaceNonStandardWarn
	SpanishAndAsSeparator
	SpeciesNumericWarn
	SubgenusAbbrWarn
	SuperspeciesWarn
	UTF8ConvBadWarn
	UninomialComboWarn
	WhiteSpaceTrailWarn
	YearCharWarn
	YearDotWarn
	YearMisplacedWarn
	YearOrigMisplacedWarn
	YearPageWarn
	YearParensWarn
	YearQuestionWarn
	YearRangeWarn
	YearSqBracketsWarn
)

func (Warning) MarshalJSON

func (w Warning) MarshalJSON() ([]byte, error)

MarshalJSON implements json.Marshaler. It will encode null if this Int is null.

func (Warning) NewQualityWarning

func (w Warning) NewQualityWarning() QualityWarning

NewQualityWarning creates new QualityWarning object.

func (Warning) Quality

func (w Warning) Quality() int

Quality returns parsing quality number that corresponds to a particular warning.

func (Warning) String

func (w Warning) String() string

String implements fmt.Stringer interface.

func (*Warning) UnmarshalJSON

func (w *Warning) UnmarshalJSON(bs []byte) error

UnmarshalJSON implements json.Unmarshaller.

type Word

type Word struct {
	// Verbatim is unmodified value of a word.
	Verbatim string `json:"verbatim"`
	// Normalized is normalized value of a word.
	Normalized string `json:"normalized"`
	// Type is a semantic meaning of a word.
	Type WordType `json:"wordType"`
	// Start is the index of the first letter of a word.
	Start int `json:"start"`
	// End is the index of the end of a word.
	End int `json:"end"`
}

Word represents a parsed word and its meaning in the name-string.

type WordType

type WordType int

WordType designates semantic meaning of a word.

const (
	UnknownType WordType = iota
	ComparisonMarkerType
	CultivarType
	ApproxMarkerType
	AuthorWordType
	AuthorWordFiliusType
	CandidatusType
	GenusType
	InfraspEpithetType
	HybridCharType
	GraftChimeraCharType
	RankType
	SpEpithetType
	SubgenusType
	SuperspType
	UninomialType
	YearApproximateType
	YearType
)

func (WordType) MarshalJSON

func (wt WordType) MarshalJSON() ([]byte, error)

MarshalJSON implements json.Marshaler.

func (WordType) String

func (wt WordType) String() string

String is an implementation of fmt.Stringer interface.

func (*WordType) UnmarshalJSON

func (wt *WordType) UnmarshalJSON(bs []byte) error

UnmarshalJSON implements json.Unmarshaller.

type Year

type Year struct {
	// Value is a string value of a year.
	Value string `json:"year"`
	// IsApproximate is indication if the year was written as approximate.
	// Approximate year might be represented by a range of years, by
	// a question mark "188?", by parentheses "(1888)".
	IsApproximate bool `json:"isApproximate,omitempty"`
}

Year provided only if "with_details=true" Year of the original publication. If a range of the years provided, the start year is kept, with isApproximate flag set to true.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL