Documentation
¶
Index ¶
- type Config
- type Option
- func OptBayesOddsThreshold(f float64) Option
- func OptDataSources(is []int) Option
- func OptFormat(f gnfmt.Format) Option
- func OptIncludeInputText(b bool) Option
- func OptInputTextOnly(b bool) Option
- func OptLanguage(l lang.Language) Option
- func OptTikaURL(s string) Option
- func OptTokensAround(i int) Option
- func OptVerifierURL(s string) Option
- func OptWithAllMatches(b bool) Option
- func OptWithAmbiguousNames(b bool) Option
- func OptWithBayes(b bool) Option
- func OptWithBayesOddsDetails(b bool) Option
- func OptWithOddsAdjustment(b bool) Option
- func OptWithPlainInput(b bool) Option
- func OptWithPositonInBytes(b bool) Option
- func OptWithUniqueNames(b bool) Option
- func OptWithVerification(b bool) Option
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct { // BayesOddsThreshold sets the limit of posterior odds. Everything higher // this limit will be classified as a name. BayesOddsThreshold float64 // Format output format for finding results. Possible formats are // csv - CSV output // compact - JSON in one line // pretty - JSON with new lines and indentations. Format gnfmt.Format // IncludeInputText can be set to true if the user wants to get back the text // used for name-finding. This feature is epspecilly useful if original file // was a PDF, MS Word, HTML etc. and a user wants to use OffsetStart and // OffsetEnd indices to find names in the text. IncludeInputText bool // InputTextOnly can be set to true if the user wants only the UTF8-encoded text // of the file without name-finding. If this option is true, then most of other // options are ignored. InputTextOnly bool // Language that is prevalent in the text. This setting helps to get // a better result for NLP name-finding, because languages differ in their // training patterns. // Currently only the following languages are supported: // // eng - English // deu - German Language lang.Language // LanguageDetected is the code of a language that was detected in text. // It is an empty string, if detection of language is not set. LanguageDetected string // DataSources is a list of data-source IDs used for the // name-verification. These data-sources will always be matched with the // verified names. You can find the list of all data-sources at // https://verifier.globalnames.org/api/v1/data_sources DataSources []int // TikaURL contains the URL of Apache Tika service. This service is used // for extraction of UTF8-encoded texts from a variety of file formats. TikaURL string // TokensAround sets the number of tokens (words) before and after each // name-candidate. These words will be returned with the output. TokensAround int // VerifierURL contains the URL of a name-verification service. VerifierURL string // WithAllMatches sets verification to return all found matches. WithAllMatches bool // WithAmbiguousNames shows ambigous uninomial names when true. WithAmbiguousNames bool // WithBayes determines if both heuristic and Naive Bayes algorithms run // during the name-finnding. // false - only heuristic algorithms run // true - both heuristic and Naive Bayes algorithms run. WithBayes bool // WithBayesOddsDetails show in detail how odds are calculated. WithBayesOddsDetails bool // WithOddsAdjustment can be set to true to adjust calculated odds using the // ratio of scientific names found in text to the number of capitalized // words. WithOddsAdjustment bool // WithPlainInput flag can be set to true if the input is a plain // UTF8-encoded text. In this case file is read directly instead of going // through file type and encoding checking. WithPlainInput bool // WithPositionInBytes can be set to true to receive offsets in number of // bytes instead of UTF-8 characters. WithPositionInBytes bool // WithUniqueNames can be set to true to get a unique list of names. WithUniqueNames bool // WithVerification is true if names should be verified WithVerification bool // APIDoc APIDoc string }
Config is responsible for name-finding operations.
type Option ¶
type Option func(*Config)
Option type for changing GNfinder settings.
func OptBayesOddsThreshold ¶
OptBayesOddsThreshold is an option for name finding, that sets new threshold for results from the Bayes name-finding. All the name candidates that have a higher threshold will appear in the resulting names output.
func OptDataSources ¶
OptDataSources sets data sources that will always be checked during verification process.
func OptIncludeInputText ¶
OptIncludeInputText indicates if to return original UTF8-encoded input.
func OptInputTextOnly ¶
OptInputTextOnly indicates if to return original UTF8-encoded input.
func OptTikaURL ¶
OptTikaURL sets URL for UTF8 text extraction service.
func OptTokensAround ¶
OptTokensAround sets number of tokens rememberred on the left and right side of a name-candidate.
func OptVerifierURL ¶
OptVerifierURL sets URL for verification service.
func OptWithAllMatches ¶
OptWithAllMatches sets WithAllMatches option to return all matches found by verification.
func OptWithAmbiguousNames ¶
OptWithAmbiguousNames sets WithAmbiguousNames option to show ambiguous uninomials and genera.
func OptWithBayes ¶
OptWithBayes is an option that forces running bayes name-finding even when the language is not supported by training sets.
func OptWithBayesOddsDetails ¶
OptWithBayesOddsDetails option to show details of odds calculations.
func OptWithOddsAdjustment ¶
OptWithOddsAdjustment is an option that triggers recalculation of prior odds using number of found names divided by number of all name candidates.
func OptWithPlainInput ¶
OptWithPlainInput sets WithPlainInput option indicating there is no need to check file type and encoding, and the file can be read directly.
func OptWithPositonInBytes ¶
OptWithPositonInBytes is an option that allows to have offsets in number of bytes of number of UTF-8 characters.
func OptWithUniqueNames ¶
OptWithUniqueNames indicates if to return the unique list of names instead of all occurences of names in the text.
func OptWithVerification ¶
OptWithVerification indicates either to run or not to run the verification process after name-finding.