corpus

package
v0.1.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 15, 2024 License: GPL-3.0 Imports: 9 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrResourceNotFound = errors.New("resource not found")
)

Functions

This section is empty.

Types

type CorporaSetup

type CorporaSetup struct {

	// RegistryDir specifies a root directory where all the
	// required corpora registry (= configuration) files are
	// located.
	RegistryDir string `json:"registryDir"`

	// MaximumRecords specifies max. number of records returned
	// in a "searchRetrieve" search. In case of MQuery, this is
	// also limited by its internals to `MaxRecordsInternalLimit`
	MaximumRecords int `json:"maximumRecords"`

	// MaximumContext specifies max. number of tokens left/right from hit
	MaximumContext int `json:"maximumContext"`

	// Resources is a description of configured corpora/resources
	Resources SrchResources `json:"resources"`
}

CorporaSetup defines mquery application configuration related to a corpus

func (*CorporaSetup) GetRegistryPath

func (cs *CorporaSetup) GetRegistryPath(corpusID string) string

func (*CorporaSetup) ValidateAndDefaults

func (cs *CorporaSetup) ValidateAndDefaults(confContext string) error

type CorpusSetup

type CorpusSetup struct {
	ID  string `json:"id"`
	PID string `json:"pid"`

	// language mappings
	FullName    map[string]string `json:"fullName"`    // section required, "en" required
	Description map[string]string `json:"description"` // section optional, "en" required

	// languages used in resource - ISO 639-3 three letter language codes
	Languages []string `json:"languages"`

	URI              string           `json:"uri"`
	PosAttrs         []PosAttr        `json:"posAttrs"`
	StructureMapping StructureMapping `json:"structureMapping"`

	// ViewContextStruct is a structure used to specify "units"
	// for KWIC left and right context. Typically, this is
	// a structure representing a sentence or a speach.
	ViewContextStruct string `json:"viewContextStruct"`

	KontextBacklinkRootURL string `json:"kontextBacklinkRootURL"`
}

CorpusSetup is a complete corpus configuration (it is part of MQuery-SRU configuration)

func (*CorpusSetup) GetBasicSearchAttrs

func (cs *CorpusSetup) GetBasicSearchAttrs() []string

GetBasicSearchAttrs provides all the basic search attrs

func (*CorpusSetup) GetDefinedLayers

func (cs *CorpusSetup) GetDefinedLayers() *collections.Set[LayerType]

GetDefinedLayers returns all the layers defined for the corpus

func (*CorpusSetup) GetDefinedLayersAsRefString

func (cs *CorpusSetup) GetDefinedLayersAsRefString() string

GetDefinedLayersAsRefString provides all the layers defined for the corpus formatted as a single string (this is required in SRU XML)

func (*CorpusSetup) GetLayerDefault

func (cs *CorpusSetup) GetLayerDefault(ln LayerType) PosAttr

GetLayerDefault provides default positional attribute for a specified layer.

func (*CorpusSetup) Validate

func (ls *CorpusSetup) Validate(confContext string) error

Validate validates corpus setup. This should be run as part of server startup (i.e. before any requests start)

type LayerType

type LayerType string

LayerType is a layer above positional attributes combining similar attribute types (e.g. annotation). In Manatee, no such thing is defined but it is nevertheless supported via configuration of corpora in MQuery-SRU where each positional attribute belongs to a specific layer.

const (
	LayerTypeText     LayerType = "text"
	LayerTypeLemma    LayerType = "lemma"
	LayerTypePOS      LayerType = "pos"
	LayerTypeOrth     LayerType = "orth"
	LayerTypeNorm     LayerType = "norm"
	LayerTypePhonetic LayerType = "phonetic"

	DefaultLayerType = LayerTypeText

	// ExplainOpNumberOfRecords is a value we currently don't understand
	// well...
	// TODO what is this value for in the "explain" operation?
	ExplainOpNumberOfRecords = 25
)

func (LayerType) GetResultID

func (name LayerType) GetResultID() string

func (LayerType) Validate

func (name LayerType) Validate() error

type PosAttr

type PosAttr struct {
	ID   string `json:"id"`
	Name string `json:"name"`

	// Layer defines a layer the attribute is attached to
	// (this is not supported directly by Manatee so it
	// is configured and supported in MQuery-SRU)
	Layer LayerType `json:"layer"`

	// IsBasicSearchAttr defines whether the attribute is
	// used as a search attr in basic query
	IsBasicSearchAttr bool `json:"isBasicSearchAttr"`

	// IsLayerDefault defines whether the attribute is
	// used as a default one when querying its layer.
	// (e.g. the `word` attribute is typically set as
	// the default for the `text` layer)
	IsLayerDefault bool `json:"isLayerDefault"`
}

PosAttr represents a corpus positional attribute

type SrchResources

type SrchResources []*CorpusSetup

SrchResources is a configuration of all the enabled corpora.

func (SrchResources) GetCommonLayers

func (sr SrchResources) GetCommonLayers() []LayerType

func (SrchResources) GetCommonPosAttrNames

func (sr SrchResources) GetCommonPosAttrNames(corpusName ...string) ([]string, error)

GetCommonPosAttrNames is the same as GetCommonPosAttrs but it returns just a list of attribute names.

func (SrchResources) GetCommonPosAttrs

func (sr SrchResources) GetCommonPosAttrs(corpusNames ...string) ([]PosAttr, error)

GetCommonPosAttrs returns positional attributes common to provided corpora. The attribute of the text layer which is set as default will be listed always first, the rest is sorted alphabetically.

func (SrchResources) GetCommonPosAttrs2 added in v0.0.9

func (sr SrchResources) GetCommonPosAttrs2() []PosAttr

GetCommonPosAttrs2 returns positional attributes common to defined corpora, it can not return error like GetCommonPosAttrs

func (SrchResources) GetCorpora

func (sr SrchResources) GetCorpora() []string

func (SrchResources) GetResource added in v0.0.9

func (sr SrchResources) GetResource(ID string) (*CorpusSetup, error)

func (SrchResources) GetResourceByPID added in v0.1.5

func (sr SrchResources) GetResourceByPID(PID string) (*CorpusSetup, error)

GetResourceByPID in case a resource with PID does not exist, ErrResourceNotFound is returned

func (SrchResources) Validate

func (sr SrchResources) Validate(confContext string) error

Validate validates all the corpora configurations. This should be run during server startup.

type StructureMapping

type StructureMapping struct {
	SentenceStruct  string `json:"sentenceStruct"`
	UtteranceStruct string `json:"utteranceStruct"`
	ParagraphStruct string `json:"paragraphStruct"`
	TurnStruct      string `json:"turnStruct"`
	TextStruct      string `json:"textStruct"`
	SessionStruct   string `json:"sessionStruct"`
}

StructureMapping provides mapping between custom corpus structures and FCS-QL generic structures (paragraph, sentence, utterance,...)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL