Documentation ¶
Overview ¶
Package conllx provides readers and a writer for the CoNLL-X format.
More information about CONLL-X can be found at: http://ilk.uvt.nl/conll/
Index ¶
- type Features
- type FoldSet
- type Reader
- type Sentence
- type SentenceReader
- type SentenceWriter
- type SplittingReader
- type Token
- func (t *Token) CoarsePosTag() (string, bool)
- func (t *Token) Features() (*Features, bool)
- func (t *Token) Form() (string, bool)
- func (t *Token) Head() (uint, bool)
- func (t *Token) HeadRel() (string, bool)
- func (t *Token) Lemma() (string, bool)
- func (t *Token) PHead() (uint, bool)
- func (t *Token) PHeadRel() (string, bool)
- func (t *Token) PosTag() (string, bool)
- func (t *Token) SetCoarsePosTag(coarsePosTag string) *Token
- func (t *Token) SetFeatures(features map[string]string) *Token
- func (t *Token) SetForm(form string) *Token
- func (t *Token) SetHead(head uint) *Token
- func (t *Token) SetHeadRel(rel string) *Token
- func (t *Token) SetLemma(lemma string) *Token
- func (t *Token) SetPHead(head uint) *Token
- func (t *Token) SetPHeadRel(rel string) *Token
- func (t *Token) SetPosTag(posTag string) *Token
- func (t Token) String() string
- type Writer
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Features ¶
type Features struct {
// contains filtered or unexported fields
}
Features from the CONLL-X features field.
func (*Features) FeaturesMap ¶
FeaturesMap returns the token features as a key-value mapping. Features that do not follow the expected format are skipped.
The feature map is lazily initialized on its first call. No feature field parsing is done if this method is not called.
func (*Features) FeaturesString ¶
FeaturesString returns the token features as a string. This will give feature in exactly the same format as the original CONLL-X data.
type FoldSet ¶
type FoldSet map[int]interface{}
A FoldSet contains fold numbers. This type is used with a SplittingReader to indicate from which folds sentences should be returned.
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
A Reader for CONLL-X files.
func NewReader ¶
NewReader creates a new CoNLL-X reader from a buffered I/O reader. The caller is responsible for closing the provided reader.
func (*Reader) ReadSentence ¶
ReadSentence returns the next sentence. If there is no more data that can be read, io.EOF is returned as the error.
The returned Sentence slice is only valid until the next call of ReadSentence. If you need to retain a sentence accross calls, it is safe to make a copy.
type SentenceReader ¶
A SentenceReader reads CoNLL-X sentences.
type SentenceWriter ¶
A SentenceWriter writes CoNLL-X sentences.
type SplittingReader ¶
type SplittingReader struct {
// contains filtered or unexported fields
}
SplittingReader is a wrapper around a (CoNLL-X) Reader that splits the corpus into folds.
func NewSplittingReader ¶
func NewSplittingReader(reader *Reader, nFolds int, folds FoldSet) (*SplittingReader, error)
NewSplittingReader creates a SplittingReader, that splits the data in 'nFolds' folds. The reader returns the sentences that are in 'folds'.
func (*SplittingReader) ReadSentence ¶
func (r *SplittingReader) ReadSentence() (sentence Sentence, err error)
ReadSentence returns the next sentence that is in one of the folds requested from the SplittingReader.
type Token ¶
type Token struct {
// contains filtered or unexported fields
}
Token stores a token with the CONLL-X annotation layers.
func NewToken ¶
func NewToken() *Token
NewToken creates a new Token with all layers set to absent.
Note that although the Sentence type used by readers and writers is a slice of Token as a value type, this constructor returns a pointer. This is intentional: the token constructor returns a pointer to permit token construction via the builder pattern.
Example ¶
// Construct a token using the builder-pattern token := NewToken().SetForm("apples").SetLemma("apple").SetPosTag("noun") // Append the token to a sentence. var sent Sentence sent = append(sent, *token) fmt.Println(sent)
Output: 1 apples apple _ noun _ _ _ _ _
func (*Token) CoarsePosTag ¶
CoarsePosTag returns the coarse-grained POS tag of the token, the second tuple element is false when there is no coarse-grained tag stored in this token.
func (*Token) Features ¶
Features returns the features field, the second tuple element is false when there are no features stored in this token.
Example ¶
input := `1 test _ _ _ f1:v1|f2:v2` strReader := strings.NewReader(input) reader := NewReader(bufio.NewReader(strReader)) sent, err := reader.ReadSentence() if err != nil { log.Fatal("Error reading sentence") } features, ok := sent[0].Features() if !ok { log.Fatal("Token should have features") } fmt.Println(features.FeaturesMap()["f1"]) fmt.Println(features.FeaturesMap()["f2"])
Output: v1 v2
func (*Token) Form ¶
Form returns the form (the actual token), the second tuple element is false when there is no form stored in this token.
func (*Token) Head ¶
Head returns the head of the token, the second tuple element is false when there is no head stored in this token.
func (*Token) HeadRel ¶
HeadRel returns the relation of the token to its head, the second tuple element is false when there is no head relation stored in this token.
func (*Token) Lemma ¶
Lemma returns the lemma of the token, the second tuple element is false when there is no lemma stored in this token.
func (*Token) PHead ¶
PHead returns the projective head of the token, the second tuple element is false when there is no head stored in this token.
func (*Token) PHeadRel ¶
PHeadRel returns the relation of the token to its projective head, the second tuple element is false when there is no head relation stored in this token.
func (*Token) PosTag ¶
PosTag returns the fine-grained POS tag of the token, the second tuple element is false when there is no fine-grained tag stored in this token.
func (*Token) SetCoarsePosTag ¶
SetCoarsePosTag sets the coarse-grained POS tag for this token. The token itself is returned to allow method chaining.
func (*Token) SetFeatures ¶
SetFeatures sets the features for this token. The token itself is returned to allow method chaining.
Example ¶
token := NewToken().SetFeatures(map[string]string{ "num": "sg", "tense": "past", }) features, _ := token.Features() fmt.Println(features.FeaturesMap()["num"]) fmt.Println(features.FeaturesMap()["tense"])
Output: sg past
func (*Token) SetForm ¶
SetForm sets the form for this token. The token itself is returned to allow method chaining.
func (*Token) SetHead ¶
SetHead sets the head of this token. The token itself is returned to allow method chaining.
func (*Token) SetHeadRel ¶
SetHeadRel sets the relation to the head of this token. The token itself is returned to allow method chaining.
func (*Token) SetLemma ¶
SetLemma sets the lemma for this token. The token itself is returned to allow method chaining.
func (*Token) SetPHead ¶
SetPHead sets the projective head of this token. The token itself is returned to allow method chaining.
func (*Token) SetPHeadRel ¶
SetPHeadRel sets the relation to the projective head of this token. The token itself is returned to allow method chaining.
type Writer ¶
type Writer struct {
// contains filtered or unexported fields
}
Writer writes sentences in CoNLL-X format.
Example ¶
var buf bytes.Buffer writer := NewWriter(&buf) writer.WriteSentence( Sentence{ *NewToken().SetForm("Hello").SetPosTag("expr"), *NewToken().SetForm("world").SetPosTag("noun")}) writer.WriteSentence( Sentence{ *NewToken().SetForm("Go").SetPosTag("name"), *NewToken().SetForm("rocks").SetPosTag("verb")}) fmt.Println(buf.String())
Output: 1 Hello _ _ expr _ _ _ _ _ 2 world _ _ noun _ _ _ _ _ 1 Go _ _ name _ _ _ _ _ 2 rocks _ _ verb _ _ _ _ _
func (*Writer) WriteSentence ¶
WriteSentence writes a sentences in CoNLL-X format. For annotation layers that are absent in a token underscores (_) are written.