conllx

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 29, 2018 License: BSD-3-Clause Imports: 7 Imported by: 5

README

Introduction

GoDoc Report card Build Status

This is a package for reading and writing CONLL-X files in Go.

Installation

This package can be installed with the go command:

go get gopkg.in/danieldk/conllx.v1

The package documentation is available at: https://godoc.org/gopkg.in/danieldk/conllx.v1

Documentation

Overview

Package conllx provides readers and a writer for the CoNLL-X format.

More information about CONLL-X can be found at: http://ilk.uvt.nl/conll/

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Features

type Features struct {
	// contains filtered or unexported fields
}

Features from the CONLL-X features field.

func (*Features) FeaturesMap

func (f *Features) FeaturesMap() map[string]string

FeaturesMap returns the token features as a key-value mapping. Features that do not follow the expected format are skipped.

The feature map is lazily initialized on its first call. No feature field parsing is done if this method is not called.

func (*Features) FeaturesString

func (f *Features) FeaturesString() string

FeaturesString returns the token features as a string. This will give feature in exactly the same format as the original CONLL-X data.

type FoldSet

type FoldSet map[int]interface{}

A FoldSet contains fold numbers. This type is used with a SplittingReader to indicate from which folds sentences should be returned.

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

A Reader for CONLL-X files.

func NewReader

func NewReader(r *bufio.Reader) *Reader

NewReader creates a new CoNLL-X reader from a buffered I/O reader. The caller is responsible for closing the provided reader.

func (*Reader) ReadSentence

func (r *Reader) ReadSentence() (sentence Sentence, err error)

ReadSentence returns the next sentence. If there is no more data that can be read, io.EOF is returned as the error.

The returned Sentence slice is only valid until the next call of ReadSentence. If you need to retain a sentence accross calls, it is safe to make a copy.

type Sentence

type Sentence []Token

A Sentence is a slice of Tokens.

func (Sentence) String

func (s Sentence) String() string

type SentenceReader

type SentenceReader interface {
	ReadSentence() (sentence Sentence, err error)
}

A SentenceReader reads CoNLL-X sentences.

type SentenceWriter

type SentenceWriter interface {
	WriteSentence(sentence Sentence) error
}

A SentenceWriter writes CoNLL-X sentences.

type SplittingReader

type SplittingReader struct {
	// contains filtered or unexported fields
}

SplittingReader is a wrapper around a (CoNLL-X) Reader that splits the corpus into folds.

func NewSplittingReader

func NewSplittingReader(reader *Reader, nFolds int, folds FoldSet) (*SplittingReader, error)

NewSplittingReader creates a SplittingReader, that splits the data in 'nFolds' folds. The reader returns the sentences that are in 'folds'.

func (*SplittingReader) ReadSentence

func (r *SplittingReader) ReadSentence() (sentence Sentence, err error)

ReadSentence returns the next sentence that is in one of the folds requested from the SplittingReader.

type Token

type Token struct {
	// contains filtered or unexported fields
}

Token stores a token with the CONLL-X annotation layers.

func NewToken

func NewToken() *Token

NewToken creates a new Token with all layers set to absent.

Note that although the Sentence type used by readers and writers is a slice of Token as a value type, this constructor returns a pointer. This is intentional: the token constructor returns a pointer to permit token construction via the builder pattern.

Example
// Construct a token using the builder-pattern
token := NewToken().SetForm("apples").SetLemma("apple").SetPosTag("noun")

// Append the token to a sentence.
var sent Sentence
sent = append(sent, *token)

fmt.Println(sent)
Output:

1	apples	apple	_	noun	_	_	_	_	_

func (*Token) CoarsePosTag

func (t *Token) CoarsePosTag() (string, bool)

CoarsePosTag returns the coarse-grained POS tag of the token, the second tuple element is false when there is no coarse-grained tag stored in this token.

func (*Token) Features

func (t *Token) Features() (*Features, bool)

Features returns the features field, the second tuple element is false when there are no features stored in this token.

Example
input := `1	test	_	_	_	f1:v1|f2:v2`
strReader := strings.NewReader(input)
reader := NewReader(bufio.NewReader(strReader))

sent, err := reader.ReadSentence()
if err != nil {
	log.Fatal("Error reading sentence")
}

features, ok := sent[0].Features()
if !ok {
	log.Fatal("Token should have features")
}

fmt.Println(features.FeaturesMap()["f1"])
fmt.Println(features.FeaturesMap()["f2"])
Output:

v1
v2

func (*Token) Form

func (t *Token) Form() (string, bool)

Form returns the form (the actual token), the second tuple element is false when there is no form stored in this token.

func (*Token) Head

func (t *Token) Head() (uint, bool)

Head returns the head of the token, the second tuple element is false when there is no head stored in this token.

func (*Token) HeadRel

func (t *Token) HeadRel() (string, bool)

HeadRel returns the relation of the token to its head, the second tuple element is false when there is no head relation stored in this token.

func (*Token) Lemma

func (t *Token) Lemma() (string, bool)

Lemma returns the lemma of the token, the second tuple element is false when there is no lemma stored in this token.

func (*Token) PHead

func (t *Token) PHead() (uint, bool)

PHead returns the projective head of the token, the second tuple element is false when there is no head stored in this token.

func (*Token) PHeadRel

func (t *Token) PHeadRel() (string, bool)

PHeadRel returns the relation of the token to its projective head, the second tuple element is false when there is no head relation stored in this token.

func (*Token) PosTag

func (t *Token) PosTag() (string, bool)

PosTag returns the fine-grained POS tag of the token, the second tuple element is false when there is no fine-grained tag stored in this token.

func (*Token) SetCoarsePosTag

func (t *Token) SetCoarsePosTag(coarsePosTag string) *Token

SetCoarsePosTag sets the coarse-grained POS tag for this token. The token itself is returned to allow method chaining.

func (*Token) SetFeatures

func (t *Token) SetFeatures(features map[string]string) *Token

SetFeatures sets the features for this token. The token itself is returned to allow method chaining.

Example
token := NewToken().SetFeatures(map[string]string{
	"num":   "sg",
	"tense": "past",
})

features, _ := token.Features()

fmt.Println(features.FeaturesMap()["num"])
fmt.Println(features.FeaturesMap()["tense"])
Output:

sg
past

func (*Token) SetForm

func (t *Token) SetForm(form string) *Token

SetForm sets the form for this token. The token itself is returned to allow method chaining.

func (*Token) SetHead

func (t *Token) SetHead(head uint) *Token

SetHead sets the head of this token. The token itself is returned to allow method chaining.

func (*Token) SetHeadRel

func (t *Token) SetHeadRel(rel string) *Token

SetHeadRel sets the relation to the head of this token. The token itself is returned to allow method chaining.

func (*Token) SetLemma

func (t *Token) SetLemma(lemma string) *Token

SetLemma sets the lemma for this token. The token itself is returned to allow method chaining.

func (*Token) SetPHead

func (t *Token) SetPHead(head uint) *Token

SetPHead sets the projective head of this token. The token itself is returned to allow method chaining.

func (*Token) SetPHeadRel

func (t *Token) SetPHeadRel(rel string) *Token

SetPHeadRel sets the relation to the projective head of this token. The token itself is returned to allow method chaining.

func (*Token) SetPosTag

func (t *Token) SetPosTag(posTag string) *Token

SetPosTag sets the fine-grained POS tag for this token. The token itself is returned to allow method chaining.

func (Token) String

func (t Token) String() string

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer writes sentences in CoNLL-X format.

Example
var buf bytes.Buffer
writer := NewWriter(&buf)
writer.WriteSentence(
	Sentence{
		*NewToken().SetForm("Hello").SetPosTag("expr"),
		*NewToken().SetForm("world").SetPosTag("noun")})
writer.WriteSentence(
	Sentence{
		*NewToken().SetForm("Go").SetPosTag("name"),
		*NewToken().SetForm("rocks").SetPosTag("verb")})

fmt.Println(buf.String())
Output:

1	Hello	_	_	expr	_	_	_	_	_
2	world	_	_	noun	_	_	_	_	_

1	Go	_	_	name	_	_	_	_	_
2	rocks	_	_	verb	_	_	_	_	_

func NewWriter

func NewWriter(w io.Writer) *Writer

NewWriter creates a new writer.

func (*Writer) WriteSentence

func (w *Writer) WriteSentence(sentence Sentence) error

WriteSentence writes a sentences in CoNLL-X format. For annotation layers that are absent in a token underscores (_) are written.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL