Documentation ¶
Overview ¶
Package ggl contains an implementation of lex.Lexer using cloud.google.com/natural-language/. GLexer builds a graph of tokens from the API response, encapsulated in the Lexer interface.
Index ¶
- Constants
- Variables
- func NewDocument(source []byte) (doc *lex.Document, err error)
- type GLexer
- func (t *GLexer) GetDocument() *lex.Document
- func (t *GLexer) GetExecTime() time.Duration
- func (t *GLexer) Init(ctx context.Context, source *lex.Document) error
- func (t *GLexer) InitWithClient(ctx context.Context, client *gnl.Client, source *lex.Document) error
- func (t *GLexer) Next() (*lex.Token, error)
Constants ¶
const MaxTextSizeInBytes = 1000000
MaxTextSizeInBytes is the maximum size of text that can be passed to GNL API
Variables ¶
var ByteOrderMark = []byte{0xEF, 0xBB, 0xBF} //nolint
ByteOrderMark is used to detect a BOM in the source text, which isn't allowed by GNL API
var SupportedEncodingTypes = map[string]gnlpb.EncodingType{ "UTF-8": gnlpb.EncodingType_UTF8, "UTF-16": gnlpb.EncodingType_UTF16, "UTF-16BE": gnlpb.EncodingType_UTF16, "UTF-16LE": gnlpb.EncodingType_UTF16, "UTF-32": gnlpb.EncodingType_UTF32, "UTF-32BE": gnlpb.EncodingType_UTF32, "UTF-32LE": gnlpb.EncodingType_UTF32, }
SupportedEncodingTypes describes the encodings permitted by GNL API Maps from IANA detected charset to the protobuf allowed charsets enum
Functions ¶
Types ¶
type GLexer ¶
type GLexer struct {
// contains filtered or unexported fields
}
GLexer implements Lexer using cloud.google.com/natural-language/.
func NewInitialisedGLexer ¶
NewInitialisedGNLLexer is a helper factory that creates and initialises a new GLexer ready for a call to Next(). Init calls the remote API cloud.google.com/natural-language/ and will block for a period dependent on source string length. A new context for the request is created based on the given timeout.
func (*GLexer) GetDocument ¶
GetDocument implements lex.Lexer
func (*GLexer) GetExecTime ¶
GetExecTime implements lex.Lexer
func (*GLexer) Init ¶
Init calls cloud.google.com/natural-language/ using the source txt to load data ready for Lexer.Next. Given Init makes a network call, builds and walks a token graph it may be slow to return.
func (*GLexer) InitWithClient ¶
func (t *GLexer) InitWithClient(ctx context.Context, client *gnl.Client, source *lex.Document) error
InitWithClient initialises the lexer with a given GCP client in order to give the caller the opportunity to optimise connection management for e.g. Google Cloud Functions