Documentation
¶
Overview ¶
Package sentences implements Unicode sentence boundaries: https://unicode.org/reports/tr29/#Sentence_Boundaries
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewScanner ¶ added in v1.0.3
NewScanner returns a Scanner, to tokenize sentences per https://unicode.org/reports/tr29/#Sentence_Boundaries. Iterate through sentences by calling Scan() until false, then check Err(). See also the bufio.Scanner docs.
Example ¶
package main import ( "fmt" "log" "strings" "github.com/clipperhouse/uax29/sentences" ) func main() { text := "Hello, 世界. “Nice dog! 👍🐶”, they said." reader := strings.NewReader(text) scanner := sentences.NewScanner(reader) // Scan returns true until error or EOF for scanner.Scan() { fmt.Printf("%q\n", scanner.Text()) } // Gotta check the error! if err := scanner.Err(); err != nil { log.Fatal(err) } }
Output: "Hello, 世界. " "“Nice dog! " "👍🐶”, they said."
func NewSegmenter ¶ added in v1.7.0
NewSegmenter retuns a Segmenter, which is an iterator over the source text. Iterate while Next() is true, and access the segmented sentences via Bytes().
Example ¶
package main import ( "fmt" "log" "github.com/clipperhouse/uax29/sentences" ) func main() { text := []byte("Hello, 世界. “Nice dog! 👍🐶”, they said.") segments := sentences.NewSegmenter(text) // Scan returns true until error or EOF for segments.Next() { fmt.Printf("%q\n", segments.Bytes()) } // Gotta check the error! if err := segments.Err(); err != nil { log.Fatal(err) } }
Output: "Hello, 世界. " "“Nice dog! " "👍🐶”, they said."
func SegmentAll ¶ added in v1.7.0
SegmentAll will iterate through all tokens and collect them into a [][]byte. This is a convenience method -- if you will be allocating such a slice anyway, this will save you some code. The downside is that this allocation is unbounded -- O(n) on the number of tokens. Use Segmenter for more bounded memory usage.
Example ¶
package main import ( "fmt" "github.com/clipperhouse/uax29/sentences" ) func main() { text := []byte("Hello, 世界. “Nice dog! 👍🐶”, they said.") segments := sentences.SegmentAll(text) fmt.Printf("%q\n", segments) }
Output: ["Hello, 世界. " "“Nice dog! " "👍🐶”, they said."]
Types ¶
This section is empty.