html-text-chunker

command module

v1.0.1 Latest Latest Go to latest Published: May 14, 2019 License: GPL-3.0 Imports: 4 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/Staffbase/html-text-chunker

README ¶

Public repository

html-text-chunker

Description

When using HTML as rich text you might have the issue that a certain API you want to use only supports a certain amount of characters.

That's why this this project tries to split the text into fragments/chunks which are small enough to be used with such APIs.

This project tries to fulfill the following requirements:

every chunk contains less than CHUNK_SIZE characters
we can modify and reassemble all chunks to valid HTML
the linguistic context of sentences should not be destroyed if possible
should work with HTML strings
should also work with non-HTML strings
(gracefully ignore broken HTML)

Future considerations:

extract alt tags separately if requested

Commands

Run benchmarks:

go test -bench=. -benchtime=10s ./chunk/

Run tests:

go test -v ./chunk/

Integrate this project into your project:

go get -t github.com/Staffbase/html-text-chunker

package main
import (
	richText "github.com/Staffbase/html-text-chunker/chunk"
	"fmt"
	"strings"
)

const CHUNK_SIZE = 1337

func foo(text string) {	
	chunker := richText.NewChunkedRichText(text, CHUNK_SIZE, false)
	chunker.MakeChunks()
	
	// do some meaningful stuff
	for idx, part := range chunker.TextParts {
		// each part.Text has the maximum length of CHUNK_SIZE
		// you can modify part.Text and replace the dom node
		fmt.Printf("Processing text part %d\n", idx)
		part.Text = bar(part.Text)
	}
	
	newText := chunker.Finish()
	fmt.Print(newText)
}

func bar(input string) string{
	if len(input) > CHUNK_SIZE {
		panic("The string is too long!")
	}
	return strings.ToUpper(input)
}

After this run newText will contain the given HTML markup with the modified parts from bar().

Update the dependency:

go get -u=patch github.com/Staffbase/html-text-chunker

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
chunk

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL