html-text-chunker

command module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 14, 2019 License: GPL-3.0 Imports: 4 Imported by: 0

README

Actions Status

Public repository

html-text-chunker

Description

When using HTML as rich text you might have the issue that a certain API you want to use only supports a certain amount of characters.

That's why this this project tries to split the text into fragments/chunks which are small enough to be used with such APIs.

This project tries to fulfill the following requirements:

  • every chunk contains less than CHUNK_SIZE characters
  • we can modify and reassemble all chunks to valid HTML
  • the linguistic context of sentences should not be destroyed if possible
  • should work with HTML strings
  • should also work with non-HTML strings
  • (gracefully ignore broken HTML)

Future considerations:

  • extract alt tags separately if requested

Commands

Run benchmarks:

go test -bench=. -benchtime=10s ./chunk/

Run tests:

go test -v ./chunk/

Integrate this project into your project:

go get -t github.com/Staffbase/html-text-chunker

package main
import (
	richText "github.com/Staffbase/html-text-chunker/chunk"
	"fmt"
	"strings"
)

const CHUNK_SIZE = 1337

func foo(text string) {	
	chunker := richText.NewChunkedRichText(text, CHUNK_SIZE, false)
	chunker.MakeChunks()
	
	// do some meaningful stuff
	for idx, part := range chunker.TextParts {
		// each part.Text has the maximum length of CHUNK_SIZE
		// you can modify part.Text and replace the dom node
		fmt.Printf("Processing text part %d\n", idx)
		part.Text = bar(part.Text)
	}
	
	newText := chunker.Finish()
	fmt.Print(newText)
}

func bar(input string) string{
	if len(input) > CHUNK_SIZE {
		panic("The string is too long!")
	}
	return strings.ToUpper(input)
}

After this run newText will contain the given HTML markup with the modified parts from bar().

Update the dependency:

go get -u=patch github.com/Staffbase/html-text-chunker

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL