diffmatchpatch

package
v0.14.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 29, 2024 License: Apache-2.0, MIT Imports: 13 Imported by: 0

Documentation

Overview

Package diffmatchpatch offers robust algorithms to perform the operations required for synchronizing plain text.

Index

Constants

View Source
const (
	// Sep1 signifies the start of a conflict.
	Sep1 = "<<<<<<<"
	// Sep2 signifies the middle of a conflict.
	Sep2 = "======="
	// Sep3 signifies the end of a conflict.
	Sep3 = ">>>>>>>"
)
View Source
const FOUR_BYTE_BITS = 21
View Source
const ONE_BYTE_BITS = 7

These constants define the number of bits representable in 1,2,3,4 byte utf8 sequences, respectively.

View Source
const THREE_BYTE_BITS = 16
View Source
const TWO_BYTE_BITS = 11
View Source
const UNICODE_INVALID_RANGE_DELTA = UNICODE_INVALID_RANGE_END - UNICODE_INVALID_RANGE_START + 1
View Source
const UNICODE_INVALID_RANGE_END = 0xDFFF
View Source
const UNICODE_INVALID_RANGE_START = 0xD800
View Source
const UNICODE_RANGE_MAX = 0x10FFFF

Variables

This section is empty.

Functions

func Merge

func Merge(ctx context.Context, o, a, b string, labelO, labelA, labelB string) (string, bool, error)

Merge: Built-in text merging implementation. TODO: ignore CRLF --> LF ???

Types

type Diff

type Diff struct {
	Type Operation
	Text string
}

Diff represents one diff operation

type DiffMatchPatch

type DiffMatchPatch struct {
	// Number of seconds to map a diff before giving up (0 for infinity).
	DiffTimeout time.Duration
	// Cost of an empty edit operation in terms of edit characters.
	DiffEditCost int
	// How far to search for a match (0 = exact location, 1000+ = broad match). A match this many characters away from the expected location will add 1.0 to the score (0.0 is a perfect match).
	MatchDistance int
	// When deleting a large block of text (over ~64 characters), how close do the contents have to be to match the expected contents. (0.0 = perfection, 1.0 = very loose).  Note that MatchThreshold controls how closely the end points of a delete need to match.
	PatchDeleteThreshold float64
	// Chunk size for context length.
	PatchMargin int
	// The number of bits in an int.
	MatchMaxBits int
	// At what point is no match declared (0.0 = perfection, 1.0 = very loose).
	MatchThreshold float64
}

DiffMatchPatch holds the configuration for diff-match-patch operations.

func New

func New() *DiffMatchPatch

New creates a new DiffMatchPatch object with default parameters.

func (*DiffMatchPatch) DiffBisect

func (dmp *DiffMatchPatch) DiffBisect(text1, text2 string, deadline time.Time) []Diff

DiffBisect finds the 'middle snake' of a diff, split the problem in two and return the recursively constructed diff. If an invalid UTF-8 sequence is encountered, it will be replaced by the Unicode replacement character. See Myers 1986 paper: An O(ND) Difference Algorithm and Its Variations.

func (*DiffMatchPatch) DiffCharsToLines

func (dmp *DiffMatchPatch) DiffCharsToLines(diffs []Diff, lineMap LineMap) []Diff

DiffCharsToLines rehydrates the text in a diff from a string of line hashes to real lines of text.

func (*DiffMatchPatch) DiffCleanupEfficiency

func (dmp *DiffMatchPatch) DiffCleanupEfficiency(diffs []Diff) []Diff

DiffCleanupEfficiency reduces the number of edits by eliminating operationally trivial equalities.

func (*DiffMatchPatch) DiffCleanupMerge

func (dmp *DiffMatchPatch) DiffCleanupMerge(diffs []Diff) []Diff

DiffCleanupMerge reorders and merges like edit sections. Merge equalities. Any edit section can move as long as it doesn't cross an equality.

func (*DiffMatchPatch) DiffCleanupSemantic

func (dmp *DiffMatchPatch) DiffCleanupSemantic(diffs []Diff) []Diff

DiffCleanupSemantic reduces the number of edits by eliminating semantically trivial equalities.

func (*DiffMatchPatch) DiffCleanupSemanticLossless

func (dmp *DiffMatchPatch) DiffCleanupSemanticLossless(diffs []Diff) []Diff

DiffCleanupSemanticLossless looks for single edits surrounded on both sides by equalities which can be shifted sideways to align the edit to a word boundary. E.g: The c<ins>at c</ins>ame. -> The <ins>cat </ins>came.

func (*DiffMatchPatch) DiffCommonOverlap

func (dmp *DiffMatchPatch) DiffCommonOverlap(text1 string, text2 string) int

DiffCommonOverlap determines if the suffix of one string is the prefix of another.

func (*DiffMatchPatch) DiffCommonPrefix

func (dmp *DiffMatchPatch) DiffCommonPrefix(text1, text2 string) int

DiffCommonPrefix determines the common prefix length of two strings.

func (*DiffMatchPatch) DiffCommonSuffix

func (dmp *DiffMatchPatch) DiffCommonSuffix(text1, text2 string) int

DiffCommonSuffix determines the common suffix length of two strings.

func (*DiffMatchPatch) DiffFromDelta

func (dmp *DiffMatchPatch) DiffFromDelta(text1 string, delta string) (diffs []Diff, err error)

DiffFromDelta given the original text1, and an encoded string which describes the operations required to transform text1 into text2, comAdde the full diff.

func (*DiffMatchPatch) DiffHalfMatch

func (dmp *DiffMatchPatch) DiffHalfMatch(text1, text2 string) []string

DiffHalfMatch checks whether the two texts share a substring which is at least half the length of the longer text. This speedup can produce non-minimal diffs.

func (*DiffMatchPatch) DiffLevenshtein

func (dmp *DiffMatchPatch) DiffLevenshtein(diffs []Diff) int

DiffLevenshtein computes the Levenshtein distance that is the number of inserted, deleted or substituted characters.

func (*DiffMatchPatch) DiffLinesToChars

func (dmp *DiffMatchPatch) DiffLinesToChars(text1, text2 string) (string, string, LineMap)

DiffLinesToChars splits two texts into a list of strings, and educes the texts to a string of hashes where each Unicode character represents one line. It's slightly faster to call DiffLinesToRunes first, followed by DiffMainRunes.

Note: since we hash lines to runes, there is an upper limit to the number of unique lines this algorithm can handle. That limit is 1,112,063 unique lines.

func (*DiffMatchPatch) DiffLinesToRunes

func (dmp *DiffMatchPatch) DiffLinesToRunes(text1, text2 string) ([]rune, []rune, LineMap)

DiffLinesToRunes splits two texts into a list of runes.

Note: since we hash lines to runes, there is an upper limit to the number of unique lines this algorithm can handle. That limit is 1,112,063 unique lines.

func (*DiffMatchPatch) DiffMain

func (dmp *DiffMatchPatch) DiffMain(text1, text2 string, checklines bool) []Diff

DiffMain finds the differences between two texts. If an invalid UTF-8 sequence is encountered, it will be replaced by the Unicode replacement character.

Note: if checklines is true, the limitation noted in DiffLinesToChars applies

func (*DiffMatchPatch) DiffMainRunes

func (dmp *DiffMatchPatch) DiffMainRunes(text1, text2 []rune, checklines bool) []Diff

DiffMainRunes finds the differences between two rune sequences. If an invalid UTF-8 sequence is encountered, it will be replaced by the Unicode replacement character.

Note: if checklines is true, the limitation noted in DiffLinesToRunes applies

func (*DiffMatchPatch) DiffPrettyHtml

func (dmp *DiffMatchPatch) DiffPrettyHtml(diffs []Diff) string

DiffPrettyHtml converts a []Diff into a pretty HTML report. It is intended as an example from which to write one's own display functions.

func (*DiffMatchPatch) DiffPrettyText

func (dmp *DiffMatchPatch) DiffPrettyText(diffs []Diff) string

DiffPrettyText converts a []Diff into a colored text report.

func (*DiffMatchPatch) DiffText1

func (dmp *DiffMatchPatch) DiffText1(diffs []Diff) string

DiffText1 computes and returns the source text (all equalities and deletions).

func (*DiffMatchPatch) DiffText2

func (dmp *DiffMatchPatch) DiffText2(diffs []Diff) string

DiffText2 computes and returns the destination text (all equalities and insertions).

func (*DiffMatchPatch) DiffToDelta

func (dmp *DiffMatchPatch) DiffToDelta(diffs []Diff) string

DiffToDelta crushes the diff into an encoded string which describes the operations required to transform text1 into text2. E.g. =3\t-2\t+ing -> Keep 3 chars, delete 2 chars, insert 'ing'. Operations are tab-separated. Inserted text is escaped using %xx notation.

func (*DiffMatchPatch) DiffXIndex

func (dmp *DiffMatchPatch) DiffXIndex(diffs []Diff, loc int) int

DiffXIndex returns the equivalent location in s2.

func (*DiffMatchPatch) MatchAlphabet

func (dmp *DiffMatchPatch) MatchAlphabet(pattern string) map[byte]int

MatchAlphabet initialises the alphabet for the Bitap algorithm.

func (*DiffMatchPatch) MatchBitap

func (dmp *DiffMatchPatch) MatchBitap(text, pattern string, loc int) int

MatchBitap locates the best instance of 'pattern' in 'text' near 'loc' using the Bitap algorithm. Returns -1 if no match was found.

func (*DiffMatchPatch) MatchMain

func (dmp *DiffMatchPatch) MatchMain(text, pattern string, loc int) int

MatchMain locates the best instance of 'pattern' in 'text' near 'loc'. Returns -1 if no match found.

func (*DiffMatchPatch) Merge

func (dmp *DiffMatchPatch) Merge(textO, textA, textB string, labelA, labelB string) (string, bool, error)

Merge: Run a three-way text merge

	Link: https://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf

 input:
 A = [1, 4, 5, 2, 3, 6]
 O = [1, 2, 3, 4, 5, 6]
 B = [1, 2, 4, 5, 3, 6]

 maximum matches:

 .---.---.---.---.---.---.---.---.---.
 | A | 1 | 4 | 5 | 2 | 3 |   |   | 6 |
 :---+---+---+---+---+---+---+---+---:
 | O | 1 |   |   | 2 | 3 | 4 | 5 | 6 |
 '---'---'---'---'---'---'---'---'---'

 .---.---.---.---.---.---.---.---.
 | A | 1 | 2 | 3 | 4 | 5 |   | 6 |
 :---+---+---+---+---+---+---+---:
 | O | 1 | 2 |   | 4 | 5 | 3 | 6 |
 '---'---'---'---'---'---'---'---'

 diff3 parse:
 .---.---.-----.---.-------.----.
 | A | 1 | 4,5 | 2 |     3 |  6 |
 :---+---+-----+---+-------+----:
 | O | 1 |     | 2 | 3,4,5 | ,5 |
 :---+---+-----+---+-------+----:
 | B | 1 |     | 2 | 4,5,3 |  6 |
 '---'---'-----'---'-------'----'

 calculated output:
 .----.---.------.---.-------.---.
 | A' | 1 |  4,5 | 2 |     3 | 6 |
 :----+---+------+---+-------+---:
 | O' | 1 |  4,5 | 2 | 3,4,5 | 6 |
 :----+---+------+---+-------+---:
 | B' | 1 |  4,5 | 2 | 4,5,3 | 6 |
 '----'---'------'---'-------'---'

 printed output:
 1
 4
 5
 2
 <<<<<<< A
 3
 ||||||| O
 3
 4
 5
 =======
 4
 5
 3
 >>>>>>> B
 6

func (*DiffMatchPatch) MergeLinesToRunes

func (dmp *DiffMatchPatch) MergeLinesToRunes(textO, textA, textB string) *MergeRunes

func (*DiffMatchPatch) PatchAddContext

func (dmp *DiffMatchPatch) PatchAddContext(patch Patch, text string) Patch

PatchAddContext increases the context until it is unique, but doesn't let the pattern expand beyond MatchMaxBits.

func (*DiffMatchPatch) PatchAddPadding

func (dmp *DiffMatchPatch) PatchAddPadding(patches []Patch) string

PatchAddPadding adds some padding on text start and end so that edges can match something. Intended to be called only from within patchApply.

func (*DiffMatchPatch) PatchApply

func (dmp *DiffMatchPatch) PatchApply(patches []Patch, text string) (string, []bool)

PatchApply merges a set of patches onto the text. Returns a patched text, as well as an array of true/false values indicating which patches were applied.

func (*DiffMatchPatch) PatchDeepCopy

func (dmp *DiffMatchPatch) PatchDeepCopy(patches []Patch) []Patch

PatchDeepCopy returns an array that is identical to a given an array of patches.

func (*DiffMatchPatch) PatchFromText

func (dmp *DiffMatchPatch) PatchFromText(textline string) ([]Patch, error)

PatchFromText parses a textual representation of patches and returns a List of Patch objects.

func (*DiffMatchPatch) PatchMake

func (dmp *DiffMatchPatch) PatchMake(opt ...any) []Patch

PatchMake computes a list of patches.

func (*DiffMatchPatch) PatchSplitMax

func (dmp *DiffMatchPatch) PatchSplitMax(patches []Patch) []Patch

PatchSplitMax looks through the patches and breaks up any which are longer than the maximum limit of the match algorithm. Intended to be called only from within patchApply.

func (*DiffMatchPatch) PatchToText

func (dmp *DiffMatchPatch) PatchToText(patches []Patch) string

PatchToText takes a list of patches and returns a textual representation.

type LineMap

type LineMap map[rune]string

LineMap is a mapping from a line hash to its text.

type MergeRunes

type MergeRunes struct {
	Lines          LineMap
	StrIndexArrayO []rune
	StrIndexArrayA []rune
	StrIndexArrayB []rune
}

type Operation

type Operation int8

Operation defines the operation of a diff item.

const (
	// DiffDelete item represents a delete diff.
	DiffDelete Operation = -1
	// DiffInsert item represents an insert diff.
	DiffInsert Operation = 1
	// DiffEqual item represents an equal diff.
	DiffEqual Operation = 0
)

func (Operation) String

func (i Operation) String() string

type Patch

type Patch struct {
	Start1  int
	Start2  int
	Length1 int
	Length2 int
	// contains filtered or unexported fields
}

Patch represents one patch operation.

func (*Patch) String

func (p *Patch) String() string

String emulates GNU diff's format. Header: @@ -382,8 +481,9 @@ Indices are printed as 1-based, not 0-based.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL