phonetic

module
v1.4.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 1, 2023 License: MIT

README

Phonetic

Set of different phonetic encoders' implementations.

Installion

To install:

$ go get -v github.com/f1monkey/phonetic

Usage

Soundex

The fastest algorithm in this library. Soundex is used to encode words into a phonetic code for matching similar sounding words with different spellings. It was developed for indexing English language names. Wiki page.

Code example:

package main

import (
	"fmt"

	"github.com/f1monkey/phonetic/soundex"
)

func main() {
	e := soundex.NewEncoder()
	result := e.Encode("orange")
	fmt.Println(result)
	// prints: O652
}
Metaphone

The Metaphone encoder converts words into a phonetic code that represents their pronunciation for comparing words based on their phonetic properties, rather than their spelling. The Metaphone encoder was designed for English. Wiki page

Code example

package main

import (
	"fmt"

	"github.com/f1monkey/phonetic/metaphone"
)

func main() {
	e := metaphone.NewEncoder()
	result := e.Encode("orange")
	fmt.Println(result)
	// prints: ORNJ
}
Cologne phonetics

Cologne phonetics (Kölner Phonetik) is a phonetic algorithm used for indexing German words by their sound, allowing for name and word matching in German language databases. Wiki page

Code example:

package main

import (
	"fmt"

	"github.com/f1monkey/phonetic/cologne"
)

func main() {
	e := cologne.NewEncoder()
	result := e.Encode("Großtraktor")
	fmt.Println(result)
	// prints: 47827427
}
Caverphone2

Caverphone2 is a phonetic algorithm used for indexing and matching names, particularly in English and New Zealand languages. Wiki page

package main

import (
	"fmt"

	"github.com/f1monkey/phonetic/caverphone2"
)

func main() {
	e := caverphone2.NewEncoder()
	result := e.Encode("orange")
	fmt.Println(result)
	// prints: ARNK111111
}
Beider-Morse

It's a Go port of the original PHP library BMPM is a phonetic algorithm used for indexing and matching names in multiple languages. Contains a huge amount of different rules to transform a word to it's phonetic representation. Current implementation is relatively slow.

To reduce outcoming binary size, the three rulesets were split into different packages:

  • github.com/f1monkey/phonetic/beidermorse - generic rules (for general usage)
  • github.com/f1monkey/phonetic/beidermorse/beidermorseash - ashkenazi rules
  • github.com/f1monkey/phonetic/beidermorse/beidermorsesep - sephardic rules

Each package contains exact and approx (default) rulesets. To use exact ruleset, you should pass a special option to encoder (see in example).

Code examples:

  • generic ruleset with approx accuracy
    import (
    	"fmt"
    	"github.com/f1monkey/phonetic/beidermorse"
    )
    
    func main() {
    	encoder, _ := beidermorse.NewEncoder()
    	result := encoder.Encode("orange")
    	fmt.Println(result)
    	// prints: [orangi oragi orongi orogi orYngi Yrangi Yrongi YrYngi oranxi oronxi orani oroni oranii oronii oranzi oronzi urangi urongi]
    }
    
  • generic ruleset with exact accuracy
    import (
    	"fmt"
    	"github.com/f1monkey/phonetic/beidermorse"
    )
    
    func main() {
    	encoder, _ := beidermorse.NewEncoder(beidermorse.WithAccuracy(beidermorse.Exact))
    	result := encoder.Encode("orange")
    	fmt.Println(result)
    	// prints: [orange oranxe oranhe oranje oranZe orandZe]
    
    }
    
  • generic ruleset with exact accuracy and english language with buffer reusing (to reduce GC pressure)
    import (
    	"fmt"
    	"github.com/f1monkey/phonetic/beidermorse"
    )
    
    func main() {
    	encoder, err = beidermorse.NewEncoder(
    		beidermorse.WithAccuracy(beidermorse.Exact),
    		beidermorse.WithLang(beidermorse.English),
    		beidermorse.WithBufferReuse(true),
    	)
    	result := encoder.Encode("orange")
    	fmt.Println(result)
    	// prints: [orenk orenge orendS orendZe oronk oronge orondS orondZe orank orange orandS orandZe arenk arenge arendS arendZe aronk aronge arondS arondZe arank arange arandS arandZe]
    
    }
    
  • ashkenazi ruleset with approx accuracy
    	import (
    		"fmt"
    		"github.com/f1monkey/phonetic/beidermorseash"
    	)
    
    	func main() {
    		encoder, _ := beidermorseash.NewEncoder()
    		result := encoder.Encode("orange")
    		fmt.Println(result)
    		// prints: [orangi orongi orYngi Yrangi Yrongi YrYngi oranzi oronzi orani oroni oranxi oronxi urangi urongi]
    	}
    
  • sephardic ruleset with approx accuracy
    	import (
    		"fmt"
    		"github.com/f1monkey/phonetic/beidermorsesep"
    	)
    
    	func main() {
    		encoder, _ := beidermorsesep.NewEncoder()
    		result := encoder.Encode("orange")
    		fmt.Println(result)
    		// prints: [uranzi uranz uranS uranzi uranz uranhi uranh]
    	}
    

Benchmarks

  • Soundex
    goos: linux
    goarch: amd64
    pkg: github.com/f1monkey/phonetic/soundex
    cpu: AMD Ryzen 9 6900HX with Radeon Graphics
    Benchmark_Encoder_Encode-16    	14173989	        99.21 ns/op	       8 B/op	       1 allocs/op
    PASS
    ok  	github.com/f1monkey/phonetic/soundex	1.497s
    
  • Metaphone
    goos: linux
    goarch: amd64
    pkg: github.com/f1monkey/phonetic/metaphone
    cpu: AMD Ryzen 9 6900HX with Radeon Graphics        
    Benchmark_Encoder_Encode-16    	 6451292	       267.1 ns/op	      48 B/op	       3 allocs/op
    PASS
    ok  	github.com/f1monkey/phonetic/metaphone	1.916s
    
  • Cologne phonetics
    goos: linux
    goarch: amd64
    pkg: github.com/f1monkey/phonetic/cologne
    cpu: AMD Ryzen 9 6900HX with Radeon Graphics
    Benchmark_Encoder_Encode-16    	 3737944	       374.8 ns/op	     104 B/op	       3 allocs/op
    PASS
    ok  	github.com/f1monkey/phonetic/cologne	1.729s
    
  • Caverphone2
    goos: linux
    goarch: amd64
    pkg: github.com/f1monkey/phonetic/caverphone2
    cpu: AMD Ryzen 9 6900HX with Radeon Graphics
    Benchmark_Encoder_Encode-16    	 1864532	       641.7 ns/op	      40 B/op	       3 allocs/op
    PASS
    ok  	github.com/f1monkey/phonetic/caverphone2	1.851s
    
  • Beider-Morse
    goos: linux
    goarch: amd64
    pkg: github.com/f1monkey/phonetic/beidermorse
    cpu: AMD Ryzen 9 6900HX with Radeon Graphics
    Benchmark_Encoder_Encode_En_Approx-16                	    5769	    219152 ns/op	   21264 B/op	     146 allocs/op
    Benchmark_Encoder_Encode_En_Exact-16                 	   13203	     82072 ns/op	    9199 B/op	      84 allocs/op
    Benchmark_Encoder_Encode_Ru_Approx-16                	   30060	     54323 ns/op	    6093 B/op	      48 allocs/op
    Benchmark_Encoder_Encode_Ru_Exact-16                 	   37522	     28353 ns/op	    2657 B/op	      26 allocs/op
    
    With buffer reuse:
    goos: linux
    goarch: amd64
    pkg: github.com/f1monkey/phonetic/beidermorse
    cpu: AMD Ryzen 9 6900HX with Radeon Graphics
    Benchmark_Encoder_Encode_BufferReuse_En_Approx-16    	   10000	    129346 ns/op	    6126 B/op	     130 allocs/op
    Benchmark_Encoder_Encode_BufferReuse_En_Exact-16     	   23198	     48813 ns/op	    2297 B/op	      72 allocs/op
    Benchmark_Encoder_Encode_BufferReuse_Ru_Approx-16    	   48902	     29909 ns/op	    1297 B/op	      41 allocs/op
    Benchmark_Encoder_Encode_BufferReuse_Ru_Exact-16     	   65834	     16260 ns/op	     485 B/op	      22 allocs/op
    

Directories

Path Synopsis
THE FOLLOWING CODE WAS GENERATED USING "beidermorse/generate.go" COMMAND.
THE FOLLOWING CODE WAS GENERATED USING "beidermorse/generate.go" COMMAND.
beidermorseash
THE FOLLOWING CODE WAS GENERATED USING "beidermorse/generate.go" COMMAND.
THE FOLLOWING CODE WAS GENERATED USING "beidermorse/generate.go" COMMAND.
beidermorsesep
THE FOLLOWING CODE WAS GENERATED USING "beidermorse/generate.go" COMMAND.
THE FOLLOWING CODE WAS GENERATED USING "beidermorse/generate.go" COMMAND.
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL