Graduation project for Algorithms course held at Projector in 2019.
About
This project aims to implement Burrow-Wheeler transform in Golang using various approaches and do some research on performance.
Results
Burrow-Wheeler transform is based on sorting all existing string suffixes[1] in lexicographical order.
A range of approaches exists in implemented in this repo.
Naive approach - create all suffix in a loop and sort them. Take last symbol of rotational shift and obtain BWT result. N^2 logN if taking string comparison into consideration. O(N^2) memory
Memory efficient naive approach[1]. We don't actually need to create every possible suffix, we can use pointers(indexes). Same complexity, but O(N) in memory.
SuffixTree - I've taken naive Suffix tree construction from RosettaCode, which is NlogN^2. Than, I've sorted suffix tree using build-in Go sorter, which is likely NlogN. Normally you can use Radix sort here in O(n). Faster algorytmhs exist for suffix tree construction also exist, but out of scope here.
SuffixArray - there's a build-in SuffixArray in GoLang, operating in linear time using [6][7].
I've benchmarked BZIP BWT subroutine and added as comparison.
Data
Benchmark results are in report/data.csv.
Benchmarks
Build-in Suffix Array is faster than BZIP implementation ... explain blocks and other nice findings here.
TODO
Implement naive approach for compression
Benchmark naive approach and create tests and benchmarks
Benchmark approach from reference 2 - suffix array
Use suffixarray construction routine from golang std lib