gofft

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 23, 2019 License: MIT Imports: 3 Imported by: 2

README

gofft GoDoc Build Status Report Card

A better radix-2 fast Fourier transform in Go.

Package gofft provides an efficient radix-2 fast discrete Fourier transformation algorithm in pure Go.

This code is much faster than existing FFT implementations and uses no additional memory.

The algorithm is non-recursive, works in-place overwriting the input array, and requires O(1) additional space.

What

I took an existing FFT implementation in Go, cleaned and improved the code API and performance, and replaced the permutation step with an algorithm that works with no temp array.

Performance was more than doubled over the original code, and is consistently the fastest Go FFT library (see benchmarks below)

Added convolution functions Convolve(x, y), FastConvolve(x, y), MultiConvolve(x...), FastMultiConvolve(X), which implement the discrete convolution and a new hierarchical convolution algorithm that has utility in a number of CS problems. This computes the convolution of many arrays in O(n*ln(n)2) run time, and in the case of FastMultiConvolve O(1) additional space.

Also included new utility functions: IsPow2, NextPow2, ZeroPad, ZeroPadToNextPow2, Float64ToComplex128Array, Complex128ToFloat64Array, and RoundFloat64Array

Why

Most existing FFT libraries in Go allocate temporary arrays with O(N) additional space. This is less-than-ideal when you have arrays of length of 225 or more, where you quickly end up allocating gigabytes of data and dragging down the FFT calculation to a halt.

This code is much faster than major existing fft libraries while reducing memory usage and remaining in pure Go.

Additionally, the new convolution functions have significant utility for projects I've written or am planning.

One downside is that the FFT is not multithreaded (like go-dsp is), so for large vector size FFTs on a multi-core machine it will be slower than it could be. FFTs can be run in parallel, however, so in the case of many FFT calls it will be faster.

How

package main

import (
	"fmt"
	"github.com/argusdusty/gofft"
)

func main() {
	// Do an FFT and IFFT and get the same result
	testArray := gofft.Float64ToComplex128Array([]float64{1, 2, 3, 4, 5, 6, 7, 8})
	err := gofft.FFT(testArray)
	if err != nil {
		panic(err)
	}
	err = gofft.IFFT(testArray)
	if err != nil {
		panic(err)
	}
	result := gofft.Complex128ToFloat64Array(testArray)
	gofft.RoundFloat64Array(result)
	fmt.Println(result)

	// Do a discrete convolution of the testArray with itself
	testArray, err = gofft.Convolve(testArray, testArray)
	if err != nil {
		panic(err)
	}
	result = gofft.Complex128ToFloat64Array(testArray)
	gofft.RoundFloat64Array(result)
	fmt.Println(result)
}

Outputs:

[1 2 3 4 5 6 7 8]
[1 4 10 20 35 56 84 120 147 164 170 164 145 112 64]
Benchmarks
github.com\argusdusty\gofft>go test -bench=. -benchmem -cpu=1,4
goos: windows
goarch: amd64
pkg: github.com/argusdusty/gofft
BenchmarkSlowFFT/Tiny_(4)                5000000               259 ns/op         246.56 MB/s          64 B/op          1 allocs/op
BenchmarkSlowFFT/Tiny_(4)-4             10000000               249 ns/op         256.19 MB/s          64 B/op          1 allocs/op
BenchmarkSlowFFT/Small_(128)                3000            355391 ns/op           5.76 MB/s        2048 B/op          1 allocs/op
BenchmarkSlowFFT/Small_(128)-4              5000            355787 ns/op           5.76 MB/s        2048 B/op          1 allocs/op
BenchmarkSlowFFT/Medium_(4096)                 3         349252400 ns/op           0.19 MB/s       65536 B/op          1 allocs/op
BenchmarkSlowFFT/Medium_(4096)-4               3         347247300 ns/op           0.19 MB/s       65536 B/op          1 allocs/op
BenchmarkKtyeFFT/Tiny_(4)               10000000               131 ns/op         486.48 MB/s          64 B/op          1 allocs/op
BenchmarkKtyeFFT/Tiny_(4)-4             20000000               116 ns/op         551.20 MB/s          64 B/op          1 allocs/op
BenchmarkKtyeFFT/Small_(128)              300000              4297 ns/op         476.56 MB/s        2048 B/op          1 allocs/op
BenchmarkKtyeFFT/Small_(128)-4            300000              4214 ns/op         485.94 MB/s        2048 B/op          1 allocs/op
BenchmarkKtyeFFT/Medium_(4096)             10000            180892 ns/op         362.29 MB/s       65536 B/op          1 allocs/op
BenchmarkKtyeFFT/Medium_(4096)-4           10000            171514 ns/op         382.10 MB/s       65536 B/op          1 allocs/op
BenchmarkKtyeFFT/Large_(131072)              100          10477746 ns/op         200.15 MB/s     2097152 B/op          1 allocs/op
BenchmarkKtyeFFT/Large_(131072)-4            100          10248000 ns/op         204.64 MB/s     2097154 B/op          1 allocs/op
BenchmarkGoDSPFFT/Tiny_(4)                300000              3497 ns/op          18.30 MB/s         568 B/op         13 allocs/op
BenchmarkGoDSPFFT/Tiny_(4)-4              500000              3408 ns/op          18.78 MB/s         504 B/op         14 allocs/op
BenchmarkGoDSPFFT/Small_(128)             200000             10960 ns/op         186.85 MB/s        5600 B/op         18 allocs/op
BenchmarkGoDSPFFT/Small_(128)-4           100000             19320 ns/op         106.00 MB/s        5785 B/op         34 allocs/op
BenchmarkGoDSPFFT/Medium_(4096)            10000            170892 ns/op         383.49 MB/s      164380 B/op         23 allocs/op
BenchmarkGoDSPFFT/Medium_(4096)-4          10000            195829 ns/op         334.66 MB/s      164831 B/op         54 allocs/op
BenchmarkGoDSPFFT/Large_(131072)             200           7961076 ns/op         263.43 MB/s     5243448 B/op         28 allocs/op
BenchmarkGoDSPFFT/Large_(131072)-4           300           5679089 ns/op         369.28 MB/s     5244217 B/op         74 allocs/op
BenchmarkGoDSPFFT/Huge_(4194304)               2         719410200 ns/op          93.28 MB/s   167772816 B/op         33 allocs/op
BenchmarkGoDSPFFT/Huge_(4194304)-4             3         371179166 ns/op         180.80 MB/s   167774186 B/op         98 allocs/op
BenchmarkFFT/Tiny_(4)                   30000000              42.1 ns/op        1521.16 MB/s           0 B/op          0 allocs/op
BenchmarkFFT/Tiny_(4)-4                 30000000              42.1 ns/op        1520.90 MB/s           0 B/op          0 allocs/op
BenchmarkFFT/Small_(128)                 1000000              1304 ns/op        1570.10 MB/s           0 B/op          0 allocs/op
BenchmarkFFT/Small_(128)-4               1000000              1273 ns/op        1608.04 MB/s           0 B/op          0 allocs/op
BenchmarkFFT/Medium_(4096)                 20000             70867 ns/op         924.77 MB/s           0 B/op          0 allocs/op
BenchmarkFFT/Medium_(4096)-4               20000             70953 ns/op         923.65 MB/s           0 B/op          0 allocs/op
BenchmarkFFT/Large_(131072)                  300           4475215 ns/op         468.61 MB/s           0 B/op          0 allocs/op
BenchmarkFFT/Large_(131072)-4                300           4474826 ns/op         468.66 MB/s           0 B/op          0 allocs/op
BenchmarkFFT/Huge_(4194304)                    3         453284966 ns/op         148.05 MB/s           0 B/op          0 allocs/op
BenchmarkFFT/Huge_(4194304)-4                  3         456955133 ns/op         146.86 MB/s           0 B/op          0 allocs/op
BenchmarkFFTParallel/Tiny_(4)           30000000              38.2 ns/op        1676.16 MB/s           0 B/op          0 allocs/op
BenchmarkFFTParallel/Tiny_(4)-4        100000000              13.4 ns/op        4767.56 MB/s           0 B/op          0 allocs/op
BenchmarkFFTParallel/Small_(128)         1000000              1307 ns/op        1565.95 MB/s           0 B/op          0 allocs/op
BenchmarkFFTParallel/Small_(128)-4       3000000               478 ns/op        4281.00 MB/s           0 B/op          0 allocs/op
BenchmarkFFTParallel/Medium_(4096)         20000             73033 ns/op         897.34 MB/s           0 B/op          0 allocs/op
BenchmarkFFTParallel/Medium_(4096)-4       50000             24109 ns/op        2718.32 MB/s           0 B/op          0 allocs/op
BenchmarkFFTParallel/Large_(131072)          300           4449412 ns/op         471.33 MB/s           0 B/op          0 allocs/op
BenchmarkFFTParallel/Large_(131072)-4       1000           1914525 ns/op        1095.39 MB/s           0 B/op          0 allocs/op
BenchmarkFFTParallel/Huge_(4194304)            3         447525700 ns/op         149.96 MB/s          37 B/op          1 allocs/op
BenchmarkFFTParallel/Huge_(4194304)-4          5         270185780 ns/op         248.38 MB/s          41 B/op          1 allocs/op

Documentation

Overview

Package gofft provides a fast discrete Fourier transformation algorithm.

Implemented is the 1-dimensional DFT of complex input data for with input lengths which are powers of 2.

The algorithm is non-recursive, works in-place overwriting the input array, and requires O(1) additional space.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Complex128ToFloat64Array

func Complex128ToFloat64Array(x []complex128) []float64

Complex128ToFloat64Array converts a complex128 array to the equivalent float64 array taking only the real part.

func Convolve

func Convolve(x, y []complex128) ([]complex128, error)

Convolve computes the discrete convolution of x and y using FFT. Pads x and y to the next power of 2 from len(x)+len(y)-1

func FFT

func FFT(x []complex128) error

FFT implements the fast Fourier transform. This is done in-place (modifying the input array). Requires O(1) additional memory. len(x) must be a perfect power of 2, otherwise this will return an error.

func FastConvolve

func FastConvolve(x, y []complex128) error

FastConvolve computes the discrete convolution of x and y using FFT and stores the result in x, while erasing y (setting it to 0s). Since this does no allocations, x and y are assumed to already be 0-padded for at least half their length.

func FastMultiConvolve

func FastMultiConvolve(X []complex128, n int, multithread bool) error

FastMultiConvolve computes the discrete convolution of many arrays using a hierarchical FFT algorithm, and stores the result in the first section of the input, writing 0s to the remainder of the input This does no allocations, so the arrays must first be 0-padded out to the next power of 2 from sum of the lengths of the longest two arrays. Additionally, the number of arrays must be a power of 2 X is the concatenated array of arrays, of length N (n*m) n is the length of the 0-padded arrays. multithread tells the algorithm to use goroutines, which can slow things down for small N. Takes O(N*log(N)^2) run time and O(1) additional space.

func Float64ToComplex128Array

func Float64ToComplex128Array(x []float64) []complex128

Float64ToComplex128Array converts a float64 array to the equivalent complex128 array using an imaginary part of 0.

func IFFT

func IFFT(x []complex128) error

IFFT implements the inverse fast Fourier transform. This is done in-place (modifying the input array). Requires O(1) additional memory. len(x) must be a perfect power of 2, otherwise this will return an error.

func IsPow2

func IsPow2(N int) bool

IsPow2 returns true if N is a perfect power of 2 (1, 2, 4, 8, ...) and false otherwise. Only works up to 2^30 Algorithm from: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2

func MultiConvolve

func MultiConvolve(X ...[]complex128) ([]complex128, error)

MultiConvolve computes the discrete convolution of many arrays using a hierarchical FFT algorithm that successfully builds up larger convolutions. This requires allocating up to 4*N extra memory for appropriate 0-padding where N=sum(len(x) for x in X). Takes O(N*log(N)^2) run time and O(N) additional space.

This is much slower and takes many more allocations than FastMultiConvolve below, but has a smart planner that handles disproportionate array sizes very well. If all your arrays are the same length, FastMultiConvolve will be much faster.

func NextPow2

func NextPow2(N int) int

NextPow2 returns the smallest power of 2 >= N. Only works up to 2^30 Algorithm from: https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2

func Prepare

func Prepare(N int) error

Prepare precomputes values used for FFT on a vector of length N. N must be a perfect power of 2, otherwise this will return an error.

func RoundFloat64Array

func RoundFloat64Array(x []float64)

RoundFloat64Array calls math.Round on each entry in x, changing the array in-place

func ZeroPad

func ZeroPad(x []complex128, N int) []complex128

ZeroPad pads x with 0s at the end into a new array of length N. This does not alter x, and creates an entirely new array. This should only be used as a convience function, and isn't meant for performance. You should call this as few times as possible since it does potentially large allocations.

func ZeroPadToNextPow2

func ZeroPadToNextPow2(x []complex128) []complex128

ZeroPadToNextPow2 pads x with 0s at the end into a new array of length 2^N >= len(x) This does not alter x, and creates an entirely new array. This should only be used as a convience function, and isn't meant for performance. You should call this as few times as possible since it does potentially large allocations.

Types

This section is empty.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL