All SIMD functions are generated from C++. Most of it is portable, auto-vectorized code. To add a new function:
Compile the C++ source code to AT&T assembly with Clang,
flags -Ofast -mfma -mavx2 -funroll-loops -fomit-frame-pointer -std=c++17
Convert the output to avo instructions
with python asm/asm2avo.py --suffix AVX2 input.s --out output.go
Fix potential issues with the output. Data sections need to be added manually and moves changed to unaligned access
Add the avo section to asm/gen.go and generate Go assembly and stubs with go generate asm/gen.go
The C++ code was mostly compiled and analyzed with godbolt. Other assembly output may need
additional cleanup. To compile code that uses vcl run
godbolt locally and add -I/path/to/vcl to the compiler
flags.