Documentation ¶
Index ¶
Constants ¶
const Arith = `` /* 1147-byte string literal not displayed */
const Asm = `
import "golang.org/x/sys/cpu"
var (
supportAdx = cpu.X86.HasADX && cpu.X86.HasBMI2
_ = supportAdx
)
`
Asm ...
const AsmNoAdx = `` /* 230-byte string literal not displayed */
AsmNoAdx ...
const Base = `` /* 17030-byte string literal not displayed */
const BigNum = `` /* 1349-byte string literal not displayed */
const Conv = `` /* 10051-byte string literal not displayed */
const Doc = `` /* 1140-byte string literal not displayed */
const Exp = `` /* 566-byte string literal not displayed */
const FixedExp = `` /* 1756-byte string literal not displayed */
const Inverse = `` /* 11275-byte string literal not displayed */
const InverseTests = `` /* 11516-byte string literal not displayed */
const MulCIOS = `` /* 2803-byte string literal not displayed */
MulCIOS text book CIOS works for all modulus.
There are couple of variations to the multiplication (and squaring) algorithms.
All versions are derived from the Montgomery CIOS algorithm: see section 2.3.2 of Tolga Acar's thesis https://www.microsoft.com/en-us/research/wp-content/uploads/1998/06/97Acar.pdf
For 1-word modulus, the generator will call mul_cios_one_limb (standard REDC)
For 13-word+ modulus, the generator will output a unoptimized textbook CIOS code, in plain Go.
For all other moduli, we look at the available bits in the last limb. If they are none (like secp256k1) we generate a unoptimized textbook CIOS code, in plain Go, for all architectures. If there is at least one we can omit a carry propagation in the CIOS algorithm. If there is at least two we can use the same technique for the CIOS Squaring. See appendix in https://eprint.iacr.org/2022/1400.pdf for the exact condition.
In practice, we have 3 different targets in mind: x86(amd64), arm64 and wasm.
For amd64, we can leverage (when available) the BMI2 and ADX instructions to have 2-carry-chains in parallel. This make the use of assembly worth it as it results in a significant perf improvement; most CPUs since 2016 support these instructions, and we assume it to be the "default path"; in case the CPU has no support, we fall back to a slow, unoptimized version.
On amd64, the Squaring algorithm always call the Multiplication (assembly) implementation.
For arm64, we unroll the loops in the CIOS (+nocarry optimization) algorithm, such that the instructions generated by the Go compiler closely match what we would hand-write. Hence, there is no assembly needed for arm64 target.
Additionally, if 2-bits+ are available on the last limb, we have a template to generate a dedicated Squaring algorithm This is not activated by default, to minimize the codebase size. On M1, AWS Graviton3 it results in a 5-10% speedup. On some mobile devices, speed up observed was more important (~20%).
The same (arm64) unrolled Go code produce satisfying performance for WASM (compiled using TinyGo).
const MulDoc = `` /* 1630-byte string literal not displayed */
const MulNoCarry = `` /* 6220-byte string literal not displayed */
MulNoCarry see https://eprint.iacr.org/2022/1400.pdf annex for more info on the algorithm Note that these templates are optimized for arm64 target, since x86 benefits from assembly impl.
const OpsAMD64 = `` /* 807-byte string literal not displayed */
OpsAMD64 is included with AMD64 builds (regardless of architecture or if F.ASM is set)
const OpsNoAsm = `` /* 1916-byte string literal not displayed */
const Reduce = `` /* 422-byte string literal not displayed */
const Sqrt = `` /* 3421-byte string literal not displayed */
const Test = `` /* 37965-byte string literal not displayed */
const TestVector = `` /* 1249-byte string literal not displayed */
const Vector = `` /* 5864-byte string literal not displayed */
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
This section is empty.