Falcon-512/1024
Fast Fourier Lattice-based Compact Signatures over NTRU
Overview
Falcon is a post-quantum digital signature algorithm selected by NIST for standardization. It produces the smallest signatures among NIST PQC signature candidates while maintaining strong security guarantees. Falcon is based on the theoretical framework of Gentry, Peikert, and Vaikuntanathan (GPV) applied to NTRU lattices, using fast Fourier sampling (ffsampling) over the reals to produce short lattice vectors as signatures.
Falcon is the showcase algorithm in the MetaMUI suite, with the most extensive SIMD optimizations, cross-language verification, and hardware acceleration support.
How it works:
- KeyGen — Generate an NTRU key pair
(f, g, F, G)satisfyingfG - gF = q mod X^n+1, then derive the public keyh = g/f mod q. - Sign — Hash the message to a point in
Z_q[X]/(X^n+1), then use ffsampling with the secret key’s LDL tree to find a short vector(s1, s2)such thats1 + s2*h = hash(msg) mod q. - Verify — Recompute
s1 = c - s2*h mod qfrom the signature’ss2component, then check that||(s1, s2)||^2is below the acceptance bound.
Specifications
| Parameter Set | NIST Level | Public Key (bytes) | Secret Key (bytes) | Signature (bytes) | Degree n |
|---|---|---|---|---|---|
| Falcon-512 | 1 | 897 | 1281 | ~666 (compressed) | 512 |
| Falcon-1024 | 5 | 1793 | 2305 | ~1280 (compressed) | 1024 |
NIST encoding headers:
| Component | Header Byte | Falcon-512 | Falcon-1024 |
|---|---|---|---|
| Public Key | 0x00 \| logn |
0x09 |
0x0A |
| Secret Key | 0x50 \| logn |
0x59 |
0x5A |
| Signature | 0x30 \| logn |
0x39 |
0x3A |
Signatures use compressed encoding: the s2 polynomial coefficients are Huffman-coded, yielding variable-length signatures that average ~666 bytes for Falcon-512 and ~1280 bytes for Falcon-1024. The signer retries with fresh randomness if the compressed signature exceeds the maximum size.
Security
- Security notion: EUF-CMA (existential unforgeability under chosen-message attack)
- Hardness assumptions: NTRU lattice problem + Short Integer Solution (SIS) over NTRU
- Signing method: GPV framework with fast Fourier sampling — discrete Gaussian sampling over the NTRU lattice using the LDL tree decomposition of the Gram matrix
- Constant-time: All SIMD paths are constant-time with branchless masking (audited). See
metamui-crypto-rust/metamui-falcon512/src/constant_time.rs - Floating-point precision: Falcon requires high-precision floating-point arithmetic during signing. The implementation uses 64-bit IEEE 754 doubles throughout, with careful handling of rounding to ensure security of the discrete Gaussian sampler
Hardware Acceleration
Falcon has the most extensive hardware acceleration in the MetaMUI suite, covering NTT, FFT, and batch operations across multiple instruction sets and GPU compute.
SIMD Dispatch Order
The runtime selects the best available backend: Metal GPU > AVX-512 > AVX-2 > NEON > Portable
Instruction Set Details
| Acceleration | Parallelism | Operations | Implementation |
|---|---|---|---|
| AVX-512 | 32-way butterfly | NTT forward/inverse, FFT | metamui-crypto-rust/metamui-falcon512/src/simd/avx512_ntt.rs, avx512_fft.rs |
| AVX-2 | 16-way butterfly | NTT forward/inverse, FFT | metamui-crypto-rust/metamui-falcon512/src/simd/avx2_ntt.rs, avx2_fft.rs |
| NEON | 8-way butterfly | NTT forward/inverse, FFT | metamui-crypto-rust/metamui-falcon512/src/simd/neon_ntt.rs, neon_fft.rs |
| Apple Metal | GPU parallel | NTT forward/inverse, pointwise multiply, batch verify | metamui-crypto-rust/metamui-falcon512/src/metal/ |
| Portable | Scalar | All operations (fallback) | metamui-crypto-rust/metamui-falcon512/src/simd/portable.rs, portable_fft.rs |
Fused Operations
Fused multiply-add and multiply-subtract operations reduce memory traffic and improve throughput for the polynomial arithmetic that dominates signing:
poly_muladd_fft(a, b, c)— computesa*b + cin FFT domain in a single passpoly_mulsub_fft(a, b, c)— computesa*b - cin FFT domain in a single pass
These are implemented in portable, NEON, AVX-2, and AVX-512 variants.
ExpandedKey Caching
The ExpandedKey structure pre-computes the LDL tree decomposition from the secret key once, then reuses it across all signing retries. This avoids redundant tree computation on each attempt, which is significant because Falcon’s sign-with-retry loop may execute multiple iterations before producing a signature within the norm bound.
Apple Metal GPU
Metal compute shaders provide GPU-accelerated batch operations:
- Forward NTT / Inverse NTT — polynomial transform on GPU
- Pointwise multiply — coefficient-wise multiplication in NTT domain
- Batch verify — CPU pre-computes
s0 = c - NTT(s1 * h), GPU performs parallel norm checking across multiple signatures
Shader source: metamui-crypto-rust/metamui-falcon512/src/metal/shaders/
Constant-Time Audit
All SIMD paths (AVX-512, AVX-2, NEON, portable) use branchless masking throughout and have been audited for constant-time behavior. No secret-dependent branches or memory access patterns exist in the signing or verification paths.
Platform Support
Falcon is implemented across all 10 platforms with cross-language interoperability verified:
| Platform | Language | Implementation Path | Cross-Lang Verified |
|---|---|---|---|
| Native | C | metamui-crypto-c/ |
Yes (gold standard) |
| Systems | Rust | metamui-crypto-rust/metamui-falcon512/ |
Yes |
| Backend | Go | metamui-crypto-go/falcon512/, falcon1024/ |
Yes |
| Data Science | Python | metamui-crypto-python/metamui_crypto/falcon512/, falcon1024/ |
Yes |
| JVM | Java | metamui-crypto-java/ |
Yes |
| JVM/Android | Kotlin | metamui-crypto-kotlin/ |
Yes |
| .NET | C# | metamui-crypto-csharp/ |
Yes |
| Apple | Swift | metamui-crypto-swift/ |
Yes |
| Web | TypeScript | metamui-crypto-typescript/packages/falcon/ |
Format check only |
| Browser/Edge | WASM | metamui-crypto-wasm/ |
Format check only |
Cross-language verification chain: C (generates vectors) > Go > Rust > Python > Java > Kotlin > C# > Swift — all produce and verify signatures interoperably using shared test vectors.
WASM note: The TypeScript/WASM implementation uses a non-NIST encoding format (896-byte public key, 666-byte fixed-length signature). Cryptographic correctness has been verified via the Rust backend.
API Example
use metamui_falcon512::{Falcon512, KeyPair};
// Key generation
let keypair = Falcon512::keygen(&mut rng);
// Signing
let message = b"Document to be signed";
let signature = Falcon512::sign(&keypair.secret_key, message);
// Verification
let is_valid = Falcon512::verify(&keypair.public_key, message, &signature);
assert!(is_valid);
// Using ExpandedKey for repeated signing (amortizes LDL tree cost)
let expanded = Falcon512::expand_key(&keypair.secret_key);
let sig1 = Falcon512::sign_with_expanded(&expanded, b"First message");
let sig2 = Falcon512::sign_with_expanded(&expanded, b"Second message");
Test Vectors
Falcon uses both NIST-format test vectors and cross-language interoperability vectors:
- NIST vectors:
test-vectors/falcon512_vectors.json - Cross-language (512):
test-vectors/falcon512_cross_lang.json— 5 messages, signed by C reference, verified by all languages - Cross-language (1024):
test-vectors/falcon1024_cross_lang.json— 5 messages, signed by C reference, verified by all languages - Hash-to-point golden vectors:
test-vectors/hash_to_point_falcon512_golden.json,test-vectors/hash_to_point_falcon1024_golden.json
References
- Falcon Specification — Fouque, P.-A., Hoffstein, J., Kirchner, P., et al. Falcon: Fast-Fourier Lattice-based Compact Signatures over NTRU. https://falcon-sign.info/
- NIST PQC Round 3 — Falcon was selected for standardization in the NIST Post-Quantum Cryptography project. https://csrc.nist.gov/projects/post-quantum-cryptography
- GPV Framework — Gentry, C., Peikert, C., Vaikuntanathan, V. Trapdoors for Hard Lattices and New Cryptographic Constructions. STOC 2008.