ARM SVE2 (Experimental)
Scalable Vector Extension 2 (SVE2) is ARM’s next-generation SIMD architecture with variable-length vectors ranging from 128 to 2048 bits. Unlike NEON’s fixed 128-bit registers, SVE2 code adapts automatically to the hardware’s vector width at runtime.
Status
SVE2 support is experimental. The current implementation provides:
- BLAKE3 comparison tests between NEON and SVE2 paths
- QEMU-based test infrastructure for development without SVE2 hardware
Production workloads should use the NEON backend on AArch64. The SVE2 backend will be promoted from experimental when SVE2 hardware is widely available in server and consumer devices.
Hardware Availability
SVE2 is currently available on:
- ARM Neoverse V1/V2 (AWS Graviton 3/4, some server platforms)
- ARM Cortex-X2/X3/X4 (mobile SoCs with SVE2 support)
- Limited consumer availability compared to the universal NEON baseline
Apple Silicon (M1-M4) does not implement SVE2.
QEMU Test Infrastructure
A Docker-based QEMU environment is provided for SVE2 development and testing without access to SVE2 hardware:
tools/qemu-sve2/
Dockerfile # Ubuntu + QEMU + aarch64 cross-compiler
build-docker.sh # Build the Docker image
compile-sve2.sh # Cross-compile SVE2 code for aarch64
run-sve2.sh # Run SVE2 binaries under QEMU emulation
test-sve2.sh # Execute SVE2 test suite
test_sve2_simple.c # Basic SVE2 intrinsics test
BLAKE3 Comparison Tests
A comparison test validates NEON vs SVE2 BLAKE3 output consistency:
metamui-crypto-c/blake3/compare_neon_sve2.c
This ensures the SVE2 path produces identical hash outputs to the proven NEON implementation.
Future Plans
When SVE2 hardware becomes widely available, the plan is to:
- Implement full SVE2 backends for Falcon NTT/FFT operations
- Add SVE2 to the runtime dispatch hierarchy (between AVX-512 and NEON in priority)
- Leverage vector-length-agnostic programming for automatic scaling across different SVE2 implementations