ARM SVE2 (Experimental)

Scalable Vector Extension 2 (SVE2) is ARM’s next-generation SIMD architecture with variable-length vectors ranging from 128 to 2048 bits. Unlike NEON’s fixed 128-bit registers, SVE2 code adapts automatically to the hardware’s vector width at runtime.

Status

SVE2 support is experimental. The current implementation provides:

BLAKE3 comparison tests between NEON and SVE2 paths
QEMU-based test infrastructure for development without SVE2 hardware

Production workloads should use the NEON backend on AArch64. The SVE2 backend will be promoted from experimental when SVE2 hardware is widely available in server and consumer devices.

Hardware Availability

SVE2 is currently available on:

ARM Neoverse V1/V2 (AWS Graviton 3/4, some server platforms)
ARM Cortex-X2/X3/X4 (mobile SoCs with SVE2 support)
Limited consumer availability compared to the universal NEON baseline

Apple Silicon (M1-M4) does not implement SVE2.

QEMU Test Infrastructure

A Docker-based QEMU environment is provided for SVE2 development and testing without access to SVE2 hardware:

tools/qemu-sve2/
    Dockerfile          # Ubuntu + QEMU + aarch64 cross-compiler
    build-docker.sh     # Build the Docker image
    compile-sve2.sh     # Cross-compile SVE2 code for aarch64
    run-sve2.sh         # Run SVE2 binaries under QEMU emulation
    test-sve2.sh        # Execute SVE2 test suite
    test_sve2_simple.c  # Basic SVE2 intrinsics test

BLAKE3 Comparison Tests

A comparison test validates NEON vs SVE2 BLAKE3 output consistency:

metamui-crypto-c/blake3/compare_neon_sve2.c

This ensures the SVE2 path produces identical hash outputs to the proven NEON implementation.

Future Plans

When SVE2 hardware becomes widely available, the plan is to:

Implement full SVE2 backends for Falcon NTT/FFT operations
Add SVE2 to the runtime dispatch hierarchy (between AVX-512 and NEON in priority)
Leverage vector-length-agnostic programming for automatic scaling across different SVE2 implementations