BLAKE3
High-Speed Parallel Cryptographic Hash Function
Overview
BLAKE3 is a cryptographic hash function designed by Jack O’Connell, Jean-Philippe Aumasson, Samuel Neves, and Zooko Wilcox-O’Hearn. It is the successor to BLAKE2 and provides significantly higher throughput by exploiting a Merkle tree structure that enables parallel computation at every level — from SIMD instructions within a single core to multi-threaded hashing across cores to GPU batch processing.
BLAKE3 supports three modes of operation:
- Hash — Standard cryptographic hash with arbitrary-length output
- Keyed hash — MAC (message authentication code) using a 256-bit key
- KDF — Key derivation function using a context string
The core compression function processes 64-byte blocks using a 16-word state with 7 rounds (compared to 10 in BLAKE2). The tree structure splits input into 1024-byte chunks, compresses each chunk independently, then merges results in a binary tree. This design is inherently parallel at every stage.
Specifications
| Property | Value |
|---|---|
| Output size | 256 bits default (arbitrary via XOF) |
| Block size | 64 bytes |
| Chunk size | 1024 bytes (16 blocks) |
| Rounds | 7 per compression |
| State words | 16 x 32-bit |
| Key size (keyed mode) | 256 bits |
| Internal security | 128 bits (birthday bound for 256-bit output) |
Modes:
| Mode | Input | Description |
|---|---|---|
hash(data) |
Arbitrary bytes | Standard cryptographic hash |
keyed_hash(key, data) |
256-bit key + arbitrary bytes | MAC construction |
derive_key(context, material) |
Context string + key material | KDF via context separation |
Security
- Collision resistance: 128 bits (birthday bound on 256-bit output)
- Preimage resistance: 256 bits
- Second preimage resistance: 256 bits
- PRF security: Keyed mode provides PRF security under the assumption that the compression function is a PRF
- Domain separation: The three modes (hash, keyed hash, KDF) use distinct initialization vectors, preventing cross-mode attacks
- No length extension: Unlike SHA-256, BLAKE3’s tree structure and finalization prevent length extension attacks
Hardware Acceleration
BLAKE3 has the broadest hardware acceleration coverage in the MetaMUI suite, spanning CPU SIMD, GPU compute, and WebAssembly SIMD.
CPU SIMD
| Acceleration | Parallelism | Description |
|---|---|---|
| AVX-512 | 16-block batch | Compresses 16 blocks simultaneously using 512-bit registers |
| AVX-2 | 8-block batch | Compresses 8 blocks simultaneously using 256-bit registers |
| NEON | Block compression | SIMD-accelerated compression function on ARM |
| WASM SIMD128 | Block compression | 128-bit SIMD in WebAssembly environments |
GPU Compute
| Acceleration | Target | Operations |
|---|---|---|
| Apple Metal | macOS/iOS GPU | compress_blocks_simd — batch block compression; process_chunks_tile — parallel chunk processing; tree_merge — Merkle tree merge |
| CUDA | NVIDIA GPU | Batch hashing with multiple CUDA streams for throughput |
Parallelism Model
BLAKE3’s tree structure enables parallelism at three levels:
- Intra-block: SIMD instructions parallelize the quarter-round operations within a single compression
- Inter-chunk: Independent 1024-byte chunks can be compressed on separate threads or GPU work items
- Tree merge: Parent node computation in the Merkle tree can be parallelized across tree levels
This makes BLAKE3 particularly well-suited for hashing large inputs (files, streams) and for batch hashing many small inputs (e.g., verifying a set of transaction hashes).
Platform Support
BLAKE3 is implemented across all 10 platforms with SIMD acceleration where available:
| Platform | Language | SIMD Support | Implementation Path |
|---|---|---|---|
| Native | C | AVX-512, AVX-2 | metamui-crypto-c/ |
| Systems | Rust | AVX-512, AVX-2, NEON | metamui-crypto-rust/ |
| Backend | Go | Portable + assembly | metamui-crypto-go/ |
| Data Science | Python | Via C bindings | metamui-crypto-python/ |
| JVM | Java | Portable | metamui-crypto-java/ |
| JVM/Android | Kotlin | Portable | metamui-crypto-kotlin/ |
| .NET | C# | Portable | metamui-crypto-csharp/ |
| Apple | Swift | NEON, Metal | metamui-crypto-swift/ |
| Web | TypeScript | WASM SIMD128 | metamui-crypto-typescript/ |
| Browser/Edge | WASM | SIMD128 | metamui-crypto-wasm/ |
API Example
// Standard hash
let hash = blake3::hash(b"input data");
// Keyed hash (MAC)
let key: [u8; 32] = /* 256-bit key */;
let mac = blake3::keyed_hash(&key, b"authenticated data");
// Key derivation
let derived_key = blake3::derive_key("metamui-crypto 2024 session key", key_material);
// Incremental hashing (streaming)
let mut hasher = blake3::Hasher::new();
hasher.update(b"first chunk");
hasher.update(b"second chunk");
let hash = hasher.finalize();
// Extended output (XOF)
let mut output = [0u8; 64];
hasher.finalize_xof().fill(&mut output);
Test Vectors
- Location:
test-vectors/blake3_vectors.json - Coverage: Hash, keyed hash, and KDF modes with various input lengths
- Source: Official BLAKE3 test vectors from the reference implementation
References
- BLAKE3 Specification — O’Connor, J., Aumasson, J.-P., Neves, S., Wilcox-O’Hearn, Z. BLAKE3: One function, fast everywhere. https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf
- BLAKE3 Reference Implementation — https://github.com/BLAKE3-team/BLAKE3
- blake3.io — Official website. https://blake3.io/