fractalyze · ZKX compiler

ZKX + PrimeIR: compiler stack for real-time proving.

ZKX is our proof compiler, and PrimeIR is the optimization layer beneath it. Instead of hand-tuning each prover backend, this stack compiles proving workloads into optimized primitive kernels across schemes and hardware targets.

The benchmark sections below show the kernel-level gains (FFT, IFFT, MSM, SMCS, LOGUP GKR) and the application-level outcomes (Groth16 prove, zkVM block proof). The payment gateway demo shows what this means in product terms: proof generation fast enough for real user-facing flows.

Try the demo

circom

arithmetic circuit

zkVM

RISC-V trace

custom circuit

user-defined

ZKX ↗open-zkx

PrimeIR · MLIR dialect ↗

algebraic rewritekernel fusionlayout assignmentlowering

CPU

AVX-512

GPU

CUDA

ZK ASIC

planned

How it works

An XLA for ZK.

ZK proving today looks like ML did around 2015 — every scheme has its own hand-tuned prover, and every new (scheme × hardware) combination is a fresh manual optimization round.

ML solved this with XLA. TensorFlow, PyTorch, and JAX stopped emitting hand-tuned CUDA kernels and started emitting an IR that XLA lowers to whatever hardware target you bring. PyTorch + XLA on a GPU often beats hand-written CUDA C++ — the compiler sees the whole computation graph; the manual writer only sees one kernel at a time.

ZKX takes the same path for ZK — and it's not a loose analogy. We build on the same IR infrastructure (StableHLO + HLO) and add a ZK-specific dialect (PrimeIR) on top. Given any proving scheme, ZKX produces optimized proof generation through just-in-time compilation and cluster-level optimization, targeting CPU, GPU, and ASIC from a single codebase.

Both domains hit the same wall: ZK proving and ML training are both memory-bound on modern GPUs. The compiler optimizations that won for XLA — fusion, layout, scheduling — are the same ones that move the needle here.

ZK — today

ZK — with ZKX

frontends

per scheme, isolated — Groth16, zkVM, circom each ship their own prover

Unified ingest — Groth16, zkVM, circom feed the same pipeline

None — each prover has its own internals

PrimeIR · StableHLO · HLO

compiler

None — kernels are written by hand

ZKX — whole-graph fusion, layout, scheduling

optimization cost

N × M hand-tuned implementations (per scheme × hardware)

N + M — add a frontend or backend, the cross is automated

backends

Per-prover (Gnark → CPU, ICICLE → GPU, SP1 → GPU, …)

CPU · GPU · ASIC (one compiler, any target)

memory bound

tuned by hand, kernel by kernel

tuned by the compiler, end-to-end

ML hit the same pattern around 2017 — XLA solved it there.

Performance

Primitive kernel benchmarks

Backend kernel improvements from ZKX + PrimeIR, measured on FFT, IFFT, MSM, SMCS, and LOGUP GKR.

Primitive kernels

Best zkx

1.997 ms

Baseline

28.334 ms

Primitive · vs Gnark baseline

rabbitsnark-py · bn254 · gpu · degree 22

14.2×

faster

Gnark baseline28.334 ms

zkx1.997 ms

015 ms30 ms

Scheme comparison

Primitive kernels

5 schemes listed

Family	Setup	Baseline	zkx	Speedup
FFT	bn254 · gpu · d22 · Gnark	28.334 ms	1.997 ms	14.2×
IFFT	bn254 · gpu · d22 · Gnark	28.835 ms	2.066 ms	14.0×
MSM	bn254 · gpu · d22 · Gnark	56.380 ms	29.835 ms	1.9×
SMCS	koalabear · gpu · d20 · SP1	1.736 ms	1.321 ms	1.3×
LOGUP GKR	koalabear · gpu · d22 · SP1	108.394 ms	38.438 ms	2.8×

Applications

We beat the industry's fastest provers.

Two end-to-end workloads drive most production GPU prover cost — Groth16 STARK→SNARK wrapping and zkVM block proving. ZKX is faster than the prover each category currently calls SOTA.

The two provers the ZK industry currently calls fastest in their categories. ZKX is faster than both — same workload, same hardware, no protocol changes.

Groth16 prove

vs Gnark + ICICLE

~2×

faster

Production GPU Groth16 stack — Gnark (Consensys) wrapping Ingonyama's ICICLE GPU primitives. What every SP1 mainnet deployment uses today for the STARK→SNARK wrapping step.

Gnark + ICICLE4.90 s

zkx2.49 s

↳ 2.41 s saved

SP1 STARK verifier in Groth16 · ~5–6M constraints · BN254 · RTX 5090

"ICICLE-snark is now the fastest Groth16 prover implementation"
— Ingonyama, Mar 2025

zkVM block proof

vs SP1 Hypercube

~1.5×

faster

Succinct Labs' current production prover for SP1 — the published frontier for end-to-end zkVM block proof latency. What every Succinct mainnet deployment runs today to generate Ethereum block proofs.

SP1 Hypercube10.30 s

zkx7.00 s

↳ 3.30 s saved

Ethereum mainnet block proving · target block 22,309,250 · RTX 5090

Live demo