fractalyze · ZKX compiler

ZKX + PrimeIR: compiler stack for real-time proving.

ZKX is our proof compiler, and PrimeIR is the optimization layer beneath it. Instead of hand-tuning each prover backend, this stack compiles proving workloads into optimized primitive kernels across schemes and hardware targets.

The benchmark sections below show the kernel-level gains (FFT, IFFT, MSM, SMCS, LOGUP GKR) and the application-level outcomes (Groth16 prove, zkVM block proof). The payment gateway demo shows what this means in product terms: proof generation fast enough for real user-facing flows.

circom
arithmetic circuit
zkVM
RISC-V trace
custom circuit
user-defined
ZKX ↗open-zkx
PrimeIR · MLIR dialect ↗
algebraic rewritekernel fusionlayout assignmentlowering
CPU
AVX-512
GPU
CUDA
ZK ASIC
planned
How it works

An XLA for ZK.

ZK proving today looks like ML did around 2015 — every scheme has its own hand-tuned prover, and every new (scheme × hardware) combination is a fresh manual optimization round.

ML solved this with XLA. TensorFlow, PyTorch, and JAX stopped emitting hand-tuned CUDA kernels and started emitting an IR that XLA lowers to whatever hardware target you bring. PyTorch + XLA on a GPU often beats hand-written CUDA C++ — the compiler sees the whole computation graph; the manual writer only sees one kernel at a time.

ZKX takes the same path for ZK — and it's not a loose analogy. We build on the same IR infrastructure (StableHLO + HLO) and add a ZK-specific dialect (PrimeIR) on top. Given any proving scheme, ZKX produces optimized proof generation through just-in-time compilation and cluster-level optimization, targeting CPU, GPU, and ASIC from a single codebase.

Both domains hit the same wall: ZK proving and ML training are both memory-bound on modern GPUs. The compiler optimizations that won for XLA — fusion, layout, scheduling — are the same ones that move the needle here.

ZK — today
ZK — with ZKX
frontends
per scheme, isolated — Groth16, zkVM, circom each ship their own prover
Unified ingest — Groth16, zkVM, circom feed the same pipeline
IR
None — each prover has its own internals
PrimeIR · StableHLO · HLO
compiler
None — kernels are written by hand
ZKX — whole-graph fusion, layout, scheduling
optimization cost
N × M hand-tuned implementations (per scheme × hardware)
N + M — add a frontend or backend, the cross is automated
backends
Per-prover (Gnark → CPU, ICICLE → GPU, SP1 → GPU, …)
CPU · GPU · ASIC (one compiler, any target)
memory bound
tuned by hand, kernel by kernel
tuned by the compiler, end-to-end
ML hit the same pattern around 2017 — XLA solved it there.
Performance

Primitive kernel benchmarks

Backend kernel improvements from ZKX + PrimeIR, measured on FFT, IFFT, MSM, SMCS, and LOGUP GKR.

Primitive kernels
Best zkx
1.997 ms
Baseline
28.334 ms

Primitive · vs Gnark baseline

rabbitsnark-py · bn254 · gpu · degree 22

14.2×
faster
Gnark baseline28.334 ms
zkx1.997 ms
015 ms30 ms
Scheme comparison
Primitive kernels
5 schemes listed
FamilySetupBaselinezkxSpeedup
FFTbn254 · gpu · d22 · Gnark28.334 ms1.997 ms14.2×
IFFTbn254 · gpu · d22 · Gnark28.835 ms2.066 ms14.0×
MSMbn254 · gpu · d22 · Gnark56.380 ms29.835 ms1.9×
SMCSkoalabear · gpu · d20 · SP11.736 ms1.321 ms1.3×
LOGUP GKRkoalabear · gpu · d22 · SP1108.394 ms38.438 ms2.8×
Applications

We beat the industry's fastest provers.

Two end-to-end workloads drive most production GPU prover cost — Groth16 STARK→SNARK wrapping and zkVM block proving. ZKX is faster than the prover each category currently calls SOTA.

The two provers the ZK industry currently calls fastest in their categories. ZKX is faster than both — same workload, same hardware, no protocol changes.

Groth16 prove
vs Gnark + ICICLE
~2×
faster

Production GPU Groth16 stack — Gnark (Consensys) wrapping Ingonyama's ICICLE GPU primitives. What every SP1 mainnet deployment uses today for the STARK→SNARK wrapping step.

Gnark + ICICLE4.90 s
zkx2.49 s
2.41 s saved
SP1 STARK verifier in Groth16 · ~5–6M constraints · BN254 · RTX 5090
"ICICLE-snark is now the fastest Groth16 prover implementation"
zkVM block proof
vs SP1 Hypercube
~1.5×
faster

Succinct Labs' current production prover for SP1 — the published frontier for end-to-end zkVM block proof latency. What every Succinct mainnet deployment runs today to generate Ethereum block proofs.

SP1 Hypercube10.30 s
zkx7.00 s
3.30 s saved
Ethereum mainnet block proving · target block 22,309,250 · RTX 5090
Live demo

Bridge real-world events to on-chain in real-time.

Check how fast a real-world activity can land on-chain. Star a GitHub repo → ZKX generates the proof → Solana settles the payout, all in seconds.

Try it · live on devnetPOST /api/claim

A Solana program that gates payments on a ZK proof of an off-chain attestation. AI-agent payouts, verified on-chain.

real-time signalidle0/4 complete
OFF-CHAINON-CHAINyou starthe repoZKX makes proofof your actionproof on-chaingateway verifies$you get paid0.01 SOL
fractalyze/zkx-livestars
checking GitHub session…
Certified by

Backed by top-tier programs.

Ethereum logo
Ethereum Foundation
Ethereum Foundation grant recipient
NVIDIA logo
NVIDIA Inception
NVIDIA Inception Program member