June 20, 2026 Updated June 21, 2026 9 min read

NeuroCUDA Launch: Inside the Benchmarks Behind a One-Line Neuromorphic Compiler

A two-line PyTorch script and a 94.49% accuracy number do not, by themselves, prove a compiler works. Here is the benchmark methodology behind NeuroCUDA's launch numbers, what each result actually measures, and why the way they were measured matters as much as the numbers themselves.

The pitch for NeuroCUDA fits in two lines:

model = torch.load("resnet18_cifar10.pth")
snn = neurocuda.compile(model, target="loihi3")

A two-line claim like that is easy to make and easy to disbelieve. Compiler claims fail in the gap between "it ran" and "it ran correctly," and neuromorphic compilation has an unusually wide gap, because converting a standard artificial neural network (ANN) into a spiking neural network (SNN) is a lossy transformation by default: continuous activations become discrete spike timing, and something is typically lost in that translation. The interesting question with any ANN-to-SNN compiler is not whether it runs, but how much accuracy survives the conversion, and whether the generated code is provably correct against a hardware vendor's own reference implementation. This post walks through how each of NeuroCUDA's launch benchmarks was actually measured.

The conversion gap: 95.44% to 94.49%

The baseline model is a ResNet-18 trained conventionally on CIFAR-10, reaching 95.44% test accuracy as a standard ANN. After NeuroCUDA converts it to a spiking equivalent, integrate-and-fire neurons replacing the activation functions, threshold voltages and refractory periods configured automatically, the SNN reaches 94.49% accuracy at 32 simulation time steps.

That is a 0.95 percentage point conversion gap. For context, ANN-to-SNN conversion research on residual architectures like ResNet has historically reported gaps in the 1-3% range, with some approaches losing significantly more accuracy on deeper networks where gradients and activations are harder to discretize without loss. A sub-1% gap on a residual architecture, measured at a relatively low 32 time steps rather than the 100+ steps some conversion methods require to close the gap, is the headline technical result.

What matters isn't the 94.49% on its own, but that it was achieved without needing hundreds of simulation time steps.

The real differentiator: a NIR executor that handles residual graphs

An accuracy number shows the conversion math is sound. It does not show that NeuroCUDA's export path is doing something other tools in this space cannot. NIR (Neuromorphic Intermediate Representation) is the published, vendor-neutral graph format the field is consolidating around, and NeuroCUDA exports to it directly rather than inventing a competing format. But the reference NIR execution path is built for simple feed-forward graphs, not graphs with residual connections, multiple inputs summing into one node, the way ResNet's skip connections work.

NeuroCUDA's executor builds its own execution plan with Kahn's algorithm for topological sort, and explicitly sums multi-input nodes when it encounters them. Run end to end on a full ResNet-18, write, read back, execute, and compare against the original model, the result is a 0.000000 maximum absolute difference, a bit-exact round trip on a residual architecture. That gap, not a hardware-vendor validation claim, is the part that's verifiable directly in the public repository's nir_executor.py.

Separately, NeuroCUDA's integrate-and-fire neuron math was checked against Loihi 2's published neuron equations, reimplemented from the spec rather than run on Intel's Lava SDK or physical silicon, across 100,000+ synthetic comparisons with zero deviations. That result shows the neuron math matches the published spec. It is not a hardware or vendor-SDK validation, and the site won't describe it as one.

The quantization result that looked like a bug

8-bit quantization is normally a tradeoff: smaller, faster, less accurate. NeuroCUDA's quantized model went the other direction, improving from 92.89% to 94.49% test accuracy after 8-bit quantization, a 1.6 percentage point gain, not loss.

This looks like an error until you consider why it happens: reduced numerical precision acts as a regularizer, similar to how dropout or weight decay reduce overfitting by limiting how precisely a model can memorize training-set noise. The 92.89% figure was measured on a model that had begun overfitting the training set; quantization clipped some of that overfit precision away, and test-set accuracy improved as a side effect. The result is documented in low-precision neural network research, but it is counterintuitive enough that it was worth verifying twice before it became a launch number.

The number that explains why any of this matters: 93.67% sparsity

Accuracy and code-correctness numbers prove the compiler works. They do not prove neuromorphic deployment is worth doing. The number that answers "why bother" is activation sparsity, measured using NeuroBench, the standard benchmark suite for neuromorphic workloads: 93.67% sparsity, meaning only 6.33% of neurons fire on a typical inference pass.

Spiking neural network energy consumption scales with how many neurons actually fire, not with the size of the network. A 93.67% sparse network is doing roughly 15x less compute-equivalent work per inference than a dense equivalent, which is the foundation of the 100-1000x energy efficiency claims made about neuromorphic hardware generally. Without this number, the accuracy and correctness results would describe a compiler that works but for a hardware category with no real advantage. For the broader energy argument this number feeds into, see The AI Energy Crisis Explained.

MetricResultWhat it proves
SNN accuracy (ResNet-18, CIFAR-10)94.49%Conversion math is sound
ANN baseline / conversion gap95.44% / 0.95%Gap is competitive at low time steps
NIR round-trip on ResNet-18 (residual graph)0.000000 max diffCustom executor handles branches the reference path skips
Loihi 2 neuron math vs. published spec0 / 100,000+ deviationsNeuron math matches spec (not a vendor-SDK or hardware test)
8-bit quantization effect+1.6% accuracyRegularization, not just compression
NeuroBench activation sparsity93.67%The actual energy-efficiency case
Cross-backend max deviation≤1.2%GPU, CPU, Loihi, FPGA agree

Why this was built solo

The proximate cause was personal frustration: months spent trying to deploy the same model to both Loihi and SpiNNaker, discovering there was no shared path between them, only two unrelated SDKs and a full rewrite in between. The deeper reason is covered in more historical detail in Why Neuromorphic Computing Needs Its CUDA Moment, but the short version: compiler infrastructure for a new hardware category has historically come from hardware vendors or large engineering teams, not from individuals working alone. NeuroCUDA is a bet that a working open-source implementation, backed by a benchmark methodology rigorous enough to survive scrutiny, does not require that starting point. More on how NeuroCUDA fits alongside QuantaraCore's other products in Building Memoria, FinInsight, and NeuroCUDA.

Open source, and what is verifiable today

NeuroCUDA is MIT licensed, with the full implementation at github.com/Krishnav1/neurocuda, pip installable, and the benchmark methodology described above documented in the accompanying paper submitted to arXiv in June 2026. Every number in this post is reproducible from the public repository, not just quoted from a press release.

Sources & further reading

  1. NeuroCUDA research paper (submitted to arXiv, June 2026), via github.com/Krishnav1/neurocuda
  2. NeuroBench benchmark suite documentation, used for the activation sparsity measurement
  3. NIR (Neuromorphic Intermediate Representation) specification, arXiv:2311.14641, the export format NeuroCUDA targets
  4. Intel Loihi 2 published neuron-model equations, used as the reference for the neuron-math comparison (not Intel's Lava SDK or physical hardware)
  5. CIFAR-10 dataset and standard ResNet-18 training methodology, used for the baseline ANN accuracy figure

Frequently asked questions

What is the NeuroCUDA conversion gap and why is 0.95% significant?

The conversion gap is the accuracy difference between a standard ANN and its converted spiking neural network equivalent. NeuroCUDA's ResNet-18 on CIFAR-10 converts from a 95.44% ANN baseline to 94.49% SNN accuracy, a 0.95% gap, competitive with state-of-the-art ANN-to-SNN conversion research on residual architectures.

What makes NeuroCUDA's NIR export different from the reference implementation?

NIR is a published, vendor-neutral graph format for spiking networks, and NeuroCUDA exports to it directly. The reference NIR execution path handles simple feed-forward graphs but not residual connections, where multiple inputs sum into one node. NeuroCUDA's executor uses Kahn's algorithm for topological sort and explicitly sums multi-input nodes, verified bit-exact (0.000000 max difference) on a full ResNet-18 round trip.

Has NeuroCUDA been validated on real Loihi 2 hardware or Intel's Lava SDK?

Not yet. NeuroCUDA's integrate-and-fire neuron math was checked against Loihi 2's published neuron equations across 100,000+ synthetic comparisons with zero deviations, which confirms the math matches the published spec. That is a spec-conformance check, not a run against Intel's Lava SDK or physical Loihi 2 silicon, and the site labels it accordingly.

Why did 8-bit quantization improve accuracy instead of reducing it?

Quantization typically trades accuracy for efficiency, but here reduced precision acted as a regularizer, reducing overfitting. Test accuracy improved from 92.89% to 94.49% after 8-bit quantization, an unusual but documented effect in low-precision neural network research.

Where can I find the NeuroCUDA source code and paper?

NeuroCUDA is open source under the MIT license at github.com/Krishnav1/neurocuda. The benchmark methodology and results were submitted as a research paper to arXiv in June 2026.