NeuroCUDA
A PyTorch-to-spiking-neural-network compiler with honest, verified numbers
You train a normal PyTorch model with ReLU activations. NeuroCUDA compiles it into a spiking neural network - binary spikes, stateful membrane, temporal integration - and runs it on GPU, CPU, or a Loihi 2 neuron simulator, with NIR export for cross-platform deployment. Direct ReLU-to-spike replacement destroys accuracy; NeuroCUDA's two-stage QCFS calibration plus BPTT fine-tuning pipeline keeps the conversion lossless.
neurocuda.convert(model, train_loader)
The problem NeuroCUDA solves
ReLU activations and IF (integrate-and-fire) spiking neurons are fundamentally different transfer functions. Swap one for the other directly and a 99% ANN can drop to around 20% accuracy - the binary spike has no way to approximate a continuous activation without adaptation. This is the core obstacle standing between a trained PyTorch model and a working spiking neural network.
NeuroCUDA's PyTorch to spiking neural network compiler solves this with a two-stage pipeline: QCFS calibration learns per-channel thresholds that match each layer's activation distribution, then BPTT fine-tuning with surrogate gradients adapts the weights to binary spike dynamics. The result is a real spiking network - not a quantized approximation - that matches or beats the original ANN's accuracy, exportable to NIR for deployment beyond a single backend.
How the PyTorch to spiking neural network compiler works
QCFS calibration
ReLU activations are replaced with QCFS (Quantized Clip-Floor-Shift), which has learnable per-channel thresholds. This is a smooth, differentiable optimization problem - the model stays accurate because QCFS outputs are still graded, not yet binary.
IF replacement and BPTT fine-tuning
BatchNorm is folded into the convolution weights, QCFS is swapped for binary IF (integrate-and-fire) neurons using the learned thresholds, and the model is fine-tuned with backpropagation through time using a surrogate gradient (atan) so the weights adapt to spiking dynamics.
NIR export
The converted spiking network is serialized into NIR (Neuromorphic Intermediate Representation) - a hardware-agnostic format that plays the same role for spiking networks that ONNX plays for conventional deep learning models. NeuroCUDA's NIR executor handles multi-input residual nodes (ResNet-style skip connections) via topological sorting, verified bit-exact on a full ResNet-18 round-trip.
Backend execution
The compiled spiking network runs on a GPU or CPU PyTorch backend for training and inference, or on a Loihi 2 IF-neuron simulator validated against Intel's published Loihi neuron equations. FPGA deployment generates HLS C++ as a proof of concept; NIR export makes the model format-compatible with SpiNNaker and other NIR-supporting tools.
Verified results for the PyTorch to spiking neural network compiler
All numbers below are measured on full test sets with 3 or more seeds, reported as mean ± standard deviation - never single runs or subsampled test sets.
| Benchmark | Result |
|---|---|
| N-MNIST (3-layer CNN) | SNN 99.88% ± 0.02% vs. ANN 99.70% ± 0.00% - the SNN beats the ANN by 0.18% |
| ResNet-18 on CIFAR-10 | SNN 94.61% ± 0.14% vs. ANN 95.56% ± 0.11% - a 0.95% conversion gap |
| NIR round-trip on ResNet-18 (residual graph) | 0.000000 max absolute difference - bit-exact |
| CPU vs. GPU backend | 0 deviations across 256,000 spikes - bit-exact |
| Loihi 2 IF neuron math vs. published spec | 0 deviations across 100,000+ comparisons (NumPy reimplementation, not the Lava SDK, not physical silicon) |
| CartPole-v1 (direct SNN training) | 100% solved, 68.5% sparsity |
| CartPole-v1 (ANN-to-SNN conversion) | 100% solved on best seed, but stochastic - 19% ± 26% across 5 seeds |
The real differentiator is the NIR executor: the reference NIR tooling round-trips simple feed-forward graphs but doesn't handle ResNet-style skip connections. NeuroCUDA's executor uses Kahn's topological sort with explicit multi-input summation to handle residual graphs correctly, verified bit-exact on a full ResNet-18 round-trip. No vendor SDK or physical hardware validation has been performed - the Loihi 2 row validates neuron math against Intel's published equations, not Intel's Lava SDK.
Supported backends
NeuroCUDA ships three backends directly, plus NIR export for broader compatibility. Backends are honestly labeled by validation status - no backend is claimed to run on physical neuromorphic silicon.
NeuroCUDA vs. other PyTorch-based SNN tools
| Tool | What it does | What it doesn't do |
|---|---|---|
| NIR | Vendor-neutral graph format for spiking networks - one model, multiple simulators | Doesn't train, convert, or validate - it's a format, not a pipeline |
| SNNToolBox | ANN-to-SNN conversion from Keras/PyTorch, export to PyNN/Brian2/SpiNNaker/Loihi | No NeuroBench reporting, no bit-level validation against vendor SDKs |
| snnTorch | Direct SNN training via surrogate gradient BPTT | No ANN-to-SNN conversion, no multi-backend deployment |
| NeuroCUDA | QCFS-to-IF conversion + BPTT fine-tuning + NIR export + multi-backend compile + NeuroBench reporting in one pipeline | Doesn't reinvent IR or conversion theory - builds on published methods |
Run it on a robot: NeuroCUDA ROS2
NeuroCUDA ships a ROS2 package - neurocuda_ros2 and neurocuda_msgs - that wraps this exact compiler in standard ROS2 nodes and topics. A camera or event-camera feed goes in; class predictions, spike statistics, and control commands come out, with no spiking-neuron code to write.
ros2 launch neurocuda_ros2 infer.launch.py model:=vgg5_cifar10
See the full ROS2 architecture, message types, and Docker image →
Why it matters: the CUDA parallel
Before 2007, every GPU vendor required its own shading language to access the hardware. CUDA changed that with a single thesis: one language, all NVIDIA GPUs. It did not make GPUs faster - it made them accessible, and that accessibility is what triggered the GPU computing boom that underlies almost all of modern AI.
Neuromorphic computing is at an earlier, messier stage of the same problem. Every chip ships with its own SDK, and Intel's Lava - the most widely used one - is now archived with no public successor yet announced. NeuroCUDA does not claim to solve every chip; it solves the part that's tractable today: PyTorch in, a verified spiking network out, running on GPU, CPU, or a Loihi 2 simulator, with NIR export so the model isn't locked to one tool. Read the full argument for why neuromorphic computing needs its CUDA moment →
Open source & installation
NeuroCUDA is released under the MIT license and is pip-installable. The full source, benchmark scripts, and backend implementations are public on GitHub.
pip install neurocuda
For NIR export, NeuroBench reporting, and the CartPole reinforcement learning demo: pip install neurocuda[all]
Contributions are open, particularly around extending validated hardware support beyond the current GPU, CPU, and Loihi 2 simulator backends.
Frequently asked questions
What is NeuroCUDA?
NeuroCUDA is an open-source, pip-installable compiler that converts trained PyTorch models into spiking neural networks and deploys them to GPU, CPU, and a Loihi 2 IF-neuron simulator. It uses QCFS calibration plus BPTT fine-tuning to keep the ANN-to-SNN conversion accurate, and exports to NIR for cross-platform deployment.
What accuracy does NeuroCUDA achieve when converting ANNs to SNNs?
On a 3-layer CNN trained on N-MNIST, the converted spiking network reaches 99.88% ± 0.02% accuracy versus a 99.70% ANN baseline - the SNN slightly beats the ANN. On ResNet-18 trained on CIFAR-10, the converted SNN reaches 94.61% ± 0.14% against a 95.56% ANN baseline, a 0.95% gap. All numbers are mean over 3+ seeds on full test sets.
Which hardware can NeuroCUDA deploy to?
GPU and CPU directly via the PyTorch backend, and a Loihi 2 IF-neuron simulator validated against Intel's published Loihi neuron equations - not the Lava SDK, not physical silicon. NIR export makes models format-compatible with SpiNNaker and other NIR-supporting tools, and FPGA deployment generates HLS C++ as a proof of concept, not yet synthesized to a bitstream.
Is NeuroCUDA free and open source?
Yes - MIT licensed, installable via pip install neurocuda, with full source available at github.com/Krishnav1/neurocuda.
What is NIR and how does NeuroCUDA use it?
NIR (Neuromorphic Intermediate Representation) is a hardware-agnostic graph format for spiking neural networks - the neuromorphic equivalent of ONNX. NeuroCUDA exports converted SNNs to NIR using a custom executor that handles multi-input residual connections (ResNet-style skip connections) via topological sorting, verified bit-exact on a full ResNet-18 round-trip.
Does NeuroCUDA work with ROS2?
Yes - neurocuda_ros2 and neurocuda_msgs wrap this compiler in standard ROS2 nodes and topics. See the NeuroCUDA ROS2 page for the full architecture, message types, pre-trained models, and Docker image.