Open Source · pip install neurocuda · MIT License

NeuroCUDA

A PyTorch-to-spiking-neural-network compiler with honest, verified numbers

You train a normal PyTorch model with ReLU activations. NeuroCUDA compiles it into a spiking neural network - binary spikes, stateful membrane, temporal integration - and runs it on GPU, CPU, or a Loihi 2 neuron simulator, with NIR export for cross-platform deployment. Direct ReLU-to-spike replacement destroys accuracy; NeuroCUDA's two-stage QCFS calibration plus BPTT fine-tuning pipeline keeps the conversion lossless.

pip install neurocuda
neurocuda.convert(model, train_loader)

View on GitHub View on PyPI

The problem NeuroCUDA solves

ReLU activations and IF (integrate-and-fire) spiking neurons are fundamentally different transfer functions. Swap one for the other directly and a 99% ANN can drop to around 20% accuracy - the binary spike has no way to approximate a continuous activation without adaptation. This is the core obstacle standing between a trained PyTorch model and a working spiking neural network.

NeuroCUDA's PyTorch to spiking neural network compiler solves this with a two-stage pipeline: QCFS calibration learns per-channel thresholds that match each layer's activation distribution, then BPTT fine-tuning with surrogate gradients adapts the weights to binary spike dynamics. The result is a real spiking network - not a quantized approximation - that matches or beats the original ANN's accuracy, exportable to NIR for deployment beyond a single backend.

How the PyTorch to spiking neural network compiler works

QCFS calibration

ReLU activations are replaced with QCFS (Quantized Clip-Floor-Shift), which has learnable per-channel thresholds. This is a smooth, differentiable optimization problem - the model stays accurate because QCFS outputs are still graded, not yet binary.

IF replacement and BPTT fine-tuning

BatchNorm is folded into the convolution weights, QCFS is swapped for binary IF (integrate-and-fire) neurons using the learned thresholds, and the model is fine-tuned with backpropagation through time using a surrogate gradient (atan) so the weights adapt to spiking dynamics.

NIR export

The converted spiking network is serialized into NIR (Neuromorphic Intermediate Representation) - a hardware-agnostic format that plays the same role for spiking networks that ONNX plays for conventional deep learning models. NeuroCUDA's NIR executor handles multi-input residual nodes (ResNet-style skip connections) via topological sorting, verified bit-exact on a full ResNet-18 round-trip.

Backend execution

The compiled spiking network runs on a GPU or CPU PyTorch backend for training and inference, or on a Loihi 2 IF-neuron simulator validated against Intel's published Loihi neuron equations. FPGA deployment generates HLS C++ as a proof of concept; NIR export makes the model format-compatible with SpiNNaker and other NIR-supporting tools.

Verified results for the PyTorch to spiking neural network compiler

All numbers below are measured on full test sets with 3 or more seeds, reported as mean ± standard deviation - never single runs or subsampled test sets.

Benchmark	Result
N-MNIST (3-layer CNN)	SNN 99.88% ± 0.02% vs. ANN 99.70% ± 0.00% - the SNN beats the ANN by 0.18%
ResNet-18 on CIFAR-10	SNN 94.61% ± 0.14% vs. ANN 95.56% ± 0.11% - a 0.95% conversion gap
NIR round-trip on ResNet-18 (residual graph)	0.000000 max absolute difference - bit-exact
CPU vs. GPU backend	0 deviations across 256,000 spikes - bit-exact
Loihi 2 IF neuron math vs. published spec	0 deviations across 100,000+ comparisons (NumPy reimplementation, not the Lava SDK, not physical silicon)
CartPole-v1 (direct SNN training)	100% solved, 68.5% sparsity
CartPole-v1 (ANN-to-SNN conversion)	100% solved on best seed, but stochastic - 19% ± 26% across 5 seeds

The real differentiator is the NIR executor: the reference NIR tooling round-trips simple feed-forward graphs but doesn't handle ResNet-style skip connections. NeuroCUDA's executor uses Kahn's topological sort with explicit multi-input summation to handle residual graphs correctly, verified bit-exact on a full ResNet-18 round-trip. No vendor SDK or physical hardware validation has been performed - the Loihi 2 row validates neuron math against Intel's published equations, not Intel's Lava SDK.

Supported backends

NeuroCUDA ships three backends directly, plus NIR export for broader compatibility. Backends are honestly labeled by validation status - no backend is claimed to run on physical neuromorphic silicon.

NVIDIA GPU (shipped) CPU x86 / ARM (shipped) Loihi 2 simulator (shipped) NIR export (format-compatible) FPGA via HLS C++ (proof of concept)

NeuroCUDA vs. other PyTorch-based SNN tools

Tool	What it does	What it doesn't do
NIR	Vendor-neutral graph format for spiking networks - one model, multiple simulators	Doesn't train, convert, or validate - it's a format, not a pipeline
SNNToolBox	ANN-to-SNN conversion from Keras/PyTorch, export to PyNN/Brian2/SpiNNaker/Loihi	No NeuroBench reporting, no bit-level validation against vendor SDKs
snnTorch	Direct SNN training via surrogate gradient BPTT	No ANN-to-SNN conversion, no multi-backend deployment
NeuroCUDA	QCFS-to-IF conversion + BPTT fine-tuning + NIR export + multi-backend compile + NeuroBench reporting in one pipeline	Doesn't reinvent IR or conversion theory - builds on published methods

Run it on a robot: NeuroCUDA ROS2

NeuroCUDA ships a ROS2 package - neurocuda_ros2 and neurocuda_msgs - that wraps this exact compiler in standard ROS2 nodes and topics. A camera or event-camera feed goes in; class predictions, spike statistics, and control commands come out, with no spiking-neuron code to write.

pip install neurocuda
ros2 launch neurocuda_ros2 infer.launch.py model:=vgg5_cifar10

See the full ROS2 architecture, message types, and Docker image →

Why it matters: the CUDA parallel

Before 2007, every GPU vendor required its own shading language to access the hardware. CUDA changed that with a single thesis: one language, all NVIDIA GPUs. It did not make GPUs faster - it made them accessible, and that accessibility is what triggered the GPU computing boom that underlies almost all of modern AI.

Neuromorphic computing is at an earlier, messier stage of the same problem. Every chip ships with its own SDK, and Intel's Lava - the most widely used one - is now archived with no public successor yet announced. NeuroCUDA does not claim to solve every chip; it solves the part that's tractable today: PyTorch in, a verified spiking network out, running on GPU, CPU, or a Loihi 2 simulator, with NIR export so the model isn't locked to one tool. Read the full argument for why neuromorphic computing needs its CUDA moment →

Open source & installation

NeuroCUDA is released under the MIT license and is pip-installable. The full source, benchmark scripts, and backend implementations are public on GitHub.

pip install neurocuda

For NIR export, NeuroBench reporting, and the CartPole reinforcement learning demo: pip install neurocuda[all]

Contributions are open, particularly around extending validated hardware support beyond the current GPU, CPU, and Loihi 2 simulator backends.

github.com/Krishnav1/neurocuda pypi.org/project/neurocuda

Frequently asked questions

What is NeuroCUDA?

NeuroCUDA is an open-source, pip-installable compiler that converts trained PyTorch models into spiking neural networks and deploys them to GPU, CPU, and a Loihi 2 IF-neuron simulator. It uses QCFS calibration plus BPTT fine-tuning to keep the ANN-to-SNN conversion accurate, and exports to NIR for cross-platform deployment.

What accuracy does NeuroCUDA achieve when converting ANNs to SNNs?

On a 3-layer CNN trained on N-MNIST, the converted spiking network reaches 99.88% ± 0.02% accuracy versus a 99.70% ANN baseline - the SNN slightly beats the ANN. On ResNet-18 trained on CIFAR-10, the converted SNN reaches 94.61% ± 0.14% against a 95.56% ANN baseline, a 0.95% gap. All numbers are mean over 3+ seeds on full test sets.

Which hardware can NeuroCUDA deploy to?

GPU and CPU directly via the PyTorch backend, and a Loihi 2 IF-neuron simulator validated against Intel's published Loihi neuron equations - not the Lava SDK, not physical silicon. NIR export makes models format-compatible with SpiNNaker and other NIR-supporting tools, and FPGA deployment generates HLS C++ as a proof of concept, not yet synthesized to a bitstream.

Is NeuroCUDA free and open source?

Yes - MIT licensed, installable via pip install neurocuda, with full source available at github.com/Krishnav1/neurocuda.

What is NIR and how does NeuroCUDA use it?

NIR (Neuromorphic Intermediate Representation) is a hardware-agnostic graph format for spiking neural networks - the neuromorphic equivalent of ONNX. NeuroCUDA exports converted SNNs to NIR using a custom executor that handles multi-input residual connections (ResNet-style skip connections) via topological sorting, verified bit-exact on a full ResNet-18 round-trip.

Does NeuroCUDA work with ROS2?

Yes - neurocuda_ros2 and neurocuda_msgs wrap this compiler in standard ROS2 nodes and topics. See the NeuroCUDA ROS2 page for the full architecture, message types, pre-trained models, and Docker image.