June 20, 2026 Updated June 21, 2026 10 min read

Why Neuromorphic Computing Needs Its CUDA Moment

Q: What did CUDA actually solve for GPU computing?

Before CUDA shipped in 2007, programming a GPU required learning a vendor-specific shading language such as NVIDIA's Cg or ATI's HLSL, with no portability between hardware. CUDA introduced a single C-based language that compiled to any NVIDIA GPU, making general-purpose GPU programming accessible to scientists and engineers without graphics expertise.

Q: Why is neuromorphic computing compared to pre-CUDA GPUs?

Each neuromorphic chip vendor today ships its own incompatible SDK, such as Intel's Lava for Loihi, sPyNNaker for SpiNNaker, and MetaTF for Akida, meaning a model built for one chip cannot run on another without a full rewrite. This mirrors the GPU computing landscape before CUDA standardized programming across NVIDIA hardware.

Q: What is NIR in neuromorphic computing?

NIR (Neuromorphic Intermediate Representation) is a standard graph format for spiking and non-spiking neural networks, functioning like LLVM IR or ONNX for neuromorphic models. It defines the format, while a compiler is still needed to read that format and generate hardware-specific code.

Q: Why does infrastructure usually win over the best individual hardware?

Historically, the winning hardware platform in a new computing category has rarely been the fastest chip alone. x86 won despite not being the most elegant instruction set architecture, and CUDA-era NVIDIA GPUs won partly because of software accessibility, not just raw silicon performance. The pattern is consistent: hardware appears, fragmentation follows, then a compiler or software layer determines which platform actually scales.

Before 2007, programming a GPU meant learning a vendor's proprietary shading language, with zero portability to competing hardware. CUDA replaced that with a single thesis: one language, all NVIDIA GPUs. Neuromorphic computing is sitting in that exact pre-CUDA moment today, and the historical pattern says infrastructure, not silicon, decides who wins next.

In 2006, if you wanted to use a GPU for anything beyond rendering graphics, you needed to learn the vendor's shading language: NVIDIA's Cg or ATI's HLSL. There was no portability between vendors, and there was no general-purpose programming model at all. GPUs were extremely fast at parallel arithmetic, but that speed was locked behind a graphics-specific programming interface that most scientists and engineers had no reason to learn.

NVIDIA shipped CUDA in 2007 with a deceptively simple thesis: one language (C, with extensions), all NVIDIA GPUs. A programmer no longer needed to understand the register layout or pipeline architecture of a specific chip generation. They wrote kernels in C, and the compiler handled the translation to hardware. By 2010, CUDA had made GPU computing accessible to any scientist or engineer with a parallel workload, regardless of graphics background. That accessibility, not a faster chip, is what triggered the GPU computing era that nearly all of modern AI now depends on.

CUDA didn't make GPUs faster - it made them programmable by people who weren't GPU specialists, and that accessibility is what actually mattered.

Neuromorphic computing today, mapped onto 2006

The parallels between neuromorphic computing today and GPU computing in 2006 are close to exact, point for point:

Fragmented toolchains. Intel ships Lava for Loihi. The University of Manchester and TU Dresden ship sPyNNaker for SpiNNaker. BrainChip ships MetaTF for Akida. Each is a fundamentally different API, with different abstractions, different data formats, and a different deployment pipeline end to end.
Hard vendor lock-in. A model built in Lava runs on Loihi. Moving it to SpiNNaker means rewriting it from scratch in sPyNNaker, not porting it. Researchers effectively lock in their hardware choice at the start of a project, long before they know which chip will perform best for their workload.
No shared compiler abstraction. Pre-CUDA GPUs had no intermediate representation standing between the programmer and the silicon, so you programmed close to the metal. Every neuromorphic SDK today has the same gap: hardware-level primitives are exposed directly, with no compiler-level abstraction sitting in between.
The hardware is ready; the software is not. Loihi 2, SpiNNaker-2, and Akida are all shipping, working silicon with demonstrated efficiency advantages, not laboratory curiosities. The software stack around them is roughly fifteen years behind where the GPU computing software ecosystem stood at the same hardware maturity point. For the chip-by-chip detail behind this claim, see our field guide to neuromorphic hardware.

NIR: the intermediate representation, not the compiler

The first piece of the solution already exists: a shared data format. NIR (Neuromorphic Intermediate Representation) defines a standard graph representation for both spiking and non-spiking neural networks, the same functional role that LLVM IR plays for general-purpose compilers, or that ONNX plays for conventional deep learning interchange. A model expressed in NIR can, in principle, be consumed by any backend that implements support for the format.

NIR emerged from the neuromorphic research community and is gaining adoption across groups at Intel, SynSense, and several academic labs. It is important to be precise about what NIR is and is not: it is a format, not a compiler. Having a shared interchange format is necessary but not sufficient, the same way ONNX alone does not deploy a model to a GPU. Something still has to read NIR, perform any required hardware-specific transformations, and generate the actual code that runs on a given chip.

The compiler layer: where NeuroCUDA fits

That compiler layer is what NeuroCUDA is built to be. It takes a standard PyTorch model, converts it to a spiking neural network where the target requires it, quantizes it, represents it in NIR, and generates backend-specific code for each supported target. The key design decisions:

One API surface: neurocuda.compile(model, target="loihi3"). That single call is the entire user-facing interface.
Multi-target by design: Loihi, SpiNNaker, Akida, FPGA, GPU, and CPU from one frontend, with backend-specific code generators behind a shared interface.
Automated ANN-to-SNN conversion: Standard PyTorch models are converted to spiking equivalents automatically, with configurable time steps and threshold tuning rather than a manual, model-by-model conversion process.
Backend abstraction that scales: Adding support for a new chip means writing a new backend code generator, not modifying how users write their models.

Why infrastructure wins, historically

The pattern is consistent across multiple computing eras, not unique to GPUs. x86 became the dominant general-purpose instruction set architecture despite credible arguments that competing ISAs were architecturally cleaner, because of the software ecosystem built around it, not because it was the fastest design on paper. CUDA-era NVIDIA GPUs won the AI compute market in large part because CUDA made parallel programming accessible to a far larger pool of engineers than competing hardware, not solely because NVIDIA's silicon was unambiguously the fastest at every point along the way. TensorFlow and PyTorch became dominant deep learning frameworks by abstracting hardware details away from researchers, not by exposing more of them.

The pattern, stated plainly: new hardware category appears, fragmentation follows immediately, someone builds the unifying compiler layer, and only then does the category scale by orders of magnitude. Neuromorphic computing today sits between the second and third steps. The hardware exists and works. The fragmentation is real and well documented. The compiler layer is being built, by more than one team, including this one.

What comes next

NeuroCUDA is open source under the MIT license, with the implementation and benchmark code public on GitHub and the accompanying research paper submitted to arXiv. It is currently the work of one person rather than a larger team, which shapes the pace but not the rigor: every accuracy and deviation number in the benchmark suite is reproducible from the public repository. The premise behind building it this way is that the CUDA pattern is recognizable enough, and the need for it acute enough, that a working open-source implementation can attract contribution and adoption on its own merits. For more on how this fits alongside QuantaraCore's other products, see Building Memoria, FinInsight, and NeuroCUDA.

Neuromorphic computing will have its CUDA moment eventually; the underlying economics of inference energy efficiency make that close to inevitable, as covered in The AI Energy Crisis Explained. The open question is only which compiler becomes the standard, and where it comes from.

Sources & further reading

NVIDIA CUDA history and architecture documentation - background on the 2007 CUDA launch and its programming model
NIR (Neuromorphic Intermediate Representation) specification - community-maintained format documentation
Intel Lava, sPyNNaker, and BrainChip MetaTF SDK documentation - referenced for current vendor-specific toolchains
NeuroCUDA research paper (submitted to arXiv, June 2026), via github.com/Krishnav1/neurocuda

Frequently asked questions

What did CUDA actually solve for GPU computing?

Before CUDA shipped in 2007, programming a GPU required a vendor-specific shading language such as NVIDIA's Cg or ATI's HLSL, with no portability between hardware. CUDA introduced a single C-based language compiling to any NVIDIA GPU, making general-purpose GPU programming accessible without a graphics background.

Why is neuromorphic computing compared to pre-CUDA GPUs?

Each neuromorphic chip vendor ships its own incompatible SDK today, such as Intel's Lava for Loihi, sPyNNaker for SpiNNaker, and MetaTF for Akida, meaning a model built for one chip cannot run on another without a full rewrite, mirroring the GPU landscape before CUDA.

What is NIR in neuromorphic computing?

NIR is a standard graph format for spiking and non-spiking neural networks, functioning like LLVM IR or ONNX for neuromorphic models. It defines the format; a compiler is still required to read it and generate hardware-specific code.

Why does infrastructure usually win over the best individual hardware?

Historically, the winning platform in a new computing category is rarely the fastest chip alone. x86 and CUDA-era GPUs both won significant market share partly through software accessibility, not raw silicon performance alone. The consistent pattern: hardware appears, fragmentation follows, then a compiler layer determines which platform scales.