June 22, 2026 8 min read

CUDA for Neuromorphic Computing: Where the Analogy Breaks

Calling for "a CUDA for neuromorphic computing" is a useful shorthand for the fragmentation problem. It is not a precise description of what's achievable today, and it's worth being specific about why.

We've made the historical case ourselves, in our earlier piece on neuromorphic computing's CUDA moment: before CUDA, GPU programming meant learning a vendor-specific shading language with no portability across hardware, and neuromorphic computing looks similar today, with each chip vendor shipping its own incompatible SDK. That parallel is genuinely useful for explaining the problem. It is less useful as a roadmap for what a compiler can deliver right now, and conflating the two leads to overclaiming.

Where the analogy holds

Where the analogy breaks

CUDA targeted one vendor's hardware family. Neuromorphic doesn't have that.

CUDA's job was made tractable by a key fact: it only ever had to target NVIDIA's own GPUs, which share a common SIMT execution model across generations. A neuromorphic compiler that wants the same reach has to span chips with fundamentally different neuron models, memory architectures, and timing semantics, built by different vendors with no shared execution model to standardize against. NIR addresses the format side of this, but turning a NIR graph into validated, hardware-specific execution is still separate work per chip.

CUDA had Nvidia's hardware roadmap behind it. Neuromorphic compilers don't control the hardware.

CUDA could evolve in lockstep with NVIDIA's own GPU architecture roadmap. A third-party neuromorphic compiler has no equivalent leverage - it has to track whatever vendors decide to ship, support, or archive, as covered in our piece on Intel's Lava framework going archived. That dependency on vendor decisions outside the compiler's control is a structural difference, not a temporary gap.

CUDA could assume one vendor's hardware roadmap. A neuromorphic compiler has to survive several vendors' roadmaps changing independently.

"Validated" means something narrower today than it eventually will.

Take NeuroCUDA as a concrete example, since it's the project behind this comparison: its Loihi 2 backend is a simulator checked against Intel's own published neuron equations across more than 100,000 comparisons with zero deviations. That is a real, specific, useful validation - and it is explicitly not the same claim as "runs on physical Loihi 2 silicon" or "matches Intel's Lava SDK." CUDA's early validation, by contrast, ran on actual GPU silicon from day one, because it only had to target hardware its own creator controlled.

PropertyCUDA (2007)Neuromorphic compilers (2026)
Hardware targetsOne vendor's GPU familyMultiple vendors, divergent architectures
Hardware roadmap controlSame company as the compilerCompiler has no control over vendor roadmaps
Physical hardware validationNative, from launchSimulator-validated for most open tooling
Shared intermediate formatNot needed (single vendor)NIR (8 simulators, 5 hardware platforms)

What this means in practice

None of this makes the CUDA analogy useless, and it doesn't make the underlying problem smaller. It means the honest framing is "a compiler that solves the part of this that's tractable today," not "the CUDA of neuromorphic computing" as a finished claim. NeuroCUDA's actual, measured scope - PyTorch model in, validated spiking network out, running on GPU, CPU, and a Loihi 2 simulator, with NIR export for portability - is real progress on the fragmentation problem without overstating what a single open-source project, working without control over any vendor's hardware roadmap, can currently guarantee.

Sources & further reading

  1. NIR specification, neuroir.org, arXiv:2311.14641
  2. lava-nc/lava GitHub repository, archived status observed June 2026
  3. NeuroCUDA source and verified benchmark results, github.com/Krishnav1/neurocuda

Frequently asked questions

Is there a CUDA equivalent for neuromorphic chips?

Not yet, in the full sense of a single toolchain that runs unmodified across every neuromorphic chip the way CUDA runs across NVIDIA GPUs. What exists today is a vendor-neutral graph format, NIR, plus individual compilers like NeuroCUDA that convert PyTorch models for specific backends such as GPU, CPU, and a Loihi 2 simulator.

Why can't one compiler target every neuromorphic chip the way CUDA targets NVIDIA GPUs?

Unlike NVIDIA's GPUs, which share a common SIMT execution model across generations, neuromorphic chips vary much more in their underlying neuron models, memory architectures, and timing semantics. A compiler can target a shared intermediate format like NIR, but turning that into chip-specific, hardware-validated code still requires per-chip work.

What can NeuroCUDA actually deliver today versus what CUDA delivered?

NeuroCUDA converts trained PyTorch models into spiking neural networks and validates them on GPU, CPU, and a Loihi 2 simulator backend, with NIR export for portability. It does not yet provide validated deployment to physical neuromorphic silicon across multiple vendors the way CUDA provided a single validated path across NVIDIA's GPU hardware lineup.