June 22, 2026 13 min read

NeuroCUDA vs. NIRTorch: Comparing NIR Conversion

NIRTorch and NeuroCUDA both move models between PyTorch and NIR, but they're built for different jobs - one is a general translation helper, the other is a validated conversion-and-deployment pipeline.

NIRTorch is the closest existing tool to NeuroCUDA's NIR support, maintained under the Open Neuromorphic community's neuromorphs GitHub organization. It provides two functions, nir_to_torch and torch_to_nir, that map between PyTorch modules and NIR graphs, using torch.fx symbolic tracing to inspect a model's computational graph. It's a useful, general-purpose translation layer maintained by the same community that defines the NIR specification itself.

The comparison that matters here isn't "which tool is better" in the abstract. NIRTorch and NeuroCUDA aren't competing for the same job. NIRTorch is a translation primitive: it exists so that any project in the NIR ecosystem has a documented way to move between PyTorch and NIR without writing its own graph-walking code. NeuroCUDA's NIR executor is one specific, narrowly-scoped piece of a larger conversion-and-deployment pipeline, built to solve one named correctness problem - executing residual connections right - and verified against a specific, reproducible number. Understanding the difference matters if you're choosing a tool, because picking based on name recognition rather than scope leads to using a general translation helper for a job that needs a validated pipeline, or vice versa.

Two different jobs, both touching NIR

Trained PyTorch model

↓

NIRTorch: torch.fx symbolic trace

NeuroCUDA: convert + train as SNN

↓

General NIR graph (developer maps node types)

NIR executor: Kahn's algorithm + multi-input summation

↓

Equivalent PyTorch module, or onward NIR use

Bit-exact round-trip: 0.000000 max abs diff (ResNet-18)

What NIRTorch does

NIRTorch's job is translation: given a PyTorch module, trace its computational graph with torch.fx and produce the equivalent NIR graph, or given a NIR graph, construct the equivalent PyTorch module. The developer's responsibility is to supply a mapping between NIR node types and PyTorch modules; NIRTorch handles building the graph from that mapping. This is a general capability, not narrowly scoped to any one model architecture, and it's the natural choice if you're working directly with NIR graphs across the broader NIR-compatible tool ecosystem.

torch.fx is PyTorch's own symbolic tracing toolkit: it runs a model with proxy tensors to record the sequence of operations performed, then builds an intermediate representation (an fx.Graph) of that sequence. NIRTorch sits on top of this, walking the traced graph and re-expressing each node as the equivalent NIR primitive, or doing the reverse when going from NIR back into PyTorch. Because it relies on tracing rather than parsing the model's source code, it can in principle follow whatever control flow the model actually executes for a given input, which is also exactly why tracing-based approaches in general carry the caveat covered below: tracing only ever sees the path that was executed, not every possible path through the model.

This design makes NIRTorch genuinely useful as connective tissue. A research group building a new NIR-native simulator, a hardware team writing a NIR-to-silicon backend, or a tool author who wants PyTorch interoperability without writing graph-translation code from scratch can all depend on NIRTorch as the documented, community-maintained bridge. It isn't trying to validate accuracy, manage backend execution, or guarantee bit-exactness on any particular architecture, those simply aren't its stated job.

What NeuroCUDA's NIR executor does differently

NeuroCUDA takes a narrower, purpose-built approach to one specific part of this problem: executing a NIR graph correctly when it contains residual connections, the skip-connection pattern found in architectures like ResNet where a node has multiple incoming branches that must be summed correctly during execution. NeuroCUDA's NIR executor (nir_executor.py) uses Kahn's algorithm for topological sorting with explicit multi-input summation at branch nodes. The result is verified bit-exact on a full ResNet-18 round-trip: 0.000000 maximum absolute difference between the original model's output and the NIR-round-tripped version's output.

Concretely, the executor's job is to walk a NIR graph in an order that respects dependencies, an operation can't run before all of its inputs are ready, and a residual addition node specifically needs both the skip-connection branch and the main-path branch resolved before it can sum them. Kahn's algorithm does this by repeatedly processing nodes whose dependencies are already satisfied, which naturally handles branching and merging structure without needing special-cased logic for every architecture shape. The "explicit multi-input summation" part matters because a naive executor that assumes one input per node will silently produce wrong results, or simply crash, the moment it hits a node with two incoming edges. NeuroCUDA's executor is written to expect and correctly handle that case.

The reason this is verified rather than assumed is that NeuroCUDA's test suite runs a real ResNet-18, the same architecture used for its published CIFAR-10 accuracy numbers, through a full convert-to-NIR-and-back round trip, then diffs the output of the original model against the output of the reconstructed model on the same input. A maximum absolute difference of 0.000000 means there is no detectable numerical drift anywhere in that round trip, not an approximation, not "close enough," but bit-exact equivalence on the exact architecture where residual handling is the hard part.

This isn't a claim that NIRTorch fails on residual networks. It's a claim that NeuroCUDA's executor is independently verified to handle them correctly, with a specific, reproducible number behind it.

Being precise about what's actually documented

It would be easy to overstate this comparison by claiming NIRTorch specifically fails on ResNet-style architectures. That claim doesn't have a citable source - NIRTorch's own documentation doesn't make a specific statement either way about residual network support. What is documented, in a general discussion of torch.fx-style symbolic tracing (the technique NIRTorch is built on), is that determining skip-connection structure at the nn.Module level is a known hard problem in PyTorch generally: because PyTorch uses dynamic graph execution, the only reliable way to recover the full computational graph, including which branches feed into which summation points, is to trace actual execution rather than inspect the module definition statically.

That's a real, general challenge for any tracing-based approach, including NIRTorch's. It is not the same as a confirmed report that NIRTorch itself breaks on a specific architecture. The honest comparison is: NeuroCUDA's executor has a specific, independently verifiable result on this exact case (0.000000 max abs diff on ResNet-18), and that result is worth knowing about regardless of how any other tool handles the same case.

Why hold this line so carefully? Because the SNN tooling space is small enough that an unsourced claim about a specific project spreads fast, gets repeated without the original caveat, and ends up treated as settled fact. Anyone evaluating NIRTorch for their own project deserves a comparison built on what's actually written down, not on an inference dressed up as a finding. If NIRTorch's maintainers publish a specific residual-handling benchmark in the future, that's the number to compare against directly. Until then, the only fair statement is the general one about tracing-based translation, paired with NeuroCUDA's specific, verified number.

Why the distinction between general tooling and verified pipelines matters

This isn't a NIRTorch-specific issue, it's a pattern worth recognizing across the whole NIR ecosystem and, more broadly, across any tool that translates between two representations. A translation layer's job is to be correct for the cases its maintainers have tested and to be honest about the cases they haven't. A deployment pipeline's job is to be correct for the specific path it claims to support, with numbers to back that claim. Neither role is lesser than the other, but conflating them, assuming a general translation tool has been validated the way a narrow, purpose-built pipeline has, is where evaluation mistakes happen.

The same pattern shows up elsewhere in the broader ANN-to-SNN conversion tooling landscape: some tools are general frameworks that leave correctness verification to the user, others ship specific, reproducible benchmark numbers for specific architectures. Reading documentation carefully enough to tell which category a given tool falls into is, in practice, the single highest-leverage step in evaluating any conversion or translation tool in this space.

Property	NIRTorch	NeuroCUDA's NIR executor
Scope	General PyTorch ↔ NIR translation	Purpose-built graph execution, incl. residual nodes
Underlying technique	torch.fx symbolic tracing	Kahn's algorithm topological sort + explicit multi-input summation
Residual/ResNet handling	Not specifically documented	Verified bit-exact, ResNet-18, 0.000000 max abs diff
Maintained by	Open Neuromorphic community (neuromorphs org)	QuantaraCore Technologies LLP

When to use which

If the task is general-purpose PyTorch-to-NIR translation across arbitrary architectures within the broader NIR ecosystem, NIRTorch is a maintained, community-backed tool built exactly for that. If the task is converting a trained PyTorch model, including ones with residual connections, into a validated spiking network with measured accuracy and a verified NIR round-trip, NeuroCUDA's pipeline - covered in our PyTorch SNN tutorial - is built specifically around that workflow, from conversion through validation through export.

A few concrete scenarios make the split clearer. If you're building a new tool that needs to read or write NIR graphs and you just need a dependable PyTorch bridge, depend on NIRTorch rather than writing your own torch.fx walker, that's exactly the kind of duplicated effort NIRTorch exists to prevent. If you're a researcher comparing simulators that all speak NIR, NIRTorch is likely already in your dependency chain indirectly through the tools you're using. If, on the other hand, you have a trained image classifier, possibly with residual blocks, and your goal is to deploy it as a spiking network on a GPU, CPU, or the Loihi 2 simulator with accuracy numbers you can defend, that's the scenario NeuroCUDA's pipeline is built and verified for end to end.

The two aren't mutually exclusive either. Because both tools speak NIR, a model converted and validated through NeuroCUDA can in principle be exported as a NIR graph and handed to NIRTorch-based tooling downstream, or a NIR graph produced elsewhere via NIRTorch could be executed through NeuroCUDA's verified executor if it contains the kind of branching structure that executor was built to handle correctly. NIR's whole purpose as an intermediate representation is to make this kind of interoperability possible rather than locking a model into one tool's internal format.

A practical checklist before relying on either tool

Whichever tool fits your scenario, the same few questions are worth asking before depending on it for production work:

What does the documentation actually claim, not imply? NIRTorch's README describes its two functions and their general behavior; it does not publish architecture-specific benchmark numbers. NeuroCUDA's documentation publishes specific numbers (accuracy, bit-exactness, deviation counts) tied to specific architectures and datasets. Knowing which kind of documentation you're reading prevents over- or under-trusting either one.
Has the specific case you care about been tested, by anyone, with a published number? If your architecture has residual connections and bit-exact round-tripping matters to your use case, NeuroCUDA's ResNet-18 result is a directly applicable, citable number. If your architecture is something neither tool has published numbers for, that's a gap worth testing yourself before deploying.
Who maintains it, and what's their incentive to keep it correct? NIRTorch is maintained by the Open Neuromorphic community as shared infrastructure, no single commercial interest depends on it being right for any one architecture. NeuroCUDA is maintained by QuantaraCore specifically because its accuracy and bit-exactness claims are the product's value proposition, which is a different but equally legitimate kind of incentive to get the numbers right.

Sources & further reading

NIRTorch repository, github.com/neuromorphs/NIRTorch
NIR specification, neuroir.org, arXiv:2311.14641
NeuroCUDA source and NIR executor implementation, github.com/Krishnav1/neurocuda

Frequently asked questions

What is NIRTorch?

NIRTorch is a PyTorch helper package, maintained under the Open Neuromorphic community's neuromorphs GitHub organization, that provides two functions, nir_to_torch and torch_to_nir, to convert between PyTorch modules and NIR graphs. It uses torch.fx symbolic tracing to inspect a model's computational graph.

How does NIRTorch differ from NeuroCUDA's NIR support?

NIRTorch is a general-purpose PyTorch-to-NIR translation helper built on torch.fx symbolic tracing. NeuroCUDA includes a purpose-built NIR executor using Kahn's algorithm for topological sorting with explicit multi-input summation at branch nodes, verified bit-exact on a full ResNet-18 round-trip with 0.000000 maximum absolute difference.

Does NIRTorch support residual or skip-connection architectures like ResNet?

NIRTorch's own documentation does not make a specific claim either way about ResNet support. What is documented generally about torch.fx-style symbolic tracing is that determining skip-connection structure at the nn.Module level is a known hard problem, since PyTorch's dynamic graph execution means the full computational graph can only be recovered by tracing it.