June 28, 2026 8 min read

NeuroBench Report Format Explained: Algorithm Track vs System Track

NeuroBench is the closest thing neuromorphic computing has to a standard benchmark format. Most papers that cite it still mix up what its two tracks actually require. Here's the format, explained precisely enough to use.

Direct answer

NeuroBench (Yik et al., Nature Communications, 2025) splits evaluation into two tracks. The Algorithm Track is hardware-independent: accuracy, sparsity, and footprint/operation-count metrics, where modeled or estimated energy is acceptable as a lower-fidelity proxy. The System Track is hardware-dependent: it requires actual measured latency and power on physical hardware, via on-chip instrumentation or external multimeters - a modeled energy number does not qualify as System Track evidence, no matter how carefully it was calculated. A correctly formatted report states which track each number belongs to and never lets a modeled figure pass as a measured one.

Why a standard format matters here specifically

Neuromorphic computing results are unusually easy to misrepresent by accident, because the field mixes software simulators, hardware-validated models, and physical silicon under similar-sounding language. A number computed from a Loihi 2 software simulator and a number measured with a multimeter on physical Loihi 2 silicon can both get described as "Loihi 2 energy" if the report isn't careful - and they are not the same kind of evidence. NeuroBench's two-track structure exists specifically to force that distinction into the open before comparison happens.

Algorithm Track: what's required, what's acceptable

Metric categoryRequirementModeled energy OK?
AccuracyFull test set, hardware-independentN/A
Sparsity / activation rateMeasured on the actual modelN/A
Footprint (param count, memory)Computed directly from model specN/A
Energy / operation countCan be modeled from published per-op constantsYes - if labeled "modeled"

This is the track most academic SNN papers actually report under, even when they don't say so explicitly. If your energy number came from multiplying a spike count by a published per-spike energy constant (see our Loihi 2 vs GPU energy breakdown for exactly this calculation), you are reporting Algorithm Track-grade evidence. That's legitimate and useful - it just needs the word "modeled" attached.

System Track: what's required, what's not acceptable

Metric categoryRequirementModeled energy OK?
LatencyMeasured wall-clock on physical hardwareNo
Power / energyOn-chip instrumentation or external multimeterNo - must be measured
ThroughputMeasured on the actual deployed systemNo

This is the track that requires a physical chip on a physical bench, drawing physical current, with an instrument reading the result. There is no shortcut from Algorithm Track evidence into System Track evidence - a more careful model, a better-published constant, or a larger spike-count sample doesn't change the fact that nothing was actually measured on hardware.

The two tracks aren't a quality ladder where System Track is "the better version" of Algorithm Track. They're answering different questions: what does the algorithm theoretically cost, versus what did this specific deployed system actually draw.

A correctly formatted report, side by side

## NeuroBench Report — ResNet-18 / CIFAR-10 SNN

### Algorithm Track (hardware-independent)
Accuracy:        94.61% ± 0.14%  (3 seeds, full 10,000-image test set)
Sparsity:         78.2% ± 0.6%   (mean across all layers)
Param count:      11.2M
Energy (modeled): 0.73 µJ/inference
  — modeled from Intel's published ~0.9pJ/synaptic-op figure
  × measured spike count; NOT measured on physical silicon.

### System Track (hardware-dependent)
Status: NOT REPORTED.
Reason: no physical Loihi 2 hardware access; CPU/GPU timing
available but does not constitute System Track neuromorphic evidence.

Notice the last line. A correctly formatted NeuroBench-style report is allowed to say "not reported" for System Track metrics. That's not a weakness in the report - it's the honest alternative to quietly filling the gap with a modeled number relabeled as measured.

A checklist before you publish a NeuroBench-style report

1
Label every energy number "modeled" or "measured" explicitly

Never let the track be implied by context - state it next to the number.

2
Report accuracy on the full test set, 3+ seeds, as mean ± std

A single-seed or partial-test-set number is not a NeuroBench-grade Algorithm Track result.

3
If you have no physical hardware, write "System Track: not reported"

Don't substitute a CPU/GPU timing number for a missing neuromorphic hardware measurement.

4
Name the exact constants used for any modeled energy figure

"Modeled from Intel's published ~0.9pJ/synaptic-op figure" is verifiable. "Estimated energy savings" is not.

Where NeuroBench is explicit about what it doesn't cover yet

NeuroBench's own published methodology scopes embodied and robotics-style benchmarking as future work rather than something the current standard fully covers - a real-world ROS2 robot pipeline with camera input and closed-loop control sits outside both tracks as currently defined. That's a genuine gap in the standard, not a workaround to route around; reports describing robotics deployments should say so plainly rather than forcing the result into Algorithm or System Track framing it wasn't designed for.

Sources & further reading

  1. Yik et al., "NeuroBench: A Framework for Benchmarking Neuromorphic Computing," Nature Communications, 2025, DOI 10.1038/s41467-025-56739-4
  2. NeuroCUDA NeuroBench-format report generator, github.com/Krishnav1/neurocuda

Frequently asked questions

What is NeuroBench?

NeuroBench is a standard benchmark framework for neuromorphic computing, published by a multi-institution collaboration in Nature Communications in 2025. It defines a common methodology and metric set so that results from different neuromorphic hardware, algorithms, and research groups can be compared on equal terms, split into a hardware-independent Algorithm Track and a hardware-dependent System Track.

What is the difference between NeuroBench's Algorithm Track and System Track?

The Algorithm Track is hardware-independent and evaluates a model's accuracy, sparsity, and modeled energy/operation counts without requiring physical hardware - estimated or proxy energy figures are acceptable here. The System Track is hardware-dependent and requires actual measured power and latency on real physical hardware via on-chip instrumentation or external multimeters - modeled or estimated energy numbers do not qualify as System Track evidence.

Does NeuroBench require multiple random seeds?

NeuroBench's methodology emphasizes reproducibility and statistical reliability, and standard scientific practice for any benchmark report - which NeuroBench's own reference implementations follow - is to report results as mean and standard deviation across multiple random seeds (commonly three or more), not a single run, especially for any result intended for comparison against other methods.