The GPU is not the only answer anymore
GPUs dominate AI compute today. But as inference becomes a major cost center and energy constraints tighten, three alternative paradigms are competing for the next generation of AI hardware.
Each uses different physics. Each has different tradeoffs. And each is at a very different stage of maturity.
The four paradigms at a glance
| Paradigm | Core idea | Best AI workloads | Energy profile | Maturity |
|---|---|---|---|---|
| Classical (GPU/TPU) | Deterministic logic, massive parallelism | Training + inference, all architectures | High, scaling with model size | Dominant, decades of tooling |
| Quantum | Exploits quantum superposition and entanglement | Optimization, chemistry, specific sampling tasks | Low per operation, high for cooling | Narrow, error-prone, improving |
| Neuromorphic | Event-driven, spike-based, brain-inspired | Edge inference, sensory processing, sparse workloads | Very low | Niche, commercially available but limited ecosystem |
| Thermodynamic | Uses thermal noise as computational resource | Probabilistic inference, generative AI, uncertainty | Potentially very low for inference | Early prototypes, first chips taping out |
Classical: dominant but power-bound
GPUs (NVIDIA H100/B200, AMD MI300X) and TPUs (Google) are the workhorses. The ecosystem is deep: CUDA, PyTorch, JAX, optimized compilers, massive cloud availability.
The problem is energy. Training a frontier model costs millions in compute. Inference at scale is a growing operational expense. Every token generated costs electricity, cooling, and hardware depreciation.
| Metric | Current state |
|---|---|
| Training cost for frontier models | $100M+ for the largest runs |
| Inference cost per 1M tokens | $0.03 (Mistral Small) to $25 (GPT-4.1 input) |
| Energy per inference | Scaling with model size and context length |
| Tooling maturity | Decades of optimization, massive ecosystem |
Classical compute will remain dominant for years. But the economics are pushing teams toward alternatives for specific workloads.
Quantum: promising but narrow
Quantum computing uses superposition and entanglement to process information in ways classical systems cannot efficiently simulate. For certain problems (optimization, chemistry simulation, specific sampling), quantum has theoretical advantages.
| Strength | Constraint |
|---|---|
| Exponential speedup for specific algorithms | Decoherence limits computation time |
| Active investment from Google, IBM, Microsoft, startups | Error correction requires massive qubit overhead |
| Quantum advantage demonstrated for narrow tasks | Cryogenic cooling at near absolute zero |
| Programming model fundamentally different from classical |
For AI specifically, quantum is not yet competitive for training or general inference. The most realistic near-term applications are in optimization, drug discovery, and materials science, not in running transformers.
Neuromorphic: efficient but niche
Neuromorphic chips (Intel Loihi 2, IBM NorthPole, SynSense, BrainChip) mimic brain-like computation: event-driven, spike-based, inherently parallel at low power.
| Strength | Constraint |
|---|---|
| Extremely low power consumption | Limited software ecosystem |
| Good for always-on sensory processing | Not competitive for large-model inference |
| Handles sparse, temporal data well | Programming model unfamiliar to most engineers |
| Commercially available (BrainChip Akida, Intel Loihi) | Niche adoption, small community |
Neuromorphic works well at the edge: hearing aids, drones, autonomous sensors, anomaly detection. It is not a replacement for datacenter GPU workloads.
Thermodynamic: the newest contender
Thermodynamic computing uses thermal noise as a computational resource rather than fighting it. The hardware naturally samples from probability distributions, making it potentially ideal for probabilistic AI workloads.
| Development | Status | Source |
|---|---|---|
| 8-cell proof-of-concept on PCB | Demonstrated Gaussian sampling, matrix inversion, ML primitives | Nature Communications, 2025 |
| Extropic thermodynamic sampling unit | In development. Co-founded by Guillaume Verdon (ex-Google Quantum AI) | WIRED |
| Normal Computing CN101 chip | Taped out August 2025. Targets multimodal diffusion GenAI inference. $85M+ raised | Fortune |
| Berkeley Lab training research | Required 96 GPUs on Perlmutter, but promises very low energy inference | Berkeley Lab |
The key tradeoff: expensive digital training for cheap physical inference. Same economic pattern as GPUs, different physics.
Which paradigm for which workload?
| Workload | Best paradigm today | Why |
|---|---|---|
| Frontier model training | Classical (GPU/TPU) | No alternative has the scale, tooling, or performance |
| High-volume inference | Classical, with thermodynamic potential | GPUs dominate, but energy costs are pushing alternatives |
| Probabilistic/generative inference | Classical, thermodynamic emerging | Thermodynamic hardware natively samples distributions |
| Optimization problems | Classical, quantum emerging | Quantum has theoretical advantages for specific problems |
| Edge/sensor processing | Neuromorphic | Lowest power, always-on, event-driven |
| Drug discovery/materials science | Quantum | Quantum simulation of molecular systems |
Timeline reality check
| Paradigm | When it matters for mainstream AI |
|---|---|
| Classical GPU/TPU | Now and for the foreseeable future |
| Quantum | 2028-2030+ for narrow AI applications |
| Neuromorphic | Available now for edge, unlikely to impact datacenter AI |
| Thermodynamic | 2027-2029 for first commercial niches, 2030+ for broader use |
What this means for engineering teams
If you run AI workloads today, GPUs are the answer. Period.
But if you plan infrastructure for 3-5 years out:
| Action | Why |
|---|---|
| Track thermodynamic computing progress | Biggest potential impact on inference cost |
| Monitor quantum for specific optimization tasks | Not general AI, but valuable for certain domains |
| Consider neuromorphic for edge deployments | Proven technology for low-power always-on |
| Do not bet everything on one paradigm | The computing landscape is diversifying, not converging |
The GPU era is not ending. But the question “what hardware runs inference?” is about to have more than one answer.