The physical layer of the AI revolution

Training a frontier large language model is no longer a software problem — it is a construction problem. The binding constraints have shifted from chips to power, packaging, and grid interconnects, with transformer lead times stretching past three years and over half of planned 2026 US data-center capacity reportedly delayed by electrical-equipment shortages.

Training a frontier large language model is no longer a software problem — it is a construction problem. The leading edge of AI in 2026 lives inside purpose-built warehouses humming at hundreds of megawatts, filled with rack-scale "GPUs" the size of refrigerators, plumbed with liquid cooling, and connected by optical fiber that moves data faster than most national networks. Roughly $660–725 billion in combined hyperscaler capital expenditure is flowing into this physical layer in 2026 alone, nearly double 2025's record and approaching the capital intensity of the 1990s telecom buildout. The binding constraints have shifted from chips to power, packaging, and grid interconnects, with transformer lead times now stretching past three years and over half of planned 2026 US data-center capacity reportedly delayed by electrical-equipment shortages. This primer walks through the stack — silicon, networks, buildings, electricity, supply chains, and geopolitics — to give a working vocabulary for how LLMs actually get trained and served.

The accelerators that do the actual math

A modern LLM is, at heart, a very long sequence of matrix multiplications. CPUs, with their handful of complex cores optimized for branchy serial code, are catastrophically wrong for this. GPUs and their cousins are right because they have thousands of simple arithmetic units doing dense linear algebra in parallel. Within a GPU, Tensor Cores — small dedicated matrix-multiply engines first introduced in NVIDIA's Volta generation in 2017 — perform fused multiply-accumulates on 4×4 or 8×8 tiles per cycle, delivering 10–30× more matrix throughput than general-purpose CUDA cores. Each new chip generation has also added narrower number formats — FP16, then FP8 (Hopper, 2022), now FP4 (Blackwell, 2024) — that effectively double or quadruple throughput at little accuracy cost for LLM workloads.

NVIDIA dominates this market with roughly 80–86% of data-center AI accelerator revenue and ~78% gross margins. The Hopper generation — H100 (80 GB HBM3, 3.35 TB/s memory bandwidth, 700 W, ~$25–40K per unit) and H200 (141 GB HBM3e at 4.8 TB/s) — is still the workhorse, but the Blackwell generation is now the flagship. A B200 package is actually two reticle-limit dies fused into one 208-billion-transistor chip with 192 GB of HBM3e at 8 TB/s. NVIDIA packages two Blackwells with a Grace ARM CPU into a GB200 superchip, then puts 36 of those into a single liquid-cooled, ~120 kW rack called the GB200 NVL72 — 72 GPUs that behave as one giant accelerator with 13.4 TB of shared memory and 130 TB/s of internal bandwidth, marketed (and priced near $3–5M per rack) as "one big GPU for trillion-parameter models." The Blackwell Ultra (GB300) refresh shipping through 2025–26 lifts HBM to 288 GB per GPU. The next generation, Vera Rubin, is on track for the second half of 2026, with Rubin Ultra "Kyber" NVL576 racks projected at ~600 kW each arriving in late 2027, and a Feynman generation pushing toward megawatt-class racks after that.

What makes these chips special compared to a gaming GPU is the combination of those tensor cores, dense interconnect (covered below), and especially HBM — High Bandwidth Memory. HBM is 3D-stacked DRAM sitting on a silicon interposer right next to the GPU die, talking to it through a 1,024-bit-wide interface (versus 64 bits for ordinary DDR5). Each generation roughly doubles bandwidth: HBM2e at ~1.6 TB/s on A100, HBM3 at 3.35 TB/s on H100, HBM3e at 4.8–8 TB/s on H200/B200, HBM4 at 22 TB/s per Rubin socket. Memory bandwidth, not compute, is the real bottleneck for LLM inference, because generating each token requires streaming the entire model's weights from memory through the math units. This is why an H200 with the same FLOPs as an H100 but 43% more bandwidth delivers 20–45% faster inference, and why every new generation prioritizes HBM upgrades.

The alternatives are real but trail in ecosystem and deployed volume. Google's TPUs are the only credible non-NVIDIA training platform at frontier scale: Trillium (v6) is the production workhorse, Ironwood (v7) went generally available in November 2025 with 192 GB HBM3e and 9,216-chip "superpods" using optical circuit switches, and Anthropic in October 2025 signed a deal for up to 1 million TPUs and over 1 GW of capacity. AMD has finally become a serious second-source with MI300X (192 GB HBM3, adopted by Microsoft, Meta, Oracle), MI325X, the MI350 series (288 GB HBM3e, claims of competitive or better performance per dollar than B200), and the MI400 "Helios" rack for late 2026 — anchored by Meta committing roughly $100 billion to 6 GW of MI400 and OpenAI signing a parallel 6 GW deal. AWS Trainium2 and Trainium3 power Anthropic's Project Rainier, which by late 2025 reportedly crossed 1 million Trainium chips, with the Amazon-Anthropic relationship expanded in April 2026 to a multi-year 5 GW arrangement. Meta's MTIA, Microsoft's Maia 100/200, and OpenAI's Broadcom-codesigned chip (10 GW deployment over 2026–2029) round out a clear industry trend: every hyperscaler now wants its own silicon, primarily for inference, to escape NVIDIA's pricing.

NVIDIA's real moat, though, is CUDA — nineteen years of compounding software including cuDNN (deep-learning kernels), NCCL (multi-GPU collective communication), TensorRT-LLM (inference optimization), and Triton (kernel DSL), tightly integrated with PyTorch and JAX. AMD's ROCm has reached rough parity for many common workloads, but specialized kernels like FlashAttention-3 and DeepSeek's FP8 paths still ship CUDA-first. Combined with NVLink/NVSwitch networking inside the rack, the CUDA stack creates compounding lock-in that raw FLOPs cannot easily dislodge.

Networking, or why a cluster has to act like one machine

The reason a 100,000-GPU cluster is interesting is not that it has a hundred thousand independent computers; it is that they all act as one synchronized computer. In standard data-parallel training, every GPU computes gradients on its slice of a batch, then all GPUs exchange and average those gradients — an all-reduce operation — before the next step. This happens thousands of times per training run. If the network can't move data fast enough, all those expensive accelerators sit idle waiting. Tail latency is the central enemy: the slowest GPU on the slowest link drags every other GPU to its pace. The Ultra Ethernet 1.0 specification mentions "tail latency" 47 times for a reason.

The architecture splits into two layers. Scale-up networking connects GPUs within a tightly coupled domain at near-memory speeds; NVIDIA's NVLink does this at 1.8 TB/s per Blackwell GPU, and NVSwitch crossbar chips fan it out so every GPU in the domain can talk to every other GPU at full rate. Scale-out networking connects domains across the data center using InfiniBand (NVIDIA Quantum-X800 at 800 Gb/s per port) or specialized Ethernet (NVIDIA Spectrum-X, or the open Ultra Ethernet Consortium standard ratified in June 2025 and championed by AMD, Broadcom, Meta, and Microsoft). The bandwidth gap between the two layers is roughly 18–36× — and the dominant architectural trend is enlarging the scale-up domain to swallow more of the cluster. Eight years ago a scale-up domain meant 8 GPUs on one server board. The GB200 NVL72 made it 72. Rubin Ultra's Kyber rack targets 576 GPU dies in a single domain in 2027. Bigger domains directly enable larger tensor-parallel groups, bigger MoE expert pools, and longer-context inference with the KV cache living in shared memory.

Beyond a few racks, signals must go optical. Today that means pluggable transceivers on top of every switch — but at million-GPU scale they become an enormous power and reliability problem (Jensen Huang has claimed a million-GPU cluster would burn ~180 MW just running its optical transceivers). The fix is co-packaged optics (CPO), where silicon photonics engines bond directly onto the switch ASIC. NVIDIA's Quantum-X Photonics (InfiniBand) is shipping in late 2025; Spectrum-X Photonics Ethernet follows in 2026. Lightmatter (now valued at $4.4 billion), Ayar Labs, and Marvell's $3.25 billion acquisition of Celestial AI in February 2026 signal that photonic interconnects are the next architectural revolution after rack-scale GPUs.

How the rooms have changed

A traditional enterprise data center was built around racks dissipating 5–15 kilowatts, cooled by air pushed through a raised floor. Even pre-AI hyperscaler racks topped out around 30 kW. A GB200 NVL72 rack draws ~120 kW. A Rubin Ultra Kyber rack will draw ~600 kW. This is not an incremental change — it requires a fundamentally different building.

Air physically cannot remove that much heat from a single 24-inch rack footprint; liquid conducts heat roughly 3,000× better than air. Above about 50–70 kW per rack, direct-to-chip liquid cooling — cold plates bolted onto GPUs and CPUs, fed by warm water (30–45 °C) circulating through coolant distribution units — becomes mandatory. Rear-door heat exchangers handle the residual heat the cold plates miss (memory, voltage regulators, NICs). Single-phase immersion (dunking servers in dielectric oil) is growing but niche; two-phase immersion and embedded microfluidics etched into the chip itself are emerging for the post-Rubin generation. The industry consensus, summarized in a Build.inc note that has become a meme in the sector, is that "developers who are not designing for direct liquid cooling are building legacy product."

That is why old data centers cannot simply be retrofitted for frontier AI. Their floors are not rated for 3,000-lb racks. Their power feeds were sized for 5–10 kW per rack at 208 VAC, not 120 kW at 50 V DC busbar. They have no liquid plumbing. Their substations were sized for tens of MW, not the gigawatts a frontier campus requires. The industry is responding with greenfield AI campuses designed from first principles around the rack spec — Meta tore down the H-shaped buildings it had already started at Hyperion in Louisiana and rebuilt with long rectangular halls flanked by gas turbines, while xAI retrofitted a Memphis appliance factory in 122 days using "tented" prefab construction inspired by speed-over-elegance.

Water is the next pressure point. Most cooling still uses evaporative towers, with water-use effectiveness around 1.5–3 liters per kWh. Google consumed 24 billion liters of water in 2023; a hyperscale AI site can evaporate over a million liters a day. Microsoft has pledged zero-water cooling on all new builds from late 2027, and Singapore and the EU now mandate WUE reporting. Closed-loop liquid cooling — which uses water once on the inside and air-cooled chillers on the outside — is becoming the standard answer in water-stressed regions.

Power is the binding constraint

Five years ago a "big" data center drew 30–100 MW. The frontier today is 1–2 GW. Meta's Hyperion is designed to scale to 5 GW on a footprint nearly the size of Manhattan. The arithmetic of AI scaling has effectively become a question of who can get electricity built and delivered.

This is why behind-the-meter generation has become a defining feature of the era. xAI's Colossus famously launched with only an 8 MW grid interconnect and bridged the gap with around 35 mobile natural-gas turbines burning methane from a 16-inch pipeline — eventually emitting an estimated 1,200–2,000 tons of NOx per year and triggering an NAACP Clean Air Act suit and an EPA rule change. Behind-the-meter gas is faster than waiting for grid interconnect, which now takes 4–7 years in major markets and longer in Northern Virginia, where Dominion Energy has a 25 GW queue plus another 75 GW with no energization date.

The grid stress is showing up in PJM capacity auctions: the 2025/26 auction cleared at ~9× the prior year's price, and the 2027/28 auction (cleared December 2025) hit the FERC-imposed price cap with PJM falling 6,623 MW short of its reliability target — the first time the entire RTO has undercleared. Data centers are reportedly responsible for roughly 40% of those new capacity costs. In Texas, ERCOT's large-load interconnection queue exploded from 63 GW to 226 GW in a year (~73% data centers), prompting Senate Bill 6 to require remote disconnect capability on new loads.

Nuclear has therefore become a strategic theme:

  • Microsoft + Constellation signed a 20-year PPA in September 2024 to restart Three Mile Island Unit 1 (835 MW, online targeted 2028) — the most concrete near-term nuclear addition in the AI buildout.
  • Amazon + Talen Energy sought 960 MW co-located at the Susquehanna nuclear plant; FERC rejected the amended interconnection agreement in late 2024 and again in early 2025, limiting AWS to its existing 300 MW.
  • Google + Kairos Power, Amazon + X-energy (up to 5 GW by 2039), and Meta's 6.6 GW December 2025 RFP (won by Vistra, Oklo, and TerraPower) are all betting on Small Modular Reactors (SMRs) — but their realistic deployment window is the early 2030s, very likely too late for the 2025–2028 frontier compute race.

Meanwhile transformer lead times have stretched to as much as 128 weeks, and Bloomberg reported in April 2026 that more than half of planned US 2026 data-center capacity (~12 GW) has been delayed or canceled by transformer, switchgear, and battery shortages.

The named buildouts shaping 2024–2026

The frontier buildouts have specific addresses, contracts, and partners worth knowing by name.

OpenAI's Stargate, announced January 2025 as a $500 billion, four-year, ~10 GW commitment with Oracle, SoftBank, and MGX, is anchored at a 1.2 GW Crusoe-built site in Abilene, Texas on a Lancium clean campus, leased through Oracle. The first building came online in September 2025; by Q3 2025 the project announced five additional sites (Shackelford and Milam Counties in Texas, Doña Ana in New Mexico, Lordstown in Ohio, plus a Wisconsin site via Vantage) totaling roughly 7 GW. Stargate UAE with G42 (1 GW initial, inside a planned 5 GW US–UAE AI campus) and Stargate Argentina ($25 billion, 500 MW in Patagonia) followed. A March 2026 setback: OpenAI and Oracle scrapped the planned 1.2→2 GW Abilene expansion, with NVIDIA reportedly placing a $150M deposit to broker the remaining 600 MW to Meta.

xAI's Colossus, in a former Electrolux factory in Memphis, was built in 122 days in 2024 and has grown to roughly 230,000 GPUs (a mix of H100, H200, and GB200) drawing ~300 MW. Construction of Colossus 2 in Whitehaven (and a sister site in Southaven, Mississippi) began in March 2025, targeting 550,000 Blackwell chips and reportedly the first gigawatt-scale single AI cluster. In May 2026, Anthropic agreed to rent the entire Colossus 1 capacity, a remarkable cross-rival deal.

Meta's Hyperion in Richland Parish, Louisiana is a $27 billion JV with Blue Owl Capital targeting 2 GW initially and 5 GW by 2030; its sister site Prometheus in New Albany, Ohio brings ~1 GW online in 2026. Meta's 2026 capex guidance jumped to $115–145 billion, up from $72 billion actual in 2025, and Zuckerberg pledged the company would end 2025 with over 1.3 million GPUs, layered on top of a $100 billion / 6 GW AMD deal and a $21 billion seven-year CoreWeave contract for Rubin capacity.

Anthropic has the most explicitly multi-vendor strategy: AWS's Project Rainier (1M+ Trainium2 chips, anchored by an $11 billion Indiana site, expanded April 2026 to a 5 GW / ~$25 billion arrangement), Google's October 2025 deal for up to 1 million TPUs and over 1 GW (first phase ~400K Ironwood units delivered through Broadcom), and the May 2026 rental of all of Colossus 1. Google itself spent ~$91 billion in 2025 and guided $175–185 billion for 2026 — roughly doubling — and Anthropic is now its largest cloud customer.

Microsoft Azure ran $80 billion of capex in FY2025 and sits on a reported $80 billion backlog of unfilled Azure orders, gated by power not chips. The Microsoft-OpenAI relationship reset in January 2025: Microsoft retained an Azure exclusivity on OpenAI's GPT-series API but lost right-of-first-refusal on new compute capacity, freeing OpenAI to contract with Oracle ($300 billion / 5 years), AWS ($50 billion / 2 GW), AMD (6 GW), and Broadcom (10 GW). In parallel Microsoft launched in-house MAI models and integrated Anthropic's Claude into Office 365 Copilot — the first major Copilot diversification away from OpenAI. Oracle, once an afterthought in AI infrastructure, has emerged as Stargate's lead operator and the most aggressive new-entrant hyperscaler.

The supply chain bottlenecks behind every chip

If NVIDIA, AMD, and Google's TPU team all want to ship more silicon, they ultimately all ask the same factory in Hsinchu, Taiwan. TSMC fabricates virtually every leading-edge AI chip — NVIDIA Hopper/Blackwell/Rubin, AMD Instinct, Google TPUs, Apple silicon, Broadcom's custom ASICs for Meta and OpenAI. North America was 75% of TSMC's 2025 revenue; HPC alone was 58%. The concentration is the system's single biggest geopolitical risk.

Within TSMC, the tighter chokepoint is not the transistor fabrication itself but CoWoS advanced packaging — the silicon-interposer technology that bonds HBM stacks next to GPU dies. CoWoS capacity is being scaled from roughly 35,000 wafers per month in late 2024 to 130,000 by end of 2026, and NVIDIA has reportedly pre-booked 50–60%+ of TSMC's 2026 CoWoS capacity. TSMC management has openly called CoWoS "oversubscribed through mid-2026," and the TSMC Arizona Fab 21 ramp, accelerated to begin N3 volume production in the second half of 2027, is the most-watched US industrial-policy project in a decade.

HBM is the other ration. SK Hynix holds roughly 53–62% market share, supplies the bulk of NVIDIA's needs, was first to ship 12-Hi HBM4 samples, and overtook Samsung as the world's #1 DRAM maker for the first time in over four decades. Samsung lost share badly because it repeatedly failed NVIDIA's HBM3e qualification through 2024 and most of 2025, finally passing in September 2025; it remains a strong contender for Rubin-generation HBM4. Micron, the US-based outlier, climbed from a 4% share in early 2024 to roughly 21% by late 2025 and is strategically important for export-controlled deployments. All three vendors are sold out of HBM through 2026, and HBM allocation has spilled over into consumer markets — NVIDIA cut RTX 50-series production 30–40% in early 2026 because GDDR7 memory was being reallocated to AI.

Upstream of TSMC is ASML, the sole supplier of EUV lithography machines, and the only company that can make High-NA EUV (priced at ~$380M per system). EUV machines have never shipped to China, and Dutch DUV-immersion controls tightened in 2024–25; Chinese customers still drove ~42% of ASML's 2025 sales by stockpiling older tools.

Geopolitics, export controls, and the China track

US export controls on advanced AI chips began in October 2022 (banning H100/A100 sales to China), tightened in October 2023 (closing the H800/A800 loophole NVIDIA had designed in response), expanded to include HBM in December 2024, and culminated in Biden's AI Diffusion Rule in January 2025, which sorted the world into three tiers of access. The Trump administration rescinded the Diffusion Rule on May 13, 2025, two days before it would have taken effect, calling it "divorced from commercial reality." The H20 — NVIDIA's China-compliant Hopper variant — was banned in April 2025 (costing NVIDIA a $4.5 billion writedown and ~$8 billion in lost Q2 revenue), then re-allowed for export in July 2025 as part of a US-China rare-earths deal. A subsequent August 2025 arrangement requires NVIDIA and AMD to remit 15% of China AI-chip revenue to the US government in exchange for licenses — politically announced but operationally still being defined.

China's domestic response has accelerated. Huawei's Ascend 910C (a dual-die chiplet on SMIC's 7nm-class process) delivers roughly 60% of an H100's inference performance per chip; the CloudMatrix 384 rack-scale system of 384 Ascend chips, according to a SemiAnalysis analysis, beats a GB200 NVL72 on aggregate BF16 FLOPs and HBM capacity at much higher power. SMIC is producing 7nm-class logic using DUV multi-patterning (about 34 lithography steps versus 9 with EUV), with yields reportedly improving but margins compressing. The big symbolic moment was DeepSeek, whose V3 model (671B parameters, only 37B active per token via Mixture-of-Experts) trained on 2,048 H800 GPUs for ~2.79M GPU-hours — roughly $5–6M for the final run, eleven-fold less than Llama 3 405B — and whose R1 reasoning model triggered a January 27, 2025 selloff that erased ~17% of NVIDIA's market cap in a single day, then the largest one-day loss in US history.

The Middle East is the third pole of the buildout. Trump's May 2025 Gulf trip produced Stargate UAE (1 GW in Abu Dhabi, inside a planned 5 GW US-UAE campus with G42, OpenAI, Oracle, and NVIDIA, paired with a UAE commitment of up to 500,000 advanced chips per year), and Saudi Arabia's Humain — a PIF-owned national champion launched May 12, 2025 that immediately signed for >18,000 GB300s with NVIDIA, a $10 billion / 500 MW AMD partnership, and $5 billion AWS and Qualcomm partnerships. Europe is pursuing a less concentrated but more numerous strategy with 19 EuroHPC AI Factories selected by October 2025 and an InvestAI €20 billion fund for up to five "AI Gigafactories." The UK committed £1 billion to scale the AI Research Resource 20× by 2030; India's IndiaAI Mission has deployed over 38,000 GPUs via Yotta, Tata Communications, Jio, and others; Japan quadrupled its FY2026 chips/AI budget to ¥1.23 trillion ($7.9 billion); and South Korea announced a >260,000 Blackwell GPU rollout at the APEC summit in October 2025, spread across Samsung, SK, Hyundai, NAVER, and the government's National AI Computing Center. The unifying pattern is sovereign compute: every serious country now treats AI hardware as it once treated steel mills.

Why training and inference need different machines

Training and inference look similar on a slide but make very different demands on hardware. Training is one huge job: 10,000 to 200,000+ accelerators executing in lockstep for weeks or months, dominated by synchronous all-reduce collectives where the slowest GPU sets the pace, tolerant of restarts and checkpointing but absolutely intolerant of bandwidth shortfalls. It is fundamentally capex-heavy: you build a billion-dollar cluster and amortize it over a few training runs. The all-in cost of a 100k-H100 training cluster runs $3–5 billion, and the final-run compute cost of a 2025–26 frontier model lands in the $200–500 million range, with Dario Amodei and others projecting $1–3 billion training runs by 2027 and $10 billion-class runs by the end of the decade — though the final training run is only about 10–22% of total R&D compute spend at the frontier.

Inference is millions of small jobs: each request is independent, each user expects sub-second time-to-first-token, and the workload is memory-bandwidth bound during decoding rather than compute-bound, because every generated token must stream the model's weights through the math units. Inference is opex-heavy — costs scale linearly with query volume — and increasingly runs on geographically distributed clusters placed near users, on a different chip mix (often older H100/H200, AWS Trainium2, Google TPU v5e/Trillium, or specialty chips like Groq's SRAM-first LPUs that NVIDIA itself licensed in a $20 billion deal in December 2025).

Two trends have made the training/inference distinction matter more than it used to. First, reasoning models (OpenAI's o1/o3, DeepSeek R1, Claude with extended thinking) generate 10–100× more tokens per query than one-shot chat — Epoch AI estimates ~30× cost-per-useful-output for o1 versus GPT-4o, and Jensen Huang has revised his estimate that inference compute will grow "100× more" upward to "a billion times" (directional rhetoric, but the trend is real). Epoch AI's careful estimate is that inference will be 50–80% of total AI compute by 2030, dwarfing training. Second, Mixture-of-Experts models like DeepSeek V3/R1 keep all 671B parameters resident in HBM but activate only ~37B per token, which makes them memory-capacity-hungry rather than compute-hungry, and shifts hardware design toward chips with more, faster HBM. NVIDIA's response is the Rubin CPX — a purpose-built long-context-inference prefill chip with cheaper GDDR7 memory paired with conventional Rubin GPUs handling decode — formalizing the disaggregation of inference workloads that Splitwise and DistServe research demonstrated could raise throughput 1.4× at 20% lower cost.

The bottlenecks worth watching

The forward-looking story is dominated by three races and a half-dozen physical constraints.

The first race is for the gigawatt cluster. Colossus 2 claims first-to-1-GW in early 2026 (though Tom's Hardware analysis of cooling capacity suggests a phased ramp closer to 350 MW). Stargate Abilene, Hyperion, Prometheus, and Microsoft's Fairwater all follow. Beyond that, hyperscalers are experimenting with multi-datacenter training — Google trained Gemini Ultra across multiple sites linked by optical circuit switches; OpenAI's "MegaScaler" extends synchronous training across campuses; the industry is increasingly looking at asynchronous gradient descent to break the speed-of-light synchronization barrier across long-haul fiber.

The second race is for inference-optimized silicon: Rubin CPX, Trainium3, Maia 200, the Broadcom-OpenAI accelerator, and the Groq licensing deal all reflect a bifurcation where the optimal training chip and the optimal inference chip diverge sharply.

The third race is for optical interconnect at chip scale, where co-packaged optics from NVIDIA, Lightmatter, Ayar Labs, and Marvell/Celestial AI promise to remove the power and reliability tax of pluggable transceivers at million-GPU scale.

The constraints surrounding all three races are mostly physical and slow-moving:

  • Grid interconnect queues of 4–7 years in major markets and over 2,100 GW of pending interconnections nationally
  • Transformer and switchgear lead times of 36–48 months (some up to 7 years for large gas turbines), delaying an estimated half of planned 2026 US data-center capacity
  • HBM and CoWoS allocation sold out through 2026, with NVIDIA having pre-booked over half of TSMC's 2026 packaging capacity

Hyperscaler combined capex of ~$220 billion in 2024, ~$400 billion in 2025, and a guided $660–725 billion in 2026 is now consuming roughly 2.2% of US GDP — a capital-intensity level that approaches the 1990s telecom buildout and exceeds anything in modern tech. Capital intensity at the Big Five has jumped from under 20% of revenue historically to 45–57%, financed by record bond issuance (>$100 billion in 2025 alone).

What this means

The most important reframing in this primer is simple: AI is no longer software running on commodity infrastructure; it is a co-designed physical system in which the building, the power plant, the cooling loop, the network fabric, and the chip are now a single product. NVIDIA's GB200 NVL72 is the clearest expression — a 120 kW rack that is sold, conceptually, as one GPU. Rubin Ultra's 600 kW Kyber rack extends the principle. The constraint binding the next leg of AI scaling is not algorithmic cleverness or even chip design — it is electricity, transformers, packaging capacity, and skilled construction labor, none of which scale on Moore's-law timelines.

A few specific implications follow. NVIDIA's moat is wider than market share alone suggests, because CUDA, NVLink, and ecosystem lock-in compound; AMD, Google, and the custom hyperscaler chips will steal share at the margin, especially for inference, but a full ecosystem displacement is unlikely this decade. The DeepSeek moment was real but partially misread: efficiency gains compress the cost of training a given capability, but Jevons-paradox dynamics around reasoning models are pushing total inference demand vastly higher, which is exactly why hyperscaler capex accelerated rather than slowed after January 2025. The geopolitical center of gravity is shifting: the US-China binary of 2022 has become a multi-polar geometry in which Gulf sovereign wealth, EU industrial policy, Korean industrial conglomerates, and Japanese state subsidy are all serious players, and where export controls have shown themselves to be a leaky, frequently-renegotiated instrument rather than a wall.

Most fundamentally, the bottleneck has migrated. In 2022 it was design. In 2023 it was wafers. In 2024 it was HBM and CoWoS. In 2025 it was power. In 2026 it is electrical equipment — transformers, switchgear, busbars — and the multi-year grid interconnect queue. The frontier AI race is now, to a first approximation, a race for hundred-MW substations and thousand-MW power purchase agreements. Whoever wins the next leg of the LLM race will likely be whoever first solves the boring industrial problems of moving electrons reliably into a building full of liquid-cooled silicon — and then teaching that silicon to think in lockstep.