NVIDIA SXM
1. SXM Overview
SXM stands for Server PCI Express Module, which is NVIDIA’s proprietary high-bandwidth GPU socket/connector solution designed for mounting data-center-class GPU accelerators directly onto server motherboards.
Core Design Philosophy
Proprietary: SXM is NVIDIA’s closed proprietary interface standard with undisclosed specifications (requires NDA, Non-Disclosure Agreement), giving NVIDIA complete design freedom
High Bandwidth: Direct GPU-to-GPU interconnection via NVLink, with bandwidth far exceeding PCIe
High Power: Not limited by the PCIe standard 75W/300W limits; directly powered through the socket up to 700W-1400W+
High Density: A single HGX baseboard (NVLink Switch, power delivery, and cooling base) can accommodate 4 or 8 GPUs
Modularity: GPUs are mounted horizontally as mezzanine cards for easy integration
Why SXM?
Traditional PCIe slots were designed for general-purpose expansion cards (network cards, storage cards, GPUs, etc.) and have the following bottlenecks:
- Power Limitations: PCIe slot standard power delivery is only 75W; even with auxiliary power cables, the PCIe specification restricts the overall power solution
- Bandwidth Limitations: PCIe x16 bandwidth is far lower than NVLink, unable to meet the demands of large-scale multi-GPU parallel training
- Topology Limitations: Under PCIe tree topology, GPU-to-GPU communication must pass through the CPU, resulting in high latency and limited bandwidth
- Density Limitations: Standard PCIe cards are mounted vertically, occupying significant space, with complex cooling and power delivery design
By breaking these limitations, SXM enables 8 GPUs in DGX/HGX systems to work together as a single giant GPU.
2. SXM Generational Evolution
2.1 SXM1 (Pascal P100, 2016)
SXM was first introduced with the Tesla P100 (GP100 core) in the DGX-1 system.
| Item | Specification |
|---|---|
| Corresponding GPU | Tesla P100 (GP100) |
| Architecture | Pascal |
| Memory | 16GB HBM2 |
| Memory Bandwidth | 720 GB/s |
| TDP | 300W |
| NVLink Version | NVLink 1.0 |
| NVLink Bandwidth | 160 GB/s (4 links, 40 GB/s per link) |
| Launch Product | DGX-1 |
| Process Node | TSMC 16FF+ |
| Transistor Count | 15.3 billion |
P100 was the first GPU equipped with NVLink; the 4 GPUs on the SXM module were interconnected via NVLink in a hybrid cube mesh topology.
2.2 SXM2 (Volta V100 16GB, 2017)
| Item | Specification |
|---|---|
| Corresponding GPU | Tesla V100 16GB (GV100) |
| Architecture | Volta |
| Memory | 16GB HBM2 |
| Memory Bandwidth | 900 GB/s |
| TDP | 300W |
| NVLink Version | NVLink 2.0 |
| NVLink Bandwidth | 300 GB/s (6 links, 50 GB/s per link) |
| Connector | Amphenol MEG-Array 400-pin |
| Launch Product | DGX-1 V100 |
| Process Node | TSMC 12FFN |
| Transistor Count | 21.1 billion |
V100 introduced Tensor Cores, delivering a 12x improvement in AI training performance compared to P100. SXM2 used a 400-pin Amphenol MEG-Array connector, upgrading from P100’s NVLink mesh topology to an NVLink fully-connected topology.
2.3 SXM3 (Volta V100 32GB, 2018)
| Item | Specification |
|---|---|
| Corresponding GPU | Tesla V100 32GB (GV100) |
| Architecture | Volta (same core as V100) |
| Memory | 32GB HBM2 |
| Memory Bandwidth | 900 GB/s |
| TDP | 350W |
| NVLink Version | NVLink 2.0 |
| NVLink Bandwidth | 300 GB/s |
| Power Architecture | 48V input (different from 12V) |
| Connector | Updated Amphenol MEG-Array (more robust) |
| Launch Product | DGX-2 |
Key changes in SXM3:
- Connector Upgrade: While physically similar to SXM2, it used a more robust MEG-Array connector (different pin configuration)
- 48V Power Delivery: This was the biggest architectural change — switching from traditional 12V to 48V power architecture, significantly reducing current losses
- Vicor Modules: Introduced Vicor MCM/MCD (Multi-Chip Module/Driver) power modules, which became standard design in subsequent SXM4/5/6
- TDP Increase: Raised from 300W to 350W, laying the foundation for higher-power GPUs
SXM3 was also the first time NVIDIA shipped standardized HGX baseboards to OEMs. OEMs could directly purchase pre-assembled 4-GPU baseboards, greatly reducing integration complexity.
A historical detail about SXM numbering: The P100 used an SXM module (without a suffix number), but was referred to as “SXM2” specification. In fact, the P100’s SXM in DGX-1 was called SXM-2 specification (capable of upgrading to V100’s SXM2 module). Sources like Wikipedia mark the socket used by P100 as SXM, while marking the P100 module itself as SXM2. A more accurate understanding is: P100 is SXM1 (first generation), V100 16GB is SXM2, V100 32GB is SXM3.
2.4 SXM4 (Ampere A100, 2020)
| Item | Specification |
|---|---|
| Corresponding GPU | A100 40GB / 80GB (GA100) |
| Architecture | Ampere |
| Memory | 40GB HBM2 / 80GB HBM2e |
| Memory Bandwidth | 1.6 TB/s (HBM2) / 2.0 TB/s (HBM2e) |
| TDP | 400W |
| NVLink Version | NVLink 3.0 |
| NVLink Bandwidth | 600 GB/s (12 links, 50 GB/s per link) |
| Launch Product | DGX A100 |
| Process Node | TSMC N7 |
| Transistor Count | 54.2 billion |
| MIG | Supported (up to 7 instances) |
| NVSwitch | 3rd Gen, supporting SHARP in-network reduction |
A100 was the first GPU to support MIG (Multi-Instance GPU), capable of partitioning a single GPU into up to 7 independent instances. The SXM4 baseboard used NVIDIA’s Redstone 4-GPU baseboard design; in DGX A100, each baseboard carried 4 GPUs, with two baseboards achieving an 8-GPU configuration.
NVLink 3.0 provided 600 GB/s bidirectional bandwidth per GPU, working with NVSwitch to achieve a fully-interconnected topology for 8 GPUs.
2.5 SXM5 (Hopper H100/H200, 2022/2023)
| Item | Specification |
|---|---|
| Corresponding GPU | H100 (GH100) / H200 |
| Architecture | Hopper |
| Memory | 80GB HBM3 (H100) / 141GB HBM3e (H200) |
| Memory Bandwidth | 3.35 TB/s (H100) / 4.8 TB/s (H200) |
| TDP | 700W |
| NVLink Version | NVLink 4.0 |
| NVLink Bandwidth | 900 GB/s (18 links, 50 GB/s per link) |
| Launch Product | DGX H100 |
| Process Node | TSMC 4N |
| Transistor Count | 80 billion |
| New Features | Transformer Engine, FP8, DPX instruction set |
SXM5 is the most widely deployed SXM specification to date. Key breakthroughs of H100 SXM5:
- NVLink 4.0 Bandwidth: 900 GB/s bidirectional, 14 times that of PCIe 5.0 x16 (64 GB/s)
- 900W Power Delivery Capability: Although nominal TDP is 700W, the SXM5 socket’s power delivery capability can reach 900W
- Transformer Engine: Mixed-precision matrix operation unit designed specifically for large language models (LLMs)
- NVSwitch 4.0: 4 NVSwitch chips fully interconnect 8 GPUs, with total bidirectional bandwidth exceeding 7.2 TB/s
- H200 Update: Same SXM5 socket, but memory upgraded to 141GB HBM3e, bandwidth 4.8 TB/s
SXM5 power delivery remains based on Vicor’s 48V architecture, using Vicor MCM/MCD modules to convert 48V to GPU core voltage.
2.6 SXM6 (Blackwell B200/B300, 2024/2025)
| Item | Specification |
|---|---|
| Corresponding GPU | B200 / B300 (GB100/GB300) |
| Architecture | Blackwell / Blackwell Ultra |
| Memory | 192GB HBM3e (B200) / 288GB HBM3e (B300) |
| Memory Bandwidth | 8 TB/s (B200) / 8 TB/s (B300) |
| TDP | 1000W-1200W (B200) / 1400W (B300) |
| NVLink Version | NVLink 5.0 |
| NVLink Bandwidth | 1.8 TB/s |
| Process Node | TSMC 4NP |
| Transistor Count | 208 billion (dual-die design) |
| Launch Product | DGX B200 / GB200 NVL72 |
SXM6 represents the biggest leap in SXM history:
- Dual-Die Packaging: B200 is composed of two GB100 dies connected via an NVLink bridge (CoWoS-L packaging); a single GPU is equivalent to two 104-billion-transistor chips
- Power Explosion: 1000W (air-cooled) / 1200W (liquid-cooled) — the power consumption of a single B200 GPU exceeds the total system power of many home PCs
- B300 Goes Further: 1400W TDP, 288GB HBM3e (12-layer stack), 15 PFLOPS FP4
- NVLink 5.0: Bidirectional 1.8 TB/s, supporting SHARP v4 in-network reduction
- Liquid Cooling Becomes Mandatory: B200 at 1000W TDP can still be air-cooled, but B300 at 1400W essentially requires liquid cooling
- Socketed Design Rumors: In October 2024, TrendForce reported that B300 may be the first to adopt a socketed design, allowing the GPU module to be user-replaceable (no longer soldered to the baseboard)
B200 vs B300 Difference: B200 (dual-die GB100, 192GB HBM3e, 1000W) shipped in H2 2024. B300 (Blackwell Ultra, 288GB HBM3e, 1400W) ships in H2 2025.
2.7 SXM7 (Rubin R100, 2026)
| Item | Specification |
|---|---|
| Corresponding GPU | R100 (GR100) |
| Architecture | Rubin |
| Memory | 288GB HBM4 |
| Memory Bandwidth | To be announced |
| TDP | Expected 1500W+ |
| NVLink Version | NVLink 6.0 |
| NVLink Bandwidth | Over 2 TB/s |
| Process Node | To be announced (TSMC N3 series) |
| Launch Product | Vera Rubin NVL144 |
Rubin is NVIDIA’s next-generation GPU architecture after Blackwell, named after astronomer Vera Rubin:
- Two RDL (Re-Distribution Layer, which will assemble multiple chips and HBM memory on a single interposer substrate) GR100 dies in one SXM7 socket
- First GPU to use HBM4 (JEDEC HBM4 standard)
- Launched alongside NVLink 6.0, with GPU-to-GPU bandwidth exceeding 2 TB/s
- Vera Rubin NVL144 platform fully interconnects 144 Rubin GPUs via NVLink 6
2.8 SXM8 (Rubin Ultra, 2027)
| Item | Specification |
|---|---|
| Corresponding GPU | VR200 or Rubin Ultra |
| Memory | 1TB HBM4/HBM4e |
| FP4 Performance | 100 PFLOPS |
| TDP | Expected 2000W+ |
| NVLink Version | NVLink 6 / 7 |
| Launch Product | Vera Rubin Ultra NVL576 |
Rubin Ultra places 4 RDL GPU dies into a single socket, achieving 100 PFLOPS FP4 and 1TB of memory. The NVL576 platform will connect 576 GPUs.
2.9 Feynman (2028+)
NVIDIA announced the Feynman architecture (named after physicist Richard Feynman) at GTC 2026, which will be the next major evolution of SXM:
- 3D Stacking: First adoption of vertical GPU die stacking design, breaking through single-die area limitations
- Custom HBM (C-HBM4E): Custom high-bandwidth memory, exceeding 1TB per GPU
- Optical NVLink: On-die integrated optical interconnect, eliminating copper cables in data centers
- Paired with Rosa CPU: Feynman GPU paired with Rosa CPU (successor to Vera) forming a superchip
- Integrated Groq LPU: Groq LP40 engine will join the NVLink port
It remains undetermined what SXM designation Feynman will use (SXM9 or a next-generation interconnect solution), but it will certainly drive a paradigm shift in data center interconnects from copper to optical.
3. Physical Interface & Electrical Specifications
3.1 Connector: Amphenol MEG-Array
All SXM modules use Amphenol MEG-Array mezzanine connectors. This is the core component of the SXM physical layer.
| Characteristic | Description |
|---|---|
| Manufacturer | Amphenol Communications Solutions |
| Series | MEG-Array (Mezzanine Grid Array) |
| Pin Pitch | 1.27mm × 1.27mm array |
| Signal Speed | Over 10 Gb/s |
| Soldering Method | Surface Mount (SMT) |
| Key Features | High density, high speed, reliability superior to PCIe gold fingers |
MEG-Array is a dual-sided array connector; the GPU mezzanine card and baseboard each have one half. Features include:
- Flexible ground distribution design to optimize signal integrity
- Supports high-speed signal transmission exceeding 10 Gb/s
- Large-scale array structure providing high-density connections
- Standard surface mount process reduces manufacturing costs
Pin Count by Generation:
- SXM2: 400-pin Amphenol MEG-Array
- SXM3/SXM4/SXM5: 400-600 pins (exact count is NVIDIA confidential), dual high-density arrays
Regarding Specification Confidentiality: The precise pin definitions, dimensions, and signal assignments of SXM connectors are NVIDIA’s trade secrets. As one engineer stated on the NVIDIA Developer Forums: “The specifications for SXM2, SXM3, SXM4, SXM5 connectors seem to be a trade secret, and you cannot discuss it unless all involved parties are PCI-SIG members.”
3.2 Power Architecture
The core evolutionary path of SXM power architecture is from 12V to 48V, along with the introduction of Vicor high-efficiency power modules.
48V Power Topology (SXM3+)
SXM2 and earlier GPUs used standard 12V input. Starting from SXM3 (V100 32GB), NVIDIA switched to a 48V power architecture:
DC-DC Conversion Chain:
48V PSU → Vicor BCM (Bus Converter Module) → 48V→12V (MCD) → 12V→Core Voltage (MCM)
Vicor Module Deep Dive:
- BCM (Bus Converter Module): Responsible for efficiently converting the PSU’s 48V to an intermediate bus voltage
- MCM (Multi-Chip Module): GPU core voltage regulation module, converting 12V to the low-voltage, high-current required by the GPU core
- MCD (Multi-Chip Driver): Driver module working in conjunction with the MCM
Advantages of this architecture:
- 48V Transmission Reduces Current: At the same power, 48V current is only 1/4 that of 12V, significantly reducing PCB copper losses and heat generation
- Higher Power Efficiency: Vicor module conversion efficiency is typically above 95%
- Modular Design: Standardized power modules facilitate scaling to high-power GPUs
Power Evolution by Generation
| Generation | GPU | TDP | Power Architecture | Typical Power Solution |
|---|---|---|---|---|
| SXM1 | P100 | 300W | 12V | Standard VRM |
| SXM2 | V100 16GB | 300W | 12V | Standard VRM |
| SXM3 | V100 32GB | 350W | 48V Vicor | Vicor BCM + MCM/MCD |
| SXM4 | A100 | 400W | 48V Vicor | Vicor MCM/MCD |
| SXM5 | H100 | 700W | 48V Vicor | Vicor Enhanced MCM/MCD |
| SXM5 | H200 | 700W | 48V Vicor | Same as above |
| SXM6 | B200 | 1000-1200W | 48V Vicor + Liquid Cooling | High-Power Vicor Modules |
| SXM6 | B300 | 1400W | 48V Vicor + Mandatory Liquid Cooling | Enhanced Power Delivery |
| SXM7 | R100 | 1500W+ | To be announced | Higher Density Power Delivery |
Taking the 1000W TDP B200 as an example, under the 48V power architecture, only about 21A of current is required; if using traditional 12V, 83A would be needed — this would require extremely thick power cables and massive PCB copper layers.
4. Complete SXM Specification Table
| Specification | SXM1 (P100) | SXM2 (V100 16G) | SXM3 (V100 32G) | SXM4 (A100) | SXM5 (H100) | SXM6 (B200) | SXM6 (B300) | SXM7 (R100) |
|---|---|---|---|---|---|---|---|---|
| Release Date | Q2 2016 | Q3 2017 | Q3 2018 | Q1 2020 | Q3 2022 | Q4 2024 | H2 2025 | 2026 |
| Architecture | Pascal | Volta | Volta | Ampere | Hopper | Blackwell | Blackwell Ultra | Rubin |
| GPU Core | GP100 | GV100 | GV100 | GA100 | GH100 | GB100×2 | GB300×2 | GR100×2 |
| Transistors | 15.3B | 21.1B | 21.1B | 54.2B | 80B | 208B | - | - |
| Process Node | 16FF+ | 12FFN | 12FFN | N7 | 4N | 4NP | 4NP | N3? |
| HBM Type | HBM2 | HBM2 | HBM2 | HBM2/HBM2e | HBM3 | HBM3e | HBM3e | HBM4 |
| Memory Capacity | 16GB | 16GB | 32GB | 40/80GB | 80GB | 192GB | 288GB | ~288GB |
| Memory Bandwidth | 720GB/s | 900GB/s | 900GB/s | 1.6-2.0TB/s | 3.35TB/s | 8TB/s | 8TB/s | - |
| TDP | 300W | 300W | 350W | 400W | 700W | 1000-1200W | 1400W | 1500W+ |
| NVLink Version | 1.0 | 2.0 | 2.0 | 3.0 | 4.0 | 5.0 | 5.0 | 6.0 |
| NVLink Bandwidth | 160GB/s | 300GB/s | 300GB/s | 600GB/s | 900GB/s | 1.8TB/s | 1.8TB/s | >2TB/s |
| Power Architecture | 12V | 12V | 48V Vicor | 48V Vicor | 48V Vicor | 48V Vicor | 48V Vicor | - |
| Cooling | Air | Air | Air | Air | Air/Liquid | Air/Liquid | Liquid Only | Liquid Only |
| FP64 (TFLOPS) | 5.3 | 7.8 | 7.8 | 9.7 | 34 | 40 | - | - |
| FP32 (TFLOPS) | 10.6 | 15.7 | 15.7 | 19.5 | 67 | - | - | - |
| FP16 Tensor | - | 125 | 125 | 312 | 990 | - | - | - |
| FP8 | - | - | - | 624 | 1,979 | 4,500 | 7,000 | - |
| FP4 | - | - | - | - | - | 9,000 | 15,000 | - |
| MIG | ❌ | ❌ | ❌ | ✅ (7 instances) | ✅ (7 instances) | ✅ | ✅ | ✅ |
| Connector | - | 400-pin MEG-Array | Updated MEG-Array | MEG-Array | MEG-Array | MEG-Array | MEG-Array | - |
| Launch Platform | DGX-1 | DGX-1 V100 | DGX-2 | DGX A100 | DGX H100 | DGX B200 | GB300 NVL72 | VR NVL144 |
5. Future Roadmap
2026: Rubin / Vera
Rubin R100 (SXM7)
├─ Dual GR100 dies
├─ HBM4 288GB
├─ NVLink 6.0 >2TB/s
└─ Vera Rubin NVL144 (144 GPU fully interconnected)
The Vera Rubin platform is the first time NVIDIA deeply integrates the Vera CPU, successor to Grace CPU, with Rubin GPU. Vera CPU communicates with Rubin GPU using the NVLink Chip-to-Chip (C2C) protocol, with 1.8 TB/s bandwidth.
2027: Rubin Ultra
Rubin Ultra (SXM8)
├─ Four GR100 dies (4 RDL package)
├─ 1TB HBM4/HBM4e
├─ 100 PFLOPS FP4
└─ Vera Rubin Ultra NVL576 (576 GPU)
Rubin Ultra’s 4-die package further pushes the process of delaying the end of Moore’s Law. The NVL576 with 576 GPUs will be the largest single-domain GPU cluster to date.
2028: Feynman + Rosa
Feynman (SXM9? / New Interconnect Solution)
├─ 3D stacked GPU dies
├─ C-HBM4E custom memory >1TB
├─ Optical NVLink (on-die integrated optical engine)
├─ Paired with Rosa CPU
└─ Integrated Groq LP40 (NVLink + NVFP4)
Feynman’s Optical NVLink will be a revolutionary change in data center interconnects — no longer using copper cables for high-speed signal transmission, but instead integrating optical engines directly into the GPU package. This will solve the signal integrity, power consumption, and distance limitations of copper cables at ultra-high bandwidths.
6. References
- Wikipedia - SXM (socket): https://en.wikipedia.org/wiki/SXM_(socket)
- Grokipedia - SXM (socket): https://grokipedia.com/page/SXM_(socket)
- NVIDIA HGX Platform: https://www.nvidia.com/en-us/data-center/hgx/
- NVIDIA DGX Systems: https://www.nvidia.com/en-us/data-center/dgx-b200/
- Exxact Corp - SXM vs PCIe: https://www.exxactcorp.com/blog/deep-learning/sxm-vs-pcie-gpus-best-for-training-llms-like-gpt-4
- l4rz - Running SXM GPUs in Consumer PCs: https://github.com/l4rz/running-nvidia-sxm-gpus-in-consumer-pcs
- Amphenol MEG-Array: https://www.amphenol-cs.com/product-series/meg-array.html
- Reverse Engineering SXM2: https://bbenchoff.github.io/pages/SXM2PCIe.html
- NVIDIA Developer Forums - SXM Specs: https://forums.developer.nvidia.com/t/s-x-m-specifications/238960
- Tom’s Hardware - B300 Socket: https://www.tomshardware.com/tech-industry/artificial-intelligence/nvidia-reportedly-mulls-socketed-design-for-blackwell-b300-ai-gpus
- The Next Platform - NVIDIA Roadmap: https://www.nextplatform.com/compute/2025/03/19/nvidia-draws-gpu-system-roadmap-out-to-2028/1653528
- Tom’s Hardware - Vera Rubin: https://www.tomshardware.com/pc-components/gpus/nvidias-vera-rubin-platform-in-depth
- TrendForce - NVIDIA Socket Design: https://www.trendforce.com/news/2024/10/11/news-nvidia-rumored-to-switch-to-gpu-socket-design-with-300-series
- Wikipedia - Feynman microarchitecture: https://en.wikipedia.org/wiki/Feynman_(microarchitecture)
- Tom’s Hardware - Feynman Details: https://www.tomshardware.com/pc-components/gpus/nvidia-updates-data-center-roadmap-with-rosa-cpu-and-stacked-feynman-gpus
- Lenovo Press - HGX B200 Guide: https://lenovopress.lenovo.com/lp2226-thinksystem-nvidia-b200-180gb-1000w-gpu
- FiberMall - HGX B200 Cooling: https://www.fibermall.com/blog/nvidia-hgx-b200-cooling-solution.htm
- NVIDIA Technical Blog - HGX H100: https://developer.nvidia.com/blog/introducing-nvidia-hgx-h100-an-accelerated-server-platform-for-ai-and-high-performance-computing/
- NVIDIA Datasheet - DGX B200: https://resources.nvidia.com/en-us-dgx-systems/dgx-b200-datasheet
- NVIDIA - NVLink & NVSwitch: https://www.nvidia.com/en-us/data-center/nvlink/