NVIDIA SXM

1. SXM Overview

SXM stands for Server PCI Express Module, which is NVIDIA’s proprietary high-bandwidth GPU socket/connector solution designed for mounting data-center-class GPU accelerators directly onto server motherboards.

Core Design Philosophy

  • Proprietary: SXM is NVIDIA’s closed proprietary interface standard with undisclosed specifications (requires NDA, Non-Disclosure Agreement), giving NVIDIA complete design freedom

  • High Bandwidth: Direct GPU-to-GPU interconnection via NVLink, with bandwidth far exceeding PCIe

  • High Power: Not limited by the PCIe standard 75W/300W limits; directly powered through the socket up to 700W-1400W+

  • High Density: A single HGX baseboard (NVLink Switch, power delivery, and cooling base) can accommodate 4 or 8 GPUs

  • Modularity: GPUs are mounted horizontally as mezzanine cards for easy integration

Why SXM?

Traditional PCIe slots were designed for general-purpose expansion cards (network cards, storage cards, GPUs, etc.) and have the following bottlenecks:

  • Power Limitations: PCIe slot standard power delivery is only 75W; even with auxiliary power cables, the PCIe specification restricts the overall power solution
  • Bandwidth Limitations: PCIe x16 bandwidth is far lower than NVLink, unable to meet the demands of large-scale multi-GPU parallel training
  • Topology Limitations: Under PCIe tree topology, GPU-to-GPU communication must pass through the CPU, resulting in high latency and limited bandwidth
  • Density Limitations: Standard PCIe cards are mounted vertically, occupying significant space, with complex cooling and power delivery design

By breaking these limitations, SXM enables 8 GPUs in DGX/HGX systems to work together as a single giant GPU.


2. SXM Generational Evolution

2.1 SXM1 (Pascal P100, 2016)

SXM was first introduced with the Tesla P100 (GP100 core) in the DGX-1 system.

ItemSpecification
Corresponding GPUTesla P100 (GP100)
ArchitecturePascal
Memory16GB HBM2
Memory Bandwidth720 GB/s
TDP300W
NVLink VersionNVLink 1.0
NVLink Bandwidth160 GB/s (4 links, 40 GB/s per link)
Launch ProductDGX-1
Process NodeTSMC 16FF+
Transistor Count15.3 billion

P100 was the first GPU equipped with NVLink; the 4 GPUs on the SXM module were interconnected via NVLink in a hybrid cube mesh topology.

2.2 SXM2 (Volta V100 16GB, 2017)

ItemSpecification
Corresponding GPUTesla V100 16GB (GV100)
ArchitectureVolta
Memory16GB HBM2
Memory Bandwidth900 GB/s
TDP300W
NVLink VersionNVLink 2.0
NVLink Bandwidth300 GB/s (6 links, 50 GB/s per link)
ConnectorAmphenol MEG-Array 400-pin
Launch ProductDGX-1 V100
Process NodeTSMC 12FFN
Transistor Count21.1 billion

V100 introduced Tensor Cores, delivering a 12x improvement in AI training performance compared to P100. SXM2 used a 400-pin Amphenol MEG-Array connector, upgrading from P100’s NVLink mesh topology to an NVLink fully-connected topology.

2.3 SXM3 (Volta V100 32GB, 2018)

ItemSpecification
Corresponding GPUTesla V100 32GB (GV100)
ArchitectureVolta (same core as V100)
Memory32GB HBM2
Memory Bandwidth900 GB/s
TDP350W
NVLink VersionNVLink 2.0
NVLink Bandwidth300 GB/s
Power Architecture48V input (different from 12V)
ConnectorUpdated Amphenol MEG-Array (more robust)
Launch ProductDGX-2

Key changes in SXM3:

  • Connector Upgrade: While physically similar to SXM2, it used a more robust MEG-Array connector (different pin configuration)
  • 48V Power Delivery: This was the biggest architectural change — switching from traditional 12V to 48V power architecture, significantly reducing current losses
  • Vicor Modules: Introduced Vicor MCM/MCD (Multi-Chip Module/Driver) power modules, which became standard design in subsequent SXM4/5/6
  • TDP Increase: Raised from 300W to 350W, laying the foundation for higher-power GPUs

SXM3 was also the first time NVIDIA shipped standardized HGX baseboards to OEMs. OEMs could directly purchase pre-assembled 4-GPU baseboards, greatly reducing integration complexity.

A historical detail about SXM numbering: The P100 used an SXM module (without a suffix number), but was referred to as “SXM2” specification. In fact, the P100’s SXM in DGX-1 was called SXM-2 specification (capable of upgrading to V100’s SXM2 module). Sources like Wikipedia mark the socket used by P100 as SXM, while marking the P100 module itself as SXM2. A more accurate understanding is: P100 is SXM1 (first generation), V100 16GB is SXM2, V100 32GB is SXM3.

2.4 SXM4 (Ampere A100, 2020)

ItemSpecification
Corresponding GPUA100 40GB / 80GB (GA100)
ArchitectureAmpere
Memory40GB HBM2 / 80GB HBM2e
Memory Bandwidth1.6 TB/s (HBM2) / 2.0 TB/s (HBM2e)
TDP400W
NVLink VersionNVLink 3.0
NVLink Bandwidth600 GB/s (12 links, 50 GB/s per link)
Launch ProductDGX A100
Process NodeTSMC N7
Transistor Count54.2 billion
MIGSupported (up to 7 instances)
NVSwitch3rd Gen, supporting SHARP in-network reduction

A100 was the first GPU to support MIG (Multi-Instance GPU), capable of partitioning a single GPU into up to 7 independent instances. The SXM4 baseboard used NVIDIA’s Redstone 4-GPU baseboard design; in DGX A100, each baseboard carried 4 GPUs, with two baseboards achieving an 8-GPU configuration.

NVLink 3.0 provided 600 GB/s bidirectional bandwidth per GPU, working with NVSwitch to achieve a fully-interconnected topology for 8 GPUs.

2.5 SXM5 (Hopper H100/H200, 2022/2023)

ItemSpecification
Corresponding GPUH100 (GH100) / H200
ArchitectureHopper
Memory80GB HBM3 (H100) / 141GB HBM3e (H200)
Memory Bandwidth3.35 TB/s (H100) / 4.8 TB/s (H200)
TDP700W
NVLink VersionNVLink 4.0
NVLink Bandwidth900 GB/s (18 links, 50 GB/s per link)
Launch ProductDGX H100
Process NodeTSMC 4N
Transistor Count80 billion
New FeaturesTransformer Engine, FP8, DPX instruction set

SXM5 is the most widely deployed SXM specification to date. Key breakthroughs of H100 SXM5:

  • NVLink 4.0 Bandwidth: 900 GB/s bidirectional, 14 times that of PCIe 5.0 x16 (64 GB/s)
  • 900W Power Delivery Capability: Although nominal TDP is 700W, the SXM5 socket’s power delivery capability can reach 900W
  • Transformer Engine: Mixed-precision matrix operation unit designed specifically for large language models (LLMs)
  • NVSwitch 4.0: 4 NVSwitch chips fully interconnect 8 GPUs, with total bidirectional bandwidth exceeding 7.2 TB/s
  • H200 Update: Same SXM5 socket, but memory upgraded to 141GB HBM3e, bandwidth 4.8 TB/s

SXM5 power delivery remains based on Vicor’s 48V architecture, using Vicor MCM/MCD modules to convert 48V to GPU core voltage.

2.6 SXM6 (Blackwell B200/B300, 2024/2025)

ItemSpecification
Corresponding GPUB200 / B300 (GB100/GB300)
ArchitectureBlackwell / Blackwell Ultra
Memory192GB HBM3e (B200) / 288GB HBM3e (B300)
Memory Bandwidth8 TB/s (B200) / 8 TB/s (B300)
TDP1000W-1200W (B200) / 1400W (B300)
NVLink VersionNVLink 5.0
NVLink Bandwidth1.8 TB/s
Process NodeTSMC 4NP
Transistor Count208 billion (dual-die design)
Launch ProductDGX B200 / GB200 NVL72

SXM6 represents the biggest leap in SXM history:

  • Dual-Die Packaging: B200 is composed of two GB100 dies connected via an NVLink bridge (CoWoS-L packaging); a single GPU is equivalent to two 104-billion-transistor chips
  • Power Explosion: 1000W (air-cooled) / 1200W (liquid-cooled) — the power consumption of a single B200 GPU exceeds the total system power of many home PCs
  • B300 Goes Further: 1400W TDP, 288GB HBM3e (12-layer stack), 15 PFLOPS FP4
  • NVLink 5.0: Bidirectional 1.8 TB/s, supporting SHARP v4 in-network reduction
  • Liquid Cooling Becomes Mandatory: B200 at 1000W TDP can still be air-cooled, but B300 at 1400W essentially requires liquid cooling
  • Socketed Design Rumors: In October 2024, TrendForce reported that B300 may be the first to adopt a socketed design, allowing the GPU module to be user-replaceable (no longer soldered to the baseboard)

B200 vs B300 Difference: B200 (dual-die GB100, 192GB HBM3e, 1000W) shipped in H2 2024. B300 (Blackwell Ultra, 288GB HBM3e, 1400W) ships in H2 2025.

2.7 SXM7 (Rubin R100, 2026)

ItemSpecification
Corresponding GPUR100 (GR100)
ArchitectureRubin
Memory288GB HBM4
Memory BandwidthTo be announced
TDPExpected 1500W+
NVLink VersionNVLink 6.0
NVLink BandwidthOver 2 TB/s
Process NodeTo be announced (TSMC N3 series)
Launch ProductVera Rubin NVL144

Rubin is NVIDIA’s next-generation GPU architecture after Blackwell, named after astronomer Vera Rubin:

  • Two RDL (Re-Distribution Layer, which will assemble multiple chips and HBM memory on a single interposer substrate) GR100 dies in one SXM7 socket
  • First GPU to use HBM4 (JEDEC HBM4 standard)
  • Launched alongside NVLink 6.0, with GPU-to-GPU bandwidth exceeding 2 TB/s
  • Vera Rubin NVL144 platform fully interconnects 144 Rubin GPUs via NVLink 6

2.8 SXM8 (Rubin Ultra, 2027)

ItemSpecification
Corresponding GPUVR200 or Rubin Ultra
Memory1TB HBM4/HBM4e
FP4 Performance100 PFLOPS
TDPExpected 2000W+
NVLink VersionNVLink 6 / 7
Launch ProductVera Rubin Ultra NVL576

Rubin Ultra places 4 RDL GPU dies into a single socket, achieving 100 PFLOPS FP4 and 1TB of memory. The NVL576 platform will connect 576 GPUs.

2.9 Feynman (2028+)

NVIDIA announced the Feynman architecture (named after physicist Richard Feynman) at GTC 2026, which will be the next major evolution of SXM:

  • 3D Stacking: First adoption of vertical GPU die stacking design, breaking through single-die area limitations
  • Custom HBM (C-HBM4E): Custom high-bandwidth memory, exceeding 1TB per GPU
  • Optical NVLink: On-die integrated optical interconnect, eliminating copper cables in data centers
  • Paired with Rosa CPU: Feynman GPU paired with Rosa CPU (successor to Vera) forming a superchip
  • Integrated Groq LPU: Groq LP40 engine will join the NVLink port

It remains undetermined what SXM designation Feynman will use (SXM9 or a next-generation interconnect solution), but it will certainly drive a paradigm shift in data center interconnects from copper to optical.


3. Physical Interface & Electrical Specifications

3.1 Connector: Amphenol MEG-Array

All SXM modules use Amphenol MEG-Array mezzanine connectors. This is the core component of the SXM physical layer.

CharacteristicDescription
ManufacturerAmphenol Communications Solutions
SeriesMEG-Array (Mezzanine Grid Array)
Pin Pitch1.27mm × 1.27mm array
Signal SpeedOver 10 Gb/s
Soldering MethodSurface Mount (SMT)
Key FeaturesHigh density, high speed, reliability superior to PCIe gold fingers

MEG-Array is a dual-sided array connector; the GPU mezzanine card and baseboard each have one half. Features include:

  • Flexible ground distribution design to optimize signal integrity
  • Supports high-speed signal transmission exceeding 10 Gb/s
  • Large-scale array structure providing high-density connections
  • Standard surface mount process reduces manufacturing costs

Pin Count by Generation:

  • SXM2: 400-pin Amphenol MEG-Array
  • SXM3/SXM4/SXM5: 400-600 pins (exact count is NVIDIA confidential), dual high-density arrays

Regarding Specification Confidentiality: The precise pin definitions, dimensions, and signal assignments of SXM connectors are NVIDIA’s trade secrets. As one engineer stated on the NVIDIA Developer Forums: “The specifications for SXM2, SXM3, SXM4, SXM5 connectors seem to be a trade secret, and you cannot discuss it unless all involved parties are PCI-SIG members.”

3.2 Power Architecture

The core evolutionary path of SXM power architecture is from 12V to 48V, along with the introduction of Vicor high-efficiency power modules.

48V Power Topology (SXM3+)

SXM2 and earlier GPUs used standard 12V input. Starting from SXM3 (V100 32GB), NVIDIA switched to a 48V power architecture:

DC-DC Conversion Chain:
48V PSU → Vicor BCM (Bus Converter Module) → 48V→12V (MCD) → 12V→Core Voltage (MCM)

Vicor Module Deep Dive:

  • BCM (Bus Converter Module): Responsible for efficiently converting the PSU’s 48V to an intermediate bus voltage
  • MCM (Multi-Chip Module): GPU core voltage regulation module, converting 12V to the low-voltage, high-current required by the GPU core
  • MCD (Multi-Chip Driver): Driver module working in conjunction with the MCM

Advantages of this architecture:

  • 48V Transmission Reduces Current: At the same power, 48V current is only 1/4 that of 12V, significantly reducing PCB copper losses and heat generation
  • Higher Power Efficiency: Vicor module conversion efficiency is typically above 95%
  • Modular Design: Standardized power modules facilitate scaling to high-power GPUs

Power Evolution by Generation

GenerationGPUTDPPower ArchitectureTypical Power Solution
SXM1P100300W12VStandard VRM
SXM2V100 16GB300W12VStandard VRM
SXM3V100 32GB350W48V VicorVicor BCM + MCM/MCD
SXM4A100400W48V VicorVicor MCM/MCD
SXM5H100700W48V VicorVicor Enhanced MCM/MCD
SXM5H200700W48V VicorSame as above
SXM6B2001000-1200W48V Vicor + Liquid CoolingHigh-Power Vicor Modules
SXM6B3001400W48V Vicor + Mandatory Liquid CoolingEnhanced Power Delivery
SXM7R1001500W+To be announcedHigher Density Power Delivery

Taking the 1000W TDP B200 as an example, under the 48V power architecture, only about 21A of current is required; if using traditional 12V, 83A would be needed — this would require extremely thick power cables and massive PCB copper layers.

4. Complete SXM Specification Table

SpecificationSXM1 (P100)SXM2 (V100 16G)SXM3 (V100 32G)SXM4 (A100)SXM5 (H100)SXM6 (B200)SXM6 (B300)SXM7 (R100)
Release DateQ2 2016Q3 2017Q3 2018Q1 2020Q3 2022Q4 2024H2 20252026
ArchitecturePascalVoltaVoltaAmpereHopperBlackwellBlackwell UltraRubin
GPU CoreGP100GV100GV100GA100GH100GB100×2GB300×2GR100×2
Transistors15.3B21.1B21.1B54.2B80B208B--
Process Node16FF+12FFN12FFNN74N4NP4NPN3?
HBM TypeHBM2HBM2HBM2HBM2/HBM2eHBM3HBM3eHBM3eHBM4
Memory Capacity16GB16GB32GB40/80GB80GB192GB288GB~288GB
Memory Bandwidth720GB/s900GB/s900GB/s1.6-2.0TB/s3.35TB/s8TB/s8TB/s-
TDP300W300W350W400W700W1000-1200W1400W1500W+
NVLink Version1.02.02.03.04.05.05.06.0
NVLink Bandwidth160GB/s300GB/s300GB/s600GB/s900GB/s1.8TB/s1.8TB/s>2TB/s
Power Architecture12V12V48V Vicor48V Vicor48V Vicor48V Vicor48V Vicor-
CoolingAirAirAirAirAir/LiquidAir/LiquidLiquid OnlyLiquid Only
FP64 (TFLOPS)5.37.87.89.73440--
FP32 (TFLOPS)10.615.715.719.567---
FP16 Tensor-125125312990---
FP8---6241,9794,5007,000-
FP4-----9,00015,000-
MIG✅ (7 instances)✅ (7 instances)
Connector-400-pin MEG-ArrayUpdated MEG-ArrayMEG-ArrayMEG-ArrayMEG-ArrayMEG-Array-
Launch PlatformDGX-1DGX-1 V100DGX-2DGX A100DGX H100DGX B200GB300 NVL72VR NVL144

5. Future Roadmap

2026: Rubin / Vera

Rubin R100 (SXM7)
  ├─ Dual GR100 dies
  ├─ HBM4 288GB
  ├─ NVLink 6.0 >2TB/s
  └─ Vera Rubin NVL144 (144 GPU fully interconnected)

The Vera Rubin platform is the first time NVIDIA deeply integrates the Vera CPU, successor to Grace CPU, with Rubin GPU. Vera CPU communicates with Rubin GPU using the NVLink Chip-to-Chip (C2C) protocol, with 1.8 TB/s bandwidth.

2027: Rubin Ultra

Rubin Ultra (SXM8)
  ├─ Four GR100 dies (4 RDL package)
  ├─ 1TB HBM4/HBM4e
  ├─ 100 PFLOPS FP4
  └─ Vera Rubin Ultra NVL576 (576 GPU)

Rubin Ultra’s 4-die package further pushes the process of delaying the end of Moore’s Law. The NVL576 with 576 GPUs will be the largest single-domain GPU cluster to date.

2028: Feynman + Rosa

Feynman (SXM9? / New Interconnect Solution)
  ├─ 3D stacked GPU dies
  ├─ C-HBM4E custom memory >1TB
  ├─ Optical NVLink (on-die integrated optical engine)
  ├─ Paired with Rosa CPU
  └─ Integrated Groq LP40 (NVLink + NVFP4)

Feynman’s Optical NVLink will be a revolutionary change in data center interconnects — no longer using copper cables for high-speed signal transmission, but instead integrating optical engines directly into the GPU package. This will solve the signal integrity, power consumption, and distance limitations of copper cables at ultra-high bandwidths.


6. References

  1. Wikipedia - SXM (socket): https://en.wikipedia.org/wiki/SXM_(socket)
  2. Grokipedia - SXM (socket): https://grokipedia.com/page/SXM_(socket)
  3. NVIDIA HGX Platform: https://www.nvidia.com/en-us/data-center/hgx/
  4. NVIDIA DGX Systems: https://www.nvidia.com/en-us/data-center/dgx-b200/
  5. Exxact Corp - SXM vs PCIe: https://www.exxactcorp.com/blog/deep-learning/sxm-vs-pcie-gpus-best-for-training-llms-like-gpt-4
  6. l4rz - Running SXM GPUs in Consumer PCs: https://github.com/l4rz/running-nvidia-sxm-gpus-in-consumer-pcs
  7. Amphenol MEG-Array: https://www.amphenol-cs.com/product-series/meg-array.html
  8. Reverse Engineering SXM2: https://bbenchoff.github.io/pages/SXM2PCIe.html
  9. NVIDIA Developer Forums - SXM Specs: https://forums.developer.nvidia.com/t/s-x-m-specifications/238960
  10. Tom’s Hardware - B300 Socket: https://www.tomshardware.com/tech-industry/artificial-intelligence/nvidia-reportedly-mulls-socketed-design-for-blackwell-b300-ai-gpus
  11. The Next Platform - NVIDIA Roadmap: https://www.nextplatform.com/compute/2025/03/19/nvidia-draws-gpu-system-roadmap-out-to-2028/1653528
  12. Tom’s Hardware - Vera Rubin: https://www.tomshardware.com/pc-components/gpus/nvidias-vera-rubin-platform-in-depth
  13. TrendForce - NVIDIA Socket Design: https://www.trendforce.com/news/2024/10/11/news-nvidia-rumored-to-switch-to-gpu-socket-design-with-300-series
  14. Wikipedia - Feynman microarchitecture: https://en.wikipedia.org/wiki/Feynman_(microarchitecture)
  15. Tom’s Hardware - Feynman Details: https://www.tomshardware.com/pc-components/gpus/nvidia-updates-data-center-roadmap-with-rosa-cpu-and-stacked-feynman-gpus
  16. Lenovo Press - HGX B200 Guide: https://lenovopress.lenovo.com/lp2226-thinksystem-nvidia-b200-180gb-1000w-gpu
  17. FiberMall - HGX B200 Cooling: https://www.fibermall.com/blog/nvidia-hgx-b200-cooling-solution.htm
  18. NVIDIA Technical Blog - HGX H100: https://developer.nvidia.com/blog/introducing-nvidia-hgx-h100-an-accelerated-server-platform-for-ai-and-high-performance-computing/
  19. NVIDIA Datasheet - DGX B200: https://resources.nvidia.com/en-us-dgx-systems/dgx-b200-datasheet
  20. NVIDIA - NVLink & NVSwitch: https://www.nvidia.com/en-us/data-center/nvlink/