NVIDIA SXM

1. SXM Overview

SXM stands for Server PCI Express Module, which is NVIDIA’s proprietary high-bandwidth GPU socket/connector solution designed for mounting data-center-class GPU accelerators directly onto server motherboards.

Core Design Philosophy

Proprietary: SXM is NVIDIA’s closed proprietary interface standard with undisclosed specifications (requires NDA, Non-Disclosure Agreement), giving NVIDIA complete design freedom
High Bandwidth: Direct GPU-to-GPU interconnection via NVLink, with bandwidth far exceeding PCIe
High Power: Not limited by the PCIe standard 75W/300W limits; directly powered through the socket up to 700W-1400W+
High Density: A single HGX baseboard (NVLink Switch, power delivery, and cooling base) can accommodate 4 or 8 GPUs
Modularity: GPUs are mounted horizontally as mezzanine cards for easy integration

Why SXM?

Traditional PCIe slots were designed for general-purpose expansion cards (network cards, storage cards, GPUs, etc.) and have the following bottlenecks:

Power Limitations: PCIe slot standard power delivery is only 75W; even with auxiliary power cables, the PCIe specification restricts the overall power solution
Bandwidth Limitations: PCIe x16 bandwidth is far lower than NVLink, unable to meet the demands of large-scale multi-GPU parallel training
Topology Limitations: Under PCIe tree topology, GPU-to-GPU communication must pass through the CPU, resulting in high latency and limited bandwidth
Density Limitations: Standard PCIe cards are mounted vertically, occupying significant space, with complex cooling and power delivery design

By breaking these limitations, SXM enables 8 GPUs in DGX/HGX systems to work together as a single giant GPU.

2. SXM Generational Evolution

2.1 SXM1 (Pascal P100, 2016)

SXM was first introduced with the Tesla P100 (GP100 core) in the DGX-1 system.

Item	Specification
Corresponding GPU	Tesla P100 (GP100)
Architecture	Pascal
Memory	16GB HBM2
Memory Bandwidth	720 GB/s
TDP	300W
NVLink Version	NVLink 1.0
NVLink Bandwidth	160 GB/s (4 links, 40 GB/s per link)
Launch Product	DGX-1
Process Node	TSMC 16FF+
Transistor Count	15.3 billion

P100 was the first GPU equipped with NVLink; the 4 GPUs on the SXM module were interconnected via NVLink in a hybrid cube mesh topology.

2.2 SXM2 (Volta V100 16GB, 2017)

Item	Specification
Corresponding GPU	Tesla V100 16GB (GV100)
Architecture	Volta
Memory	16GB HBM2
Memory Bandwidth	900 GB/s
TDP	300W
NVLink Version	NVLink 2.0
NVLink Bandwidth	300 GB/s (6 links, 50 GB/s per link)
Connector	Amphenol MEG-Array 400-pin
Launch Product	DGX-1 V100
Process Node	TSMC 12FFN
Transistor Count	21.1 billion

V100 introduced Tensor Cores, delivering a 12x improvement in AI training performance compared to P100. SXM2 used a 400-pin Amphenol MEG-Array connector, upgrading from P100’s NVLink mesh topology to an NVLink fully-connected topology.

2.3 SXM3 (Volta V100 32GB, 2018)

Item	Specification
Corresponding GPU	Tesla V100 32GB (GV100)
Architecture	Volta (same core as V100)
Memory	32GB HBM2
Memory Bandwidth	900 GB/s
TDP	350W
NVLink Version	NVLink 2.0
NVLink Bandwidth	300 GB/s
Power Architecture	48V input (different from 12V)
Connector	Updated Amphenol MEG-Array (more robust)
Launch Product	DGX-2

Key changes in SXM3:

Connector Upgrade: While physically similar to SXM2, it used a more robust MEG-Array connector (different pin configuration)
48V Power Delivery: This was the biggest architectural change — switching from traditional 12V to 48V power architecture, significantly reducing current losses
Vicor Modules: Introduced Vicor MCM/MCD (Multi-Chip Module/Driver) power modules, which became standard design in subsequent SXM4/5/6
TDP Increase: Raised from 300W to 350W, laying the foundation for higher-power GPUs

SXM3 was also the first time NVIDIA shipped standardized HGX baseboards to OEMs. OEMs could directly purchase pre-assembled 4-GPU baseboards, greatly reducing integration complexity.

A historical detail about SXM numbering: The P100 used an SXM module (without a suffix number), but was referred to as “SXM2” specification. In fact, the P100’s SXM in DGX-1 was called SXM-2 specification (capable of upgrading to V100’s SXM2 module). Sources like Wikipedia mark the socket used by P100 as SXM, while marking the P100 module itself as SXM2. A more accurate understanding is: P100 is SXM1 (first generation), V100 16GB is SXM2, V100 32GB is SXM3.

2.4 SXM4 (Ampere A100, 2020)

Item	Specification
Corresponding GPU	A100 40GB / 80GB (GA100)
Architecture	Ampere
Memory	40GB HBM2 / 80GB HBM2e
Memory Bandwidth	1.6 TB/s (HBM2) / 2.0 TB/s (HBM2e)
TDP	400W
NVLink Version	NVLink 3.0
NVLink Bandwidth	600 GB/s (12 links, 50 GB/s per link)
Launch Product	DGX A100
Process Node	TSMC N7
Transistor Count	54.2 billion
MIG	Supported (up to 7 instances)
NVSwitch	3rd Gen, supporting SHARP in-network reduction

A100 was the first GPU to support MIG (Multi-Instance GPU), capable of partitioning a single GPU into up to 7 independent instances. The SXM4 baseboard used NVIDIA’s Redstone 4-GPU baseboard design; in DGX A100, each baseboard carried 4 GPUs, with two baseboards achieving an 8-GPU configuration.

NVLink 3.0 provided 600 GB/s bidirectional bandwidth per GPU, working with NVSwitch to achieve a fully-interconnected topology for 8 GPUs.

2.5 SXM5 (Hopper H100/H200, 2022/2023)

Item	Specification
Corresponding GPU	H100 (GH100) / H200
Architecture	Hopper
Memory	80GB HBM3 (H100) / 141GB HBM3e (H200)
Memory Bandwidth	3.35 TB/s (H100) / 4.8 TB/s (H200)
TDP	700W
NVLink Version	NVLink 4.0
NVLink Bandwidth	900 GB/s (18 links, 50 GB/s per link)
Launch Product	DGX H100
Process Node	TSMC 4N
Transistor Count	80 billion
New Features	Transformer Engine, FP8, DPX instruction set

SXM5 is the most widely deployed SXM specification to date. Key breakthroughs of H100 SXM5:

NVLink 4.0 Bandwidth: 900 GB/s bidirectional, 14 times that of PCIe 5.0 x16 (64 GB/s)
900W Power Delivery Capability: Although nominal TDP is 700W, the SXM5 socket’s power delivery capability can reach 900W
Transformer Engine: Mixed-precision matrix operation unit designed specifically for large language models (LLMs)
NVSwitch 4.0: 4 NVSwitch chips fully interconnect 8 GPUs, with total bidirectional bandwidth exceeding 7.2 TB/s
H200 Update: Same SXM5 socket, but memory upgraded to 141GB HBM3e, bandwidth 4.8 TB/s

SXM5 power delivery remains based on Vicor’s 48V architecture, using Vicor MCM/MCD modules to convert 48V to GPU core voltage.

2.6 SXM6 (Blackwell B200/B300, 2024/2025)

Item	Specification
Corresponding GPU	B200 / B300 (GB100/GB300)
Architecture	Blackwell / Blackwell Ultra
Memory	192GB HBM3e (B200) / 288GB HBM3e (B300)
Memory Bandwidth	8 TB/s (B200) / 8 TB/s (B300)
TDP	1000W-1200W (B200) / 1400W (B300)
NVLink Version	NVLink 5.0
NVLink Bandwidth	1.8 TB/s
Process Node	TSMC 4NP
Transistor Count	208 billion (dual-die design)
Launch Product	DGX B200 / GB200 NVL72

SXM6 represents the biggest leap in SXM history:

Dual-Die Packaging: B200 is composed of two GB100 dies connected via an NVLink bridge (CoWoS-L packaging); a single GPU is equivalent to two 104-billion-transistor chips
Power Explosion: 1000W (air-cooled) / 1200W (liquid-cooled) — the power consumption of a single B200 GPU exceeds the total system power of many home PCs
B300 Goes Further: 1400W TDP, 288GB HBM3e (12-layer stack), 15 PFLOPS FP4
NVLink 5.0: Bidirectional 1.8 TB/s, supporting SHARP v4 in-network reduction
Liquid Cooling Becomes Mandatory: B200 at 1000W TDP can still be air-cooled, but B300 at 1400W essentially requires liquid cooling
Socketed Design Rumors: In October 2024, TrendForce reported that B300 may be the first to adopt a socketed design, allowing the GPU module to be user-replaceable (no longer soldered to the baseboard)

B200 vs B300 Difference: B200 (dual-die GB100, 192GB HBM3e, 1000W) shipped in H2 2024. B300 (Blackwell Ultra, 288GB HBM3e, 1400W) ships in H2 2025.

2.7 SXM7 (Rubin R100, 2026)

Item	Specification
Corresponding GPU	R100 (GR100)
Architecture	Rubin
Memory	288GB HBM4
Memory Bandwidth	To be announced
TDP	Expected 1500W+
NVLink Version	NVLink 6.0
NVLink Bandwidth	Over 2 TB/s
Process Node	To be announced (TSMC N3 series)
Launch Product	Vera Rubin NVL144

Rubin is NVIDIA’s next-generation GPU architecture after Blackwell, named after astronomer Vera Rubin:

Two RDL (Re-Distribution Layer, which will assemble multiple chips and HBM memory on a single interposer substrate) GR100 dies in one SXM7 socket
First GPU to use HBM4 (JEDEC HBM4 standard)
Launched alongside NVLink 6.0, with GPU-to-GPU bandwidth exceeding 2 TB/s
Vera Rubin NVL144 platform fully interconnects 144 Rubin GPUs via NVLink 6

2.8 SXM8 (Rubin Ultra, 2027)

Item	Specification
Corresponding GPU	VR200 or Rubin Ultra
Memory	1TB HBM4/HBM4e
FP4 Performance	100 PFLOPS
TDP	Expected 2000W+
NVLink Version	NVLink 6 / 7
Launch Product	Vera Rubin Ultra NVL576

Rubin Ultra places 4 RDL GPU dies into a single socket, achieving 100 PFLOPS FP4 and 1TB of memory. The NVL576 platform will connect 576 GPUs.

2.9 Feynman (2028+)

NVIDIA announced the Feynman architecture (named after physicist Richard Feynman) at GTC 2026, which will be the next major evolution of SXM:

3D Stacking: First adoption of vertical GPU die stacking design, breaking through single-die area limitations
Custom HBM (C-HBM4E): Custom high-bandwidth memory, exceeding 1TB per GPU
Optical NVLink: On-die integrated optical interconnect, eliminating copper cables in data centers
Paired with Rosa CPU: Feynman GPU paired with Rosa CPU (successor to Vera) forming a superchip
Integrated Groq LPU: Groq LP40 engine will join the NVLink port

It remains undetermined what SXM designation Feynman will use (SXM9 or a next-generation interconnect solution), but it will certainly drive a paradigm shift in data center interconnects from copper to optical.

3. Physical Interface & Electrical Specifications

3.1 Connector: Amphenol MEG-Array

All SXM modules use Amphenol MEG-Array mezzanine connectors. This is the core component of the SXM physical layer.

Characteristic	Description
Manufacturer	Amphenol Communications Solutions
Series	MEG-Array (Mezzanine Grid Array)
Pin Pitch	1.27mm × 1.27mm array
Signal Speed	Over 10 Gb/s
Soldering Method	Surface Mount (SMT)
Key Features	High density, high speed, reliability superior to PCIe gold fingers

MEG-Array is a dual-sided array connector; the GPU mezzanine card and baseboard each have one half. Features include:

Flexible ground distribution design to optimize signal integrity
Supports high-speed signal transmission exceeding 10 Gb/s
Large-scale array structure providing high-density connections
Standard surface mount process reduces manufacturing costs

Pin Count by Generation:

SXM2: 400-pin Amphenol MEG-Array
SXM3/SXM4/SXM5: 400-600 pins (exact count is NVIDIA confidential), dual high-density arrays

Regarding Specification Confidentiality: The precise pin definitions, dimensions, and signal assignments of SXM connectors are NVIDIA’s trade secrets. As one engineer stated on the NVIDIA Developer Forums: “The specifications for SXM2, SXM3, SXM4, SXM5 connectors seem to be a trade secret, and you cannot discuss it unless all involved parties are PCI-SIG members.”

3.2 Power Architecture

The core evolutionary path of SXM power architecture is from 12V to 48V, along with the introduction of Vicor high-efficiency power modules.

48V Power Topology (SXM3+)

SXM2 and earlier GPUs used standard 12V input. Starting from SXM3 (V100 32GB), NVIDIA switched to a 48V power architecture:

DC-DC Conversion Chain:
48V PSU → Vicor BCM (Bus Converter Module) → 48V→12V (MCD) → 12V→Core Voltage (MCM)

Vicor Module Deep Dive:

BCM (Bus Converter Module): Responsible for efficiently converting the PSU’s 48V to an intermediate bus voltage
MCM (Multi-Chip Module): GPU core voltage regulation module, converting 12V to the low-voltage, high-current required by the GPU core
MCD (Multi-Chip Driver): Driver module working in conjunction with the MCM

Advantages of this architecture:

48V Transmission Reduces Current: At the same power, 48V current is only 1/4 that of 12V, significantly reducing PCB copper losses and heat generation
Higher Power Efficiency: Vicor module conversion efficiency is typically above 95%
Modular Design: Standardized power modules facilitate scaling to high-power GPUs

Power Evolution by Generation

Generation	GPU	TDP	Power Architecture	Typical Power Solution
SXM1	P100	300W	12V	Standard VRM
SXM2	V100 16GB	300W	12V	Standard VRM
SXM3	V100 32GB	350W	48V Vicor	Vicor BCM + MCM/MCD
SXM4	A100	400W	48V Vicor	Vicor MCM/MCD
SXM5	H100	700W	48V Vicor	Vicor Enhanced MCM/MCD
SXM5	H200	700W	48V Vicor	Same as above
SXM6	B200	1000-1200W	48V Vicor + Liquid Cooling	High-Power Vicor Modules
SXM6	B300	1400W	48V Vicor + Mandatory Liquid Cooling	Enhanced Power Delivery
SXM7	R100	1500W+	To be announced	Higher Density Power Delivery

Taking the 1000W TDP B200 as an example, under the 48V power architecture, only about 21A of current is required; if using traditional 12V, 83A would be needed — this would require extremely thick power cables and massive PCB copper layers.

4. Complete SXM Specification Table

Specification	SXM1 (P100)	SXM2 (V100 16G)	SXM3 (V100 32G)	SXM4 (A100)	SXM5 (H100)	SXM6 (B200)	SXM6 (B300)	SXM7 (R100)
Release Date	Q2 2016	Q3 2017	Q3 2018	Q1 2020	Q3 2022	Q4 2024	H2 2025	2026
Architecture	Pascal	Volta	Volta	Ampere	Hopper	Blackwell	Blackwell Ultra	Rubin
GPU Core	GP100	GV100	GV100	GA100	GH100	GB100×2	GB300×2	GR100×2
Transistors	15.3B	21.1B	21.1B	54.2B	80B	208B	-	-
Process Node	16FF+	12FFN	12FFN	N7	4N	4NP	4NP	N3?
HBM Type	HBM2	HBM2	HBM2	HBM2/HBM2e	HBM3	HBM3e	HBM3e	HBM4
Memory Capacity	16GB	16GB	32GB	40/80GB	80GB	192GB	288GB	~288GB
Memory Bandwidth	720GB/s	900GB/s	900GB/s	1.6-2.0TB/s	3.35TB/s	8TB/s	8TB/s	-
TDP	300W	300W	350W	400W	700W	1000-1200W	1400W	1500W+
NVLink Version	1.0	2.0	2.0	3.0	4.0	5.0	5.0	6.0
NVLink Bandwidth	160GB/s	300GB/s	300GB/s	600GB/s	900GB/s	1.8TB/s	1.8TB/s	>2TB/s
Power Architecture	12V	12V	48V Vicor	48V Vicor	48V Vicor	48V Vicor	48V Vicor	-
Cooling	Air	Air	Air	Air	Air/Liquid	Air/Liquid	Liquid Only	Liquid Only
FP64 (TFLOPS)	5.3	7.8	7.8	9.7	34	40	-	-
FP32 (TFLOPS)	10.6	15.7	15.7	19.5	67	-	-	-
FP16 Tensor	-	125	125	312	990	-	-	-
FP8	-	-	-	624	1,979	4,500	7,000	-
FP4	-	-	-	-	-	9,000	15,000	-
MIG	❌	❌	❌	✅ (7 instances)	✅ (7 instances)	✅	✅	✅
Connector	-	400-pin MEG-Array	Updated MEG-Array	MEG-Array	MEG-Array	MEG-Array	MEG-Array	-
Launch Platform	DGX-1	DGX-1 V100	DGX-2	DGX A100	DGX H100	DGX B200	GB300 NVL72	VR NVL144

5. Future Roadmap

2026: Rubin / Vera

Rubin R100 (SXM7)
  ├─ Dual GR100 dies
  ├─ HBM4 288GB
  ├─ NVLink 6.0 >2TB/s
  └─ Vera Rubin NVL144 (144 GPU fully interconnected)

The Vera Rubin platform is the first time NVIDIA deeply integrates the Vera CPU, successor to Grace CPU, with Rubin GPU. Vera CPU communicates with Rubin GPU using the NVLink Chip-to-Chip (C2C) protocol, with 1.8 TB/s bandwidth.

2027: Rubin Ultra

Rubin Ultra (SXM8)
  ├─ Four GR100 dies (4 RDL package)
  ├─ 1TB HBM4/HBM4e
  ├─ 100 PFLOPS FP4
  └─ Vera Rubin Ultra NVL576 (576 GPU)

Rubin Ultra’s 4-die package further pushes the process of delaying the end of Moore’s Law. The NVL576 with 576 GPUs will be the largest single-domain GPU cluster to date.

2028: Feynman + Rosa

Feynman (SXM9? / New Interconnect Solution)
  ├─ 3D stacked GPU dies
  ├─ C-HBM4E custom memory >1TB
  ├─ Optical NVLink (on-die integrated optical engine)
  ├─ Paired with Rosa CPU
  └─ Integrated Groq LP40 (NVLink + NVFP4)

Feynman’s Optical NVLink will be a revolutionary change in data center interconnects — no longer using copper cables for high-speed signal transmission, but instead integrating optical engines directly into the GPU package. This will solve the signal integrity, power consumption, and distance limitations of copper cables at ultra-high bandwidths.

6. References

Wikipedia - SXM (socket): https://en.wikipedia.org/wiki/SXM_(socket)
Grokipedia - SXM (socket): https://grokipedia.com/page/SXM_(socket)
NVIDIA HGX Platform: https://www.nvidia.com/en-us/data-center/hgx/
NVIDIA DGX Systems: https://www.nvidia.com/en-us/data-center/dgx-b200/
Exxact Corp - SXM vs PCIe: https://www.exxactcorp.com/blog/deep-learning/sxm-vs-pcie-gpus-best-for-training-llms-like-gpt-4
l4rz - Running SXM GPUs in Consumer PCs: https://github.com/l4rz/running-nvidia-sxm-gpus-in-consumer-pcs
Amphenol MEG-Array: https://www.amphenol-cs.com/product-series/meg-array.html
Reverse Engineering SXM2: https://bbenchoff.github.io/pages/SXM2PCIe.html
NVIDIA Developer Forums - SXM Specs: https://forums.developer.nvidia.com/t/s-x-m-specifications/238960
Tom’s Hardware - B300 Socket: https://www.tomshardware.com/tech-industry/artificial-intelligence/nvidia-reportedly-mulls-socketed-design-for-blackwell-b300-ai-gpus
The Next Platform - NVIDIA Roadmap: https://www.nextplatform.com/compute/2025/03/19/nvidia-draws-gpu-system-roadmap-out-to-2028/1653528
Tom’s Hardware - Vera Rubin: https://www.tomshardware.com/pc-components/gpus/nvidias-vera-rubin-platform-in-depth
TrendForce - NVIDIA Socket Design: https://www.trendforce.com/news/2024/10/11/news-nvidia-rumored-to-switch-to-gpu-socket-design-with-300-series
Wikipedia - Feynman microarchitecture: https://en.wikipedia.org/wiki/Feynman_(microarchitecture)
Tom’s Hardware - Feynman Details: https://www.tomshardware.com/pc-components/gpus/nvidia-updates-data-center-roadmap-with-rosa-cpu-and-stacked-feynman-gpus
Lenovo Press - HGX B200 Guide: https://lenovopress.lenovo.com/lp2226-thinksystem-nvidia-b200-180gb-1000w-gpu
FiberMall - HGX B200 Cooling: https://www.fibermall.com/blog/nvidia-hgx-b200-cooling-solution.htm
NVIDIA Technical Blog - HGX H100: https://developer.nvidia.com/blog/introducing-nvidia-hgx-h100-an-accelerated-server-platform-for-ai-and-high-performance-computing/
NVIDIA Datasheet - DGX B200: https://resources.nvidia.com/en-us-dgx-systems/dgx-b200-datasheet
NVIDIA - NVLink & NVSwitch: https://www.nvidia.com/en-us/data-center/nvlink/

NVIDIA SXM#

1. SXM Overview#

Core Design Philosophy#

Why SXM?#

2. SXM Generational Evolution#

2.1 SXM1 (Pascal P100, 2016)#

2.2 SXM2 (Volta V100 16GB, 2017)#

2.3 SXM3 (Volta V100 32GB, 2018)#

2.4 SXM4 (Ampere A100, 2020)#

2.5 SXM5 (Hopper H100/H200, 2022/2023)#

2.6 SXM6 (Blackwell B200/B300, 2024/2025)#

2.7 SXM7 (Rubin R100, 2026)#

2.8 SXM8 (Rubin Ultra, 2027)#

2.9 Feynman (2028+)#

3. Physical Interface & Electrical Specifications#

3.1 Connector: Amphenol MEG-Array#

3.2 Power Architecture#

48V Power Topology (SXM3+)#

Power Evolution by Generation#

4. Complete SXM Specification Table#

5. Future Roadmap#

2026: Rubin / Vera#

2027: Rubin Ultra#

2028: Feynman + Rosa#

6. References#