NVIDIA GeForce RTX 4090 Max-Q vs NVIDIA H200 SXM 141 GB

Contents:

Memory ML Performance Compute Power Architecture & Compatibility ML Software Support Clocks & Performance Power Consumption Rendering Benchmarks Additional

Memory

Memory Size

16 GB

+781% 141 ГБ

Memory Type

GDDR6 HBM3e

Memory Bandwidth

576.0 GB/s

4.89 TB/s

Memory Bus Width

256 бит 6,144 бит

ML Performance

FP16 (Half Precision)

28.31 TFLOPS

+845% 267.6 TFLOPS

BF16 (Brain Float)

No No

TF32 (TensorFloat)

No No

Compute Power

FP32 (Single Precision)

28.31 TFLOPS

+136% 66.91 TFLOPS

FP64 (Double Precision)

0.4423 TFLOPS

+7,463% 33.45 TFLOPS

CUDA Cores

9,728

+74% 16,896

RT Cores

Architecture & Compatibility

GPU Architecture

Ada Lovelace Hopper

SM (Streaming Multiprocessor)

+74% 132

PCIe Version

PCIe 4.0 x16 PCIe 5.0 x16

ML Software Support

CUDA Version

8.9

9.0

CUDA Toolkit (first supported)

v11 v12

CUDA Toolkit status

Supported Supported

Clocks & Performance

Base Clock

930

+61% 1,500

Boost Clock

1,455

+36% 1,980

Memory Clock

+41% 2,250

1,593

Power Consumption

Recommended PSU

No 1100 W

Power Connector

None 8-pin EPS

TDP/TGP

-89% 80 W

700 W

Rendering

Texture Units (TMU)

304

+74% 528

ROP

L2 Cache

+28% 64 MB

50 MB

Benchmarks

MLPerf, llama2-70b-99.9 (UNSET)

— 3 534 tokens/s

MLPerf, llama2-70b-99.9 (fp16)

— 3 553 tokens/s

MLPerf, llama2-70b-99.9 (fp8)

— 2 444 tokens/s

MLPerf, llama3.1-405b (fp16)

— 40.8 tokens/s

MLPerf, llama3.1-405b (fp8)

— 25.3 tokens/s

MLPerf, llama3.1-8b (fp8)

— 5 161 tokens/s

MLPerf, deepseek-r1 (fp8)

— 1 113 tokens/s

MLPerf, mixtral-8x7b (fp8)

— 7 132 tokens/s

Additional

Slots

IGP

SXM Module

Release Date

Jan. 3, 2023 Nov. 18, 2024

Display Outputs

Portable Device Dependent

No outputs

NVIDIA GeForce RTX 4090 Max-Q vs NVIDIA H200 SXM 141 GB

Comparison NVIDIA GeForce RTX 4090 Max-Q with 16 GB GDDR6 and 9,728 cores vs NVIDIA H200 SXM 141 GB with 141 GB HBM3e and 16,896 cores.

H200 (141GB)