AI Server #5 - NVIDIA DGX B200 AI Appliance

The sixth-generation DGX datacentre AI appliance is built around the Blackwell architecture and the flagship B200 accelerator, providing unprecedented training and inferencing performance in a single system. The DGX B200 includes 400Gb/s Connect-X7 Smart NICs and Bluefield DPUs for connecting to external storage.

What was the goal

The NVIDIA DGX B200 AI Appliance is designed to serve as a comprehensive platform for businesses aiming to enhance their artificial intelligence (AI) capabilities. Its primary goal is to facilitate the development and deployment of AI applications. This server is best for AI inferencing and it also offers excellent performance for data prepration and training.

Overall evaluation

The NVIDIA DGX B200 AI Appliance excels in delivering high-performance computing tailored for modern AI workloads. Its architectural advancements, combined with significant compute power and energy efficiency, position it favorably against competitors.

Figure 1 - NVIDIA DGX H100 AI Appliance

Specification

ARCHITECTURE Blackwell
GPU 8x NVIDIA Blackwell GPUs
GPU Memory 1,440GB total, 64TB/s HBM3e bandwidth
Performance 72 petaFLOPS FP8 training and 144 petaFLOPS FP4 inference
NVIDIA NVSwitch 2x
System power usage 14.3kW max
CPU 2 Intel® Xeon® Platinum 8570 Processors
112 Cores total, 2.1 GHz (Base),
4 GHz (Max Boost)
System memory 2TB, configurable to 4TB
Networking 4x OSFP ports serving 8x single-port NVIDIA ConnectX-7 VPI
> Up to 400Gb/s InfiniBand/Ethernet
2x dual-port QSFP112 NVIDIA BlueField-3 DPU
> Up to 400Gb/s InfiniBand/Ethernet
Management network 10Gb/s onboard NIC with RJ45
100Gb/s Ethernet NIC
Host baseboard management controller
(BMC) with RJ45
Storage OS: 2x 1.92TB NVMe M.2
8x 3.84TB NVMe U.2
OS Ubuntu Linux
Dimensions Height: 17.5in (444mm)
Width: 19.0in (482.2mm)
Length: 35.3in (897.1mm)
Operating temperature 5–30°C (41–86°F)

Key takeaways

  • High performance: The DGX B200 delivers 36 petaFLOPS of AI compute using eight Blackwell GPUs, significantly enhancing training and inference capabilities compared to its predecessor, the DGX H100, which is noted for its impressive performance but does not match the B200's advancements.
  • Physical size: This GPU appliance is relatively small compared to others and also offers better power consumption. In contrast the Cerebras CS-3 requires a larger physical footprint.
  • Memory: The DGX B200 features 1.5TB of aggregated memory, which supports large-scale model training and inference efficiently. This is crucial for handling complex AI workloads. Such large GPU RAM allows you to work with the largest AI models including LLama 405B or Near Protocols 1.4T parameter model.
  • Architecture: The B200 utilizes the NVIDIA Blackwell architecture, which includes advanced features like the second-generation Transformer Engine and enhanced Tensor Cores. This architecture allows for up to 20 petaFLOPS of FP4 performance, making it highly efficient for language model inferences. (Note, that there is a GB200 superchip combines two B200 GPUs with a Grace CPU, promising up to 30 times the performance for specific workloads while reducing energy consumption significantly)
  • Memory link: The DGX B200 employs fifth-generation NVLink and NVIDIA Quantum InfiniBand, facilitating high-speed data transfer and improved communication between GPUs. This is essential for maximizing performance in multi-GPU setups.
  • Power consumption: The DGX B200 is designed with power efficiency in mind, consuming approximately 14.3 kW for its eight-GPU configuration. This is competitive when considering its high performance output.
  • Cost consideration: NVIDIA's pricing strategy for the B200 suggests that while it offers cutting-edge technology, costs could be substantial—estimated at $40,000 per GPU6. However, this investment may be justified by the performance gains and operational efficiencies it provides. The total cost of ownership (TCO) for systems like the B200 is reported to be reduced by up to 25 times compared to older models like the H100 due to improved efficiency and performance metrics

More information