When nVidia announced that it had stitched 256 Grace-Hopper superchips together to make the DGX GH200 at Computex in Taipei, the reasons behind it were all driven by one thing: the increasing demands of AI.
It's been pretty clear for a while that nVidia's focus of late has been on AI. It dominates datacenter scale AI by a such a large margin that it's easy to forget that there are any other companies providing products for datacenter scale AI solutions. It also explains why nVidia doesn't seem to be particularly worried about the new AMD and Intel GPUs, even though they're gaining market share from nVidia.
Several years ago, nVidia acquired Mellanox, a company that makes high performance networking and interconnect solutions. It used that technology to create a revolutionary datacenter GPU, the A100 AI accelerator, so powerful that a single rack module could replace an entire datacenter.
Its successor the A200 was nearly as big a boost.
Both however shared one limitation that is common to GPUs: memory capacity. While it seems like GPUs with 48 or even 96GB of memory have a lot of memory, that hasn't come even close to keeping up with the growth in the size of AI datasets.
To understand why this matters, and why the company felt the need to create the DGX GH200, first we need a basic understanding of how AI works.
AI training 101
AI is designed to function like humans. That’s precisely why many current inference engines are based on neural networks: they're modelled after the way that our own brains work. Our brains are networks of neurons that interact with each other via chemical signals, and knowledge is built into the way that neurons are connected to each other. Training a human (or for example a cat) involves creating new connections between neurons. The connections are not binary; they have weights that reflect the strength of the connection. So each virtual neuron has a matrix representing the weights of the connections between it and all of the other neurons in the network, and to model a connection not existing between two neurons, the weight is zero.
This is the basis for how neural networks work also. The two biggest obstacles for building neural networks have been computing power and memory. Adding cores to CPUs certainly has helped, but even the likes of the Altera are limited to 128 cores. For a web server that's huge, but for a neural net it's nothing; neural nets require tens of thousands of virtual neurons for the simple models. GPUs however can include thousands of compute engines. That made GPUs a natural basis for nVidia's AI accelerators, especially when combined with the dedicated tensor hardware, designed to accelerate the matrix math used so heavily in neural networks.
In spite of all of nVidia's advancements in raw power, nVidia's GPUs continued to share one major limitation: memory. Training an AI requires data and feedback. It requires the information in order to learn just like we do, and the feedback to determine whether or not it's learning correctly; just think of it like training a cat. When the cat does something you like, you give the cat a treat, and the cat will habituate to doing that thing to get more treats, but that doesn't mean that any of us can understand what that cat is truly thinking. Knowing cats, it probably believes that it's the one doing the training.
As the training datasets have grown, the quality of the AI has also grown. This is very easy to see simply by looking at images generated using MidJourney version to version.
As the training datasets reach into the terabytes however, the GPU memory limitation becomes a huge bottleneck, so training systems rely heavily on CPUs in order to have access to more memory to work with. Relying on CPUs introduces a computing bottleneck in turn; they have far fewer compute units and most don't yet have dedicated AI optimized hardware, never mind AI-optimized hardware as mature as nVidia's Tensor cores have become.
Some supercomputing basics
To understand the magnitude and importance of nVidia's solution to this problem we need a quick aside to understand basic supercomputing architecture.
There are two general approaches to designing supercomputing architectures. One is the cluster, the other is shared memory. Clusters are basically ordinary computers connected with high speed networks and possibly some Network Attached Storage (NAS). They're great for tasks that can be divided up into small chunks of data and processed individually, which makes them great for renderfarms but terrible for things like large scale atmospheric models. For the latter, shared memory supercomputing architectures work far better, because such data can't be split up into discrete units. These are systems where a processor can access memory anywhere in the system no matter which CPU it's local to so that they don't need to split the data up at all.
Enter the DGX GH200
nVidia has been shipping the Grace Superchip for some time now. It's a 72-core ARM processor that ships in a dual-CPU package with a large nearly 600GB on-package LPDRAM (low power DRAM) buffer. Now in production is Grace Hopper, which has one of the 72-core ARM processors teamed with a Hopper GPU with 188GB of HBM3 (High Bandwidth Memory) on die and a coherent NVLink connection with 900 GB/second of bandwidth to allow the GPU to access the 600GB of on-package DRAM. A hierarchical set of NVLinks three levels deep connects a total of 256 Grace Hopper Superchips together to form a single shared memory supercomputer with 144 TERABYTES of system memory and a full exaflop of computing power.
Because the NVLink connections are cache coherent, every CPU and every GPU has full access to every memory address in the gigantic 144 terabytes of memory available in the supercomputer. From a programmer's point of view, it acts like one GPU rather than a group. There's no need to spend time figuring out how to divide data between two GPUs as long as the dataset fits in the system's 144 terabytes.
In other words, it's now possible to build a training dataset that is 140TB and execute enormous neural nets with dedicated hardware, accelerating AI training to unprecedented levels.
One of the demos used the Omniverse Avatar Cloud Engine (ACE) to showcase the power of its AI. The virtual character, a non-player character in a game, has a back story and personality rather than a script. In other words, it's writing its own script, responding to the player as if its another player in the game. The limiting factors in training an AI are the volume of data available, the quality of that data, and the amount of time it takes for the AI to take an input, generate an output, and incorporate the feedback. The DGX GH200 increases the maximum size of the training datasets enormously, and blows the roof off of the speed of the training feedback loop.
The growth of AI
The power of AI has been growing at an unprecedented rate. While Moore's Surprisingly Long Lived Observation is petering out, AI is growing more quickly than any prior technology in human history. The training data sets are growing by one or two orders of magnitude annually, the neural nets are growing in size at a mind-boggling rate, and the sheer power of the systems behind them are growing exponentially.
Programmers tell the story in numbers, but we're not programmers here. We're storytellers. So here are some stories.
StackOverflow, a well known forum that's been very popular with software developers for years where developers share their expertise and knowledge and experience with each other… Its traffic has dropped by over one third in the last six months because ChatGPT is able to deliver a lot of that same advice and expertise immediately, rather than when someone who has that knowledge notices the question in a post.
As nVidia's AI in the “I Am AI” intro says, “The wisdom of a million programmers.” ChatGPT can consume vast quantities of programmers' contributions to the internet, their code, their code profiling data... so much that it would take a human programmer decades, but for ChatGPT it's six months. In another year, it will take ChatGPT six weeks.
Just months ago MidJourney generated humans with hands that looked like H.R. Giger designed them. Only a few months later a photographer used MidJourney to create images that he submitted to a well respected international photography competition to see if the judges would catch on that they were AI generated art rather than photographs.
Not only did the judges not realize that they were AI generated, he won the contest.
He turned down the award, but he also made his point: AI is here to stay and we need to learn to live with it.
Futurist Ray Kurzweil predicted that AI would exceed human intelligence by 2030. It seems like nVidia is seeking to prove him right.