AI Server #1 - Dual NVidia RTX 3090 GPU
What was the goal
We have built this AI server to be able to run the LLama 3.1 70B parameter AI model locally for AI chat, the Qwen 2.5 AI model for coding, and to do AI image generation with the Flux model. This AI server is also answering VoIP phone calls, e-mails and is conducting WhatsApp chats.
Overall evaluation
This setup is excellent for small organizations where the number of users are below 100. Such a server offers the ability to work with most AI models and to create great automated services.
Hardware configuration
CPU | Intel Core i9 14900K |
RAM | 192GB DDR5 6000Mhz RAM |
Storage | 2x4TB Nvme SSD (Samsung 990 pro) |
CPU cooler | ARCTIC Liquid Freezer III 360 |
GPU cooling | Air cooled system (1 unit between GPUs) |
GPU | 2xNvidia RTX 3090 Founders Edition 24Gb Vram |
Case | Antex Performance 1FT White full tower (8 card slots!) |
Motherboard | Asus Rog Maximus z790 dark hero |
PSU | Corsair AX1500i |
Operating system | Windows 11 pro |
What have we learnt when we have built this server
- CPU: The Intel Core i9 14900K CPU is the same CPU as the Intel Core i9 13900K, they have only changed the name. Every parameter is the same, the performance is the same. Although we ended up using the 14900K, we have picked a 13900K for our other builds. Originally we have purchased the Intel Core i9 14900KF CPU, which we had to replace to Intel Core i9 14900K. The difference between the two CPUs is that the Intel Core i9 14900KF does not have a built in GPU. This was a problem, because serving the computer screen reduced the amount of GPU RAM we had for AI models. By plugging in the monitor to the on-board Hdmi sltop of the GPU built into the 14900K CPU, all of the GPU ram of the Nvidia video cards became available for AI execution.
- CPU cooling: Air cooling was not sufficient for the CPU. We had to replace the original CPU cooler with a water cooler, because the CPU always shut down under high load when it was air cooled.
- RAM: We have used 4 RAM slots in this system, and we have discovered that this setup is slower than if we use only 2. A system with 2x48GB DDR5 modules will acieve higher RAM speed because the RAM can be overclocked to higher speed offered by the XMP memory profiles in the bios. We ended up keeping the 4 modules because we had done some memory intensive work (we were analyzing LLM files around 70GB in size, which had to fit into the RAM twice). Unless you want to do RAM intensive work you don't need 4x48GB RAM. Most of the work is done by the GPU, so system memory is rarely used. In our other builds we went for 2x48GB instead of 4x48GB RAM.
- SSD: We have used a RAID0 in this system. The RAID0 configuration in bios gave us a single drive of 8TB (the capacity of the two 4TB SSDs were added together). The performance was faster when loading large models. Windows installation was a bit more difficult, because a driver had to be loaded during installation. The RAID0 array lost its content during a bios reset and we had to reinstall the system. In following builds we have used a single 4TB SSD and we did not setup a RAID0 array.
- Case: A full tower case had to be selected that had 8 card slots in the back. It was difficult to find a suitable one, as most pc cases only have 7 card slots, which is not enough to place two air-cooled GPUs in it. The case we have selected is beautiful, but it is also very heavy because of the glass panels and the thicker steel framing. Although it is difficult to carray this case, we like this case very much.
- GPU: We have tested this system with 2 Nvidia RTX4090 and 2 Nvidia RTX3090 GPUs. The 2 Nvidia RTX3090 GPUs offered nearly the same speed as 2 Nvidia RTX4090 when we have ran AI models on them. In our other builds the lower cost Nvidia RTX3090 were selected by our customers. For GPUs we have also learned that, it is much better to have 1 GPU with large VRAM then 2 GPUs. An Nvidia RTX A6000 with 48GB Vram is a better choice then 2 Nvidia RTX3090 with 2x24GB. A single GPU will consume less power, it will be easier to cool it down, it is easier to select a mother board and a case for it, plus the number of PCIe lanes in the i9 14900k CPU only allows 1 GPU to run at it's full potential.
- GPU cooling: Each Nvidia RTX3090 FE GPU takes up 3 slots. 1 slot is needed between them and 1 slot is needed below the second one for cooling. We have also learnt, that air cooling is sufficient for this setup. Water cooling is more complicated, more expensive and is a pain when you want to replace the GPUs.
- Mother board: It is important to pick a motherboard with exactly 4 spaces of the PCIe slots in between, so it is possible to fit the two GPUs in a way to have one unit of cooling space in between. The speed of the PCIe ports must be investigated before choosing a motherboard. The motherboard we have picked for this setup (Asus Rog Maximus z790 dark hero) might not be the best choice. It was way more expensive than similar offerings, plus when we put an NVME ssd in to the first NVMe slot, the speed of the second (PCIe slot used for the second GPU) degraded greatly. It is also worth mentioning that it is very hard to get replacement wifi 7 antennas for this motherboard because it uses a proprietary antenna connector. In other builds we have used "MSI MAG Z790 TOMAHAWK WiFi LGA 1700 ATX" which gave us similar performance with less pain.
- PSU: The Corsair AX1500i PSU was sufficient for us. This PSU is quiet and has a great USB interface with a Windows app that allow us to monitor power consumption on all ports. We have also used Corsair AX1600i in similar setups, which gives us more overhead. We have also used EVGA Supernove G+ 2000W in other builds, which we did not like much, as it did not offer a management port, and the fan was very noisy.
- Case cooling: We had 3 fans on the top for the water coller, 3 in the front of the case 1 in the back. This was sufficient. The cooling profile could be adjusted in the Bios to keep the system quiet.
- OS: Originally we have installed Windows 11 Home edition and we have learn't that it is only able to handle 128GB RAM. We had to upgrade the system to Windows 11 Professional to be able to use the 192GB RAM and to be able to access the server remotely through Remote Desktop.
Key takeaway
This system offers 48GB of GPU RAM and sufficient speed to run high quality AI models. We strongly recommend this setup as a first server.
More information
- AI server - 2 nvidia RTX 3090
- AI server 2 - 4 Nvidia RTX 3090
- AI server 3
- AI server NVIDIA DGX H100
- NVIDIA DGX B200 AI Appliance