AI: Running Large Language Models: System Specs
I have recently become very intersted in running AI Large language Models (LLM’s).
With a view to furthering my research in this area, I have been planning a build for a machine dedicated to AI inference.
My goal is to be able to run 70b models at Q6 or even Q8 quants with a tk/s of 3-6 tk/s at least, and, hopefully, a 120b model with at least a Q5 quant at at least 1 tk/s.
The spec that I arrived at is:
These have 24 gbs VRAM each, for a total of 48 GBs, and they have just two PCIE power connectors, not three, making it easier to power them.
2×3090 Asus TUF Gaming GPUs | These have 24 gbs VRAM each, for a total of 48 GBs, and they have just two PCIE power connectors, not three, making it easier to power them. |
Threadripper PRO 3955WX | I went with the PRO threadripper because of it’s support for more than 256 GBS RAM, and it’s 128 PCE lanes. I could have probably went with the 3945 model, since the clock speeds are similar, and the extra 4 cores (16 vs 12) of the 3955 probably won’t make that much difference for inference. |
256 GB 3200 MHZ DDR4 RAM | 3200 Mhz DDR4 is not the fastest, but it’s the fastest speed that the 3955wx supports, and I don’t think that over clocking 8x32GB sticks is going to work. I need 8 sticks because I want to use 8 channel memory. Memory bandwidth is very important for LLM’s, and 8 channel memory has about a 200 GB/s bandwidth, vs 100 GB/s for quad channel. |
WRX80-E SAGE Motherboard | This actually cost more than the CPU, but it has 7 PCI-x16 ports, which I will need in the future if I intend to add more GPU’s, and because it has 8 channel memory support. |
Corsair HX1500 | A 1500 watt PSU should be ok for two 3090’s, maybe even 3 if I underclock the card. If I get any more in the future I will have to get another PSU and connect them together. |
2 TB M.2 SSD | |
Noctua Cooler | |
Mining Case | I went with an open Air mining rig because it is the only setup that would allow me to add more than 2 GPUS. |
It will be some time before I get all of the parts, because most of them are used, and shipping will take time.