September 15 2024

AI: Running Large Language Models: System Specs

I have recently become very intersted in running AI Large language Models (LLM’s).

With a view to furthering my research in this area, I have been planning a build for a machine dedicated to AI inference.

My goal is to be able to run 70b models at Q6 or even Q8 quants with a tk/s of 3-6 tk/s at least, and, hopefully, a 120b model with at least a Q5 quant at at least 1 tk/s.

The spec that I arrived at is:

These have 24 gbs VRAM each, for a total of 48 GBs, and they have just two PCIE power connectors, not three, making it easier to power them.

2×3090 Asus TUF Gaming GPUs	These have 24 gbs VRAM each, for a total of 48 GBs, and they have just two PCIE power connectors, not three, making it easier to power them.
Threadripper PRO 3955WX	I went with the PRO threadripper because of it’s support for more than 256 GBS RAM, and it’s 128 PCE lanes. I could have probably went with the 3945 model, since the clock speeds are similar, and the extra 4 cores (16 vs 12) of the 3955 probably won’t make that much difference for inference.
256 GB 3200 MHZ DDR4 RAM	3200 Mhz DDR4 is not the fastest, but it’s the fastest speed that the 3955wx supports, and I don’t think that over clocking 8x32GB sticks is going to work. I need 8 sticks because I want to use 8 channel memory. Memory bandwidth is very important for LLM’s, and 8 channel memory has about a 200 GB/s bandwidth, vs 100 GB/s for quad channel.
WRX80-E SAGE Motherboard	This actually cost more than the CPU, but it has 7 PCI-x16 ports, which I will need in the future if I intend to add more GPU’s, and because it has 8 channel memory support.
Corsair HX1500	A 1500 watt PSU should be ok for two 3090’s, maybe even 3 if I underclock the card. If I get any more in the future I will have to get another PSU and connect them together.
2 TB M.2 SSD
Noctua Cooler
Mining Case	I went with an open Air mining rig because it is the only setup that would allow me to add more than 2 GPUS.

It will be some time before I get all of the parts, because most of them are used, and shipping will take time.

AI: Running Large Language Models: System Specs

Related Posts

WAN Video And Phantom 14B in ComfyUI Issues

SillyTavern Extension: Weather Checker

SillyTavern Extension: Email Checker

Leave a Reply Cancel reply