{"id":10366,"date":"2024-09-15T19:19:36","date_gmt":"2024-09-15T18:19:36","guid":{"rendered":"https:\/\/phoenixgamedevelopment.com\/blog\/?p=10366"},"modified":"2024-10-02T21:08:07","modified_gmt":"2024-10-02T20:08:07","slug":"running-large-language-models-system-specs","status":"publish","type":"post","link":"https:\/\/phoenixgamedevelopment.com\/blog\/running-large-language-models-system-specs\/","title":{"rendered":"AI: Running Large Language Models: System Specs"},"content":{"rendered":"\n<p>I have recently become very intersted in running AI Large language Models (LLM&#8217;s).<\/p>\n\n\n\n<p>With a view to furthering my research in this area, I have been planning a build for a machine dedicated to AI inference.<\/p>\n\n\n\n<p>My goal is to be able to run 70b models at Q6 or even Q8 quants with a tk\/s of 3-6 tk\/s at least, and, hopefully, a 120b model with at least a Q5 quant at at least 1 tk\/s.<\/p>\n\n\n\n<p>The spec that I arrived at is:<\/p>\n\n\n\n<p> These have 24 gbs VRAM each, for a total of 48 GBs, and they have just two PCIE power connectors, not three, making it easier to power them.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>2&#215;3090 Asus TUF Gaming GPUs<\/td><td> These have 24 gbs VRAM each, for a total of 48 GBs, and they have just two PCIE power connectors, not three, making it easier to power them. <\/td><\/tr><tr><td>Threadripper PRO 3955WX<\/td><td>I went with the PRO threadripper because of it&#8217;s support for more than 256 GBS RAM, and it&#8217;s 128 PCE lanes. I could have probably went with the 3945 model, since the clock speeds are similar, and the extra 4 cores (16 vs 12) of the 3955 probably won&#8217;t make that much difference for inference.<\/td><\/tr><tr><td>256 GB 3200 MHZ DDR4 RAM<\/td><td>3200 Mhz DDR4 is not the fastest, but it&#8217;s the fastest speed that the 3955wx supports, and I don&#8217;t think that over clocking 8x32GB sticks is going to work.<br>I need 8 sticks because I want to use 8 channel memory. Memory bandwidth is very important for LLM&#8217;s, and 8 channel memory has about a 200 GB\/s bandwidth, vs 100 GB\/s for quad channel.<\/td><\/tr><tr><td>WRX80-E SAGE Motherboard<\/td><td>This actually cost more than the CPU, but it has 7 PCI-x16 ports, which I will need in the future if I intend to add more GPU&#8217;s, and because it has 8 channel memory support.<\/td><\/tr><tr><td>Corsair HX1500<\/td><td>A 1500 watt PSU should be ok for two 3090&#8217;s, maybe even 3 if I underclock the card. If I get any more in the future I will have to get another PSU and connect them together.<\/td><\/tr><tr><td>2 TB M.2 SSD<\/td><td><\/td><\/tr><tr><td>Noctua Cooler<\/td><td><\/td><\/tr><tr><td>Mining Case<\/td><td>I went with an open Air mining rig because it is the only setup that would allow me to add more than 2 GPUS.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>It will be some time before I get all of the parts, because most of them are used, and shipping will take time.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have recently become very intersted in running AI Large language Models (LLM&#8217;s). With a view to furthering my research in this area, I have been planning a build for a machine dedicated to AI inference. My goal is to be able to run 70b models at Q6 or even Q8 quants with a tk\/s [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37,12],"tags":[],"class_list":["post-10366","post","type-post","status-publish","format-standard","hentry","category-ai","category-tutorials","has-post-title","has-post-date","has-post-category","has-post-tag","has-post-comment","has-post-author",""],"aioseo_notices":[],"builder_content":"","_links":{"self":[{"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/posts\/10366"}],"collection":[{"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/comments?post=10366"}],"version-history":[{"count":3,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/posts\/10366\/revisions"}],"predecessor-version":[{"id":10392,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/posts\/10366\/revisions\/10392"}],"wp:attachment":[{"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/media?parent=10366"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/categories?post=10366"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/tags?post=10366"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}