{"id":10703,"date":"2024-11-03T17:26:09","date_gmt":"2024-11-03T17:26:09","guid":{"rendered":"https:\/\/phoenixgamedevelopment.com\/blog\/?p=10703"},"modified":"2024-11-14T13:40:33","modified_gmt":"2024-11-14T13:40:33","slug":"ai-assembly-and-setup-of-ai-inference-rig","status":"publish","type":"post","link":"https:\/\/phoenixgamedevelopment.com\/blog\/ai-assembly-and-setup-of-ai-inference-rig\/","title":{"rendered":"AI: Assembly and setup of AI Inference Rig"},"content":{"rendered":"\n<p>I have finally finished the construction of my AI inference Rig!<\/p>\n\n\n\n<p>The system specs are:<\/p>\n\n\n\n<p>Threadripper PRO 3975 CPU,<br>2x ASUS TUF 3090 GPU&#8217;s (These have two PCI-E power pins instead of three, which is important!),<br>256 GBs of DDR4 3200 MHZ RAM,<br>Corsair 1500 watt PSU.<\/p>\n\n\n\n<p>I had quite a few more issues that I was expecting when setting up the AI rig.<\/p>\n\n\n\n<p>The main lesson that I learned is that the Threadripper PRO CPU&#8217;s are very sensitive to mounting pressure!<\/p>\n\n\n\n<p>After I had assembled the system, it failed to boot. It seemed like the issues were CPU related. <\/p>\n\n\n\n<p>After reseating the CPU and reapplying the thermal paste, everything seemed fine, except for one RAM stick not showing up, which I fixed by simply reseating it too.<\/p>\n\n\n\n<p>Other than the CPU issues, the build went more or less ok.<\/p>\n\n\n\n<p>I was planning to use Linux Mint for the software side, but I had issues getting this to work (Mostly Driver and CUDA issues).<\/p>\n\n\n\n<p>I eventually went with popOS (Another Ubuntu-based Linux Distro) and this worked a lot better. I prefer the Interface for Mint (It is more similiar to Windows) but popOS seems to have better nVidia\/CUDA support, which I need for this build.<\/p>\n\n\n\n<p>I spend some time setting up the software side of the system.<\/p>\n\n\n\n<p>Eventually, I got the KoboldCPP CUDA version 12 working properly. I had to update some drivers and the CUDA toolkit for this to work, popOS comes with 11.5 by default.<\/p>\n\n\n\n<p>I had no issues adding enabling network access to kobold, but I did have some problems customising  RDP\/XDRP for remote admin of the server. This took some time to fix. <\/p>\n\n\n\n<p>I also installed stable diffusion on the system, but I will probably use it in CPU-only mode, to save VRAM for LLM inference.<\/p>\n\n\n\n<p>I have not fully tested or optimised the system yet, but initial results are good.<\/p>\n\n\n\n<p>First, the RAM bandwidth seems to be closer to 150 GB\/s, which is something I was worried about. This is extremely good!<\/p>\n\n\n\n<p>Based on my post <a href=\"https:\/\/phoenixgamedevelopment.com\/blog\/ai-estimated-token-generation-rates-for-selected-model-sizes-and-algorithm-to-calculate-same\/\" title=\"\">HERE<\/a>, I was estimating that the memory bandwidth of the CPU would be around 140 GB\/s maximum, in actual fact, it is around 146-148, which is even higher than I had hoped.<\/p>\n\n\n\n<p>This means that it was worth it to get the more expensive 3975 CPU, rather than the 3945 (Due to the memory bandwidth limitations mentioned in the post above).<\/p>\n\n\n\n<p>I have tested just a few models and quants so far, and the results have been decent, but not spectacular.<\/p>\n\n\n\n<p>Before building this system, I was hoping for between 1 and 2 tk\/s when running a 120b model at Q8 with full context (32k for most models).<\/p>\n\n\n\n<p>It seems, based on very preliminary testing, that I am getting about 1.23 tk\/s. This is a little slower than I was hoping, but this is a total value, including context and prompt processing (Which were not included in the post I made above, estimating token generation rates).<\/p>\n\n\n\n<p>In practice, these values are actually quite usable, and I am satisfied with the results.<\/p>\n\n\n\n<p>It does seem however, that as context fills up, token generation rates (And, in particular, prompt processing rates) drop substantially. I will need to do more testing into this, but I suspect that adding a third 3090 to the system will be desired at some point in the future. However, I shouldn&#8217;t need any more upgrades beyond that.<\/p>\n\n\n\n<p>Running a 70b Q5 on my current dev machine (Ryzen 5950x and a single 3090, with 128 GB DDR4 RAM) gets me about 0.94 tokens per second. On my new system, that same model gets at least 5.3 tokens per second, a lot faster.<\/p>\n\n\n\n<p>I intend to do a lot more testing with the AI rig in the future, and hopefully post my results here. The power of this system should allow me to explore some very interesting concepts.<\/p>\n\n\n\n<p>I could potentially even run Falcon 180b at slow, but reasonable, speeds!<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n<!-- wp:themify-builder\/canvas \/-->","protected":false},"excerpt":{"rendered":"<p>I have finally finished the construction of my AI inference Rig! The system specs are: Threadripper PRO 3975 CPU,2x ASUS TUF 3090 GPU&#8217;s (These have two PCI-E power pins instead of three, which is important!),256 GBs of DDR4 3200 MHZ RAM,Corsair 1500 watt PSU. I had quite a few more issues that I was expecting [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":10731,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37,12],"tags":[],"class_list":["post-10703","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-tutorials","has-post-title","has-post-date","has-post-category","has-post-tag","has-post-comment","has-post-author",""],"aioseo_notices":[],"builder_content":"","_links":{"self":[{"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/posts\/10703"}],"collection":[{"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/comments?post=10703"}],"version-history":[{"count":7,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/posts\/10703\/revisions"}],"predecessor-version":[{"id":10734,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/posts\/10703\/revisions\/10734"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/media\/10731"}],"wp:attachment":[{"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/media?parent=10703"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/categories?post=10703"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/tags?post=10703"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}