{"id":10391,"date":"2024-10-02T21:43:00","date_gmt":"2024-10-02T20:43:00","guid":{"rendered":"https:\/\/phoenixgamedevelopment.com\/blog\/?p=10391"},"modified":"2024-10-07T01:16:53","modified_gmt":"2024-10-07T00:16:53","slug":"ai-memory-bandwidth-comparision-for-selected-ddr4-cpus","status":"publish","type":"post","link":"https:\/\/phoenixgamedevelopment.com\/blog\/ai-memory-bandwidth-comparision-for-selected-ddr4-cpus\/","title":{"rendered":"AI: Memory Bandwidth comparison for selected DDR4 CPU&#8217;s"},"content":{"rendered":"\n<p><strong>Introduction<\/strong><\/p>\n\n\n\n<p>As I mentioned in my previous post (<a href=\"https:\/\/phoenixgamedevelopment.com\/blog\/?p=10366\">HERE<\/a>), I have purchased a Threadripper PRO 3955WX CPU for the purposes of building an LLM inference machine.<\/p>\n\n\n\n<p>However, I have since discovered that there is a serious issue with using some Threadripper and Epyc CPU&#8217;s (Including the 3955wx) for this purpose.<\/p>\n\n\n\n<p>The issue is that these CPU&#8217;s use 2 CCD&#8217;s  (Core Chiplet Dies), which substantially reduces the memory bandwidth.<\/p>\n\n\n\n<p>The issue is discussed here: <\/p>\n\n\n\n<p><a href=\"https:\/\/www.servethehome.com\/amd-epyc-7002-rome-cpus-with-half-memory-bandwidth\/\">AMD Epyc 7002 Rome CPU&#8217;s with Half Memory Bandwidth.<\/a> (Serve the Home)<\/p>\n\n\n\n<p>And here:<\/p>\n\n\n\n<p><a href=\"https:\/\/www.reddit.com\/r\/threadripper\/comments\/1azmkvg\/comparing_threadripper_7000_memory_bandwidth_for\/\">Comparing Threadripper 7000 memory bandwidth for all models.<\/a> (Reddit, r\/Threadripper)<\/p>\n\n\n\n<p>There are also many other sources confirming the results for various generations and models of chips.<\/p>\n\n\n\n<p>The fundamental issue is that in order to reach the maximum bandwidth of 8-Channel RAM (About 200 GB\/s) it is necessary to have not just 8 Channels supported, but 8 CCD&#8217;s as well.<\/p>\n\n\n\n<p>Only extremely expensive CPU&#8217;s have 8 CCD&#8217;s.<\/p>\n\n\n\n<p>The cheaper CPU&#8217;s have only 2, which effectively limits their ram bandwidth to quad channel speeds (Around 80-100 GB\/s) or slightly more.<\/p>\n\n\n\n<p>Ordinarily, this would not be a  problem, since very, very, few use cases are going to come close to saturating the memory bandwidth like that.<\/p>\n\n\n\n<p>The issue is that Large Language Model inference is one of those use cases (Computational Fluid Dynamics is another one). Not only that, but for LLM&#8217;s VRAM\/RAM bandwidth is the single most important factor determining the performance of the system.<\/p>\n\n\n\n<p>This means that the 3955wx which I bought is going to be extremely sub-optimal for this purposes, and needs to be replaced.<\/p>\n\n\n\n<p>The problem is that it is extremely difficult to get actual, real world, values for RAM bandwidth online. <\/p>\n\n\n\n<p>Most sources only quote maximum bandwidth values, which is 200 GB\/s in 8 Channel mode. They make no reference to the CCD bandwidth issue, which can cause users to make bad purchasing decisions.<\/p>\n\n\n\n<p>After spending some time researching this issue, I have decided to prepare a small graph showing my results, with sources, so that hopefully other users will be more informed than I was.<\/p>\n\n\n\n<p><strong>Data<\/strong><\/p>\n\n\n\n<p>I am focusing entirely on DDR4 CPU&#8217;s here, since DDR5 is out of my budget at this point.<\/p>\n\n\n\n<p>I am open to expanding this graph if I can come across more data in the future.<\/p>\n\n\n\n<p>The HTML Graph is quite squashed, so I have uploaded an image instead (Click for full image):<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/imgur.com\/RtlqAIm\"><img decoding=\"async\" src=\"https:\/\/i.imgur.com\/RtlqAIml.jpg\" alt=\"\" title=\"source: imgur.com\"\/><\/a><\/figure>\n\n\n\n<p>HTML Graph:<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-regular\"><table><tbody><tr><td class=\"has-text-align-left\" data-align=\"left\"><strong>CPU<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Read Speed<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Write Speed<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Copy Speed<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>CCD&#8217;s<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Ram Type<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Ram Channels<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Ram Speed<\/strong><\/td><td><strong>Sources<\/strong><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">Ryzen 9 5950x<\/td><td class=\"has-text-align-center\" data-align=\"center\">54 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">54 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">&#8211;<\/td><td class=\"has-text-align-center\" data-align=\"center\">2<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">2<\/td><td class=\"has-text-align-center\" data-align=\"center\">3600<\/td><td><a href=\"https:\/\/www.kitguru.net\/components\/cpu\/luke-hill\/amd-threadripper-pro-5000-wx-series-three-cpus-tested\/7\/\">Link<\/a><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">Threadripper 3960x<\/td><td class=\"has-text-align-center\" data-align=\"center\">95 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">93 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">101 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">4<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">4<\/td><td class=\"has-text-align-center\" data-align=\"center\">3200<\/td><td><a href=\"https:\/\/www.servethehome.com\/amd-ryzen-threadripper-3970x-review-32-cores-of-madness\/2\/\">Link<\/a><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">Threadripper 3970x<\/td><td class=\"has-text-align-center\" data-align=\"center\">96 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">98 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">102 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">4<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">4<\/td><td class=\"has-text-align-center\" data-align=\"center\">3200<\/td><td><a href=\"https:\/\/www.servethehome.com\/amd-ryzen-threadripper-3970x-review-32-cores-of-madness\/2\/\">Link<\/a> <\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">Threadripper Pro 3955wx<\/td><td class=\"has-text-align-center\" data-align=\"center\">82GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">51 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">94 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">2<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">3200<\/td><td><a href=\"https:\/\/www.chiphell.com\/thread-2532086-1-1.html\">Link<\/a><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">Threadripper Pro 3975wx<\/td><td class=\"has-text-align-center\" data-align=\"center\">137-139 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">102 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">137 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">4<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">3200<\/td><td><a href=\"https:\/\/www.youtube.com\/watch?v=sTo6AFygo14\">Link<\/a><br><a href=\"https:\/\/www.anandtech.com\/show\/16805\/amd-threadripper-pro-review-an-upgrade-over-regular-threadripper\/8\">Link<\/a><br><a href=\"https:\/\/www.kitguru.net\/desktop-pc\/base-unit\/luke-hill\/lenovo-p620-threadripper-pro-3975wx-review\/3\/\">Link<\/a><br><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">Threadripper Pro 3995wx (64C\/128t)<\/td><td class=\"has-text-align-center\" data-align=\"center\">&#8211;<\/td><td class=\"has-text-align-center\" data-align=\"center\">149 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">&#8211;<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">3200<\/td><td><a href=\"https:\/\/www.anandtech.com\/show\/16805\/amd-threadripper-pro-review-an-upgrade-over-regular-threadripper\/8\">Link<\/a><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">Epyc 7302p<\/td><td class=\"has-text-align-center\" data-align=\"center\">115&nbsp; GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">85 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">128 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">4<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">2933<\/td><td><a href=\"https:\/\/www.reddit.com\/r\/Amd\/comments\/db1vza\/epyc_rome_7302p_benchmarks\/\">Link<\/a><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">Epyc 7443<\/td><td class=\"has-text-align-center\" data-align=\"center\">136 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">&#8211;<\/td><td class=\"has-text-align-center\" data-align=\"center\">&#8211;<\/td><td class=\"has-text-align-center\" data-align=\"center\">4<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">3200<\/td><td><a href=\"https:\/\/www.reddit.com\/r\/LocalLLaMA\/comments\/1ak2f1v\/ram_memory_bandwidth_measurement_numbers_for_both\/\">Link<\/a><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">Epyc 7551p<\/td><td class=\"has-text-align-center\" data-align=\"center\">154 GB\/S<\/td><td class=\"has-text-align-center\" data-align=\"center\">156 GB\/S<\/td><td class=\"has-text-align-center\" data-align=\"center\">145 GB\/S<\/td><td class=\"has-text-align-center\" data-align=\"center\">&#8211;<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">2600<\/td><td><a href=\"https:\/\/forums.servethehome.com\/index.php?threads\/amd-epyc-performance-impact-of-runing-4-channel-memory-instead-of-8-channels.21095\/\">Link<\/a><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">2xEpyc 7302p (Dual CPU)<\/td><td class=\"has-text-align-center\" data-align=\"center\">219 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">&nbsp;<\/td><td class=\"has-text-align-center\" data-align=\"center\">&#8211;<\/td><td class=\"has-text-align-center\" data-align=\"center\">4*2<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">8&#215;2<\/td><td class=\"has-text-align-center\" data-align=\"center\">2400<\/td><td><a href=\"https:\/\/www.reddit.com\/r\/LocalLLaMA\/comments\/1ak2f1v\/ram_memory_bandwidth_measurement_numbers_for_both\/\">Link<\/a><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">Threadripper Pro&nbsp; 5995wx<\/td><td class=\"has-text-align-center\" data-align=\"center\">160 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">171 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">&#8211;<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">3200<\/td><td><a href=\"https:\/\/www.reddit.com\/r\/threadripper\/comments\/1aghm2c\/8channel_memory_bandwidth_benchmark_results_of\/\">Link <\/a><br><a href=\"https:\/\/www.kitguru.net\/components\/cpu\/luke-hill\/amd-threadripper-pro-5000-wx-series-three-cpus-tested\/7\/\">Link<\/a><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">Threadripper Pro&nbsp; 5975wx<\/td><td class=\"has-text-align-center\" data-align=\"center\">147 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">171 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">&#8211;<\/td><td class=\"has-text-align-center\" data-align=\"center\">4<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">3200<\/td><td> <a href=\"https:\/\/www.kitguru.net\/components\/cpu\/luke-hill\/amd-threadripper-pro-5000-wx-series-three-cpus-tested\/7\/\">Link <\/a><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">Threadripper Pro&nbsp; 5965wx<\/td><td class=\"has-text-align-center\" data-align=\"center\">147GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">172 GB\/s<\/td><td class=\"has-text-align-center\" data-align=\"center\">&#8211;<\/td><td class=\"has-text-align-center\" data-align=\"center\">4<\/td><td class=\"has-text-align-center\" data-align=\"center\">DDR4<\/td><td class=\"has-text-align-center\" data-align=\"center\">8<\/td><td class=\"has-text-align-center\" data-align=\"center\">3200<\/td><td> <a href=\"https:\/\/www.kitguru.net\/components\/cpu\/luke-hill\/amd-threadripper-pro-5000-wx-series-three-cpus-tested\/7\/\">Link <\/a><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>Analysis<\/strong><\/p>\n\n\n\n<p>At my price point, it seems that 4 CCDs with 8 Channels is basically the best that can be achieved. This means that the actual read\/write speeds in the real world are in the region of 150 GB\/s read 100 Gb\/s Write.<\/p>\n\n\n\n<p>It seems, based on my research, that there is no inherent difference between Threadripper\/Pro and Epyc CPUs that have the same number of CCDs.<\/p>\n\n\n\n<p>IE, A 4 CCD Threadripper Pro will have similar read\/write speeds to a 4 CCD Epyc.<\/p>\n\n\n\n<p>This conflicts with some users that indicate that the Epyc series have superior RAM bandwidth even with the same number of CCDs. I have come across multiples sources, for example, that indicate tha the 7302p, with 4 CCD&#8217;s, can reach 200 GB\/s RAM bandwidth. I believe that this is incorrect, based on my research.<\/p>\n\n\n\n<p>For my budget, a CPU like the Epyc 7302p probably would have been a better choice. <\/p>\n\n\n\n<p>It is (substantially) less expensive than the threadripper, but has similar RAM bandwidth, due to it&#8217;s 4 CCD&#8217;s.<\/p>\n\n\n\n<p>However, I eventually chose to go with a Threadripper PRO 3975WX.<\/p>\n\n\n\n<p>There were several reasons for this.<\/p>\n\n\n\n<p>Firstly, the Threadripper PRO is a substantially more powerful CPU.<\/p>\n\n\n\n<p>It has a higher clock speed, higher boost clock, and twice as many cores (32 vs 16), and comfortable outperforms the Epyc in both single and multi core applications: <\/p>\n\n\n\n<p><br><a href=\"https:\/\/www.cpubenchmark.net\/compare\/3650vs3851\/AMD-EPYC-7302-vs-AMD-Ryzen-Threadripper-PRO-3975WX\">Comparision between Threadripper PRO 3975WX and Epyc 7302.<\/a><\/p>\n\n\n\n<p>CPU speed and Core count are not priorities for LLM inference, however they do make a difference, and more\/faster cores are still preferable.<\/p>\n\n\n\n<p>The motheboards for the Threadripper PRO are also better suited to my use case. They have more PCI-E x16 slots, and seem to have better support for peripherals, including graphics cards, etc. They also seem to be cheaper, at least in my location.<\/p>\n\n\n\n<p>Secondly, I did not want to have to return my Motherboard in addition to my CPU, since unlike the CPU, this is a bulky item.<\/p>\n\n\n\n<p>The Threadripper PRO 3975 is quite a bit more expensive than the 3955 and (especially) the 7302, but it was just about within budget.<\/p>\n\n\n\n<p><strong>Conclusion<\/strong><\/p>\n\n\n\n<p>It is vitally important that users buying hardware for LLM&#8217;s and other memory bandwidth intensive applications be aware of the bandwidth limits caused by CCD&#8217;s, otherwise the performance of CPU-based inference can be substantially lower than expected.<\/p>\n\n\n\n<p>In my case, for example, the 3955wx would have had a read speed of 82 GB\/s, when compared to the 137 GB\/s of the 3975!<\/p>\n\n\n\n<p>Of course, if the inference is mainly happening on the GPU, the CPU is far less important, but I intend to run large models, (70b, 120b, etc), which means that these memory bandwidth limitations will have a substantial effect on the performance of the system for its intended use case.<\/p>\n\n\n\n<p><\/p>\n\n\n<!-- wp:themify-builder\/canvas \/-->","protected":false},"excerpt":{"rendered":"<p>Introduction As I mentioned in my previous post (HERE), I have purchased a Threadripper PRO 3955WX CPU for the purposes of building an LLM inference machine. However, I have since discovered that there is a serious issue with using some Threadripper and Epyc CPU&#8217;s (Including the 3955wx) for this purpose. The issue is that these [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":10551,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37,12],"tags":[],"class_list":["post-10391","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-tutorials","has-post-title","has-post-date","has-post-category","has-post-tag","has-post-comment","has-post-author",""],"aioseo_notices":[],"builder_content":"","_links":{"self":[{"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/posts\/10391"}],"collection":[{"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/comments?post=10391"}],"version-history":[{"count":34,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/posts\/10391\/revisions"}],"predecessor-version":[{"id":10567,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/posts\/10391\/revisions\/10567"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/media\/10551"}],"wp:attachment":[{"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/media?parent=10391"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/categories?post=10391"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/phoenixgamedevelopment.com\/blog\/wp-json\/wp\/v2\/tags?post=10391"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}