# GPU Optimization for GameDev ### By Author - Emil Persson [@_Humus_](https://twitter.com/_Humus_) - [Blog](http://www.humus.name/) - <2013> [Low-Level Thinking in High-Level Shading Languages](https://www.gdcvault.com/play/1018182/Low-Level-Thinking-in-High) - <2014> [Low-Level Shader Optimization for Next-Gen and DX11](http://www.humus.name/Articles/Persson_LowlevelShaderOptimization.pptx) - <2018> [Rule of optimization](https://twitter.com/_Humus_/status/1011964081069330432) - Matt Pettineo [@mynameismjp](https://twitter.com/mynameismjp) - [Blog](https://therealmjp.github.io/) - <2018> [Breaking Down Barriers](https://therealmjp.github.io/posts/breaking-down-barriers-part-1-whats-a-barrier/) - <2021> [The Shader Permutation Problem](https://therealmjp.github.io/posts/shader-permutations-part1/) - Louis Bavoil [@louisbavoil](https://twitter.com/louisbavoil) - <2018> [The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload](https://devblogs.nvidia.com/the-peak-performance-analysis-method-for-optimizing-any-gpu-workload/) - <2018> [Fixing the Hyperdrive: Maximizing Rendering Performance on NVIDIA GPUs (Presented by NVIDIA)](https://www.gdcvault.com/play/1024810/Fixing-the-Hyperdrive-Maximizing-Rendering) - <2019> [Optimizing DX12/DXR GPU Workloads using Nsight Graphics: GPU Trace and the Peak-Performance-Percentage (P3) Method (Presented by NVIDIA)](https://www.gdcvault.com/browse/gdc-19/play/1026202) - <2020> [Optimizing Compute Shaders for L2 Locality using Thread-Group ID Swizzling](https://developer.nvidia.com/blog/optimizing-compute-shaders-for-l2-locality-using-thread-group-id-swizzling/) - [D3D11 Vendor Hacks](https://docs.google.com/spreadsheets/d/1J_HIRVlYK8iI4u6AJrCeb66L5W36UDkd9ExSCku9s_o/edit#gid=0) - Rys Sommefeldt [@ryszu](https://twitter.com/ryszu) - [Blog](https://rys.sommefeldt.com/) - <2018> [Understanding GPU context rolls](https://gpuopen.com/understanding-gpu-context-rolls/) - Michal Drobot [@michaldrobot](https://twitter.com/michaldrobot) - [Blog](https://michaldrobot.com/) - <2014> [Low Level Optimizations for GCN – Digital Dragons 2014](https://michaldrobot.com/2014/05/12/low-level-optimizations-for-gcn-digital-dragons-2014-slides/) - <2014> [GCN Execution Patterns in Full Screen Passes](https://michaldrobot.com/2014/04/01/gcn-execution-patterns-in-full-screen-passes/) - Kostas Anagnostou [@KostasAAA](https://twitter.com/KostasAAA) - [Blog](https://interplayoflight.wordpress.com/) - <2018> [DD2018: Kostas Anagnostou - Experiments in GPU occlusion culling](https://www.youtube.com/watch?v=U20dIA3SLTs) - <2020> [GPU ARCHITECTURE RESOURCES](https://interplayoflight.wordpress.com/2020/05/09/gpu-architecture-resources/) - <2020> [GPU ARCHITECTURE RESOURCES (twitter thread)](https://twitter.com/KostasAAA/status/1259153226043179011) - <2020> [WHAT IS SHADER OCCUPANCY AND WHY DO WE CARE ABOUT IT?](https://interplayoflight.wordpress.com/2020/11/11/what-is-shader-occupancy-and-why-do-we-care-about-it/) - <2020> [TO Z-PREPASS OR NOT TO Z-PREPASS](https://interplayoflight.wordpress.com/2020/12/21/to-z-prepass-or-not-to-z-prepass/) - Matthäus G. Chajdas [@NIV_Anteru](https://twitter.com/niv_anteru) - [Blog](https://anteru.net/) - <2018> [Introduction to compute shaders](https://anteru.net/blog/2018/intro-to-compute-shaders/) - <2018> [More compute shaders](https://anteru.net/blog/2018/more-compute-shaders/) - <2018> [Even more compute shaders](https://anteru.net/blog/2018/even-more-compute-shaders/) - [GPU database](https://db.thegpu.guru/) - Matthijs De Smedt [@anji_nl](https://twitter.com/anji_nl) - <2016> [PC GPU Performance Hot Spots](https://developer.nvidia.com/pc-gpu-performance-hot-spots) - Maurizio Cerrato [@speedwago](https://twitter.com/speedwago) - <2019> [GPU Architectures](https://drive.google.com/file/d/12ahbqGXNfY3V-1Gj5cvne2AH4BFWZHGD/view) - Sebastian Aaltonen [@SebAaltonen](https://twitter.com/SebAaltonen) - [Blog](https://www.secondorder.com/) - <2017> [Optimizing GPU occupancy and resource usage with large thread groups](https://gpuopen.com/learn/optimizing-gpu-occupancy-resource-usage-large-thread-groups/) - <2018> [DD2018: Sebastian Aaltonen - GPU based clay simulation and ray tracing tech in Claybook](https://www.youtube.com/watch?v=Xpf7Ua3UqOA) - <2018> [This is how I managed to port Claybook from consoles to ~4x slower handheld](https://threadreaderapp.com/thread/1076765876148490240.html) - Layla Mah [@MissQuickstep](https://twitter.com/missquickstep) - <2013> [The AMD GCN Architecture - A Crash Course](https://www.slideshare.net/DevCentralAMD/gs4106-the-amd-gcn-architecture-a-crash-course-by-layla-mah) - <2013> [Powering the Next Generation of Graphics: The AMD GCN Architecture](https://www.gdcvault.com/play/1019294/Powering-the-Next-Generation-of) - Sven Andersson [@andsve](https://twitter.com/andsve) - [Blog](http://svenandersson.se/) - <2014> [Real-time Rendering Blogs](http://svenandersson.se/2014/realtime-rendering-blogs.html) - Fabian Giesen [@rygorous](https://twitter.com/rygorous) - [Blog](https://fgiesen.wordpress.com/) - <2011> [A trip through the Graphics Pipeline 2011](https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/) - Timothy Lottes - <2016> [GCN Memory Coalescing](https://gpuopen.com/gcn-memory-coalescing/) - <2017> [ADVANCED SHADER PROGRAMMING ON GCN](http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/03/GDC2017-Advanced-Shader-Programming-On-GCN.pdf) - <2018> [Engine Optimization Hot Lap](https://32ipi028l5q82yhj72224m8j-wpengine.netdna-ssl.com/wp-content/uploads/2018/05/gdc_2018_sponsored_engine_optimization_hot_lap.pptx) - Robert Menzel [@renderpipeline](https://twitter.com/renderpipeline) - [Blog](http://renderingpipeline.com) - <2012> [Low-Level GPU Documentation](http://renderingpipeline.com/graphics-literature/low-level-gpu-documentation/) - Stephanie Hurlburt [@sehurlburt](http://stephaniehurlburt.com/blog) - <2016> [Casual Introduction to Low-Level Graphics Programming](http://stephaniehurlburt.com/blog/2016/10/28/casual-introduction-to-low-level-graphics-programming) - RasterGrid [@rastergrid](https://twitter.com/rastergrid) - [Blog](https://rastergrid.com/blog/) - <2021> [Understanding GPU caches](https://rastergrid.com/blog/gpu-tech/2021/01/understanding-gpu-caches/) ### By Organization - GDC - Search "Advanced Graphics" in [GDC Vault](https://gdcvault.com/) or in [GDC VAULT EXPLORER](https://yankooliveira.com/gdcvault/) - <2014> [Vertex Shader Tricks](https://www.slideshare.net/DevCentralAMD/vertex-shader-tricks-bill-bilodeau) - <2016> [Optimizing the Graphics Pipeline With Compute](https://www.gdcvault.com/play/1023109/Optimizing-the-Graphics-Pipeline-With) - <2016> [High-Performance, Low-Overhead Rendering with OpenGL and Vulkan](https://www.gdcvault.com/play/1023516/High-performance-Low-Overhead-Rendering) - <2016> [Practical DirectX 12](https://developer.nvidia.com/sites/default/files/akamai/gameworks/blog/GDC16/GDC16_gthomas_adunn_Practical_DX12.pdf) - <2017> [Wave Programming in D3D12 and Vulkan](http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/07/GDC2017-Wave-Programming-D3D12-Vulkan.pdf) - <2017> [D3D12 and Vulkan Done Right](http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/03/GDC2017-D3D12-And-Vulkan-Done-Right.pdf) - <2017> [Deep Dive: Asynchronous Compute](http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/03/GDC2017-Asynchronous-Compute-Deep-Dive.pdf) - <2019> [DirectX 12 Optimization Techniques in Capcom’s RE ENGINE](https://gpuopen.com/gdc-presentations/2019/gdc-2019-s4-optimization-techniques-re2-dmc5.pdf) - <2019> [A BLEND OF GCN OPTIMIZATION AND COLOR PROCESSING](https://gpuopen.com/gdc-presentations/2019/gdc-2019-s5-blend-of-gcn-optimization-and-color-processing.pdf) - <2019> [AMD GPU Performance Revealed](https://gpuopen.com/gdc-presentations/2019/gdc-2019-s6-gpu-performance-revealed.pdf) - Siggraph - <2020> [LOW-LEVEL OPTIMIZATIONS IN THE LAST OF US PART II](https://www.naughtydog.com/blog/naughty_dog_at_siggraph_2020) - AMD - [GPU Open](https://gpuopen.com/) - [Events Presentations](https://gpuopen.com/events/) - <2016> [Leveraging asynchronous queues for concurrent execution](https://gpuopen.com/concurrent-execution-asynchronous-queues/) - <2018> [Optimize your engine using compute @ 4C Prague 2018](https://gpuopen.com/wp-content/uploads/2018/11/4C-Prague-Compute-Shaders.pptx) | [(Youtube)](https://www.youtube.com/watch?v=0DLOJPSxJEg) - <2018> [Optimization with Radeon GPU Profiler - A Vulkan Case Study](https://gpuopen.com/wp-content/uploads/2018/01/Optimization-with-Radeon-GPU-Profiler.pptx) - <2019> [Triangles Are Precious](https://gpuopen.com/presentations/2019/nordic-game-2019-triangles-are-precious.pdf) - <2020> [Let’s build](https://gpuopen.com/lets-build/) - AMD Ryzen™ Processor Software Optimization - Optimizing for the Radeon™ RDNA Architecture - From Source to ISA: A Trip Down the Shader Compiler Pipeline - A Review of GPUOpen Effects - Curing Amnesia and Other GPU Maladies With AMD Developer Tools - Radeon™ ProRender Full Spectrum Rendering 2.0: The Universal Rendering API - <2020> [CONCURRENCY MODEL IN EXPLICIT GRAPHICS APIS](https://gpuopen.com/wp-content/uploads/2020/06/GPUOpen_Concurrency_vTUM.pdf) - <2020> [All the Pipelines - Journey through the GPU](https://gpuopen.com/videos/graphics-pipeline/) - GCN - [AMD GCN3 ISA Architecture Manual](https://gpuopen.com/compute-product/amd-gcn3-isa-architecture-manual/) - [AMD-FirePro/SDK on Github](https://github.com/AMD-FirePro/SDK/tree/master/documentation) - [GPUOpen-Drivers/pal on Github](https://github.com/GPUOpen-Drivers/pal) - <2017> ["Vega" Instruction Set Architecture](https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf) - <2019> [AMD GCN ISA: a first dive](https://giordi91.github.io/post/vegaisa/) - RDNA - <2019> [INTRODUCING RDNA ARCHITECTURE](https://www.amd.com/system/files/documents/rdna-whitepaper.pdf) - <2019> [RDNA Architecture](https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Architecture_public.pdf) - <2020> ["RDNA 1.0" Instruction Set Architecture](https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Shader_ISA_5August2019.pdf) - <2020> [RDNA Performance Guide](https://gpuopen.com/performance/) - <2020> ["RDNA 2" Instruction Set Architecture](https://developer.amd.com/wp-content/resources/RDNA2_Shader_ISA_November2020.pdf) - OpenCL - [AMD Accelerated Parallel Processing OpenCL Programming Guide](http://developer.amd.com/wordpress/media/2013/07/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide-rev-2.7.pdf) - RADEON GPU ANALYZER - [USING RADEON™ GPU ANALYZER WITH DIRECTX®12 GRAPHICS](https://gpuopen.com/learn/radeon-gpu-analyzer-2-3-direct3d-12-graphics/) - [USING RADEON™ GPU ANALYZER WITH DIRECT3D®12 COMPUTE](https://gpuopen.com/learn/radeon-gpu-analyzer-2-2-direct3d12-compute/) - Nvidia - [Developer Blog](https://developer.nvidia.com/blog) - <2015> [Constant Buffers without Constant Pain](https://developer.nvidia.com/content/constant-buffers-without-constant-pain-0) - <2016> [Reading Between The Threads: Shader Intrinsics](https://developer.nvidia.com/reading-between-threads-shader-intrinsics) - <2016> [DX12 Do's And Don'ts](https://developer.nvidia.com/dx12-dos-and-donts) - <2020> [Best Practices: Using NVIDIA RTX Ray Tracing](https://developer.nvidia.com/blog/best-practices-using-nvidia-rtx-ray-tracing/) - <2021> [Advanced API Performance](https://developer.nvidia.com/blog/tag/advanced-api-performance/) - Pascal - <2016> [NVIDIA GeForce GTX 1080 Whitepaper](http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf) - Turing - <2018> [NVIDIA Turing Architecture In-Depth](https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/) - <2018> [NVIDIA TURING GPU ARCHITECTURE](https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf) - CUDA - <2014> [CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics](https://devblogs.nvidia.com/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) - GTC - <2012> [GPU Performance Analysis and Optimization](http://on-demand.gputechconf.com/gtc/2012/presentations/S0514-GTC2012-GPU-Performance-Analysis.pdf) - Intel - [Gamedev](https://software.intel.com/en-us/gamedev) - [Intel® Processor Graphics: Architecture & Programming](https://doc.lagout.org/electronics/Intel-Graphics-Architecture-ISA-and-microarchitecture.pdf) - Microsoft - [DirectX-Specs](https://microsoft.github.io/DirectX-Specs/) - <2019> [New in D3D12 – background shader optimizations](https://devblogs.microsoft.com/directx/background-shader-optimizations/) - Arm - [Mali GPU Best Practices](https://developer.arm.com/solutions/graphics/developer-guides/mali-gpu-best-practices) - [Best Practices for Mobile Game Art Assets](https://developer.arm.com/solutions/graphics/developer-guides/best-practices-for-mobile-game-art-assets-1) - [Principles of High Performance](https://developer.arm.com/solutions/graphics/developer-guides/principles-of-high-performance) - [Accelerating 2D Applications](https://developer.arm.com/solutions/graphics/developer-guides/accelerating-2d-applications) - [Arm Vulkan Guides](https://developer.arm.com/solutions/graphics/apis/vulkan) - Khronos Group - [Vulkan Samples](https://github.com/KhronosGroup/Vulkan-Samples) - CMU - <2017> [Parallel Computer Architecture and Programming](http://15418.courses.cs.cmu.edu/tsinghua2017/home) - Misc - <2009> [From Shader Code to a Teraflop: How Shader Cores Work](https://web.archive.org/web/20181008131455/http://s09.idav.ucdavis.edu/talks/02_kayvonf_gpuArchTalk09.pdf) - <2016> [JP] [GPU最適化入門](https://www.slideshare.net/ssuser2e676d/gpu-65502505) - <2017> [Demystifying Asynchronous Compute](https://www.reddit.com/r/nvidia/comments/50dqd5/demystifying_asynchronous_compute/) - <2019> [Unity GPU culling experiments](https://www.mpc-rnd.com/unity-gpu-culling-experiments/) - <2019> [What's up with my branch on GPU?](https://aschrein.github.io/jekyll/update/2019/06/13/whatsup-with-my-branches-on-gpu.html) ### Pipeline Overview - <2011> [A trip through the Graphics Pipeline 2011](https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/) - <2015> [Life of a triangle - NVIDIA's logical pipeline](https://developer.nvidia.com/content/life-triangle-nvidias-logical-pipeline) - <2015> [Render Hell 2.0](https://simonschreibt.de/gat/renderhell/) - <2016> [How bad are small triangles on GPU and why?](http://www.g-truc.net/post-0662.html) - <2017> [GPU Performance for Game Artists](http://fragmentbuffer.com/gpu-performance-for-game-artists/) - <2019> [Understanding the anatomy of GPUs using Pokémon](https://www.ovh.com/blog/understanding-the-anatomy-of-gpus-using-pokemon/) ### Graphics Study - <2020> [Graphics Studies Compilation ](http://www.adriancourreges.com/blog/2020/12/29/graphics-studies-compilation/) ### For Artist - [WIP] [Unreal Art Optimization](https://unrealartoptimization.github.io/book/pipelines/pixel/) ### Database - [GPU shader memory operation performance test](https://github.com/sebbbi/perftest) - [GPUInfo](https://www.gpuinfo.org/) for Vulkan, OpenGL, OpenGL ES - [JP] [GPU Spec Database by HYPERでんち](https://dench.flatlib.jp/start#hardware) ### Tools - [Shader Playground](http://shader-playground.timjones.io/) - [NVIDIA Nsight Graphics](https://developer.nvidia.com/nsight-graphics) - [Radeon GPU Analyzer](https://gpuopen.com/rga/) - [Intel Graphics Performance Analyzers](https://software.intel.com/content/www/us/en/develop/tools/graphics-performance-analyzers.html) Thanks JoseEmilio-ARM for ARM part.