- 2011 - A trip through the Graphics Pipeline 2011
 - 2015 - Life of a triangle - NVIDIA's logical pipeline
 - 2015 - Render Hell 2.0
 - 2016 - How bad are small triangles on GPU and why?
 - 2017 - GPU Performance for Game Artists
 - 2019 - Understanding the anatomy of GPUs using Pokémon
 - 2020 - GPU ARCHITECTURE RESOURCES
 - 2020 - All the pipelines - journey through the GPU
 
- Emil Persson @Humus
 - Matt Pettineo @mynameismjp
 - Louis Bavoil @louisbavoil
- D3D11 Vendor Hacks
 - 2018 - The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload
 - 2018 - Fixing the Hyperdrive: Maximizing Rendering Performance on NVIDIA GPUs (Presented by NVIDIA)
 - 2019 - Optimizing DX12/DXR GPU Workloads using Nsight Graphics: GPU Trace and the Peak-Performance-Percentage (P3) Method (Presented by NVIDIA)
 - 2020 - Optimizing Compute Shaders for L2 Locality using Thread-Group ID Swizzling
 - 2021 - Dana Elifaz - The Next Level of Optimization Advice with Nsight Graphics: GPU Trace
 - 2022 - Optimizing Ray Tracing GPU Workloads using Nsight Graphics: GPU Trace and Nsight Systems
 
 - Rys Sommefeldt @ryszu
 - Michal Drobot @michaldrobot
 - Kostas Anagnostou @KostasAAA
- Blog
 - 2018 - DD2018: Kostas Anagnostou - Experiments in GPU occlusion culling
 - 2020 - GPU architecture resources
 - 2020 - GPU architecture resources (twitter thread)
 - 2020 - What is shader occupancy and why do we care about it?
 - 2020 - To z-prepass or not to z-prepass
 - 2022 - Shader tips and tricks
 - 2023 - Low-level thinking in high-level shading languages 2023
 - 2025 - The hidden cost of shader instructions
 - 2025 - Async compute all the things
 
 - Matthäus G. Chajdas @NIV_Anteru
- Blog
 - 2018 - Introduction to compute shaders
 - 2018 - More compute shaders
 - 2018 - Even more compute shaders
 
 - Matthijs De Smedt @anji_nl
- 2016 - PC GPU Performance Hot Spots
 
 - Maurizio Cerrato @speedwago
- 2019 - GPU Architectures
 
 - Sebastian Aaltonen @SebAaltonen
 - Layla Mah @MissQuickstep
 - Sven Andersson @andsve
- Blog
 - 2014 - Real-time Rendering Blogs
 
 - Fabian Giesen @rygorous
 - Timothy Lottes @NOTimothyLottes
- 2016 - Understanding Memory Coalescing on GCN
 - 2017 - ADVANCED SHADER PROGRAMMING ON GCN
 - 2018 - Engine Optimization Hot Lap
 - 2024 - Fixing The GPU
 
 - Robert Menzel @renderpipeline
- Blog
 - 2012 - Low-Level GPU Documentation
 
 - RasterGrid @rastergrid
- Blog
 - 2021 - Understanding GPU caches
 
 - Adam Sawicki @Reg__
 - Matías N. Goldberg @matiasgoldberg
 - Francesco Cifariello Ciardi @FCifaCiar
- Blog
 - 2018 - INTRO TO GPU SCALARIZATION
 
 - Sébastien Lagarde @SebLagarde
 - Bart Wronski @BartWronsk
 - Elizabeth Baumel @Icetigris
 - Anton Schreiner @antonschrein
 - Jendrik Illner @jendrikillner
- Blog
 - Graphics Programming Weekly Article Database Not specifically on optimization. Have a search bar.
 
 - Hans-Kristian @Themaister
 - Graham Wihlidal @gwihlidal
 - Alexandre Sabourin @AlexSneezeKing
 - Chips and Cheese @chipsandcheese9
 
- AMD
- GPU Open
- Events Presentations
 - AMD GPU architecture programming documentation (Instruction Set Architecture)
 - 2014 - Vertex Shader Tricks
 - 2016 - Getting the Most Out of Delta Color Compression
 - 2016 - Leveraging asynchronous queues for concurrent execution
 - 2016 - AMD GCN Assembly: Cross-Lane Operations
 - 2017 - Wave Programming in D3D12 and Vulkan
 - 2017 - D3D12 and Vulkan Done Right
 - 2017 - Deep Dive: Asynchronous Compute
 - 2018 - Optimize your engine using compute @ 4C Prague 2018 | (Youtube)
 - 2018 - Optimization with Radeon GPU Profiler - A Vulkan Case Study
 - 2019 - DirectX 12 Optimization Techniques in Capcom’s RE ENGINE
 - 2019 - A BLEND OF GCN OPTIMIZATION AND COLOR PROCESSING
 - 2019 - AMD GPU Performance Revealed
 - 2019 - Triangles Are Precious
 - 2020 - Let’s build
 - 2020 - CONCURRENCY MODEL IN EXPLICIT GRAPHICS APIS
 - 2021 - Understanding Graphs in Radeon GPU Profiler and GPUView
 - 2022 - Let's talk about (GPU) crashes
 - 2022 - Compute Shaders @ GIC
 - 2023 - Occupancy explained
 - 2024 - Occupancy explained through Insert picture the AMD RDNA architecture
 - 2024 - Mesh shaders: optimization and best practices
 
 - GCN
- 2013 - GCN3 Instruction Set Architecture
 - 2019 - AMD GCN ISA: a first dive
 - 2020 - Understanding AMD GPU ISA Video
 
 - RDNA
- 2019 - INTRODUCING RDNA ARCHITECTURE
 - 2019 - RDNA Architecture
 - 2020 - "RDNA 1.0" Instruction Set Architecture
 - 2020 - "RDNA 2" Instruction Set Architecture
 - 2022 - "RDNA3" Instruction Set Architecture
 - 2024 - "RDNA3.5" Instruction Set Architecture
 - 2025 - "RDNA4" Instruction Set Architecture
 - RDNA Performance Guide
 
 - OpenCL
 - Radeon GPU Analyzer / Radeon Raytracing Analyzer
- 2017 - Live VGPR Analysis with Radeon™ GPU Analyzer
 - 2019 - USING RADEON™ GPU ANALYZER WITH DIRECTX®12 GRAPHICS
 - 2019 - USING RADEON™ GPU ANALYZER WITH DIRECT3D®12 COMPUTE
 - 2022 - Visualizing VGPR Pressure with Radeon™ GPU Analyzer 2.6
 - 2022 - Improving raytracing performance with the Radeon™ Raytracing Analyzer (RRA)
 
 - Driver Stack
 
 - GPU Open
 - Nvidia
- Developer Blog and Talks
- Advanced API Performance on various topics
 - 2012 - GPU Performance Analysis and Optimization
 - 2015 - Constant Buffers without Constant Pain
 - 2016 - Practical DirectX 12
 - 2016 - Reading Between The Threads: Shader Intrinsics
 - 2016 - DX12 Do's And Don'ts
 - 2016 - High-Performance, Low-Overhead Rendering with OpenGL and Vulkan
 - 2019 - Tips and Tricks: Ray Tracing Best Practices
 - 2020 - Optimizing Graphics Applications using Nsight Systems and Nsight Graphics
 - 2020 - RTX Ray Tracing Best Practices
 - 2021 - Advanced API Performance
 - 2022 - Best Practices for Using NVIDIA RTX Ray Tracing (Updated)
 - 2023 - Practical Tips for Optimizing Ray Tracing
 - 2023 - Avoiding Stalls and Hitches in DirectX 12
 - 2023 - How to Improve Shader Performance by Resolving LDC Divergence
 - 2024 - Shader Debugging Made Easy with NVIDIA Nsight Graphics
 
 - Pascal
 - Turing
 - Ampere
 - Ada
 - Blackwell
 - CUDA
- 2014 - CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics
 - 2017 - CUDA kernel-level experiments in NVIDIA Nsight on Issue Efficiency, Memory Statistics, Pipe Utilization, etc.
 
 - Driver Stack
 - Misc
 
 - Developer Blog and Talks
 - Apple
 - Intel
 - Arm
 - Microsoft
 - Khronos Group
 - GDC
- Advanced Graphics Summit Not specifically on optimization
 
 - Digital Dragon
- Video Not specifically on optimization
 
 - Graphics Programming Conference
- Video Not specifically on optimization
 
 - (JP) CEDEC
- 2016 - GPU最適化入門
- (Book) マンガとイラストでわかる! GPU最適化入門
 
 
 - 2016 - GPU最適化入門
 - SIGGRAPH
- Advances in Real-Time Rendering in Games Not specifically on optimization
 - 2009 - From Shader Code to a Teraflop: How Shader Cores Work
 - 2020 - LOW-LEVEL OPTIMIZATIONS IN THE LAST OF US PART II
 
 - CMU
 
- 2018 - Aftermath: Advances in GPU Crash Debugging
 - 2020 - (JP) Device Removal の処方箋, 補足資料
 - 2023 - GPU Crash Debugging in Unreal Engine: Tools, Techniques, and Best Practices | Unreal Fest 2023
 
- GPU Specs Database by techpowerup
 - GPU database by Matthäus G. Chajdas
 - GPUInfo by Sascha Willems For Vulkan, OpenGL, OpenGL ES
 - D3d12infoDB by Dmytro Bulatov Database based on D3d12info in Tools section below
 - (JP) GPU Spec Database by HYPERでんち
 
- Online Shader Compiler
- Compiler Explorer (godbolt) Support DXC, AMD RGA
 - Shader Playground Support DXC, FXC, glslang, hlsl2glsl, hlslparser, IntelShaderAnalyzer, AMD RGA, slang, XShaderCompiler
 
 - Microsoft
 - Nvidia
 - AMD
- Radeon Developer Tool Suite
- Radeon GPU Profiler (RGP) Low-level optimization tool
 - Radeon Memory Visualizer (RMV)
 - Radeon Developer Panel (RDP)
- Driver Experiments Low-level control of the AMD Adrenalin driver
 
 - Radeon GPU Analyzer (RGA) Offline compiler and performance analysis tool
 - Radeon Raytracing Analyzer (RRA)
 - Radeon GPU Detective (RGD) Post-mortem analysis of GPU crashes
 - 2024 - Game Optimization with The Radeon Developer Tool Suite
 
 - GPU Reshape On-the-fly instrumentation of GPU operations with instruction level validation of potentially undefined behavior
- 2024 - Introducing GPU Reshape - Video
 
 
 - Radeon Developer Tool Suite
 - Intel
 - Other related tools
- RenderDoc Graphics debugger that allows quick and easy single-frame capture and detailed introspection
 - APITrace Trace OpenGL, Direct3D, and DirectDraw APIs calls to a file and replay
 - PerfTest A simple GPU shader memory operation performance test tool. Results on a wide range of GPUs are already available
 - D3d12info by Adam Sawicki Get GPU information through DXGI and Direct3D 12 (D3D12) + AMD AGS, NVAPI, WinAPI, and some other sources
 
 
Thanks JoseEmilio-ARM for ARM part.