In general, check the `crt/host_config.h` file to find out which versions are supported.
Sometimes it is possible to hack the requirements there to get some newer versions working, too :)

Thrust version can be found in `$CUDA_ROOT/include/thrust/version.h`.

Download Archives: https://developer.nvidia.com/cuda-toolkit-archive

Release notes for CUDA Toolkit (CTK):
- 11.5:     https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
- 11.4.2:   https://docs.nvidia.com/cuda/archive/11.4.2/
- 11.4.1:   https://docs.nvidia.com/cuda/archive/11.4.1/
- 11.4.0:   https://docs.nvidia.com/cuda/archive/11.4.0/
- 11.3:     https://docs.nvidia.com/cuda/archive/11.3.0/index.html
- 11.2:     https://docs.nvidia.com/cuda/archive/11.2.2/index.html
- 11.1:     https://docs.nvidia.com/cuda/archive/11.1.1/index.html
- 11.0:     https://docs.nvidia.com/cuda/archive/11.0/cuda-toolkit-release-notes/index.html
- 10.2:     https://developer.download.nvidia.com/compute/cuda/10.2/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 10.1:     https://developer.download.nvidia.com/compute/cuda/10.1/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 10.0:     https://developer.download.nvidia.com/compute/cuda/10.0/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 9.2:      https://developer.download.nvidia.com/compute/cuda/9.2/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 9.1:      https://developer.download.nvidia.com/compute/cuda/9.1/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 9.0:      https://developer.download.nvidia.com/compute/cuda/9.0/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 8.0:      https://developer.nvidia.com/compute/cuda/8.0/Prod2/docs/sidebar/CUDA_Toolkit_Release_Notes-pdf
- 7.5:      http://developer.download.nvidia.com/compute/cuda/7.5/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 7.0:      http://developer.download.nvidia.com/compute/cuda/7_0/Prod/doc/CUDA_Toolkit_Release_Notes.pdf
- 6.5:      http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_Toolkit_Release_Notes.pdf
- 6.0:      http://developer.download.nvidia.com/compute/cuda/6_0/rel/docs/CUDA_Toolkit_Release_Notes.pdf
- 5.5:      http://developer.download.nvidia.com/compute/cuda/5_5/rel/docs/CUDA_Toolkit_Release_Notes.pdf

Version notes Nvidia HPC SDK:
- https://docs.nvidia.com/hpc-sdk/hpc-sdk-release-notes/index.html

# Compatibility Guarantees

[Quote](https://docs.nvidia.com/deploy/cuda-compatibility/index.html):

- CUDA 10.0: First introduced in CUDA 10, the CUDA Forward Compatible Upgrade is designed to allow users to get access to new CUDA features and run applications built with new CUDA releases on systems with older installations of the NVIDIA datacenter GPU driver.
- CUDA 11.1: First introduced in CUDA 11.1, CUDA Enhanced Compatibility provides two benefits:
  - By leveraging semantic versioning across components in the CUDA Toolkit, an application can be built for one CUDA minor release (such as 11.1) and work across all future minor releases within the major family (such as 11.x).
  - CUDA has relaxed the minimum driver version check and thus no longer requires a driver upgrade with minor releases of the CUDA Toolkit.


# nvcc

Latest, officical Compiler requirements: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

| CUDA version | SM Arch | g++   | icpc | pgc++  | xlC    | MSVC   | clang++ | Linux driver | thrust | note       |
| ------------ | ------- | ----- | ---- | ------ | ------ | ------ | ------- |  ------ | ------ | ---------- |
|    1.0       | 1.0-1.1 | ?     | ?    |    ?   |        |        |         |         |        |            |
|    1.1       | 1.0-1.1 | ?     | ?    |    ?   |        |        |         |         |        |            |
|    2.0       | 1.0-1.1 | ?     | ?    |    ?   |        |        |         |         |        |            |
|    2.1       | 1.0-1.3 | ?     | ?    |    ?   |        |        |         |         |        |            |
|    2.3.1     | 1.0-1.3 | ?     | ?    |    ?   |        |        |         |         |        |            |
|    3.0       | 1.0-2.0 | ?     | ?    |    ?   |        |        |         |         |        |            |
|    3.1       | 1.0-2.0 | ?     | ?    |    ?   |        |        |         |         |        |            |
|    3.2       | 1.0-2.1 | ?     | 11.1 |    ?   |        |        |         |        |        |            |
|    4.0       | 1.0-2.1 |  <=4.4 | 11.1 |    ?   |        |        |         |        |        |            |
|    4.1       | 1.0-2.1 |  <=4.5 | 11.1 |    ?   |        |        |         |        |        |            |
|    4.2       | 1.0-2.1 |  <=4.6 | 11.1 |    ?   |        |        |         |        |        |            |
|    5.0       | 1.0-3.? |  <=4.6 | 11.1 |    ?   |        |        |         |   ?    | 1.5.3  |            |
|    5.5       | 1.0-3.? |  <=4.8 | 12.1 |    ?   |        |        |         |   ?     | 1.7.0 | C++11 on *host side* supported; ICC fixed to build `20110811` |
|    6.0       | 1.0-5.0 |  <=4.8 | 13.1 |    ?   |        |        |         | 331.62 | 1.7.1  |            |
|    6.5       | 1.1-5.X |  <=4.8 | 14.0 |    ?   |        |   ?    |         |   ?    | 1.7.2  | experimenal *device side* C++11 support; including this version, `<thrust/sort.h>` skrews up `__CUDA_ARCH__` (must be undefined on host); deprecation of SM 11-13 (10 removed) |
| 7.0.17 (RC)  | s. below|  <=4.9 | 15.0 | >=14.9 | 13.1.1 |   ?    |         | 346.29 | 1.8.0  |first *official* PGI support, first xlc string found; powerpc64 w. little endian supported |
|    7.0.27    | 2.0-5.X |  <=4.9 | 15.0 | >=14.9 | 13.1.1 | 2010-13 |         | 346.46 | 1.8.1 | official C++11 support on *device side*           |
|    7.5       |         |  <=4.9 | 15.0 |  15.4  | 13.1   | 2010-13 | 3.5-3.6 | 352.41?| 1.8.2 | clang (host) on linux supported, `__CUDACC_VER__` macros added |
|    7.5.18    | 2.0-5.X |  <=4.9 | 15.0 |  15.4  | 13.1   | 2010-13 |         | 352.39 | 1.8.2 |            |
|    8.0.44    | 2.0-6.X |  <=5.3 | 15.0(.4)-16.0 | 16(.3)+  | 13.1(.2)| 2012-15| 3.8-3.9 | 367.48 | 1.8.3-patch2 | sm_60 (pascal) support added |
|    8.0.61    | 2.0-6.X |  <=5.3 | 15.0(.4)-17.0 | 16(.3)+  | 13.1(.2)| 2012-15| 3.8-3.9 | 375.26 | 1.8.3-patch2 | nvcc 8 is incompatible with `std::tuple` in gcc 5.4+ |
| 9.0.69 (RC)  | 3.0-7.0 | <=5.5 (<=6) | 15.0(.4)-17.0 | 17  | 13.1(.2)| 2012-17| 3.8-3.9  | ???.?? | 1.9.0-patch4 | *device-side* C++14; `__CUDACC_VER__` deprecated for `__CUDACC_VER_MAJOR/MINOR/BUILD__` |
| 9.0.103 (RC) | 3.0-7.0 | <=5.5 (<=6) | 15.0(.4)-17.0 | 17  | 13.1(.2)| 2012-17| 3.8-3.9  | 384.59 | 1.9.0-patch4 | same as above, `__CUDACC_VER__` defined as `#error` rendering it fully broken |
| 9.0.176      | 3.0-7.0 | <=5.5 (<=6) | (15.0-)17.0 | 17.1  | 13.1(.5)| 2012-17| (3.8-)3.9  | 384.81 | 1.9.0-patch5 | same as above |
| 9.1.85       | 3.0-7.2 | <=5.5 (<=6) | (15.0-)17.0 | 17.X  | 13.1(.6)| 2012-17| (3.8-)4.0  | 390.46 | 1.9.1-patch2 | `math_functions.hpp` moved to `crt/` |
| 9.1.85.1     |         |      |              |      |          |       |            |        |              | cuBLAS 9.1.128: Volta GEMM kernels optimized |
| 9.1.85.2     |         |      |              |      |          |       |            |        |              | ptxas: fix address calculations using large immediate operands |
| 9.1.85.3     |         |      |              |      |          |       |            |        |              | cuBLAS: fixes to GEMM optimizations for convolutional sequence to sequence (seq2seq) models. |
| 9.0-9.1      |         |      |             |         |               |        |           |        |              | nvcc 9.0-9.1 is incompatible with `std::tuple` in gcc 6+ |
| 9.2.88       | 3.0-7.2 | <=7.3.0 (<=7) | (15.0-)17.0 | 17-18.X | 13.1(.6),16.1 | 2012-17| (3.8-)5.0 | 396.26 | 1.9.2        | CUTLASS 1.0 added; `std::tuple` fixed (prior GCC 6 issues) |
| 9.2.148      |         |      |             |         |               |        |           | 396.37 | 1.9.2        |  |
| 10.0.130     | 3.0-7.5 |  <=7 | (15.0-)18.0 | 17-18.X | 13.1, 16.1    | 2013-17| (3.8-)6.0 | 410.48 | 1.9.3        | [CUDA Forward Compatible Upgrade](https://docs.nvidia.com/deploy/cuda-compatibility/index.html)  |
| 10.1.105     | 3.0-7.5 |  <=8 | (15.0-)19.0 | 17-19.X |               | 2013-19| (3.8-)7.0 | 418.39 | 1.9.4        |  |
| 10.1.168     |         |      |             |         |               |        | (3.8-)8.0 | 418.67 |              | 10.1 "Update 1" |
| 10.1.243     |         |      |             |         |               |        |           | 418.87 |              | 10.1 "Update 2" |
| 10.2.89      | 3.0-7.5 |  <=8 | (15.0-)19.0 | 18-19.X | 13.1, 16.1    | 2015-19| (3.3-)8.* | 440.33.01 | 1.9.7     | sm_30,35,37,50 deprecated; `nvcc`: `-allow-unsupported-compiler` |
| 11.0.1 (RC) NVCC:11.0.167 | 3.5-8.0 | (5-)6-9.* | (15.0-)19.1 | 18-20.1 | 13.1, 16.1    | 2015-19| 3.2-9.0.0 | 450.36.06 | 1.9.9      | macOS dropped; libs drop pre-C++11, deprecate pre-C++14 (GCC < 5, Clang < 6, and MSVC < 2017); Arm C/C++ 19.2 support |
| 11.0.2-1 NVCC:11.0.194 |         |     |             |         |               |        | (3.3/)6-9.0.0 | 450.51.05 |            | `nvcc`: `--Wext-lambda-captures-this` |
| 11.0.3 NVCC:11.0.221 | ?       | ?    |  ?          |  ?      | ?             |  ?     | ? | 450.51.06 | ?            | 11.0 "Update 1"; `nvcc`: `--forward-unknown-to-host-compiler`, `--forward-unknown-to-host-linker` flags |
| 11.1.0 NVCC:11.1.74 | 3.5-8.6 | (5-)6-10.0    | (15.0-)19.1 | 18-20.1 | 13.1, 16.1 | 2017-19 | (3.3/)6-10.0.0 | 455.23.05 | 1.9.10-1             | Ubuntu@ppc64le deprecated; [CUDA Enhanced Compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) |
| 11.1.1 NVCC:11.1.?   |         |      |             |         |               |        |   |        ?  | ?            | ? |
| 11.2.0 NVCC:11.2.67  |         |      |             |         |               |        |   | 460.27.04 | 1.10.0            |   |
| 11.2.1 NVCC:.......  |         |      |             |         |               |        |   | 460.32.03 | ?            | "Update 1"  |
| 11.2.2 NVCC:.......  |         |      |             |         |               |        |   | 460.32.03 | ?            | "Update 2"  |
| 11.3.0 NVCC:....  |         |      |             |         |               |        |   | 465.19.01 | ?            | `cu++flt` added, Python Driver/RT bindings, `alloca()` |
| 11.4.0 NVCC:11.4.48  |         | 6.0-...  |             |         |               |        |   | 470.42.01 | ?            | sm30,32 and Ubuntu 16.04 dropped, C++11 stdlib for math |
| 11.4.1 NVCC:11.4.100 |         | 6.0-11.0  |             |         |               |        | ...-12.0 | 470.57.02 | ?            | 11.4 "Update 1", fix g++ 10 issues with chrono headers of libstdc++; Ubuntu 16.04 dropped (x86) |
| 11.4.2 NVCC:... |         | ...  |             |         |               |        |    | ... | ?            | ... |
| 11.5.0 NVCC:... |         | 6.0-11.0  |             |         |               |        | ...-12.0  | 495.29.05 | ?            | ... |
| **CUDA version** | **SM Arch** | **g++**   | **icpc** | **pgc++**  | **xlC**    | **MSVC**   | **clang++** | **Linux driver** | **thrust** | **note**       |


Note: empty cells generally mean "same as above" for readability.

macOS: As of 7.0, clang seems to be the only supported compiler on OSX (but no version check found).
CUDA 10.1.243 adds support for Xcode 10.2 . CUDA 11.0 dropped macOS support.

Compilers such as pgC, icc, xlC are only supported on x86 linux and little endian.

Dynamic parallelism was added with `sm_35` and CUDA `5.0`.

Newer CUDA releases have a per-release support matrix for compilers, which also lists supported kernel and glibc versions: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements

# clang++ -x cuda

clang++ can compile CUDA C++ to ptx as well.
Give it a whirl!

| clang++ | supported CUDA release | supported SMs |
| ------- | ---------------------- | ------------- |
| 3.9-5.0 | 7.0-8.0                | 2.0-(5.0)6.0  |
| 6.0     | [7.0-9.0](https://github.com/llvm-mirror/clang/blob/release_60/include/clang/Basic/Cuda.h) | [(2.0)3.0-7.0](https://github.com/llvm-mirror/clang/blob/release_60/lib/Basic/Targets/NVPTX.cpp#L163-L188) |
| 7.0     | [7.0-9.2](https://github.com/llvm-mirror/clang/blob/release_70/include/clang/Basic/Cuda.h) | [(2.0)3.0-7.2](https://github.com/llvm-mirror/clang/blob/release_70/lib/Basic/Targets/NVPTX.cpp#L196-L223) |
| 8.0     | [7.0-10.0](https://github.com/llvm-mirror/clang/blob/release_80/include/clang/Basic/Cuda.h) | [(2.0)3.0-7.5](https://github.com/llvm-mirror/clang/blob/release_80/lib/Basic/Targets/NVPTX.cpp#L199-L228) |
| 9.0     | [7.0-10.1](https://github.com/llvm-mirror/clang/blob/release_90/include/clang/Basic/Cuda.h) | [(2.0)3.0-7.5](https://github.com/llvm-mirror/clang/blob/release_90/lib/Basic/Targets/NVPTX.cpp#L204-L233) |
| 10.0    | [7.0-10.1](https://github.com/llvm/llvm-project/blob/llvmorg-10.0.0/clang/include/clang/Basic/Cuda.h)               | [(2.0)3.0-7.5](https://github.com/llvm/llvm-project/blob/llvmorg-10.0.0/clang/lib/Basic/Targets/NVPTX.cpp#L204-L233)  |
| 11.0    | [7.0-11.0](https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0/clang/include/clang/Basic/Cuda.h)               | [(2.0)3.0-8.0](https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0/clang/lib/Basic/Targets/NVPTX.cpp#L209-L240)  |
| 12.0rc5 | [7.0-11.0](https://github.com/llvm/llvm-project/blob/llvmorg-12.0.0-rc5/clang/include/clang/Basic/Cuda.h#L22-L33)               | [(2.0)3.0-8.0](https://github.com/llvm/llvm-project/blob/llvmorg-12.0.0-rc5/clang/lib/Basic/Targets/NVPTX.cpp#L217-L248)  |
| main    | [7.0-11.2](https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/Cuda.h) | [(2.0)3.0-8.6](https://github.com/llvm/llvm-project/blob/main/clang/lib/Basic/Targets/NVPTX.cpp#L220-L253) |

https://llvm.org/docs/CompileCudaWithLLVM.html

# Device-Side C++ Standard Support

C++ core language features:

|               | supported C++ standard | notes             |
| ------------- | ---------------------- | ----------------- |
| nvcc    -6.0  | c++03                  |                   |
| nvcc 6.5      | c++03, exp. c++11      | undocumented      |
| nvcc 7.0-8.0  | c++03,11               | only c++11 switch |
| nvcc 9.0-10.2 | c++03,11,14            | 10.2 adds `libcu++` (atomics); open repository: https://github.com/NVIDIA/libcudacxx/releases |
| nvcc 11.0.167+| c++03,11,14,17         | C++11 host compiler needed for math libs; ships C++11-compatible backport of the [C++20 synchronization library](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1135r5.html); device LTO added; starting with CUDA Toolkit 11.0.1, `nvcc` and CUDA Toolkit versions are not equivalent anymore |
| clang 5+      | c++03,11,14,17         |                   |
| clang 6+      | c++03,11,14,17,2a      |                   |
| clang 10+     | c++03,11,14,17,20      |                   |
| clang trunk   | c++03,11,14,17,20      | [status](https://clang.llvm.org/cxx_status.html) |

CUDA-enabled C++ standard library [`libcu++`](https://github.com/NVIDIA/libcudacxx), based on LLVM's `libc++` ([docs](https://nvidia.github.io/libcudacxx/)):

|               | introduced components                | notes                     |
| ------------- | ------------------------------------ | ------------------------- |
| CUDA 10.2     | `<atomic>` (SM6.0+), `<type_traits>` | introduction of `libcu++` |
| CUDA 11.0     | ``atomic<T>::wait/notify``, ``<barrier>``, ``<latch>``, ``<counting_semaphore>``(SM7.0+), ``<chrono>``, ``<ratio>``, ``<functional>`` w/o ``function`` | anticipated with [GTC 2020 slides](https://on-demand.gputechconf.com/supercomputing/2019/video/sc1942-the-cuda-c++-standard-library/) |
| CUDA 11.2     | `cuda::std::tuple`,`pair` | [notes](https://github.com/NVIDIA/libcudacxx/releases/tag/1.3.0) |
| CUDA next     | `cuda::std::complex`, backports: `chrono`, `type_traits` | [notes](https://github.com/NVIDIA/libcudacxx/releases/tag/1.4.0) |
| newer         | see the [release notes](https://github.com/NVIDIA/libcudacxx/releases) and [api docs](https://nvidia.github.io/libcudacxx/api.html) | all open source now |

[Incremental `libcu++` release goals (GTC 2020):](https://developer.download.nvidia.com/video/gputechconf/gtc/2020/presentations/cwe21284.pdf)
- Version 1 (CUDA 10.2): ``<atomic>``(SM6.0+), ``<type_traits>``.
- Version 2 (CUDA next): ``atomic<T>::wait/notify``, ``<barrier>``, ``<latch>``, ``<counting_semaphore>``(SM7.0+), ``<chrono>``, ``<ratio>``, ``<functional>``minus function.
- Future priorities: ``atomic_ref<T>``, ``<complex>``, ``<tuple>``, ``<array>``, ``<utility>``, ``<cmath>``, string processing, ...

## NVC++

NVC++ is a unified C++ compiler and GPU-accelerated STL for the CUDA platform.
It also seems to support [OpenACC](https://twitter.com/matcolgrove/status/1263531645312745473).
NVC++ does [currently](https://twitter.com/blelbach/status/1261455345353809920) not support the CUDA C++ language.

|               | supported C++ standard | notes             |
| ------------- | ---------------------- | ----------------- |
| nvc++  11.0   | ...,c++17              | initial release, ships C++11-compatible backport of the [C++20 synchronization library](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1135r5.html) |

All GPU compilers are [cheese](https://twitter.com/blelbach/status/1261247268713160704).