- OS - High Sierra 10.13
- Tensorflow - 1.4
- Xcode command line tools - 7.2 (Download from here: Xcode - Support - Apple Developer & Switch to different clang version: sudo xcode-select --switch/Library/Developer/CommandLineTools & check version: clang -v)
- Cmake - 3.7
- Bazel - 0.7.0
- CUDA - 9
- cuDNN - 7
- sudo pip install six numpy wheel
- brew install coreutils
-
tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
-
tensorflow/core/kernels/split_lib_gpu.cu.cc
-
tensorflow/core/kernels/concat_lib_gpu.impl.cu.cc
For example,
extern shared __align(sizeof(T))__ unsigned char smem[];=>extern shared unsigned char smem[];