Forked from Willian-Zhang/tensorflow_1_8_high_sierra_gpu.md
          
        
    
          Created
          August 3, 2021 18:31 
        
      - 
      
 - 
        
Save datatalking/15320ca9a3ed27dd438b0ddfc96fc60e to your computer and use it in GitHub Desktop.  
Revisions
- 
        
Willian-Zhang revised this gist
May 11, 2018 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,4 +1,4 @@ # Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.4 Largely based on the [Tensorflow 1.6 gist](https://gist.github.com/mattiasarro/1f3498a26ad111a8d99199eaf64551be), and [Tensorflow 1.7 gist for xcode](https://gist.github.com/pavelmalik/d51036d508c8753c86aed1f3ff1e6967)  - 
        
Willian-Zhang revised this gist
May 11, 2018 . 1 changed file with 33 additions and 33 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,7 +1,8 @@ # Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.4 for eGPU Largely based on the [Tensorflow 1.6 gist](https://gist.github.com/mattiasarro/1f3498a26ad111a8d99199eaf64551be), and [Tensorflow 1.7 gist for xcode](https://gist.github.com/pavelmalik/d51036d508c8753c86aed1f3ff1e6967) and [Tensorflow 1.7 gist for eGPU](https://gist.github.com/Willian-Zhang/088e017774536880bd425178b46b8c17), this should hopefully simplify things a bit. ## Requirements @@ -196,11 +197,11 @@ git checkout v1.8.0 ``` #### Apply Patch Apply the following [patch](https://gist.github.com/Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b#file-xtensorflow18macos-patch) to fix a couple build issues: ``` bash wget https://gist.github.com/Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b/raw/xtensorflow18macos.patch git apply xtensorflow18macos.patch ``` @@ -362,64 +363,63 @@ wget https://gist.github.com/Willian-Zhang/290dceb96679c8f413e42491c9 python mnist_cnn.py ``` ``` /usr/local/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters Using TensorFlow backend. x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples Train on 60000 samples, validate on 10000 samples Epoch 1/12 2018-05-11 04:51:10.335377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero 2018-05-11 04:51:10.336052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645 pciBusID: 0000:c4:00.0 totalMemory: 11.00GiB freeMemory: 9.37GiB 2018-05-11 04:51:10.336075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-05-11 04:51:11.063831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-05-11 04:51:11.063856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-05-11 04:51:11.063864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-05-11 04:51:11.064768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9065 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1) 2018-05-11 04:51:11.534095: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered 2018-05-11 04:51:11.579370: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered 2018-05-11 04:51:11.644835: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered 59264/60000 [============================>.] - ETA: 0s - loss: 0.2604 - acc: 0.92082018-05-11 04:51:19.228205: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered 60000/60000 [==============================] - 10s 159us/step - loss: 0.2588 - acc: 0.9213 - val_loss: 0.0561 - val_acc: 0.9829 Epoch 2/12 60000/60000 [==============================] - 4s 66us/step - loss: 0.0875 - acc: 0.9742 - val_loss: 0.0427 - val_acc: 0.9857 Epoch 3/12 60000/60000 [==============================] - 4s 67us/step - loss: 0.0662 - acc: 0.9803 - val_loss: 0.0356 - val_acc: 0.9875 Epoch 4/12 60000/60000 [==============================] - 4s 67us/step - loss: 0.0549 - acc: 0.9839 - val_loss: 0.0325 - val_acc: 0.9896 Epoch 5/12 60000/60000 [==============================] - 4s 67us/step - loss: 0.0471 - acc: 0.9859 - val_loss: 0.0309 - val_acc: 0.9901 Epoch 6/12 60000/60000 [==============================] - 4s 68us/step - loss: 0.0421 - acc: 0.9873 - val_loss: 0.0297 - val_acc: 0.9903 Epoch 7/12 60000/60000 [==============================] - 4s 67us/step - loss: 0.0377 - acc: 0.9884 - val_loss: 0.0259 - val_acc: 0.9908 Epoch 8/12 60000/60000 [==============================] - 4s 67us/step - loss: 0.0357 - acc: 0.9883 - val_loss: 0.0285 - val_acc: 0.9908 Epoch 9/12 60000/60000 [==============================] - 4s 68us/step - loss: 0.0315 - acc: 0.9904 - val_loss: 0.0327 - val_acc: 0.9901 Epoch 10/12 60000/60000 [==============================] - 4s 67us/step - loss: 0.0288 - acc: 0.9910 - val_loss: 0.0272 - val_acc: 0.9911 Epoch 11/12 60000/60000 [==============================] - 4s 67us/step - loss: 0.0282 - acc: 0.9912 - val_loss: 0.0248 - val_acc: 0.9920 Epoch 12/12 60000/60000 [==============================] - 4s 66us/step - loss: 0.0255 - acc: 0.9923 - val_loss: 0.0283 - val_acc: 0.9912 Test loss: 0.028254894825743667 Test accuracy: 0.9912 ``` You can use [cuda-smi](https://github.com/phvu/cuda-smi) to watch the GPU memory usages. In case the of the mnist example in keras, you should see the free memory drop down to maybe 2% and the fans spin up. Not quite sure what the grappler/clusters/utils.cc:127 warning is, however. ``` $ cuda-smi Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 10350 of 11264 MB (i.e. 91.9%) Free # when GPU $ cuda-smi Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 1181.1 of 11264 MB (i.e. 10.5%) Free ```  - 
        
Willian-Zhang revised this gist
May 11, 2018 . 2 changed files with 157 additions and 43 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -28,9 +28,9 @@ The rest steps are the same as normal GPU setup. #### Check and use pre-compiliation (Optional, Risky, Please Skip if you don't understand) If you are like me using MacBook Pro (15-inch, 2016) runing 10.13.4 (17E199) and eGPU: NVIDIA GeForce GTX 1080 Ti 11 GiB (or any 6.1 compatible version in [nvidia page](https://developer.nvidia.com/cuda-gpus)). You could, at your own risk, skip the `Prepare` and `Compile` steps below, [download .whl from here](https://github.com/Willian-Zhang/tensorflow-precompile/raw/r1.8/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl) and install it: ``` bash pip install tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl ``` And be sure to test after installation. @@ -39,8 +39,8 @@ But remember this is **not safe**. #### Install Homwbrew (Optional) For package management, ignore if you have your own `python`, `wget` or you want to download manually. ``` bash /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" brew install wget ``` #### NVIDIA Graphics driver @@ -63,14 +63,14 @@ Unarchive and rename `XCode.app` to `Xcode8.2.app` in case you want to build and #### Install Bazel If you have Homebrew installed ``` bash brew install bazel ``` or Download the binary [here](https://github.com/bazelbuild/bazel/releases/download/0.10.0/bazel-0.10.0-installer-darwin-x86_64.sh) ```bash chmod 755 bazel-0.10.0-installer-darwin-x86_64.sh ./bazel-0.10.0-installer-darwin-x86_64.sh ``` @@ -82,7 +82,15 @@ It should be something along the lines of cuda_9.1.128_mac.dmg #### Install NCCL Download `NCCL 2.1.15 O/S agnostic and CUDA 9` from [NVdia](https://developer.nvidia.com/nccl/nccl-download). Unarchive it and move to a permanant place e.g. `/usr/local/nccl`. ``` bash sudo mkdir -p /usr/local/nccl cd nccl_2.1.15-1+cuda9.1_x86_64 sudo mv * /usr/local/nccl sudo mkdir -p /usr/local/include/third_party/nccl sudo ln -s /usr/local/nccl/include/nccl.h /usr/local/include/third_party/nccl ``` #### Set up your env paths @@ -98,12 +106,13 @@ export PATH=$DYLD_LIBRARY_PATH:$PATH:/Developer/NVIDIA/CUDA-9.1/bin #### Compile Samples We want to compile some CUDA sample to check if the GPU is correctly recognized and supported. ``` bash cd /Developer/NVIDIA/CUDA-9.1/samples chown -R $(whoami) * make -C 1_Utilities/deviceQuery ./bin/x86_64/darwin/release/deviceQuery ``` ``` CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) @@ -153,10 +162,10 @@ Download [cuDNN 7.0.5](https://developer.nvidia.com/compute/machine-learning/cud Change into your download directory and follow the post installation steps. ``` bash tar -xzvf cudnn-9.1-osx-x64-v7-ga.tgz sudo cp cuda/include/cudnn.h /usr/local/cuda/include sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn* ``` @@ -171,39 +180,40 @@ $ which pip Or Download [get-pip](https://bootstrap.pypa.io/get-pip.py) and run it in python. More info [here](https://pip.pypa.io/en/stable/installing/) ``` bash python get-pip.py ``` pip will automatically install the tensorflow dependencies (wheel, six etc), if don't you could install them manually. ## Compile #### Clone TensorFlow from Repository ``` bash cd /tmp git clone https://github.com/tensorflow/tensorflow cd tensorflow git checkout v1.8.0 ``` #### Apply Patch Apply the following [patch](https://gist.github.com/Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b#file-xtensorflow18macos.patch) to fix a couple build issues: ``` bash wget https://gist.github.com/Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b/raw/xtensorflow18macos.patch git apply xtensorflow17macos.patch ``` #### Configure Build Except *CUDA support*, *CUDA SDK version* and *Cuda compute capabilities*, I left the other settings untouched. Pay attension to `Cuda compute capabilities`, you might want to find your own according to guide. ``` bash ./configure ``` ``` You have bazel 0.10.0 installed. Please specify the location of python. [Default is /usr/bin/python]: @@ -282,26 +292,27 @@ Configuration finished Takes about 47 minutes on my machine. ``` bash bazel clean bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package ``` #### Create wheel file and install it ``` bash bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg ls /tmp/tensorflow_pkg tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl ``` If you want to use virtualenv or something, now is the time. Or just: ``` bash pip install /tmp/tensorflow_pkg/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl ``` #### Backup your wheel if nothing goes wrong (Optional) Files in `/tmp` would be cleaned after reboot. ``` bash cp /tmp/tensorflow_pkg/*.whl ~/ ``` @@ -310,8 +321,10 @@ It's useful to leave the .whl file lying around in case you want to install it f #### Test Installation See if everything got linked correctly ``` bash cd ~ python ``` ``` python >>> import tensorflow as tf >>> tf.Session() 2018-04-08 03:25:15.740635: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero @@ -329,7 +342,7 @@ totalMemory: 11.00GiB freeMemory: 10.18GiB ##### Try out new Tensorflow feature (Optional) ``` bash python ``` ``` python import tensorflow as tf @@ -343,10 +356,12 @@ print("hello, {}".format(m)) # => "hello, [[4.]]" #### Test GPU Acceleration ```bash pip install keras wget https://gist.github.com/Willian-Zhang/290dceb96679c8f413e42491c92722b0/raw/mnist-cnn.py python mnist_cnn.py ``` ``` /usr/local/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters Using TensorFlow backend. This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,99 @@ diff --git a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc index 0f7adaf24a..934ccbada6 100644 --- a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc +++ b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc @@ -69,7 +69,7 @@ __global__ void concat_variable_kernel( IntType num_inputs = input_ptr_data.size; // verbose declaration needed due to template - extern __shared__ __align__(sizeof(T)) unsigned char smem[]; + extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char smem[]; IntType* smem_col_scan = reinterpret_cast<IntType*>(smem); if (useSmem) { diff --git a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc index 94989089ec..1d26d4bacb 100644 --- a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc +++ b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc @@ -172,7 +172,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNHWCSmall( const DepthwiseArgs args, const T* input, const T* filter, T* output) { assert(CanLaunchDepthwiseConv2dGPUSmall(args)); // Holds block plus halo and filter data for blockDim.x depths. - extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[]; + extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[]; T* const shared_data = reinterpret_cast<T*>(shared_memory); const int num_batches = args.batch; @@ -452,7 +452,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNCHWSmall( const DepthwiseArgs args, const T* input, const T* filter, T* output) { assert(CanLaunchDepthwiseConv2dGPUSmall(args)); // Holds block plus halo and filter data for blockDim.z depths. - extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[]; + extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[]; T* const shared_data = reinterpret_cast<T*>(shared_memory); const int num_batches = args.batch; @@ -1118,7 +1118,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNHWCSmall( const DepthwiseArgs args, const T* output, const T* input, T* filter) { assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.z)); // Holds block plus halo and filter data for blockDim.x depths. - extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[]; + extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[]; T* const shared_data = reinterpret_cast<T*>(shared_memory); const int num_batches = args.batch; @@ -1388,7 +1388,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNCHWSmall( const DepthwiseArgs args, const T* output, const T* input, T* filter) { assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.x)); // Holds block plus halo and filter data for blockDim.z depths. - extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[]; + extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[]; T* const shared_data = reinterpret_cast<T*>(shared_memory); const int num_batches = args.batch; diff --git a/tensorflow/core/kernels/split_lib_gpu.cu.cc b/tensorflow/core/kernels/split_lib_gpu.cu.cc index 393818730b..58a1294005 100644 --- a/tensorflow/core/kernels/split_lib_gpu.cu.cc +++ b/tensorflow/core/kernels/split_lib_gpu.cu.cc @@ -121,7 +121,7 @@ __global__ void split_v_kernel(const T* input_ptr, int num_outputs = output_ptr_data.size; // verbose declaration needed due to template - extern __shared__ __align__(sizeof(T)) unsigned char smem[]; + extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char smem[]; IntType* smem_col_scan = reinterpret_cast<IntType*>(smem); if (useSmem) { diff --git a/tensorflow/workspace.bzl b/tensorflow/workspace.bzl index 0ce5cda517..d4dc2235ac 100644 --- a/tensorflow/workspace.bzl +++ b/tensorflow/workspace.bzl @@ -361,11 +361,11 @@ def tf_workspace(path_prefix="", tf_repo_name=""): tf_http_archive( name = "protobuf_archive", urls = [ - "https://mirror.bazel.build/github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz", - "https://github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz", + "https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz", + "https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz", ], - sha256 = "846d907acf472ae233ec0882ef3a2d24edbbe834b80c305e867ac65a1f2c59e3", - strip_prefix = "protobuf-396336eb961b75f03b25824fe86cf6490fb75e3a", + sha256 = "eb16b33431b91fe8cee479575cee8de202f3626aaf00d9bf1783c6e62b4ffbc7", + strip_prefix = "protobuf-50f552646ba1de79e07562b41f3999fe036b4fd0", ) # We need to import the protobuf library under the names com_google_protobuf diff --git a/third_party/gpus/cuda/BUILD.tpl b/third_party/gpus/cuda/BUILD.tpl index 2a37c65bc7..43446dd99b 100644 --- a/third_party/gpus/cuda/BUILD.tpl +++ b/third_party/gpus/cuda/BUILD.tpl @@ -110,7 +110,7 @@ cc_library( ".", "cuda/include", ], - linkopts = ["-lgomp"], + #linkopts = ["-lgomp"], linkstatic = 1, visibility = ["//visibility:public"], )  - 
        
Willian-Zhang revised this gist
May 10, 2018 . 1 changed file with 5 additions and 5 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,4 +1,4 @@ # Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.4 for eGPU Largely based on the [Tensorflow 1.6 gist](https://gist.github.com/mattiasarro/1f3498a26ad111a8d99199eaf64551be), and [Tensorflow 1.7 gist for xcode](https://gist.github.com/pavelmalik/d51036d508c8753c86aed1f3ff1e6967), this should hopefully simplify things a bit. @@ -43,7 +43,6 @@ $ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/inst $ brew install wget ``` #### NVIDIA Graphics driver Download and install from http://www.nvidia.com/download/driverResults.aspx/130460/en-us @@ -61,7 +60,7 @@ Or Find `XCode 8.2` on https://developer.apple.com/download/more/ Unarchive and rename `XCode.app` to `Xcode8.2.app` in case you want to build and use it next time. #### Install Bazel If you have Homebrew installed ``` @@ -75,14 +74,15 @@ $ ./bazel-0.10.0-installer-darwin-x86_64.sh ``` #### Install CUDA Toolkit 9.1 [Download CUDA-9.1](https://developer.nvidia.com/cuda-downloads?target_os=MacOSX&target_arch=x86_64&target_version=1013&target_type=dmglocal) It should be something along the lines of cuda_9.1.128_mac.dmg #### Install NCCL Download `NCCL 2.1.15 O/S agnostic and CUDA 9` from [NVdia](https://developer.nvidia.com/nccl/nccl-download). Unarchive it and move correspondent file to `/usr/local/cuda`. #### Set up your env paths  - 
        
Willian-Zhang created this gist
May 10, 2018 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,411 @@ # Tensorflow 1.7 with CUDA on macOS High Sierra 10.13.4 for eGPU Largely based on the [Tensorflow 1.6 gist](https://gist.github.com/mattiasarro/1f3498a26ad111a8d99199eaf64551be), and [Tensorflow 1.7 gist for xcode](https://gist.github.com/pavelmalik/d51036d508c8753c86aed1f3ff1e6967), this should hopefully simplify things a bit. ## Requirements * NVIDIA Web-Drivers 387.10.10.10.30.106 for 10.13.4 (17E199) __(w/o Security Update)__ * CUDA-Drivers 387.128 * CUDA 9.1 Toolkit * cuDNN 7.0.5 __(latest for macOS)__ * NCCL 2.1.15 __(latest for macOS)__ * Python 2.7 * XCode 8.2 * bazel stable 0.13.0 __(latest on HomeBrew)__ * Tensorflow 1.8 Source Code ## eGPU Only #### Checkout eGPU setup before install (required for eGPU, ignore if other) If you don't know how to setup eGPU on Mac checkout [these step](https://egpu.io/forums/mac-setup/script-enable-egpu-on-tb1-2-macs-on-macos-10-13-4/paged/6/#post-33535). Make sure you have eGPU working before installation. (You sould see your specific graphic card name in Apple > About this Mac > System Report ... > Graphics/Displays) The rest steps are the same as normal GPU setup. ## Prepare #### Check and use pre-compiliation (Optional, Risky, Please Skip if you don't understand) If you are like me using MacBook Pro (15-inch, 2016) runing 10.13.4 (17E199) and eGPU: NVIDIA GeForce GTX 1080 Ti 11 GiB (or any 6.1 compatible version in [nvidia page](https://developer.nvidia.com/cuda-gpus)). You could, at your own risk, skip the `Prepare` and `Compile` steps below, [download .whl from here](https://github.com/Willian-Zhang/tensorflow-precompile/raw/r1.7/tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl) and install it: ``` bash $ pip install tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl ``` And be sure to test after installation. But remember this is **not safe**. #### Install Homwbrew (Optional) For package management, ignore if you have your own `python`, `wget` or you want to download manually. ``` bash $ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" $ brew install wget ``` #### NVIDIA Graphics driver Download and install from http://www.nvidia.com/download/driverResults.aspx/130460/en-us #### NVIDIA Cuda driver Download and install from http://www.nvidia.com/object/macosx-cuda-387.178-driver.html #### Install XCode 8.2 Download and from [XCode_8.2.xip](https://download.developer.apple.com/Developer_Tools/Xcode_8.2/Xcode_8.2.xip). Or Find `XCode 8.2` on https://developer.apple.com/download/more/ Unarchive and rename `XCode.app` to `Xcode8.2.app` in case you want to build and use it next time. #### Install Bazel 0.10 If you have Homebrew installed ``` $ brew install bazel ``` or Download the binary [here](https://github.com/bazelbuild/bazel/releases/download/0.10.0/bazel-0.10.0-installer-darwin-x86_64.sh) ``` $ chmod 755 bazel-0.10.0-installer-darwin-x86_64.sh $ ./bazel-0.10.0-installer-darwin-x86_64.sh ``` #### Install CUDA Toolkit 9.1 [Download CUDA-9.1](https://developer.nvidia.com/cuda-downloads?target_os=MacOSX&target_arch=x86_64&target_version=1013&target_type=dmglocal) It should be something along the lines of cuda_9.1.128_mac.dmg #### Set up your env paths Edit `~/.bash_profile` and add the following: ``` export CUDA_HOME=/usr/local/cuda export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH export PATH=$DYLD_LIBRARY_PATH:$PATH:/Developer/NVIDIA/CUDA-9.1/bin ``` #### Compile Samples We want to compile some CUDA sample to check if the GPU is correctly recognized and supported. ``` $ cd /Developer/NVIDIA/CUDA-9.1/samples $ chown -R $(whoami) * $ make -C 1_Utilities/deviceQuery $ ./bin/x86_64/darwin/release/deviceQuery CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 1080 Ti" CUDA Driver Version / Runtime Version 9.1 / 9.1 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 11264 MBytes (11810963456 bytes) (28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores GPU Max Clock rate: 1645 MHz (1.64 GHz) Memory Clock rate: 5505 Mhz Memory Bus Width: 352-bit L2 Cache Size: 2883584 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 196 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 1 Result = PASS ``` #### NVIDIA cuDNN - Deep Learning Primitives If not already done, register at [https://developer.nvidia.com/cudnn](https://developer.nvidia.com/cudnn) Download [cuDNN 7.0.5](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.1_20171129/cudnn-9.1-osx-x64-v7-ga) Change into your download directory and follow the post installation steps. ``` bash $ tar -xzvf cudnn-9.1-osx-x64-v7-ga.tgz $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include $ sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib $ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn* ``` #### Install pip for python 2.7 (Optional) Skip if you have your own idea of which python/pip to use: ``` bash $ which python /usr/local/bin/python $ which pip /usr/local/bin/pip ``` Or Download [get-pip](https://bootstrap.pypa.io/get-pip.py) and run it in python. More info [here](https://pip.pypa.io/en/stable/installing/) ``` python get-pip.py ``` pip will automatically install the tensorflow dependencies (wheel, six etc), if don't you could install them manually. ## Compile #### Clone TensorFlow from Repository ``` $ cd /tmp $ git clone https://github.com/tensorflow/tensorflow $ cd tensorflow $ git checkout v1.7.0 ``` #### Apply Patch Apply the following [patch](https://gist.github.com/Willian-Zhang/088e017774536880bd425178b46b8c17#file-xtensorflow17macos-patch) to fix a couple build issues: ``` $ wget https://gist.github.com/Willian-Zhang/088e017774536880bd425178b46b8c17/raw/xtensorflow17macos.patch $ git apply xtensorflow17macos.patch ``` #### Configure Build Except *CUDA support*, *CUDA SDK version* and *Cuda compute capabilities*, I left the other settings untouched. Pay attension to `Cuda compute capabilities`, you might want to find your own according to guide. ``` bash $ ./configure You have bazel 0.10.0 installed. Please specify the location of python. [Default is /usr/bin/python]: Found possible Python library paths: /Library/Python/2.7/site-packages Please input the desired Python library path to use. Default is [/Library/Python/2.7/site-packages] Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: No Google Cloud Platform support will be enabled for TensorFlow. Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: No Hadoop File System support will be enabled for TensorFlow. Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: No Amazon S3 File System support will be enabled for TensorFlow. Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]: No Apache Kafka Platform support will be enabled for TensorFlow. Do you wish to build TensorFlow with XLA JIT support? [y/N]: No XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with GDR support? [y/N]: No GDR support will be enabled for TensorFlow. Do you wish to build TensorFlow with VERBS support? [y/N]: No VERBS support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1 Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2] (type your own, check on https://developer.nvidia.com/cuda-gpus, mine is 6.1 for GTX 1080 Ti) Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. Configuration finished ``` #### Build Process Takes about 47 minutes on my machine. ``` bash $ bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package ``` #### Create wheel file and install it ``` bash $ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg $ ls ls /tmp/tensorflow_pkg tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl ``` If you want to use virtualenv or something, now is the time. Or just: ``` bash $ pip install /tmp/tensorflow_pkg/tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl ``` #### Backup your wheel if nothing goes wrong (Optional) Files in `/tmp` would be cleaned after reboot. ``` cp /tmp/tensorflow_pkg/*.whl ~/ ``` It's useful to leave the .whl file lying around in case you want to install it for another environment. #### Test Installation See if everything got linked correctly ``` bash $ cd ~ $ python >>> import tensorflow as tf >>> tf.Session() 2018-04-08 03:25:15.740635: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero 2018-04-08 03:25:15.741260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645 pciBusID: 0000:c4:00.0 totalMemory: 11.00GiB freeMemory: 10.18GiB 2018-04-08 03:25:15.741288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2018-04-08 03:25:16.157590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-08 03:25:16.157614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-04-08 03:25:16.157620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-04-08 03:25:16.157753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9849 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1) <tensorflow.python.client.session.Session object at 0x10968ef60> ``` ##### Try out new Tensorflow feature (Optional) ``` bash $ python ``` ``` python import tensorflow as tf tf.enable_eager_execution() tf.executing_eagerly() # => True x = [[2.]] m = tf.matmul(x, x) print("hello, {}".format(m)) # => "hello, [[4.]]" ``` #### Test GPU Acceleration ``` $ pip install keras $ wget https://gist.github.com/Willian-Zhang/290dceb96679c8f413e42491c92722b0/raw/mnist-cnn.py $ python mnist_cnn.py /usr/local/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters Using TensorFlow backend. x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples Train on 60000 samples, validate on 10000 samples Epoch 1/12 2018-04-08 03:29:00.155517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero 2018-04-08 03:29:00.155661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645 pciBusID: 0000:c4:00.0 totalMemory: 11.00GiB freeMemory: 10.11GiB 2018-04-08 03:29:00.155677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2018-04-08 03:29:00.562343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-08 03:29:00.562373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-04-08 03:29:00.562403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-04-08 03:29:00.562536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9781 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1) 2018-04-08 03:29:00.563022: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 9.55G (10256140800 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY 2018-04-08 03:29:00.868307: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered 2018-04-08 03:29:00.906005: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered 2018-04-08 03:29:00.973462: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered 59904/60000 [============================>.] - ETA: 0s - loss: 0.2624 - acc: 0.92022018-04-08 03:29:07.381067: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered 60000/60000 [==============================] - 8s 129us/step - loss: 0.2620 - acc: 0.9203 - val_loss: 0.0587 - val_acc: 0.9825 Epoch 2/12 60000/60000 [==============================] - 4s 66us/step - loss: 0.0891 - acc: 0.9733 - val_loss: 0.0437 - val_acc: 0.9850 Epoch 3/12 60000/60000 [==============================] - 4s 66us/step - loss: 0.0681 - acc: 0.9789 - val_loss: 0.0341 - val_acc: 0.9881 Epoch 4/12 60000/60000 [==============================] - 4s 67us/step - loss: 0.0569 - acc: 0.9829 - val_loss: 0.0398 - val_acc: 0.9859 Epoch 5/12 60000/60000 [==============================] - 4s 70us/step - loss: 0.0480 - acc: 0.9856 - val_loss: 0.0303 - val_acc: 0.9898 Epoch 6/12 60000/60000 [==============================] - 4s 66us/step - loss: 0.0438 - acc: 0.9869 - val_loss: 0.0288 - val_acc: 0.9897 Epoch 7/12 60000/60000 [==============================] - 4s 66us/step - loss: 0.0379 - acc: 0.9881 - val_loss: 0.0287 - val_acc: 0.9905 Epoch 8/12 60000/60000 [==============================] - 4s 66us/step - loss: 0.0357 - acc: 0.9892 - val_loss: 0.0277 - val_acc: 0.9915 Epoch 9/12 60000/60000 [==============================] - 4s 65us/step - loss: 0.0329 - acc: 0.9898 - val_loss: 0.0268 - val_acc: 0.9906 Epoch 10/12 60000/60000 [==============================] - 4s 66us/step - loss: 0.0312 - acc: 0.9903 - val_loss: 0.0295 - val_acc: 0.9911 Epoch 11/12 60000/60000 [==============================] - 4s 66us/step - loss: 0.0281 - acc: 0.9908 - val_loss: 0.0292 - val_acc: 0.9908 Epoch 12/12 60000/60000 [==============================] - 4s 65us/step - loss: 0.0277 - acc: 0.9917 - val_loss: 0.0260 - val_acc: 0.9919 Test loss: 0.02598250026818114 Test accuracy: 0.9919 ``` You can use [cuda-smi](https://github.com/phvu/cuda-smi) to watch the GPU memory usages. In case the of the mnist example in keras, you should see the free memory drop down to maybe 2% and the fans spin up. Not quite sure what the grappler/clusters/utils.cc:127 warning is, however. ``` $ ./cuda-smi.dms Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 10350 of 11264 MB (i.e. 91.9%) Free # when GPU $ ./cuda-smi.dms Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 1181.1 of 11264 MB (i.e. 10.5%) Free ``` Tested on a MacBook Pro (15-inch, 2016) 10.13.4 (17E199) 2.7 GHz Intel Core i7 and NVIDIA GeForce GTX 1080 Ti 11 GiB