Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save datatalking/15320ca9a3ed27dd438b0ddfc96fc60e to your computer and use it in GitHub Desktop.
Save datatalking/15320ca9a3ed27dd438b0ddfc96fc60e to your computer and use it in GitHub Desktop.

Revisions

  1. @Willian-Zhang Willian-Zhang revised this gist May 11, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion tensorflow_1_8_high_sierra_gpu.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    # Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.4 for eGPU
    # Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.4

    Largely based on the [Tensorflow 1.6 gist](https://gist.github.com/mattiasarro/1f3498a26ad111a8d99199eaf64551be),
    and [Tensorflow 1.7 gist for xcode](https://gist.github.com/pavelmalik/d51036d508c8753c86aed1f3ff1e6967)
  2. @Willian-Zhang Willian-Zhang revised this gist May 11, 2018. 1 changed file with 33 additions and 33 deletions.
    66 changes: 33 additions & 33 deletions tensorflow_1_8_high_sierra_gpu.md
    Original file line number Diff line number Diff line change
    @@ -1,7 +1,8 @@
    # Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.4 for eGPU

    Largely based on the [Tensorflow 1.6 gist](https://gist.github.com/mattiasarro/1f3498a26ad111a8d99199eaf64551be),
    and [Tensorflow 1.7 gist for xcode](https://gist.github.com/pavelmalik/d51036d508c8753c86aed1f3ff1e6967), this should hopefully simplify things a bit.
    and [Tensorflow 1.7 gist for xcode](https://gist.github.com/pavelmalik/d51036d508c8753c86aed1f3ff1e6967)
    and [Tensorflow 1.7 gist for eGPU](https://gist.github.com/Willian-Zhang/088e017774536880bd425178b46b8c17), this should hopefully simplify things a bit.

    ## Requirements

    @@ -196,11 +197,11 @@ git checkout v1.8.0
    ```

    #### Apply Patch
    Apply the following [patch](https://gist.github.com/Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b#file-xtensorflow18macos.patch) to fix a couple build issues:
    Apply the following [patch](https://gist.github.com/Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b#file-xtensorflow18macos-patch) to fix a couple build issues:

    ``` bash
    wget https://gist.github.com/Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b/raw/xtensorflow18macos.patch
    git apply xtensorflow17macos.patch
    git apply xtensorflow18macos.patch
    ```


    @@ -362,64 +363,63 @@ wget https://gist.github.com/Willian-Zhang/290dceb96679c8f413e42491c9
    python mnist_cnn.py
    ```
    ```
    /usr/local/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
    /usr/local/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
    from ._conv import register_converters as _register_converters
    Using TensorFlow backend.
    x_train shape: (60000, 28, 28, 1)
    60000 train samples
    10000 test samples
    Train on 60000 samples, validate on 10000 samples
    Epoch 1/12
    2018-04-08 03:29:00.155517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
    2018-04-08 03:29:00.155661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
    2018-05-11 04:51:10.335377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
    2018-05-11 04:51:10.336052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
    pciBusID: 0000:c4:00.0
    totalMemory: 11.00GiB freeMemory: 10.11GiB
    2018-04-08 03:29:00.155677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
    2018-04-08 03:29:00.562343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-04-08 03:29:00.562373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
    2018-04-08 03:29:00.562403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
    2018-04-08 03:29:00.562536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9781 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1)
    2018-04-08 03:29:00.563022: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 9.55G (10256140800 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-04-08 03:29:00.868307: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
    2018-04-08 03:29:00.906005: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
    2018-04-08 03:29:00.973462: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
    59904/60000 [============================>.] - ETA: 0s - loss: 0.2624 - acc: 0.92022018-04-08 03:29:07.381067: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
    60000/60000 [==============================] - 8s 129us/step - loss: 0.2620 - acc: 0.9203 - val_loss: 0.0587 - val_acc: 0.9825
    totalMemory: 11.00GiB freeMemory: 9.37GiB
    2018-05-11 04:51:10.336075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
    2018-05-11 04:51:11.063831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-05-11 04:51:11.063856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
    2018-05-11 04:51:11.063864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
    2018-05-11 04:51:11.064768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9065 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1)
    2018-05-11 04:51:11.534095: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
    2018-05-11 04:51:11.579370: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
    2018-05-11 04:51:11.644835: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
    59264/60000 [============================>.] - ETA: 0s - loss: 0.2604 - acc: 0.92082018-05-11 04:51:19.228205: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
    60000/60000 [==============================] - 10s 159us/step - loss: 0.2588 - acc: 0.9213 - val_loss: 0.0561 - val_acc: 0.9829
    Epoch 2/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0891 - acc: 0.9733 - val_loss: 0.0437 - val_acc: 0.9850
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0875 - acc: 0.9742 - val_loss: 0.0427 - val_acc: 0.9857
    Epoch 3/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0681 - acc: 0.9789 - val_loss: 0.0341 - val_acc: 0.9881
    60000/60000 [==============================] - 4s 67us/step - loss: 0.0662 - acc: 0.9803 - val_loss: 0.0356 - val_acc: 0.9875
    Epoch 4/12
    60000/60000 [==============================] - 4s 67us/step - loss: 0.0569 - acc: 0.9829 - val_loss: 0.0398 - val_acc: 0.9859
    60000/60000 [==============================] - 4s 67us/step - loss: 0.0549 - acc: 0.9839 - val_loss: 0.0325 - val_acc: 0.9896
    Epoch 5/12
    60000/60000 [==============================] - 4s 70us/step - loss: 0.0480 - acc: 0.9856 - val_loss: 0.0303 - val_acc: 0.9898
    60000/60000 [==============================] - 4s 67us/step - loss: 0.0471 - acc: 0.9859 - val_loss: 0.0309 - val_acc: 0.9901
    Epoch 6/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0438 - acc: 0.9869 - val_loss: 0.0288 - val_acc: 0.9897
    60000/60000 [==============================] - 4s 68us/step - loss: 0.0421 - acc: 0.9873 - val_loss: 0.0297 - val_acc: 0.9903
    Epoch 7/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0379 - acc: 0.9881 - val_loss: 0.0287 - val_acc: 0.9905
    60000/60000 [==============================] - 4s 67us/step - loss: 0.0377 - acc: 0.9884 - val_loss: 0.0259 - val_acc: 0.9908
    Epoch 8/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0357 - acc: 0.9892 - val_loss: 0.0277 - val_acc: 0.9915
    60000/60000 [==============================] - 4s 67us/step - loss: 0.0357 - acc: 0.9883 - val_loss: 0.0285 - val_acc: 0.9908
    Epoch 9/12
    60000/60000 [==============================] - 4s 65us/step - loss: 0.0329 - acc: 0.9898 - val_loss: 0.0268 - val_acc: 0.9906
    60000/60000 [==============================] - 4s 68us/step - loss: 0.0315 - acc: 0.9904 - val_loss: 0.0327 - val_acc: 0.9901
    Epoch 10/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0312 - acc: 0.9903 - val_loss: 0.0295 - val_acc: 0.9911
    60000/60000 [==============================] - 4s 67us/step - loss: 0.0288 - acc: 0.9910 - val_loss: 0.0272 - val_acc: 0.9911
    Epoch 11/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0281 - acc: 0.9908 - val_loss: 0.0292 - val_acc: 0.9908
    60000/60000 [==============================] - 4s 67us/step - loss: 0.0282 - acc: 0.9912 - val_loss: 0.0248 - val_acc: 0.9920
    Epoch 12/12
    60000/60000 [==============================] - 4s 65us/step - loss: 0.0277 - acc: 0.9917 - val_loss: 0.0260 - val_acc: 0.9919
    Test loss: 0.02598250026818114
    Test accuracy: 0.9919
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0255 - acc: 0.9923 - val_loss: 0.0283 - val_acc: 0.9912
    Test loss: 0.028254894825743667
    Test accuracy: 0.9912
    ```


    You can use [cuda-smi](https://github.com/phvu/cuda-smi) to watch the GPU memory usages. In case the of the mnist example in keras, you should see the free memory drop down to maybe 2% and the fans spin up. Not quite sure what the grappler/clusters/utils.cc:127 warning is, however.

    ```
    $ ./cuda-smi.dms
    $ cuda-smi
    Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 10350 of 11264 MB (i.e. 91.9%) Free
    # when GPU
    $ ./cuda-smi.dms
    $ cuda-smi
    Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 1181.1 of 11264 MB (i.e. 10.5%) Free
    ```

  3. @Willian-Zhang Willian-Zhang revised this gist May 11, 2018. 2 changed files with 157 additions and 43 deletions.
    101 changes: 58 additions & 43 deletions tensorflow_1_8_high_sierra_gpu.md
    Original file line number Diff line number Diff line change
    @@ -28,9 +28,9 @@ The rest steps are the same as normal GPU setup.
    #### Check and use pre-compiliation (Optional, Risky, Please Skip if you don't understand)
    If you are like me using MacBook Pro (15-inch, 2016) runing 10.13.4 (17E199) and eGPU: NVIDIA GeForce GTX 1080 Ti 11 GiB (or any 6.1 compatible version in [nvidia page](https://developer.nvidia.com/cuda-gpus)).
    You could, at your own risk, skip the `Prepare` and `Compile` steps below,
    [download .whl from here](https://github.com/Willian-Zhang/tensorflow-precompile/raw/r1.7/tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl) and install it:
    [download .whl from here](https://github.com/Willian-Zhang/tensorflow-precompile/raw/r1.8/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl) and install it:
    ``` bash
    $ pip install tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl
    pip install tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl
    ```

    And be sure to test after installation.
    @@ -39,8 +39,8 @@ But remember this is **not safe**.
    #### Install Homwbrew (Optional)
    For package management, ignore if you have your own `python`, `wget` or you want to download manually.
    ``` bash
    $ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    $ brew install wget
    /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    brew install wget
    ```

    #### NVIDIA Graphics driver
    @@ -63,14 +63,14 @@ Unarchive and rename `XCode.app` to `Xcode8.2.app` in case you want to build and
    #### Install Bazel

    If you have Homebrew installed
    ```
    $ brew install bazel
    ``` bash
    brew install bazel
    ```
    or Download the binary [here](https://github.com/bazelbuild/bazel/releases/download/0.10.0/bazel-0.10.0-installer-darwin-x86_64.sh)

    ```
    $ chmod 755 bazel-0.10.0-installer-darwin-x86_64.sh
    $ ./bazel-0.10.0-installer-darwin-x86_64.sh
    ```bash
    chmod 755 bazel-0.10.0-installer-darwin-x86_64.sh
    ./bazel-0.10.0-installer-darwin-x86_64.sh
    ```


    @@ -82,7 +82,15 @@ It should be something along the lines of cuda_9.1.128_mac.dmg
    #### Install NCCL
    Download `NCCL 2.1.15 O/S agnostic and CUDA 9` from [NVdia](https://developer.nvidia.com/nccl/nccl-download).

    Unarchive it and move correspondent file to `/usr/local/cuda`.
    Unarchive it and move to a permanant place e.g. `/usr/local/nccl`.
    ``` bash
    sudo mkdir -p /usr/local/nccl
    cd nccl_2.1.15-1+cuda9.1_x86_64
    sudo mv * /usr/local/nccl
    sudo mkdir -p /usr/local/include/third_party/nccl
    sudo ln -s /usr/local/nccl/include/nccl.h /usr/local/include/third_party/nccl
    ```


    #### Set up your env paths

    @@ -98,12 +106,13 @@ export PATH=$DYLD_LIBRARY_PATH:$PATH:/Developer/NVIDIA/CUDA-9.1/bin

    #### Compile Samples
    We want to compile some CUDA sample to check if the GPU is correctly recognized and supported.
    ``` bash
    cd /Developer/NVIDIA/CUDA-9.1/samples
    chown -R $(whoami) *
    make -C 1_Utilities/deviceQuery
    ./bin/x86_64/darwin/release/deviceQuery
    ```
    ```
    $ cd /Developer/NVIDIA/CUDA-9.1/samples
    $ chown -R $(whoami) *
    $ make -C 1_Utilities/deviceQuery
    $ ./bin/x86_64/darwin/release/deviceQuery
    CUDA Device Query (Runtime API) version (CUDART static linking)
    Detected 1 CUDA Capable device(s)
    @@ -153,10 +162,10 @@ Download [cuDNN 7.0.5](https://developer.nvidia.com/compute/machine-learning/cud

    Change into your download directory and follow the post installation steps.
    ``` bash
    $ tar -xzvf cudnn-9.1-osx-x64-v7-ga.tgz
    $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
    $ sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
    $ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*
    tar -xzvf cudnn-9.1-osx-x64-v7-ga.tgz
    sudo cp cuda/include/cudnn.h /usr/local/cuda/include
    sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
    sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*
    ```


    @@ -171,39 +180,40 @@ $ which pip

    Or
    Download [get-pip](https://bootstrap.pypa.io/get-pip.py) and run it in python. More info [here](https://pip.pypa.io/en/stable/installing/)
    ```
    ``` bash
    python get-pip.py
    ```
    pip will automatically install the tensorflow dependencies (wheel, six etc), if don't you could install them manually.


    ## Compile
    #### Clone TensorFlow from Repository
    ```
    $ cd /tmp
    $ git clone https://github.com/tensorflow/tensorflow
    $ cd tensorflow
    $ git checkout v1.7.0
    ``` bash
    cd /tmp
    git clone https://github.com/tensorflow/tensorflow
    cd tensorflow
    git checkout v1.8.0
    ```

    #### Apply Patch
    Apply the following [patch](https://gist.github.com/Willian-Zhang/088e017774536880bd425178b46b8c17#file-xtensorflow17macos-patch) to fix a couple build issues:
    Apply the following [patch](https://gist.github.com/Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b#file-xtensorflow18macos.patch) to fix a couple build issues:

    ```
    $ wget https://gist.github.com/Willian-Zhang/088e017774536880bd425178b46b8c17/raw/xtensorflow17macos.patch
    $ git apply xtensorflow17macos.patch
    ``` bash
    wget https://gist.github.com/Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b/raw/xtensorflow18macos.patch
    git apply xtensorflow17macos.patch
    ```



    #### Configure Build
    Except *CUDA support*, *CUDA SDK version* and *Cuda compute capabilities*, I left the other settings untouched.

    Pay attension to `Cuda compute capabilities`, you might want to find your own according to guide.


    ``` bash
    $ ./configure
    ./configure
    ```
    ```
    You have bazel 0.10.0 installed.
    Please specify the location of python. [Default is /usr/bin/python]:
    @@ -282,26 +292,27 @@ Configuration finished
    Takes about 47 minutes on my machine.

    ``` bash
    $ bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package
    bazel clean
    bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package
    ```

    #### Create wheel file and install it

    ``` bash
    $ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
    $ ls ls /tmp/tensorflow_pkg
    tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl
    bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
    ls /tmp/tensorflow_pkg
    tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl
    ```

    If you want to use virtualenv or something, now is the time. Or just:
    ``` bash
    $ pip install /tmp/tensorflow_pkg/tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl
    pip install /tmp/tensorflow_pkg/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl
    ```

    #### Backup your wheel if nothing goes wrong (Optional)

    Files in `/tmp` would be cleaned after reboot.
    ```
    ``` bash
    cp /tmp/tensorflow_pkg/*.whl ~/
    ```

    @@ -310,8 +321,10 @@ It's useful to leave the .whl file lying around in case you want to install it f
    #### Test Installation
    See if everything got linked correctly
    ``` bash
    $ cd ~
    $ python
    cd ~
    python
    ```
    ``` python
    >>> import tensorflow as tf
    >>> tf.Session()
    2018-04-08 03:25:15.740635: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
    @@ -329,7 +342,7 @@ totalMemory: 11.00GiB freeMemory: 10.18GiB

    ##### Try out new Tensorflow feature (Optional)
    ``` bash
    $ python
    python
    ```
    ``` python
    import tensorflow as tf
    @@ -343,10 +356,12 @@ print("hello, {}".format(m)) # => "hello, [[4.]]"

    #### Test GPU Acceleration

    ```bash
    pip install keras
    wget https://gist.github.com/Willian-Zhang/290dceb96679c8f413e42491c92722b0/raw/mnist-cnn.py
    python mnist_cnn.py
    ```
    ```
    $ pip install keras
    $ wget https://gist.github.com/Willian-Zhang/290dceb96679c8f413e42491c92722b0/raw/mnist-cnn.py
    $ python mnist_cnn.py
    /usr/local/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
    from ._conv import register_converters as _register_converters
    Using TensorFlow backend.
    99 changes: 99 additions & 0 deletions xtensorflow18macos.patch
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,99 @@
    diff --git a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
    index 0f7adaf24a..934ccbada6 100644
    --- a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
    +++ b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
    @@ -69,7 +69,7 @@ __global__ void concat_variable_kernel(
    IntType num_inputs = input_ptr_data.size;

    // verbose declaration needed due to template
    - extern __shared__ __align__(sizeof(T)) unsigned char smem[];
    + extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char smem[];
    IntType* smem_col_scan = reinterpret_cast<IntType*>(smem);

    if (useSmem) {
    diff --git a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
    index 94989089ec..1d26d4bacb 100644
    --- a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
    +++ b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
    @@ -172,7 +172,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNHWCSmall(
    const DepthwiseArgs args, const T* input, const T* filter, T* output) {
    assert(CanLaunchDepthwiseConv2dGPUSmall(args));
    // Holds block plus halo and filter data for blockDim.x depths.
    - extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
    + extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
    T* const shared_data = reinterpret_cast<T*>(shared_memory);

    const int num_batches = args.batch;
    @@ -452,7 +452,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNCHWSmall(
    const DepthwiseArgs args, const T* input, const T* filter, T* output) {
    assert(CanLaunchDepthwiseConv2dGPUSmall(args));
    // Holds block plus halo and filter data for blockDim.z depths.
    - extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
    + extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
    T* const shared_data = reinterpret_cast<T*>(shared_memory);

    const int num_batches = args.batch;
    @@ -1118,7 +1118,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNHWCSmall(
    const DepthwiseArgs args, const T* output, const T* input, T* filter) {
    assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.z));
    // Holds block plus halo and filter data for blockDim.x depths.
    - extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
    + extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
    T* const shared_data = reinterpret_cast<T*>(shared_memory);

    const int num_batches = args.batch;
    @@ -1388,7 +1388,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNCHWSmall(
    const DepthwiseArgs args, const T* output, const T* input, T* filter) {
    assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.x));
    // Holds block plus halo and filter data for blockDim.z depths.
    - extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
    + extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
    T* const shared_data = reinterpret_cast<T*>(shared_memory);

    const int num_batches = args.batch;
    diff --git a/tensorflow/core/kernels/split_lib_gpu.cu.cc b/tensorflow/core/kernels/split_lib_gpu.cu.cc
    index 393818730b..58a1294005 100644
    --- a/tensorflow/core/kernels/split_lib_gpu.cu.cc
    +++ b/tensorflow/core/kernels/split_lib_gpu.cu.cc
    @@ -121,7 +121,7 @@ __global__ void split_v_kernel(const T* input_ptr,
    int num_outputs = output_ptr_data.size;

    // verbose declaration needed due to template
    - extern __shared__ __align__(sizeof(T)) unsigned char smem[];
    + extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char smem[];
    IntType* smem_col_scan = reinterpret_cast<IntType*>(smem);

    if (useSmem) {
    diff --git a/tensorflow/workspace.bzl b/tensorflow/workspace.bzl
    index 0ce5cda517..d4dc2235ac 100644
    --- a/tensorflow/workspace.bzl
    +++ b/tensorflow/workspace.bzl
    @@ -361,11 +361,11 @@ def tf_workspace(path_prefix="", tf_repo_name=""):
    tf_http_archive(
    name = "protobuf_archive",
    urls = [
    - "https://mirror.bazel.build/github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
    - "https://github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
    + "https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz",
    + "https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz",
    ],
    - sha256 = "846d907acf472ae233ec0882ef3a2d24edbbe834b80c305e867ac65a1f2c59e3",
    - strip_prefix = "protobuf-396336eb961b75f03b25824fe86cf6490fb75e3a",
    + sha256 = "eb16b33431b91fe8cee479575cee8de202f3626aaf00d9bf1783c6e62b4ffbc7",
    + strip_prefix = "protobuf-50f552646ba1de79e07562b41f3999fe036b4fd0",
    )

    # We need to import the protobuf library under the names com_google_protobuf
    diff --git a/third_party/gpus/cuda/BUILD.tpl b/third_party/gpus/cuda/BUILD.tpl
    index 2a37c65bc7..43446dd99b 100644
    --- a/third_party/gpus/cuda/BUILD.tpl
    +++ b/third_party/gpus/cuda/BUILD.tpl
    @@ -110,7 +110,7 @@ cc_library(
    ".",
    "cuda/include",
    ],
    - linkopts = ["-lgomp"],
    + #linkopts = ["-lgomp"],
    linkstatic = 1,
    visibility = ["//visibility:public"],
    )
  4. @Willian-Zhang Willian-Zhang revised this gist May 10, 2018. 1 changed file with 5 additions and 5 deletions.
    10 changes: 5 additions & 5 deletions tensorflow_1_8_high_sierra_gpu.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    # Tensorflow 1.7 with CUDA on macOS High Sierra 10.13.4 for eGPU
    # Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.4 for eGPU

    Largely based on the [Tensorflow 1.6 gist](https://gist.github.com/mattiasarro/1f3498a26ad111a8d99199eaf64551be),
    and [Tensorflow 1.7 gist for xcode](https://gist.github.com/pavelmalik/d51036d508c8753c86aed1f3ff1e6967), this should hopefully simplify things a bit.
    @@ -43,7 +43,6 @@ $ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/inst
    $ brew install wget
    ```


    #### NVIDIA Graphics driver

    Download and install from http://www.nvidia.com/download/driverResults.aspx/130460/en-us
    @@ -61,7 +60,7 @@ Or Find `XCode 8.2` on https://developer.apple.com/download/more/
    Unarchive and rename `XCode.app` to `Xcode8.2.app` in case you want to build and use it next time.


    #### Install Bazel 0.10
    #### Install Bazel

    If you have Homebrew installed
    ```
    @@ -75,14 +74,15 @@ $ ./bazel-0.10.0-installer-darwin-x86_64.sh
    ```




    #### Install CUDA Toolkit 9.1
    [Download CUDA-9.1](https://developer.nvidia.com/cuda-downloads?target_os=MacOSX&target_arch=x86_64&target_version=1013&target_type=dmglocal)

    It should be something along the lines of cuda_9.1.128_mac.dmg

    #### Install NCCL
    Download `NCCL 2.1.15 O/S agnostic and CUDA 9` from [NVdia](https://developer.nvidia.com/nccl/nccl-download).

    Unarchive it and move correspondent file to `/usr/local/cuda`.

    #### Set up your env paths

  5. @Willian-Zhang Willian-Zhang created this gist May 10, 2018.
    411 changes: 411 additions & 0 deletions tensorflow_1_8_high_sierra_gpu.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,411 @@
    # Tensorflow 1.7 with CUDA on macOS High Sierra 10.13.4 for eGPU

    Largely based on the [Tensorflow 1.6 gist](https://gist.github.com/mattiasarro/1f3498a26ad111a8d99199eaf64551be),
    and [Tensorflow 1.7 gist for xcode](https://gist.github.com/pavelmalik/d51036d508c8753c86aed1f3ff1e6967), this should hopefully simplify things a bit.

    ## Requirements

    * NVIDIA Web-Drivers 387.10.10.10.30.106 for 10.13.4 (17E199) __(w/o Security Update)__
    * CUDA-Drivers 387.128
    * CUDA 9.1 Toolkit
    * cuDNN 7.0.5 __(latest for macOS)__
    * NCCL 2.1.15 __(latest for macOS)__
    * Python 2.7
    * XCode 8.2
    * bazel stable 0.13.0 __(latest on HomeBrew)__
    * Tensorflow 1.8 Source Code


    ## eGPU Only
    #### Checkout eGPU setup before install (required for eGPU, ignore if other)
    If you don't know how to setup eGPU on Mac checkout [these step](https://egpu.io/forums/mac-setup/script-enable-egpu-on-tb1-2-macs-on-macos-10-13-4/paged/6/#post-33535).
    Make sure you have eGPU working before installation.
    (You sould see your specific graphic card name in Apple > About this Mac > System Report ... > Graphics/Displays)

    The rest steps are the same as normal GPU setup.

    ## Prepare
    #### Check and use pre-compiliation (Optional, Risky, Please Skip if you don't understand)
    If you are like me using MacBook Pro (15-inch, 2016) runing 10.13.4 (17E199) and eGPU: NVIDIA GeForce GTX 1080 Ti 11 GiB (or any 6.1 compatible version in [nvidia page](https://developer.nvidia.com/cuda-gpus)).
    You could, at your own risk, skip the `Prepare` and `Compile` steps below,
    [download .whl from here](https://github.com/Willian-Zhang/tensorflow-precompile/raw/r1.7/tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl) and install it:
    ``` bash
    $ pip install tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl
    ```

    And be sure to test after installation.
    But remember this is **not safe**.

    #### Install Homwbrew (Optional)
    For package management, ignore if you have your own `python`, `wget` or you want to download manually.
    ``` bash
    $ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    $ brew install wget
    ```


    #### NVIDIA Graphics driver

    Download and install from http://www.nvidia.com/download/driverResults.aspx/130460/en-us

    #### NVIDIA Cuda driver

    Download and install from http://www.nvidia.com/object/macosx-cuda-387.178-driver.html


    #### Install XCode 8.2

    Download and from [XCode_8.2.xip](https://download.developer.apple.com/Developer_Tools/Xcode_8.2/Xcode_8.2.xip).
    Or Find `XCode 8.2` on https://developer.apple.com/download/more/

    Unarchive and rename `XCode.app` to `Xcode8.2.app` in case you want to build and use it next time.


    #### Install Bazel 0.10

    If you have Homebrew installed
    ```
    $ brew install bazel
    ```
    or Download the binary [here](https://github.com/bazelbuild/bazel/releases/download/0.10.0/bazel-0.10.0-installer-darwin-x86_64.sh)

    ```
    $ chmod 755 bazel-0.10.0-installer-darwin-x86_64.sh
    $ ./bazel-0.10.0-installer-darwin-x86_64.sh
    ```




    #### Install CUDA Toolkit 9.1
    [Download CUDA-9.1](https://developer.nvidia.com/cuda-downloads?target_os=MacOSX&target_arch=x86_64&target_version=1013&target_type=dmglocal)

    It should be something along the lines of cuda_9.1.128_mac.dmg



    #### Set up your env paths

    Edit `~/.bash_profile` and add the following:

    ```
    export CUDA_HOME=/usr/local/cuda
    export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib
    export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH
    export PATH=$DYLD_LIBRARY_PATH:$PATH:/Developer/NVIDIA/CUDA-9.1/bin
    ```


    #### Compile Samples
    We want to compile some CUDA sample to check if the GPU is correctly recognized and supported.
    ```
    $ cd /Developer/NVIDIA/CUDA-9.1/samples
    $ chown -R $(whoami) *
    $ make -C 1_Utilities/deviceQuery
    $ ./bin/x86_64/darwin/release/deviceQuery
    CUDA Device Query (Runtime API) version (CUDART static linking)
    Detected 1 CUDA Capable device(s)
    Device 0: "GeForce GTX 1080 Ti"
    CUDA Driver Version / Runtime Version 9.1 / 9.1
    CUDA Capability Major/Minor version number: 6.1
    Total amount of global memory: 11264 MBytes (11810963456 bytes)
    (28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores
    GPU Max Clock rate: 1645 MHz (1.64 GHz)
    Memory Clock rate: 5505 Mhz
    Memory Bus Width: 352-bit
    L2 Cache Size: 2883584 bytes
    Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
    Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
    Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
    Total amount of constant memory: 65536 bytes
    Total amount of shared memory per block: 49152 bytes
    Total number of registers available per block: 65536
    Warp size: 32
    Maximum number of threads per multiprocessor: 2048
    Maximum number of threads per block: 1024
    Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
    Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
    Maximum memory pitch: 2147483647 bytes
    Texture alignment: 512 bytes
    Concurrent copy and kernel execution: Yes with 2 copy engine(s)
    Run time limit on kernels: Yes
    Integrated GPU sharing Host Memory: No
    Support host page-locked memory mapping: Yes
    Alignment requirement for Surfaces: Yes
    Device has ECC support: Disabled
    Device supports Unified Addressing (UVA): Yes
    Supports Cooperative Kernel Launch: Yes
    Supports MultiDevice Co-op Kernel Launch: No
    Device PCI Domain ID / Bus ID / location ID: 0 / 196 / 0
    Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 1
    Result = PASS
    ```

    #### NVIDIA cuDNN - Deep Learning Primitives
    If not already done, register at [https://developer.nvidia.com/cudnn](https://developer.nvidia.com/cudnn)
    Download [cuDNN 7.0.5](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.1_20171129/cudnn-9.1-osx-x64-v7-ga)

    Change into your download directory and follow the post installation steps.
    ``` bash
    $ tar -xzvf cudnn-9.1-osx-x64-v7-ga.tgz
    $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
    $ sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
    $ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*
    ```


    #### Install pip for python 2.7 (Optional)
    Skip if you have your own idea of which python/pip to use:
    ``` bash
    $ which python
    /usr/local/bin/python
    $ which pip
    /usr/local/bin/pip
    ```

    Or
    Download [get-pip](https://bootstrap.pypa.io/get-pip.py) and run it in python. More info [here](https://pip.pypa.io/en/stable/installing/)
    ```
    python get-pip.py
    ```
    pip will automatically install the tensorflow dependencies (wheel, six etc), if don't you could install them manually.


    ## Compile
    #### Clone TensorFlow from Repository
    ```
    $ cd /tmp
    $ git clone https://github.com/tensorflow/tensorflow
    $ cd tensorflow
    $ git checkout v1.7.0
    ```

    #### Apply Patch
    Apply the following [patch](https://gist.github.com/Willian-Zhang/088e017774536880bd425178b46b8c17#file-xtensorflow17macos-patch) to fix a couple build issues:

    ```
    $ wget https://gist.github.com/Willian-Zhang/088e017774536880bd425178b46b8c17/raw/xtensorflow17macos.patch
    $ git apply xtensorflow17macos.patch
    ```



    #### Configure Build
    Except *CUDA support*, *CUDA SDK version* and *Cuda compute capabilities*, I left the other settings untouched.

    Pay attension to `Cuda compute capabilities`, you might want to find your own according to guide.


    ``` bash
    $ ./configure
    You have bazel 0.10.0 installed.
    Please specify the location of python. [Default is /usr/bin/python]:


    Found possible Python library paths:
    /Library/Python/2.7/site-packages
    Please input the desired Python library path to use. Default is [/Library/Python/2.7/site-packages]

    Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]:
    No Google Cloud Platform support will be enabled for TensorFlow.

    Do you wish to build TensorFlow with Hadoop File System support? [Y/n]:
    No Hadoop File System support will be enabled for TensorFlow.

    Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]:
    No Amazon S3 File System support will be enabled for TensorFlow.

    Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]:
    No Apache Kafka Platform support will be enabled for TensorFlow.

    Do you wish to build TensorFlow with XLA JIT support? [y/N]:
    No XLA JIT support will be enabled for TensorFlow.

    Do you wish to build TensorFlow with GDR support? [y/N]:
    No GDR support will be enabled for TensorFlow.

    Do you wish to build TensorFlow with VERBS support? [y/N]:
    No VERBS support will be enabled for TensorFlow.

    Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
    No OpenCL SYCL support will be enabled for TensorFlow.

    Do you wish to build TensorFlow with CUDA support? [y/N]: y
    CUDA support will be enabled for TensorFlow.

    Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1


    Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


    Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]:


    Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


    Please specify a list of comma-separated Cuda compute capabilities you want to build with.
    You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
    Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2] (type your own, check on https://developer.nvidia.com/cuda-gpus, mine is 6.1 for GTX 1080 Ti)


    Do you want to use clang as CUDA compiler? [y/N]:
    nvcc will be used as CUDA compiler.

    Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:


    Do you wish to build TensorFlow with MPI support? [y/N]:
    No MPI support will be enabled for TensorFlow.

    Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:


    Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
    Not configuring the WORKSPACE for Android builds.

    Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
    --config=mkl # Build with MKL support.
    --config=monolithic # Config for mostly static monolithic build.
    Configuration finished

    ```

    #### Build Process
    Takes about 47 minutes on my machine.

    ``` bash
    $ bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package
    ```

    #### Create wheel file and install it

    ``` bash
    $ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
    $ ls ls /tmp/tensorflow_pkg
    tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl
    ```

    If you want to use virtualenv or something, now is the time. Or just:
    ``` bash
    $ pip install /tmp/tensorflow_pkg/tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl
    ```

    #### Backup your wheel if nothing goes wrong (Optional)

    Files in `/tmp` would be cleaned after reboot.
    ```
    cp /tmp/tensorflow_pkg/*.whl ~/
    ```

    It's useful to leave the .whl file lying around in case you want to install it for another environment.

    #### Test Installation
    See if everything got linked correctly
    ``` bash
    $ cd ~
    $ python
    >>> import tensorflow as tf
    >>> tf.Session()
    2018-04-08 03:25:15.740635: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
    2018-04-08 03:25:15.741260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
    pciBusID: 0000:c4:00.0
    totalMemory: 11.00GiB freeMemory: 10.18GiB
    2018-04-08 03:25:15.741288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
    2018-04-08 03:25:16.157590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-04-08 03:25:16.157614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
    2018-04-08 03:25:16.157620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
    2018-04-08 03:25:16.157753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9849 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1)
    <tensorflow.python.client.session.Session object at 0x10968ef60>
    ```

    ##### Try out new Tensorflow feature (Optional)
    ``` bash
    $ python
    ```
    ``` python
    import tensorflow as tf
    tf.enable_eager_execution()
    tf.executing_eagerly() # => True

    x = [[2.]]
    m = tf.matmul(x, x)
    print("hello, {}".format(m)) # => "hello, [[4.]]"
    ```

    #### Test GPU Acceleration

    ```
    $ pip install keras
    $ wget https://gist.github.com/Willian-Zhang/290dceb96679c8f413e42491c92722b0/raw/mnist-cnn.py
    $ python mnist_cnn.py
    /usr/local/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
    from ._conv import register_converters as _register_converters
    Using TensorFlow backend.
    x_train shape: (60000, 28, 28, 1)
    60000 train samples
    10000 test samples
    Train on 60000 samples, validate on 10000 samples
    Epoch 1/12
    2018-04-08 03:29:00.155517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
    2018-04-08 03:29:00.155661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
    pciBusID: 0000:c4:00.0
    totalMemory: 11.00GiB freeMemory: 10.11GiB
    2018-04-08 03:29:00.155677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
    2018-04-08 03:29:00.562343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-04-08 03:29:00.562373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
    2018-04-08 03:29:00.562403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
    2018-04-08 03:29:00.562536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9781 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1)
    2018-04-08 03:29:00.563022: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 9.55G (10256140800 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
    2018-04-08 03:29:00.868307: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
    2018-04-08 03:29:00.906005: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
    2018-04-08 03:29:00.973462: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
    59904/60000 [============================>.] - ETA: 0s - loss: 0.2624 - acc: 0.92022018-04-08 03:29:07.381067: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
    60000/60000 [==============================] - 8s 129us/step - loss: 0.2620 - acc: 0.9203 - val_loss: 0.0587 - val_acc: 0.9825
    Epoch 2/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0891 - acc: 0.9733 - val_loss: 0.0437 - val_acc: 0.9850
    Epoch 3/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0681 - acc: 0.9789 - val_loss: 0.0341 - val_acc: 0.9881
    Epoch 4/12
    60000/60000 [==============================] - 4s 67us/step - loss: 0.0569 - acc: 0.9829 - val_loss: 0.0398 - val_acc: 0.9859
    Epoch 5/12
    60000/60000 [==============================] - 4s 70us/step - loss: 0.0480 - acc: 0.9856 - val_loss: 0.0303 - val_acc: 0.9898
    Epoch 6/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0438 - acc: 0.9869 - val_loss: 0.0288 - val_acc: 0.9897
    Epoch 7/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0379 - acc: 0.9881 - val_loss: 0.0287 - val_acc: 0.9905
    Epoch 8/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0357 - acc: 0.9892 - val_loss: 0.0277 - val_acc: 0.9915
    Epoch 9/12
    60000/60000 [==============================] - 4s 65us/step - loss: 0.0329 - acc: 0.9898 - val_loss: 0.0268 - val_acc: 0.9906
    Epoch 10/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0312 - acc: 0.9903 - val_loss: 0.0295 - val_acc: 0.9911
    Epoch 11/12
    60000/60000 [==============================] - 4s 66us/step - loss: 0.0281 - acc: 0.9908 - val_loss: 0.0292 - val_acc: 0.9908
    Epoch 12/12
    60000/60000 [==============================] - 4s 65us/step - loss: 0.0277 - acc: 0.9917 - val_loss: 0.0260 - val_acc: 0.9919
    Test loss: 0.02598250026818114
    Test accuracy: 0.9919
    ```


    You can use [cuda-smi](https://github.com/phvu/cuda-smi) to watch the GPU memory usages. In case the of the mnist example in keras, you should see the free memory drop down to maybe 2% and the fans spin up. Not quite sure what the grappler/clusters/utils.cc:127 warning is, however.

    ```
    $ ./cuda-smi.dms
    Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 10350 of 11264 MB (i.e. 91.9%) Free
    # when GPU
    $ ./cuda-smi.dms
    Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 1181.1 of 11264 MB (i.e. 10.5%) Free
    ```

    Tested on a MacBook Pro (15-inch, 2016) 10.13.4 (17E199) 2.7 GHz Intel Core i7 and NVIDIA GeForce GTX 1080 Ti 11 GiB