Skip to content

Instantly share code, notes, and snippets.

@mrjoshuak
Forked from tleyden/gist:9051a82f7bd124817995
Last active August 29, 2015 14:11
Show Gist options
  • Select an option

  • Save mrjoshuak/c0125cf3b9f79a3d0a39 to your computer and use it in GitHub Desktop.

Select an option

Save mrjoshuak/c0125cf3b9f79a3d0a39 to your computer and use it in GitHub Desktop.

Launch CoreOS on AWS GPU instance

  • Go to Launch CoreOS on AWS and find the HVM AMI you want to use, eg: ami-d878c3b0

  • Go to AWS control panel under EC2 instances

  • Launch a new instance

  • Under "Community AMIs", search for ami-d878c3b0

  • Select the GPU Instance: ami-d878c3b0

  • Increase root EBS store from 8 GB -> 20 GB to give yourself some breathing room

Run docker container in privileged mode

$ docker run --privileged=true -i -t ubuntu:12.04 /bin/bash

After the above command, you should be inside a shell in your docker container. The rest of the steps will assume that you are running them from inside your docker container.

Install build tools + other required packages

$ apt-get install build-essential wget git

Upgrade GCC to 4.7

Install

$ apt-get install software-properties-common python-software-properties 
$ add-apt-repository ppa:ubuntu-toolchain-r/test
$ apt-get update
$ apt-get install gcc-4.7 g++-4.7

Add GCC 4.7

$ update-alternatives --remove gcc /usr/bin/gcc-4.6
$ update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.7 60 --slave /usr/bin/g++ g++ /usr/bin/g++-4.7
$ update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.6 40 --slave /usr/bin/g++ g++ /usr/bin/g++-4.6

Make sure GCC 4.7 is the default alternative

$ update-alternatives --config gcc
...
* 0 /usr/bin/gcc-4.7 60 auto mode
...

It should have gcc-4.7 as the default choice with the asterisk next to it.

Get kernel source

$ mkdir -p /usr/src/kernels
$ cd /usr/src/kernels
$ git clone https://github.com/coreos/linux.git

Find the tag that corresponds to your CoreOS version. For example, currently the latest stable version is the v3.8 tag (I think).

$ git checkout v3.8

Compile kernel source -- Take 1

If you don't do this step, you'll get an error about not having a /usr/src/kernels/linux/include/linux/version.h file when you run NVIDIA-Linux-x86_64-340.29.run.

$ https://raw.githubusercontent.com/coreos/coreos-overlay/f0de215426eede4d479c732ea1f575d99b978f3a/sys-kernel/coreos-kernel/files/x86_64_defconfig-3.16.2
$ mv x86_64_defconfig-3.16.2 .config
$ make

Error

root@373c6d823a5b:/usr/src/kernels/linux# make
  HOSTCC  scripts/kconfig/conf.o
  ...
scripts/kconfig/conf --silentoldconfig Kconfig
.config:2620:warning: symbol value 'm' invalid for USB_OHCI_HCD_PCI
.config:2622:warning: symbol value 'm' invalid for USB_OHCI_HCD_PLATFORM
.config:2984:warning: symbol value 'm' invalid for XEN_TMEM
warning: (BLK_DEV_RBD && CEPH_FS) selects CEPH_LIB which has unmet direct dependencies (NET && INET && EXPERIMENTAL)
*
* Restart config...
*
*
* General setup
*
Prompt for development and/or incomplete code/drivers (EXPERIMENTAL) [N/y/?] (NEW) n

I gave up on trying to use the kernel config from coreos-overlay, and tried a different direction.

Compile kernel source -- Take 2

If you don't do this step, you'll get an error about not having a /usr/src/kernels/linux/include/linux/version.h file when you run NVIDIA-Linux-x86_64-340.29.run.

$ apt-get install libncurses5-dev
$ make menuconfig

When the screen pops up, just choose "Exit" and hit "Yes" when it asks you to save.

$ make

This works, and the kernel will compile. But it will end up in an error below when trying to run NVIDIA-Linux-x86_64-340.29.run.

Install nvidia driver

Download

$ mkdir -p /opt/nvidia
$ cd /opt/nvidia
$ wget http://developer.download.nvidia.com/compute/cuda/6_5/rel/installers/cuda_6.5.14_linux_64.run

Unpack

$ chmod +x cuda_6.5.14_linux_64.run
$ mkdir nvidia_installers
$ ./cuda_6.5.14_linux_64.run -extract=`pwd`/nvidia_installers

Install

$ cd nvidia_installers
$ ./NVIDIA-Linux-x86_64-340.29.run --kernel-source-path=/usr/src/kernels/linux/

Error

Using the "Take 2" method of compiling kernel source above, I got the following error when installing the kernel:

ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a
         version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module
         from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

         Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.

Verify

TODO

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment