rdegraci · June 29, 2017 13:36
diff --git a/MLDB Howto b/MLDB Howto

 apt-get update
 apt-get install -y \
  git \
  autoconf \
  build-essential \
  language-pack-en \
  libarchive-dev \
  libblas-dev \
  libboost-all-dev \
  libcap-dev \
  libcrypto++-dev \
  libcurl4-openssl-dev \
  libffi-dev \
  libmagic-dev \
  libfreetype6-dev \
  libgoogle-perftools-dev \
  liblapack-dev \
  liblzma-dev \
  libpng12-dev \
  libpq-dev \
  libpython-dev \
  libsasl2-dev \
  libssh2-1-dev \
  libtool \
  libyaml-cpp-dev \
  python-virtualenv \
  unzip \
  valgrind \
  uuid-dev \
  libxml++2.6-dev


 ssh-keygen -N "" -f /home/vagrant/.ssh/id_rsa


 Docker
 apt-get update
 apt-get install -y \
    linux-image-extra-$(uname -r) \
    linux-image-extra-virtual

 sudo apt-get update
 apt-get install -y \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

 curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

 apt-key fingerprint 0EBFCD88

 apt-get update
 add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

 apt-get update
 apt-get install -y docker-ce
 apt-cache madison docker-ce


 docker run hello-world

 ###### Then





 # You will first need to have a Github account with SSH keys set up because the repo uses SSH paths in its submodule configuration. You can test that keys are correctly set up by running the following command and seeing "successfully authenticated":

 ssh -T [email protected]
 #Note the master branch is bleeding edge and the demos or documentation may be slightly out of sync with the code at any given point in time. To avoid this, it is recommended to build the Community Edition from the latest tagged release which is tracked by the release_latest branch.



 # NOTE Occasionally, build ordering issues may creep into the build which don't affect the viability of the build, but may cause make to fail. In that case, it is acceptable to repeat the make -k compile step, which may successfully complete on a second pass. (The build order is regression tested, but the regression tests for the build ordering are run less frequently than other tests).

 # NOTE Occasionally, tests may fail spuriously, especially due to high load on the machine when running time-sensitive tests or network issues when accessing external resources. Repeating the make -k test step may allow them to pass. It is OK to use MLDB if the tests don't all pass; all code merged tagged for release has passed regression tests in the stable testing environment.

 # Build output lands in the build directory and there is no make clean target: you can just rm -rf build. You can speed up recompilation after deleting your build directory by using ccache, which can be installed with apt-get install ccache. You can then create a file at the top of the repo directory called local.mk with the following contents:

 COMPILER_CACHE:=ccache

 # N.B. To use ccache to maximum effect, you should set the cache size to something like 10GB if you have the disk space with ccache -M 10G.

 # To avoid building MLDB for all supported architectures and save time, check sample.local.mk

 # To have a faster build, you can use clang instead of gcc. Simply add toolchain=clang at the end of your make command.

 # To run a single test, simply specify its name as the target. For python and javascript, include the extension (.py and .js). For C++, omit it.


 # Once you have created the local.mk THEN:

 git clone [email protected]:mldbai/mldb.git
 cd mldb
 git checkout release_latest
 git submodule update --init --recursive
 make dependencies
 make -k compile
 make -k test
 # To speed things up, consider using the -j option in make to leverage multiple cores: make -j8 compile.



 # Building a Docker image

 # You'll need to add your user to the docker group otherwise you'll need to sudo to build the Docker image:

 sudo usermod -a -G docker `whoami`

 # To build a development Docker image just run the following command from the top level of this repo:

 nice make -j16 -k docker_mldb DOCKER_ALLOW_DIRTY=1

 # The final lines of output will give you a docker hash for this image, and the image is also tagged as <username>_latest where <username> is your Unix username on the box.

 # To run a development Docker image you just built, follow the Docker instructions from http://mldb.ai/doc/#builtin/Running.md.html except where the tag there is latest just substitute <username>_latest and where the container name there is mldb just substitute something unique to you (e.g. <username> is a good candidate!).

 # Docker images built this way will have the internal/experimental entities shown in the documentation. For external releases, the flags RUN_STRIP=-s is passed which, as a side effect, will hide the internal entities in the documentation.

 ##### THEN

 Step 1 - Launch an MLBD container with a mapped directory

 Note: the following procedure is meant to be run as a regular user, running the MLDB container as root is not recommended. See the official Docker documentation for more information regarding running containers from regular user accounts.

 First, create an empty directory on the host machine by running the following command, where </absolute/path/to/mldb_data> needs to be replaced by the absolute path on your local machine where you want your MLDB working directory to be:

 mkdir </absolute/path/to/mldb_data>
 You can now execute the following command, where <mldbport> is a port of your choice to be used in the next section (e.g. 8080).

 docker run --rm=true \
 -v </absolute/path/to/mldb_data>:/mldb_data \
 -e MLDB_IDS="`id`" \
 -p 127.0.0.1:<mldbport>:80 \
 quay.io/mldb/mldb:latest

 Once the container is booted, the path /mldb_data inside the container is mapped to </absolute/path/to/mldb_data> on the host machine, so MLDB will be able to access files at </absolute/path/to/mldb_data>/file.ext via the URL file:///mldb_data/file.ext. Read more about URLs here.



 # To run without needing a tunnel (security risk), do the following and connect to port 8080

 docker run --rm=true \
 -v </absolute/path/to/mldb_data>:/mldb_data \
 -e MLDB_IDS="`id`" \
 -p 8080:80 \
 quay.io/mldb/mldb:latest



 # But if you want security:

 Step 2 - Establish a tunnel (for remote servers)

 For security reasons, the instructions above will cause MLDB to only accept connections local to the host it was launched on. If you are not running MLDB on your workstation, you need to establish an SSH tunnel which forwards <localport> (e.g. 8080 again) from your workstation to <mldbport> on the remote host.

 This command will do this in a terminal on OSX and Linux, or on Windows using Git Bash, MinGW or Cygwin:

 ssh -f -o ExitOnForwardFailure=yes <user>@<remotehost> -L <localport>:127.0.0.1:<mldbport> -N
 You can read on how to do this with Putty on Windows here: Documentation, Tutorial.

 Step 3 - Activate MLDB

 When the line "MLDB Ready" appears in the console output, you can now point your browser to http://localhost:<localport>/. You can then follow the instructions.

	apt-get update
	apt-get install -y \
	git \
	autoconf \
	build-essential \
	language-pack-en \
	libarchive-dev \
	libblas-dev \
	libboost-all-dev \
	libcap-dev \
	libcrypto++-dev \
	libcurl4-openssl-dev \
	libffi-dev \
	libmagic-dev \
	libfreetype6-dev \
	libgoogle-perftools-dev \
	liblapack-dev \
	liblzma-dev \
	libpng12-dev \
	libpq-dev \
	libpython-dev \
	libsasl2-dev \
	libssh2-1-dev \
	libtool \
	libyaml-cpp-dev \
	python-virtualenv \
	unzip \
	valgrind \
	uuid-dev \
	libxml++2.6-dev


	ssh-keygen -N "" -f /home/vagrant/.ssh/id_rsa


	Docker
	apt-get update
	apt-get install -y \
	linux-image-extra-$(uname -r) \
	linux-image-extra-virtual

	sudo apt-get update
	apt-get install -y \
	apt-transport-https \
	ca-certificates \
	curl \
	software-properties-common

	curl -fsSL https://download.docker.com/linux/ubuntu/gpg \| sudo apt-key add -

	apt-key fingerprint 0EBFCD88

	apt-get update
	add-apt-repository \
	"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
	$(lsb_release -cs) \
	stable"

	apt-get update
	apt-get install -y docker-ce
	apt-cache madison docker-ce


	docker run hello-world

	###### Then





	# You will first need to have a Github account with SSH keys set up because the repo uses SSH paths in its submodule configuration. You can test that keys are correctly set up by running the following command and seeing "successfully authenticated":

	ssh -T [email protected]
	#Note the master branch is bleeding edge and the demos or documentation may be slightly out of sync with the code at any given point in time. To avoid this, it is recommended to build the Community Edition from the latest tagged release which is tracked by the release_latest branch.



	# NOTE Occasionally, build ordering issues may creep into the build which don't affect the viability of the build, but may cause make to fail. In that case, it is acceptable to repeat the make -k compile step, which may successfully complete on a second pass. (The build order is regression tested, but the regression tests for the build ordering are run less frequently than other tests).

	# NOTE Occasionally, tests may fail spuriously, especially due to high load on the machine when running time-sensitive tests or network issues when accessing external resources. Repeating the make -k test step may allow them to pass. It is OK to use MLDB if the tests don't all pass; all code merged tagged for release has passed regression tests in the stable testing environment.

	# Build output lands in the build directory and there is no make clean target: you can just rm -rf build. You can speed up recompilation after deleting your build directory by using ccache, which can be installed with apt-get install ccache. You can then create a file at the top of the repo directory called local.mk with the following contents:

	COMPILER_CACHE:=ccache

	# N.B. To use ccache to maximum effect, you should set the cache size to something like 10GB if you have the disk space with ccache -M 10G.

	# To avoid building MLDB for all supported architectures and save time, check sample.local.mk

	# To have a faster build, you can use clang instead of gcc. Simply add toolchain=clang at the end of your make command.

	# To run a single test, simply specify its name as the target. For python and javascript, include the extension (.py and .js). For C++, omit it.


	# Once you have created the local.mk THEN:

	git clone [email protected]:mldbai/mldb.git
	cd mldb
	git checkout release_latest
	git submodule update --init --recursive
	make dependencies
	make -k compile
	make -k test
	# To speed things up, consider using the -j option in make to leverage multiple cores: make -j8 compile.



	# Building a Docker image

	# You'll need to add your user to the docker group otherwise you'll need to sudo to build the Docker image:

	sudo usermod -a -G docker `whoami`

	# To build a development Docker image just run the following command from the top level of this repo:

	nice make -j16 -k docker_mldb DOCKER_ALLOW_DIRTY=1

	# The final lines of output will give you a docker hash for this image, and the image is also tagged as <username>_latest where <username> is your Unix username on the box.

	# To run a development Docker image you just built, follow the Docker instructions from http://mldb.ai/doc/#builtin/Running.md.html except where the tag there is latest just substitute <username>_latest and where the container name there is mldb just substitute something unique to you (e.g. <username> is a good candidate!).

	# Docker images built this way will have the internal/experimental entities shown in the documentation. For external releases, the flags RUN_STRIP=-s is passed which, as a side effect, will hide the internal entities in the documentation.

	##### THEN

	Step 1 - Launch an MLBD container with a mapped directory

	Note: the following procedure is meant to be run as a regular user, running the MLDB container as root is not recommended. See the official Docker documentation for more information regarding running containers from regular user accounts.

	First, create an empty directory on the host machine by running the following command, where </absolute/path/to/mldb_data> needs to be replaced by the absolute path on your local machine where you want your MLDB working directory to be:

	mkdir </absolute/path/to/mldb_data>
	You can now execute the following command, where <mldbport> is a port of your choice to be used in the next section (e.g. 8080).

	docker run --rm=true \
	-v </absolute/path/to/mldb_data>:/mldb_data \
	-e MLDB_IDS="`id`" \
	-p 127.0.0.1:<mldbport>:80 \
	quay.io/mldb/mldb:latest

	Once the container is booted, the path /mldb_data inside the container is mapped to </absolute/path/to/mldb_data> on the host machine, so MLDB will be able to access files at </absolute/path/to/mldb_data>/file.ext via the URL file:///mldb_data/file.ext. Read more about URLs here.



	# To run without needing a tunnel (security risk), do the following and connect to port 8080

	docker run --rm=true \
	-v </absolute/path/to/mldb_data>:/mldb_data \
	-e MLDB_IDS="`id`" \
	-p 8080:80 \
	quay.io/mldb/mldb:latest



	# But if you want security:

	Step 2 - Establish a tunnel (for remote servers)

	For security reasons, the instructions above will cause MLDB to only accept connections local to the host it was launched on. If you are not running MLDB on your workstation, you need to establish an SSH tunnel which forwards <localport> (e.g. 8080 again) from your workstation to <mldbport> on the remote host.

	This command will do this in a terminal on OSX and Linux, or on Windows using Git Bash, MinGW or Cygwin:

	ssh -f -o ExitOnForwardFailure=yes <user>@<remotehost> -L <localport>:127.0.0.1:<mldbport> -N
	You can read on how to do this with Putty on Windows here: Documentation, Tutorial.

	Step 3 - Activate MLDB

	When the line "MLDB Ready" appears in the console output, you can now point your browser to http://localhost:<localport>/. You can then follow the instructions.
No results found