Last active
May 8, 2019 23:12
-
-
Save sean-smith/a290ed22baa98deff669140a696e29d8 to your computer and use it in GitHub Desktop.
Revisions
-
Sean Smith revised this gist
May 8, 2019 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -180,6 +180,7 @@ cat $filename Build and push that dockerfile with ```bash $ $(aws ecr get-login --no-include-email --region us-east-1) # login w/ ecr $ make push ``` -
Sean Smith revised this gist
Apr 19, 2019 . 1 changed file with 10 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -89,9 +89,10 @@ uri=[URI from ECR console] build: docker build -f $(distro)/Dockerfile -t pcluster-$(distro) . docker build -t $(uri) . tag: docker tag $(uri) $(uri):$(distro) push: build tag docker push $(uri):$(distro) @@ -114,7 +115,7 @@ Add the `AmazonEC2ContainerRegistryFullAccess` IAM Policy to the Master EC2 inst Now, create a `Dockerfile` with the following contents: ```dockerfile FROM pcluster-alinux:latest # Set the working directory to /app @@ -182,7 +183,7 @@ Build and push that dockerfile with $ make push ``` Now you can submit an HPCG run like: ```bash $ awsbsub -e CASE_CORES=36 -n 2 -jn hpcg /work/run.s @@ -195,6 +196,11 @@ $ watch awsbstat ... jobId jobName status startedAt stoppedAt exitCode ------------------------------------ --------- -------- ----------- ----------- ---------- 222e21bb-a955-42c8-a45a-6d195db740b6 hpcg RUNNABLE - - - ``` And get the output, after it transitions to `RUNNING`, with: ```bash $ awsbout 222e21bb-a955-42c8-a45a-6d195db740b6 ``` -
Sean Smith revised this gist
Apr 18, 2019 . 1 changed file with 4 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -69,6 +69,10 @@ i-07148c539c09ae9b8 c5n.18xlarge 10.0.1.171 - You can see there's one `c5n.18xlarge` instance running, this is because we set `min_vcpus = 72`, had we set `min_vcpus = 0`, there would be no hosts running. Now let's run through a basic hello world example to demonstrate how it works: https://aws-parallelcluster.readthedocs.io/en/latest/tutorials/03_batch_mpi.html Now, on the master instance clone the parallelcluster repo: ```bash -
Sean Smith revised this gist
Apr 18, 2019 . 1 changed file with 86 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,6 +1,6 @@ # AWS ParallelCluster + AWS Batch Today I'm going to demonstrate running High Performance Conjucate Grandients (HPCG) in a containerized workload. This takes advantage of AWS ParallelCluster, AWS Batch, and OpenMPI. First install `aws-parallelcluster`: @@ -16,7 +16,7 @@ $ vim ~/.parallelcluster/config Add to this file the following, you'll need a public and private subnet, see [Public Private Networking](https://github.com/aws/aws-parallelcluster/wiki/Public-Private-Networking) for instructions on how to set that up. ```ini [global] update_check = true sanity_check = true @@ -108,5 +108,89 @@ $ sudo service docker start Add the `AmazonEC2ContainerRegistryFullAccess` IAM Policy to the Master EC2 instance: Now, create a `Dockerfile` with the following contents: ```docker FROM pcluster-alinux:latest # Set the working directory to /app WORKDIR /work # Copy the current directory contents into the container at /app COPY . /work ENV PATH=$PATH:/usr/lib64/openmpi/bin/ # Install any needed packages specified in requirements.txt RUN yum -y install awscli wget unzip gzip tar gcc gcc-g++ make RUN yum -y install openmpi openmpi-devel RUN yum groupinstall "Development Tools" -y RUN wget https://github.com/hpcg-benchmark/hpcg/archive/master.zip RUN unzip master.zip RUN hpcg-master/configure Linux_MPI RUN make RUN chmod 755 /work/run.s # Define environment variable ENV INSTANCETYPE c5n.18xlarge ENV CASE_CORES 36 ENV CASE_NAME run1 ENV CASE_SIZE 16 ENV CASE_TIME 20 ENTRYPOINT ["/parallelcluster/bin/entrypoint.sh"] ``` And a file `run.s` with the following contents: ```bash #!/bin/sh echo "case time, size and cores" echo "CASE_NAME, $CASE_NAME" echo "CASE_TIME, $CASE_TIME" echo "CASE_SIZE, $CASE_SIZE" echo "CASE_CORES, $CASE_CORES" export PATH=.:$PATH export OMPI_MCA_btl_vader_single_copy_mechanism=none /usr/lib64/openmpi/bin/mpirun --allow-run-as-root -np $CASE_CORES -hostfile ${HOME}/hostfile /work/bin/xhpcg --nx=$CASE_SIZE --ny=$CASE_SIZE --nz=$CASE_SIZE --rt=$CASE_TIME rating_string=$( grep "with a GFLOP/s rating" HPCG*) length=${#rating_string} rating=$(echo $rating_string | cut -c62-$length ) echo "rating=, $rating" middle="_" filename=$CASE_NAME$middle$CASE_CORES$middle$CASE_SIZE echo "$CASE_NAME, $CASE_CORES, $CASE_SIZE, $CASE_TIME, $rating" > $filename echo $filename cat $filename ``` Build and push that dockerfile with ```bash $ make push ``` Now you can submit an HPCG run ```bash $ awsbsub -e CASE_CORES=36 -n 2 -jn hpcg /work/run.s ``` Watch the job to see when it transitions into running: ```bash $ watch awsbstat ... jobId jobName status startedAt stoppedAt exitCode ------------------------------------ --------- -------- ----------- ----------- ---------- 222e21bb-a955-42c8-a45a-6d195db740b6 hello RUNNABLE - - - ``` -
Sean Smith revised this gist
Apr 18, 2019 . 1 changed file with 25 additions and 6 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -69,11 +69,12 @@ i-07148c539c09ae9b8 c5n.18xlarge 10.0.1.171 - You can see there's one `c5n.18xlarge` instance running, this is because we set `min_vcpus = 72`, had we set `min_vcpus = 0`, there would be no hosts running. Now, on the master instance clone the parallelcluster repo: ```bash $ git clone https://github.com/aws/aws-parallelcluster.git $ cd aws-parallelcluster/cli/pcluster/resources/batch/docker/ ``` Create a Makefile with the following contents: @@ -83,11 +84,29 @@ distro=alinux uri=[URI from ECR console] build: docker build -f $(distro)/Dockerfile -t pcluster-$(distro) . tag: docker tag pcluster-$(distro) $(uri):$(distro) push: build tag docker push $(uri):$(distro) ``` To get that URI, go to the [ECR Console](https://console.aws.amazon.com/ecr/repositories) and find an image with a name similar to `paral-docke-t6ayh0ia49nm` (you can sort by latest created)  Grab that URI, it should look like: `112850485306.dkr.ecr.us-east-1.amazonaws.com/paral-docke-t6ajh0ia39nm` Install docker ```bash $ sudo yum install -y docker $ sudo service docker start ``` Add the `AmazonEC2ContainerRegistryFullAccess` IAM Policy to the Master EC2 instance: -
Sean Smith revised this gist
Apr 18, 2019 . 1 changed file with 86 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,11 +1,93 @@ # AWS ParallelCluster + AWS Batch Today I'm going to demonstrate running High Performance Conjucate Grandients (HPCG) in a containerized workload. This takes advantage of AWS ParallelCluster, AWS Batch, Spack and OpenMPI. First install `aws-parallelcluster`: ```bash $ pip install aws-parallelcluster ``` Edit the file to include the awsbatch cluster configuration: ```bash $ vim ~/.parallelcluster/config ``` Add to this file the following, you'll need a public and private subnet, see [Public Private Networking](https://github.com/aws/aws-parallelcluster/wiki/Public-Private-Networking) for instructions on how to set that up. ``` [global] update_check = true sanity_check = true cluster_template = awsbatch [aws] aws_region_name = us-east-1 [cluster awsbatch] scheduler = awsbatch key_name = [your key] min_vcpus = 72 desired_vcpus = 72 max_vcpus = 288 vpc_settings = public-private master_instance_type = c5.xlarge compute_instance_type = c5n.18xlarge [vpc public-private] vpc_id = vpc-00d2e489741609bc2 master_subnet_id = subnet-0152608e422c75189 compute_subnet_id = subnet-0baadf9781f59a6a1 ``` Now, create the cluster: ```bash $ pcluster create awsbatch-cluster Creating stack named: parallelcluster-hpcg Status: parallelcluster-hpcg - CREATE_COMPLETE ClusterUser: ec2-user MasterPublicIP: 54.35.249.0 MasterPrivateIP: 10.0.0.35 ``` Once that's completed, ssh in. You may have to specify the keypath with the `-i` flag if you're not using a default key. ```bash $ pcluster ssh awsbatch -i ~/.ssh/id_rsa ``` Running `awsbhosts` shows you the hosts that are running: ```bash [ec2-user@ip-10-0-0-182 ~]$ awsbhosts ec2InstanceId instanceType privateIpAddress publicIpAddress runningJobs ------------------- -------------- ------------------ ----------------- ------------- i-07148c539c09ae9b8 c5n.18xlarge 10.0.1.171 - 0 ``` You can see there's one `c5n.18xlarge` instance running, this is because we set `min_vcpus = 72`, had we set `min_vcpus = 0`, there would be no hosts running. Go to the [ECR Console](https://console.aws.amazon.com/ecr/repositories) and find an image with a name similar to `paral-docke-t6ayh0ia49nm` (you can sort by latest created)  Grab that URI, it should look like: `112850485306.dkr.ecr.us-east-1.amazonaws.com/paral-docke-t6ajh0ia39nm` Create a Makefile with the following contents: ```make # Makefile distro=alinux uri=[URI from ECR console] build: docker build -f alinux/Dockerfile -t pcluster-$(distro) . tag: docker tag pcluster-$(distro) $(uri):$(distro) push: build tag docker push $(uri):$(distro) ``` -
Sean Smith created this gist
Apr 18, 2019 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,11 @@ ### AWS ParallelCluster + AWS Batch Today I'm going to demonstrate running High Performance Conjucate Grandient (HPCG) in a containerized workload. This takes advantage of AWS ParallelCluster, AWS Batch, Spack and OpenMPI. First install `aws-parallelcluster`: $ pip install aws-parallelcluster Edit the file to include the awsbatch cluster configuration: