Skip to content

Instantly share code, notes, and snippets.

@sean-smith
Last active May 8, 2019 23:12
Show Gist options
  • Select an option

  • Save sean-smith/a290ed22baa98deff669140a696e29d8 to your computer and use it in GitHub Desktop.

Select an option

Save sean-smith/a290ed22baa98deff669140a696e29d8 to your computer and use it in GitHub Desktop.

Revisions

  1. Sean Smith revised this gist May 8, 2019. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions hpcg.md
    Original file line number Diff line number Diff line change
    @@ -180,6 +180,7 @@ cat $filename
    Build and push that dockerfile with

    ```bash
    $ $(aws ecr get-login --no-include-email --region us-east-1) # login w/ ecr
    $ make push
    ```

  2. Sean Smith revised this gist Apr 19, 2019. 1 changed file with 10 additions and 4 deletions.
    14 changes: 10 additions & 4 deletions hpcg.md
    Original file line number Diff line number Diff line change
    @@ -89,9 +89,10 @@ uri=[URI from ECR console]

    build:
    docker build -f $(distro)/Dockerfile -t pcluster-$(distro) .
    docker build -t $(uri) .

    tag:
    docker tag pcluster-$(distro) $(uri):$(distro)
    docker tag $(uri) $(uri):$(distro)

    push: build tag
    docker push $(uri):$(distro)
    @@ -114,7 +115,7 @@ Add the `AmazonEC2ContainerRegistryFullAccess` IAM Policy to the Master EC2 inst

    Now, create a `Dockerfile` with the following contents:

    ```docker
    ```dockerfile
    FROM pcluster-alinux:latest

    # Set the working directory to /app
    @@ -182,7 +183,7 @@ Build and push that dockerfile with
    $ make push
    ```

    Now you can submit an HPCG run
    Now you can submit an HPCG run like:

    ```bash
    $ awsbsub -e CASE_CORES=36 -n 2 -jn hpcg /work/run.s
    @@ -195,6 +196,11 @@ $ watch awsbstat
    ...
    jobId jobName status startedAt stoppedAt exitCode
    ------------------------------------ --------- -------- ----------- ----------- ----------
    222e21bb-a955-42c8-a45a-6d195db740b6 hello RUNNABLE - - -
    222e21bb-a955-42c8-a45a-6d195db740b6 hpcg RUNNABLE - - -
    ```

    And get the output, after it transitions to `RUNNING`, with:

    ```bash
    $ awsbout 222e21bb-a955-42c8-a45a-6d195db740b6
    ```
  3. Sean Smith revised this gist Apr 18, 2019. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions hpcg.md
    Original file line number Diff line number Diff line change
    @@ -69,6 +69,10 @@ i-07148c539c09ae9b8 c5n.18xlarge 10.0.1.171 -

    You can see there's one `c5n.18xlarge` instance running, this is because we set `min_vcpus = 72`, had we set `min_vcpus = 0`, there would be no hosts running.

    Now let's run through a basic hello world example to demonstrate how it works:

    https://aws-parallelcluster.readthedocs.io/en/latest/tutorials/03_batch_mpi.html

    Now, on the master instance clone the parallelcluster repo:

    ```bash
  4. Sean Smith revised this gist Apr 18, 2019. 1 changed file with 86 additions and 2 deletions.
    88 changes: 86 additions & 2 deletions hpcg.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    # AWS ParallelCluster + AWS Batch

    Today I'm going to demonstrate running High Performance Conjucate Grandients (HPCG) in a containerized workload. This takes advantage of AWS ParallelCluster, AWS Batch, Spack and OpenMPI.
    Today I'm going to demonstrate running High Performance Conjucate Grandients (HPCG) in a containerized workload. This takes advantage of AWS ParallelCluster, AWS Batch, and OpenMPI.

    First install `aws-parallelcluster`:

    @@ -16,7 +16,7 @@ $ vim ~/.parallelcluster/config

    Add to this file the following, you'll need a public and private subnet, see [Public Private Networking](https://github.com/aws/aws-parallelcluster/wiki/Public-Private-Networking) for instructions on how to set that up.

    ```
    ```ini
    [global]
    update_check = true
    sanity_check = true
    @@ -108,5 +108,89 @@ $ sudo service docker start

    Add the `AmazonEC2ContainerRegistryFullAccess` IAM Policy to the Master EC2 instance:

    Now, create a `Dockerfile` with the following contents:

    ```docker
    FROM pcluster-alinux:latest
    # Set the working directory to /app
    WORKDIR /work
    # Copy the current directory contents into the container at /app
    COPY . /work
    ENV PATH=$PATH:/usr/lib64/openmpi/bin/
    # Install any needed packages specified in requirements.txt
    RUN yum -y install awscli wget unzip gzip tar gcc gcc-g++ make
    RUN yum -y install openmpi openmpi-devel
    RUN yum groupinstall "Development Tools" -y
    RUN wget https://github.com/hpcg-benchmark/hpcg/archive/master.zip
    RUN unzip master.zip
    RUN hpcg-master/configure Linux_MPI
    RUN make
    RUN chmod 755 /work/run.s
    # Define environment variable
    ENV INSTANCETYPE c5n.18xlarge
    ENV CASE_CORES 36
    ENV CASE_NAME run1
    ENV CASE_SIZE 16
    ENV CASE_TIME 20
    ENTRYPOINT ["/parallelcluster/bin/entrypoint.sh"]
    ```

    And a file `run.s` with the following contents:

    ```bash
    #!/bin/sh

    echo "case time, size and cores"
    echo "CASE_NAME, $CASE_NAME"
    echo "CASE_TIME, $CASE_TIME"
    echo "CASE_SIZE, $CASE_SIZE"
    echo "CASE_CORES, $CASE_CORES"

    export PATH=.:$PATH
    export OMPI_MCA_btl_vader_single_copy_mechanism=none

    /usr/lib64/openmpi/bin/mpirun --allow-run-as-root -np $CASE_CORES -hostfile ${HOME}/hostfile /work/bin/xhpcg --nx=$CASE_SIZE --ny=$CASE_SIZE --nz=$CASE_SIZE --rt=$CASE_TIME

    rating_string=$( grep "with a GFLOP/s rating" HPCG*)

    length=${#rating_string}
    rating=$(echo $rating_string | cut -c62-$length )

    echo "rating=, $rating"
    middle="_"
    filename=$CASE_NAME$middle$CASE_CORES$middle$CASE_SIZE
    echo "$CASE_NAME, $CASE_CORES, $CASE_SIZE, $CASE_TIME, $rating" > $filename
    echo $filename
    cat $filename
    ```

    Build and push that dockerfile with

    ```bash
    $ make push
    ```

    Now you can submit an HPCG run

    ```bash
    $ awsbsub -e CASE_CORES=36 -n 2 -jn hpcg /work/run.s
    ```

    Watch the job to see when it transitions into running:

    ```bash
    $ watch awsbstat
    ...
    jobId jobName status startedAt stoppedAt exitCode
    ------------------------------------ --------- -------- ----------- ----------- ----------
    222e21bb-a955-42c8-a45a-6d195db740b6 hello RUNNABLE - - -
    ```

  5. Sean Smith revised this gist Apr 18, 2019. 1 changed file with 25 additions and 6 deletions.
    31 changes: 25 additions & 6 deletions hpcg.md
    Original file line number Diff line number Diff line change
    @@ -69,11 +69,12 @@ i-07148c539c09ae9b8 c5n.18xlarge 10.0.1.171 -

    You can see there's one `c5n.18xlarge` instance running, this is because we set `min_vcpus = 72`, had we set `min_vcpus = 0`, there would be no hosts running.

    Go to the [ECR Console](https://console.aws.amazon.com/ecr/repositories) and find an image with a name similar to `paral-docke-t6ayh0ia49nm` (you can sort by latest created)
    Now, on the master instance clone the parallelcluster repo:

    ![image](https://user-images.githubusercontent.com/5545980/55993618-6b686700-5c64-11e9-85ee-a1ab267cce3b.png)

    Grab that URI, it should look like: `112850485306.dkr.ecr.us-east-1.amazonaws.com/paral-docke-t6ajh0ia39nm`
    ```bash
    $ git clone https://github.com/aws/aws-parallelcluster.git
    $ cd aws-parallelcluster/cli/pcluster/resources/batch/docker/
    ```

    Create a Makefile with the following contents:

    @@ -83,11 +84,29 @@ distro=alinux
    uri=[URI from ECR console]

    build:
    docker build -f alinux/Dockerfile -t pcluster-$(distro) .
    docker build -f $(distro)/Dockerfile -t pcluster-$(distro) .

    tag:
    docker tag pcluster-$(distro) $(uri):$(distro)

    push: build tag
    docker push $(uri):$(distro)
    ```
    ```

    To get that URI, go to the [ECR Console](https://console.aws.amazon.com/ecr/repositories) and find an image with a name similar to `paral-docke-t6ayh0ia49nm` (you can sort by latest created)

    ![image](https://user-images.githubusercontent.com/5545980/55993618-6b686700-5c64-11e9-85ee-a1ab267cce3b.png)

    Grab that URI, it should look like: `112850485306.dkr.ecr.us-east-1.amazonaws.com/paral-docke-t6ajh0ia39nm`

    Install docker

    ```bash
    $ sudo yum install -y docker
    $ sudo service docker start
    ```

    Add the `AmazonEC2ContainerRegistryFullAccess` IAM Policy to the Master EC2 instance:



  6. Sean Smith revised this gist Apr 18, 2019. 1 changed file with 86 additions and 4 deletions.
    90 changes: 86 additions & 4 deletions hpcg.md
    Original file line number Diff line number Diff line change
    @@ -1,11 +1,93 @@
    ### AWS ParallelCluster + AWS Batch
    # AWS ParallelCluster + AWS Batch

    Today I'm going to demonstrate running High Performance Conjucate Grandient (HPCG) in a containerized workload. This takes advantage of AWS ParallelCluster, AWS Batch, Spack and OpenMPI.
    Today I'm going to demonstrate running High Performance Conjucate Grandients (HPCG) in a containerized workload. This takes advantage of AWS ParallelCluster, AWS Batch, Spack and OpenMPI.

    First install `aws-parallelcluster`:

    $ pip install aws-parallelcluster
    ```bash
    $ pip install aws-parallelcluster
    ```

    Edit the file to include the awsbatch cluster configuration:


    ```bash
    $ vim ~/.parallelcluster/config
    ```

    Add to this file the following, you'll need a public and private subnet, see [Public Private Networking](https://github.com/aws/aws-parallelcluster/wiki/Public-Private-Networking) for instructions on how to set that up.

    ```
    [global]
    update_check = true
    sanity_check = true
    cluster_template = awsbatch
    [aws]
    aws_region_name = us-east-1
    [cluster awsbatch]
    scheduler = awsbatch
    key_name = [your key]
    min_vcpus = 72
    desired_vcpus = 72
    max_vcpus = 288
    vpc_settings = public-private
    master_instance_type = c5.xlarge
    compute_instance_type = c5n.18xlarge
    [vpc public-private]
    vpc_id = vpc-00d2e489741609bc2
    master_subnet_id = subnet-0152608e422c75189
    compute_subnet_id = subnet-0baadf9781f59a6a1
    ```

    Now, create the cluster:

    ```bash
    $ pcluster create awsbatch-cluster
    Creating stack named: parallelcluster-hpcg
    Status: parallelcluster-hpcg - CREATE_COMPLETE
    ClusterUser: ec2-user
    MasterPublicIP: 54.35.249.0
    MasterPrivateIP: 10.0.0.35
    ```

    Once that's completed, ssh in. You may have to specify the keypath with the `-i` flag if you're not using a default key.

    ```bash
    $ pcluster ssh awsbatch -i ~/.ssh/id_rsa
    ```

    Running `awsbhosts` shows you the hosts that are running:

    ```bash
    [ec2-user@ip-10-0-0-182 ~]$ awsbhosts
    ec2InstanceId instanceType privateIpAddress publicIpAddress runningJobs
    ------------------- -------------- ------------------ ----------------- -------------
    i-07148c539c09ae9b8 c5n.18xlarge 10.0.1.171 - 0
    ```

    You can see there's one `c5n.18xlarge` instance running, this is because we set `min_vcpus = 72`, had we set `min_vcpus = 0`, there would be no hosts running.

    Go to the [ECR Console](https://console.aws.amazon.com/ecr/repositories) and find an image with a name similar to `paral-docke-t6ayh0ia49nm` (you can sort by latest created)

    ![image](https://user-images.githubusercontent.com/5545980/55993618-6b686700-5c64-11e9-85ee-a1ab267cce3b.png)

    Grab that URI, it should look like: `112850485306.dkr.ecr.us-east-1.amazonaws.com/paral-docke-t6ajh0ia39nm`

    Create a Makefile with the following contents:

    ```make
    # Makefile
    distro=alinux
    uri=[URI from ECR console]

    build:
    docker build -f alinux/Dockerfile -t pcluster-$(distro) .

    tag:
    docker tag pcluster-$(distro) $(uri):$(distro)

    push: build tag
    docker push $(uri):$(distro)
    ```
  7. Sean Smith created this gist Apr 18, 2019.
    11 changes: 11 additions & 0 deletions hpcg.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,11 @@
    ### AWS ParallelCluster + AWS Batch

    Today I'm going to demonstrate running High Performance Conjucate Grandient (HPCG) in a containerized workload. This takes advantage of AWS ParallelCluster, AWS Batch, Spack and OpenMPI.

    First install `aws-parallelcluster`:

    $ pip install aws-parallelcluster

    Edit the file to include the awsbatch cluster configuration: