Skip to content

Instantly share code, notes, and snippets.

@matsen
Last active October 1, 2015 12:19
Show Gist options
  • Select an option

  • Save matsen/1b6768e02a6e3f9d9f2c to your computer and use it in GitHub Desktop.

Select an option

Save matsen/1b6768e02a6e3f9d9f2c to your computer and use it in GitHub Desktop.

Revisions

  1. matsen revised this gist Oct 1, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion response-to-exascale-hpc-rfi.md
    Original file line number Diff line number Diff line change
    @@ -38,7 +38,7 @@ NSF cash resulting in an excellent product. Their Agave API

    ### Parallel architectures require parallel algorithms.
    This is self-explanatory. In my field of Bayesian phylogenetics, for example,
    all algorithms in common use use Markov chain Monte Carlo, which is an
    all algorithms in common use utilize Markov chain Monte Carlo, which is an
    inherently serial algorithm. If we are to use large scale architecture we are
    going to need algorithms appropriate for that architecture.

  2. matsen revised this gist Sep 30, 2015. 1 changed file with 10 additions and 10 deletions.
    20 changes: 10 additions & 10 deletions response-to-exascale-hpc-rfi.md
    Original file line number Diff line number Diff line change
    @@ -21,7 +21,7 @@ There is a clear antidote to this problem, which are software
    containers. These are lightweight virtual machines which can run on a
    variety of platforms. Docker is the most well known, and the community
    (including Docker) is coalescing under a standard. See
    https://www.opencontainers.org/ for details. Using containers, all
    <https://www.opencontainers.org/> for details. Using containers, all
    dependencies are cached and the pipeline can be run reliably into the
    future.

    @@ -31,10 +31,10 @@ an entirely trivial technical consideration.

    ### iPlant collaborative is already doing a visionary job.
    As you are no doubt aware, the iPlant Collaborative environment
    (http://www.iplantcollaborative.org/) and the TACC under the leadership
    (<http://www.iplantcollaborative.org/>) and the TACC under the leadership
    of Dan Stanzione is a remarkable example of smart people given a chunk of
    NSF cash resulting in an excellent product. Their Agave API
    (http://agaveapi.co/) points the way to the future.
    (<http://agaveapi.co/>) points the way to the future.

    ### Parallel architectures require parallel algorithms.
    This is self-explanatory. In my field of Bayesian phylogenetics, for example,
    @@ -43,10 +43,10 @@ inherently serial algorithm. If we are to use large scale architecture we are
    going to need algorithms appropriate for that architecture.

    ### Algorithms can give &gt; 100 fold improvement without additional infrastructure.
    As we have seen recently with the development of *kallisto* by Bray et al
    (http://arxiv.org/abs/1505.02710) algorithms can change problems from requiring
    a cluster to being quite do-able on a laptop. [Note: I understand that
    *kallisto* isn't the same as doing a full analysis with cufflinks, etc, but for
    many common applications it appears to do a fine job.] Thus, I hope that novel
    algorithm development for core computational problems will be part of any
    investment in computing infrastructure.
    As we have seen recently with the development of [*kallisto* by Bray et
    al](http://arxiv.org/abs/1505.02710) algorithms can change problems from
    requiring a cluster to being quite do-able on a laptop. [Note: I understand
    that running *kallisto* isn't the same as doing a full analysis with cufflinks,
    etc, but for many common applications it appears to do a fine job.] Thus,
    I hope that novel algorithm development for core computational problems will be
    part of any investment in computing infrastructure.
  3. matsen revised this gist Sep 30, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion response-to-exascale-hpc-rfi.md
    Original file line number Diff line number Diff line change
    @@ -40,7 +40,7 @@ NSF cash resulting in an excellent product. Their Agave API
    This is self-explanatory. In my field of Bayesian phylogenetics, for example,
    all algorithms in common use use Markov chain Monte Carlo, which is an
    inherently serial algorithm. If we are to use large scale architecture we are
    going to need corresponding algorithms.
    going to need algorithms appropriate for that architecture.

    ### Algorithms can give &gt; 100 fold improvement without additional infrastructure.
    As we have seen recently with the development of *kallisto* by Bray et al
  4. matsen revised this gist Sep 30, 2015. 1 changed file with 5 additions and 4 deletions.
    9 changes: 5 additions & 4 deletions response-to-exascale-hpc-rfi.md
    Original file line number Diff line number Diff line change
    @@ -3,9 +3,9 @@
    ### We can piggyback on the coding development community.
    Many good things are happening in open source and industry, and we face many of
    the same issues that they do. For example, GitHub has provided enormous value
    to science, both through just filling a need we have and by direct engagement.
    It has gotten almost unbelievably popular in the computational life sciences.
    However, other tools such as continuous integration, for example by [Travis
    to science, both through filling a need and by direct engagement. It has gotten
    almost unbelievably popular in the computational life sciences. However, other
    tools such as continuous integration, for example by [Travis
    CI](http://travis-ci.org/), or containers, for example by
    [Docker](http://docker.com), have gotten less traction despite the
    contributions they could offer to the scientific community.
    @@ -39,7 +39,8 @@ NSF cash resulting in an excellent product. Their Agave API
    ### Parallel architectures require parallel algorithms.
    This is self-explanatory. In my field of Bayesian phylogenetics, for example,
    all algorithms in common use use Markov chain Monte Carlo, which is an
    inherently serial algorithm.
    inherently serial algorithm. If we are to use large scale architecture we are
    going to need corresponding algorithms.

    ### Algorithms can give &gt; 100 fold improvement without additional infrastructure.
    As we have seen recently with the development of *kallisto* by Bray et al
  5. matsen revised this gist Sep 30, 2015. 1 changed file with 0 additions and 3 deletions.
    3 changes: 0 additions & 3 deletions response-to-exascale-hpc-rfi.md
    Original file line number Diff line number Diff line change
    @@ -29,21 +29,18 @@ future.
    expansion of computing can run software containers.* This isn't
    an entirely trivial technical consideration.


    ### iPlant collaborative is already doing a visionary job.
    As you are no doubt aware, the iPlant Collaborative environment
    (http://www.iplantcollaborative.org/) and the TACC under the leadership
    of Dan Stanzione is a remarkable example of smart people given a chunk of
    NSF cash resulting in an excellent product. Their Agave API
    (http://agaveapi.co/) points the way to the future.


    ### Parallel architectures require parallel algorithms.
    This is self-explanatory. In my field of Bayesian phylogenetics, for example,
    all algorithms in common use use Markov chain Monte Carlo, which is an
    inherently serial algorithm.


    ### Algorithms can give &gt; 100 fold improvement without additional infrastructure.
    As we have seen recently with the development of *kallisto* by Bray et al
    (http://arxiv.org/abs/1505.02710) algorithms can change problems from requiring
  6. matsen revised this gist Sep 30, 2015. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions response-to-exascale-hpc-rfi.md
    Original file line number Diff line number Diff line change
    @@ -50,5 +50,5 @@ As we have seen recently with the development of *kallisto* by Bray et al
    a cluster to being quite do-able on a laptop. [Note: I understand that
    *kallisto* isn't the same as doing a full analysis with cufflinks, etc, but for
    many common applications it appears to do a fine job.] Thus, I hope that novel
    algorithm development for core computational problems will be part of any investment in
    computing infrastructure.
    algorithm development for core computational problems will be part of any
    investment in computing infrastructure.
  7. matsen revised this gist Sep 30, 2015. No changes.
  8. matsen revised this gist Sep 30, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion response-to-exascale-hpc-rfi.md
    Original file line number Diff line number Diff line change
    @@ -50,5 +50,5 @@ As we have seen recently with the development of *kallisto* by Bray et al
    a cluster to being quite do-able on a laptop. [Note: I understand that
    *kallisto* isn't the same as doing a full analysis with cufflinks, etc, but for
    many common applications it appears to do a fine job.] Thus, I hope that novel
    algorithm development for core problems will be part of any investment in
    algorithm development for core computational problems will be part of any investment in
    computing infrastructure.
  9. matsen revised this gist Sep 30, 2015. 1 changed file with 7 additions and 6 deletions.
    13 changes: 7 additions & 6 deletions response-to-exascale-hpc-rfi.md
    Original file line number Diff line number Diff line change
    @@ -1,13 +1,16 @@
    ## Response to [*Science Drivers Requiring Capable Exascale High Performance Computing* RFI](http://grants.nih.gov/grants/guide/notice-files/NOT-GM-15-122.html)

    ### We can piggyback on the coding development community.
    Many good things are happening in open source and industry, and we face many of the same issues that they do.
    For example, GitHub has provided enormous value to science, both through just filling a need we have and by direct engagement.
    Many good things are happening in open source and industry, and we face many of
    the same issues that they do. For example, GitHub has provided enormous value
    to science, both through just filling a need we have and by direct engagement.
    It has gotten almost unbelievably popular in the computational life sciences.
    However, other tools such as continuous integration, for example by [Travis CI](http://travis-ci.org/), or containers, for example by [Docker](http://docker.com), have gotten less traction despite the contributions they could offer to the scientific community.
    However, other tools such as continuous integration, for example by [Travis
    CI](http://travis-ci.org/), or containers, for example by
    [Docker](http://docker.com), have gotten less traction despite the
    contributions they could offer to the scientific community.

    ### We need a strategy to fight bit-rot.

    "Bit-rot" refers to software/pipelines that become unusable because the
    underlying dependencies have changed. Sometimes the old versions
    disappear completely, meaning that old pipelines cannot be reconstructed
    @@ -28,7 +31,6 @@ an entirely trivial technical consideration.


    ### iPlant collaborative is already doing a visionary job.

    As you are no doubt aware, the iPlant Collaborative environment
    (http://www.iplantcollaborative.org/) and the TACC under the leadership
    of Dan Stanzione is a remarkable example of smart people given a chunk of
    @@ -43,7 +45,6 @@ inherently serial algorithm.


    ### Algorithms can give &gt; 100 fold improvement without additional infrastructure.

    As we have seen recently with the development of *kallisto* by Bray et al
    (http://arxiv.org/abs/1505.02710) algorithms can change problems from requiring
    a cluster to being quite do-able on a laptop. [Note: I understand that
  10. matsen revised this gist Sep 30, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion response-to-exascale-hpc-rfi.md
    Original file line number Diff line number Diff line change
    @@ -31,7 +31,7 @@ an entirely trivial technical consideration.

    As you are no doubt aware, the iPlant Collaborative environment
    (http://www.iplantcollaborative.org/) and the TACC under the leadership
    of Dan Stanzione is a remarkble example of smart people given a chunk of
    of Dan Stanzione is a remarkable example of smart people given a chunk of
    NSF cash resulting in an excellent product. Their Agave API
    (http://agaveapi.co/) points the way to the future.

  11. matsen revised this gist Sep 30, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion response-to-exascale-hpc-rfi.md
    Original file line number Diff line number Diff line change
    @@ -24,7 +24,7 @@ future.

    *For NIH computational strategies, we need to make sure that any proposed
    expansion of computing can run software containers.* This isn't
    an entirely trivial consideration.
    an entirely trivial technical consideration.


    ### iPlant collaborative is already doing a visionary job.
  12. matsen revised this gist Sep 30, 2015. 1 changed file with 6 additions and 0 deletions.
    6 changes: 6 additions & 0 deletions response-to-exascale-hpc-rfi.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,11 @@
    ## Response to [*Science Drivers Requiring Capable Exascale High Performance Computing* RFI](http://grants.nih.gov/grants/guide/notice-files/NOT-GM-15-122.html)

    ### We can piggyback on the coding development community.
    Many good things are happening in open source and industry, and we face many of the same issues that they do.
    For example, GitHub has provided enormous value to science, both through just filling a need we have and by direct engagement.
    It has gotten almost unbelievably popular in the computational life sciences.
    However, other tools such as continuous integration, for example by [Travis CI](http://travis-ci.org/), or containers, for example by [Docker](http://docker.com), have gotten less traction despite the contributions they could offer to the scientific community.

    ### We need a strategy to fight bit-rot.

    "Bit-rot" refers to software/pipelines that become unusable because the
  13. matsen revised this gist Sep 30, 2015. No changes.
  14. matsen renamed this gist Sep 30, 2015. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  15. matsen revised this gist Sep 30, 2015. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions test.md
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,5 @@
    ## Response to [*Science Drivers Requiring Capable Exascale High Performance Computing* RFI](http://grants.nih.gov/grants/guide/notice-files/NOT-GM-15-122.html)

    ### We need a strategy to fight bit-rot.

    "Bit-rot" refers to software/pipelines that become unusable because the
  16. matsen created this gist Sep 30, 2015.
    45 changes: 45 additions & 0 deletions test.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,45 @@
    ### We need a strategy to fight bit-rot.

    "Bit-rot" refers to software/pipelines that become unusable because the
    underlying dependencies have changed. Sometimes the old versions
    disappear completely, meaning that old pipelines cannot be reconstructed
    without digging through the internet archive. Reproducibility is
    fundamental to science, and thus this problem is acute.

    There is a clear antidote to this problem, which are software
    containers. These are lightweight virtual machines which can run on a
    variety of platforms. Docker is the most well known, and the community
    (including Docker) is coalescing under a standard. See
    https://www.opencontainers.org/ for details. Using containers, all
    dependencies are cached and the pipeline can be run reliably into the
    future.

    *For NIH computational strategies, we need to make sure that any proposed
    expansion of computing can run software containers.* This isn't
    an entirely trivial consideration.


    ### iPlant collaborative is already doing a visionary job.

    As you are no doubt aware, the iPlant Collaborative environment
    (http://www.iplantcollaborative.org/) and the TACC under the leadership
    of Dan Stanzione is a remarkble example of smart people given a chunk of
    NSF cash resulting in an excellent product. Their Agave API
    (http://agaveapi.co/) points the way to the future.


    ### Parallel architectures require parallel algorithms.
    This is self-explanatory. In my field of Bayesian phylogenetics, for example,
    all algorithms in common use use Markov chain Monte Carlo, which is an
    inherently serial algorithm.


    ### Algorithms can give &gt; 100 fold improvement without additional infrastructure.

    As we have seen recently with the development of *kallisto* by Bray et al
    (http://arxiv.org/abs/1505.02710) algorithms can change problems from requiring
    a cluster to being quite do-able on a laptop. [Note: I understand that
    *kallisto* isn't the same as doing a full analysis with cufflinks, etc, but for
    many common applications it appears to do a fine job.] Thus, I hope that novel
    algorithm development for core problems will be part of any investment in
    computing infrastructure.