Last active
October 1, 2015 12:19
-
-
Save matsen/1b6768e02a6e3f9d9f2c to your computer and use it in GitHub Desktop.
Revisions
-
matsen revised this gist
Oct 1, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -38,7 +38,7 @@ NSF cash resulting in an excellent product. Their Agave API ### Parallel architectures require parallel algorithms. This is self-explanatory. In my field of Bayesian phylogenetics, for example, all algorithms in common use utilize Markov chain Monte Carlo, which is an inherently serial algorithm. If we are to use large scale architecture we are going to need algorithms appropriate for that architecture. -
matsen revised this gist
Sep 30, 2015 . 1 changed file with 10 additions and 10 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -21,7 +21,7 @@ There is a clear antidote to this problem, which are software containers. These are lightweight virtual machines which can run on a variety of platforms. Docker is the most well known, and the community (including Docker) is coalescing under a standard. See <https://www.opencontainers.org/> for details. Using containers, all dependencies are cached and the pipeline can be run reliably into the future. @@ -31,10 +31,10 @@ an entirely trivial technical consideration. ### iPlant collaborative is already doing a visionary job. As you are no doubt aware, the iPlant Collaborative environment (<http://www.iplantcollaborative.org/>) and the TACC under the leadership of Dan Stanzione is a remarkable example of smart people given a chunk of NSF cash resulting in an excellent product. Their Agave API (<http://agaveapi.co/>) points the way to the future. ### Parallel architectures require parallel algorithms. This is self-explanatory. In my field of Bayesian phylogenetics, for example, @@ -43,10 +43,10 @@ inherently serial algorithm. If we are to use large scale architecture we are going to need algorithms appropriate for that architecture. ### Algorithms can give > 100 fold improvement without additional infrastructure. As we have seen recently with the development of [*kallisto* by Bray et al](http://arxiv.org/abs/1505.02710) algorithms can change problems from requiring a cluster to being quite do-able on a laptop. [Note: I understand that running *kallisto* isn't the same as doing a full analysis with cufflinks, etc, but for many common applications it appears to do a fine job.] Thus, I hope that novel algorithm development for core computational problems will be part of any investment in computing infrastructure. -
matsen revised this gist
Sep 30, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -40,7 +40,7 @@ NSF cash resulting in an excellent product. Their Agave API This is self-explanatory. In my field of Bayesian phylogenetics, for example, all algorithms in common use use Markov chain Monte Carlo, which is an inherently serial algorithm. If we are to use large scale architecture we are going to need algorithms appropriate for that architecture. ### Algorithms can give > 100 fold improvement without additional infrastructure. As we have seen recently with the development of *kallisto* by Bray et al -
matsen revised this gist
Sep 30, 2015 . 1 changed file with 5 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -3,9 +3,9 @@ ### We can piggyback on the coding development community. Many good things are happening in open source and industry, and we face many of the same issues that they do. For example, GitHub has provided enormous value to science, both through filling a need and by direct engagement. It has gotten almost unbelievably popular in the computational life sciences. However, other tools such as continuous integration, for example by [Travis CI](http://travis-ci.org/), or containers, for example by [Docker](http://docker.com), have gotten less traction despite the contributions they could offer to the scientific community. @@ -39,7 +39,8 @@ NSF cash resulting in an excellent product. Their Agave API ### Parallel architectures require parallel algorithms. This is self-explanatory. In my field of Bayesian phylogenetics, for example, all algorithms in common use use Markov chain Monte Carlo, which is an inherently serial algorithm. If we are to use large scale architecture we are going to need corresponding algorithms. ### Algorithms can give > 100 fold improvement without additional infrastructure. As we have seen recently with the development of *kallisto* by Bray et al -
matsen revised this gist
Sep 30, 2015 . 1 changed file with 0 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -29,21 +29,18 @@ future. expansion of computing can run software containers.* This isn't an entirely trivial technical consideration. ### iPlant collaborative is already doing a visionary job. As you are no doubt aware, the iPlant Collaborative environment (http://www.iplantcollaborative.org/) and the TACC under the leadership of Dan Stanzione is a remarkable example of smart people given a chunk of NSF cash resulting in an excellent product. Their Agave API (http://agaveapi.co/) points the way to the future. ### Parallel architectures require parallel algorithms. This is self-explanatory. In my field of Bayesian phylogenetics, for example, all algorithms in common use use Markov chain Monte Carlo, which is an inherently serial algorithm. ### Algorithms can give > 100 fold improvement without additional infrastructure. As we have seen recently with the development of *kallisto* by Bray et al (http://arxiv.org/abs/1505.02710) algorithms can change problems from requiring -
matsen revised this gist
Sep 30, 2015 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -50,5 +50,5 @@ As we have seen recently with the development of *kallisto* by Bray et al a cluster to being quite do-able on a laptop. [Note: I understand that *kallisto* isn't the same as doing a full analysis with cufflinks, etc, but for many common applications it appears to do a fine job.] Thus, I hope that novel algorithm development for core computational problems will be part of any investment in computing infrastructure. -
matsen revised this gist
Sep 30, 2015 . No changes.There are no files selected for viewing
-
matsen revised this gist
Sep 30, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -50,5 +50,5 @@ As we have seen recently with the development of *kallisto* by Bray et al a cluster to being quite do-able on a laptop. [Note: I understand that *kallisto* isn't the same as doing a full analysis with cufflinks, etc, but for many common applications it appears to do a fine job.] Thus, I hope that novel algorithm development for core computational problems will be part of any investment in computing infrastructure. -
matsen revised this gist
Sep 30, 2015 . 1 changed file with 7 additions and 6 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,13 +1,16 @@ ## Response to [*Science Drivers Requiring Capable Exascale High Performance Computing* RFI](http://grants.nih.gov/grants/guide/notice-files/NOT-GM-15-122.html) ### We can piggyback on the coding development community. Many good things are happening in open source and industry, and we face many of the same issues that they do. For example, GitHub has provided enormous value to science, both through just filling a need we have and by direct engagement. It has gotten almost unbelievably popular in the computational life sciences. However, other tools such as continuous integration, for example by [Travis CI](http://travis-ci.org/), or containers, for example by [Docker](http://docker.com), have gotten less traction despite the contributions they could offer to the scientific community. ### We need a strategy to fight bit-rot. "Bit-rot" refers to software/pipelines that become unusable because the underlying dependencies have changed. Sometimes the old versions disappear completely, meaning that old pipelines cannot be reconstructed @@ -28,7 +31,6 @@ an entirely trivial technical consideration. ### iPlant collaborative is already doing a visionary job. As you are no doubt aware, the iPlant Collaborative environment (http://www.iplantcollaborative.org/) and the TACC under the leadership of Dan Stanzione is a remarkable example of smart people given a chunk of @@ -43,7 +45,6 @@ inherently serial algorithm. ### Algorithms can give > 100 fold improvement without additional infrastructure. As we have seen recently with the development of *kallisto* by Bray et al (http://arxiv.org/abs/1505.02710) algorithms can change problems from requiring a cluster to being quite do-able on a laptop. [Note: I understand that -
matsen revised this gist
Sep 30, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -31,7 +31,7 @@ an entirely trivial technical consideration. As you are no doubt aware, the iPlant Collaborative environment (http://www.iplantcollaborative.org/) and the TACC under the leadership of Dan Stanzione is a remarkable example of smart people given a chunk of NSF cash resulting in an excellent product. Their Agave API (http://agaveapi.co/) points the way to the future. -
matsen revised this gist
Sep 30, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -24,7 +24,7 @@ future. *For NIH computational strategies, we need to make sure that any proposed expansion of computing can run software containers.* This isn't an entirely trivial technical consideration. ### iPlant collaborative is already doing a visionary job. -
matsen revised this gist
Sep 30, 2015 . 1 changed file with 6 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,11 @@ ## Response to [*Science Drivers Requiring Capable Exascale High Performance Computing* RFI](http://grants.nih.gov/grants/guide/notice-files/NOT-GM-15-122.html) ### We can piggyback on the coding development community. Many good things are happening in open source and industry, and we face many of the same issues that they do. For example, GitHub has provided enormous value to science, both through just filling a need we have and by direct engagement. It has gotten almost unbelievably popular in the computational life sciences. However, other tools such as continuous integration, for example by [Travis CI](http://travis-ci.org/), or containers, for example by [Docker](http://docker.com), have gotten less traction despite the contributions they could offer to the scientific community. ### We need a strategy to fight bit-rot. "Bit-rot" refers to software/pipelines that become unusable because the -
matsen revised this gist
Sep 30, 2015 . No changes.There are no files selected for viewing
-
matsen renamed this gist
Sep 30, 2015 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
matsen revised this gist
Sep 30, 2015 . 1 changed file with 2 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,3 +1,5 @@ ## Response to [*Science Drivers Requiring Capable Exascale High Performance Computing* RFI](http://grants.nih.gov/grants/guide/notice-files/NOT-GM-15-122.html) ### We need a strategy to fight bit-rot. "Bit-rot" refers to software/pipelines that become unusable because the -
matsen created this gist
Sep 30, 2015 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,45 @@ ### We need a strategy to fight bit-rot. "Bit-rot" refers to software/pipelines that become unusable because the underlying dependencies have changed. Sometimes the old versions disappear completely, meaning that old pipelines cannot be reconstructed without digging through the internet archive. Reproducibility is fundamental to science, and thus this problem is acute. There is a clear antidote to this problem, which are software containers. These are lightweight virtual machines which can run on a variety of platforms. Docker is the most well known, and the community (including Docker) is coalescing under a standard. See https://www.opencontainers.org/ for details. Using containers, all dependencies are cached and the pipeline can be run reliably into the future. *For NIH computational strategies, we need to make sure that any proposed expansion of computing can run software containers.* This isn't an entirely trivial consideration. ### iPlant collaborative is already doing a visionary job. As you are no doubt aware, the iPlant Collaborative environment (http://www.iplantcollaborative.org/) and the TACC under the leadership of Dan Stanzione is a remarkble example of smart people given a chunk of NSF cash resulting in an excellent product. Their Agave API (http://agaveapi.co/) points the way to the future. ### Parallel architectures require parallel algorithms. This is self-explanatory. In my field of Bayesian phylogenetics, for example, all algorithms in common use use Markov chain Monte Carlo, which is an inherently serial algorithm. ### Algorithms can give > 100 fold improvement without additional infrastructure. As we have seen recently with the development of *kallisto* by Bray et al (http://arxiv.org/abs/1505.02710) algorithms can change problems from requiring a cluster to being quite do-able on a laptop. [Note: I understand that *kallisto* isn't the same as doing a full analysis with cufflinks, etc, but for many common applications it appears to do a fine job.] Thus, I hope that novel algorithm development for core problems will be part of any investment in computing infrastructure.