Skip to content

Instantly share code, notes, and snippets.

@konrad
Created January 7, 2016 20:34
Show Gist options
  • Select an option

  • Save konrad/a00b96b1d84c2f9b5e97 to your computer and use it in GitHub Desktop.

Select an option

Save konrad/a00b96b1d84c2f9b5e97 to your computer and use it in GitHub Desktop.

Revisions

  1. konrad created this gist Jan 7, 2016.
    9 changes: 9 additions & 0 deletions get_SRA_file_URL_for_lib_GEO_accession.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,9 @@
    # Problem: You have a NCBI GEO accession and would like to get the URL of the SRA file that contains the sequencing data.
    # The sed command that removes the last characer of the string is essential as there is a invisible character that messes up the
    # downstream steps otherwise.

    GEO_ACCESSION="GSM1655353" # set you GEO accession here
    SRA_FTP_URL=$(curl "http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=${GEO_ACCESSION}&targ=self&form=text&view=brief" 2>/dev/null | grep ftp-trace.ncbi.nlm.nih.gov | cut -c 32-| sed 's/.$//')
    FTP_SUB_FOLDER=$(ncftpls ${SRA_FTP_URL}/)
    SRA_FILE=$(ncftpls ${SRA_FTP_URL}/${FTP_SUB_FOLDER}/)
    echo $GEO_ACCESSION ${SRA_FTP_URL}/${FTP_SUB_FOLDER}/${SRA_FILE}