Skip to content

Instantly share code, notes, and snippets.

@keepanote
Forked from yantze/webdl.sh
Created October 1, 2019 06:05
Show Gist options
  • Select an option

  • Save keepanote/bbad64aadf9ed7525bdfbafe56bd1dc4 to your computer and use it in GitHub Desktop.

Select an option

Save keepanote/bbad64aadf9ed7525bdfbafe56bd1dc4 to your computer and use it in GitHub Desktop.

Revisions

  1. @yantze yantze revised this gist May 3, 2017. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion webdl.sh
    Original file line number Diff line number Diff line change
    @@ -12,10 +12,11 @@
    # --tries=3 // 尝试最大次数. 默认 20 次
    # --user-agent=agent // user agent string, eg. 'mozilla'
    # --random-wait // This option causes the time between requests to vary between 0.5 and 1.5 * wait seconds
    # -e robots=off // disable website robots.txt
    #
    # --restrict-file-names=windows // 限制文件名中的字符为指定的 OS (操作系统) 所允许的字符。
    # --domains wordpress.org // 被接受域的列表. 也就是跳出此列表的域名就不follow
    # codex.wordpress.org // 要下载的网站域名下的某个目录


    wget --recursive --page-requisites --html-extension --convert-links --no-parent --level=0 --user-agent='mozilla' --random-wait --adjust-extension --tries=3 --restrict-file-names=windows --domains wordpress.org codex.wordpress.org
    wget --recursive --page-requisites --html-extension --convert-links --no-parent --level=0 -e robots=off --user-agent='mozilla' --random-wait --adjust-extension --tries=3 --restrict-file-names=windows --domains wordpress.org codex.wordpress.org
  2. @yantze yantze revised this gist May 3, 2017. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion webdl.sh
    Original file line number Diff line number Diff line change
    @@ -10,10 +10,12 @@
    # --level=0 // Specify recursion maximum depth level depth.
    # --adjust-extension // A URL like http://site.com/article.cgi?25 will be saved as article.cgi?25.html.
    # --tries=3 // 尝试最大次数. 默认 20 次
    # --user-agent=agent // user agent string, eg. 'mozilla'
    # --random-wait // This option causes the time between requests to vary between 0.5 and 1.5 * wait seconds
    #
    # --restrict-file-names=windows // 限制文件名中的字符为指定的 OS (操作系统) 所允许的字符。
    # --domains wordpress.org // 被接受域的列表. 也就是跳出此列表的域名就不follow
    # codex.wordpress.org // 要下载的网站域名下的某个目录


    wget --recursive --page-requisites --html-extension --convert-links --no-parent --level=0 --adjust-extension --tries=3 --restrict-file-names=windows --domains wordpress.org codex.wordpress.org
    wget --recursive --page-requisites --html-extension --convert-links --no-parent --level=0 --user-agent='mozilla' --random-wait --adjust-extension --tries=3 --restrict-file-names=windows --domains wordpress.org codex.wordpress.org
  3. @yantze yantze revised this gist Apr 19, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion webdl.sh
    Original file line number Diff line number Diff line change
    @@ -16,4 +16,4 @@
    # codex.wordpress.org // 要下载的网站域名下的某个目录


    wget --recursive --no-clobber --page-requisites --html-extension --convert-links --no-parent --level=0 --adjust-extension --tries=3 --restrict-file-names=windows --domains wordpress.org codex.wordpress.org
    wget --recursive --page-requisites --html-extension --convert-links --no-parent --level=0 --adjust-extension --tries=3 --restrict-file-names=windows --domains wordpress.org codex.wordpress.org
  4. @yantze yantze created this gist Apr 18, 2017.
    19 changes: 19 additions & 0 deletions webdl.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,19 @@
    # 使用 wget 下载整个网站解释
    # link: https://www.douban.com/note/536265958
    # wget
    # --recursive //回归递推也就是包括所有子目录子文件
    # --no-clobber //不更改已经存在的文件,也不使用在文件名后添加 .#(# 为数字)的方法写入新的文件
    # --page-requisites //下载所有显示完整网页所需的文件,例如图像。
    # --html-extension //将所有text/html文档以.html扩展名保存
    # --convert-links //转换非相对链接为相对链接
    # --no-parent //不要追溯到父目录
    # --level=0 // Specify recursion maximum depth level depth.
    # --adjust-extension // A URL like http://site.com/article.cgi?25 will be saved as article.cgi?25.html.
    # --tries=3 // 尝试最大次数. 默认 20 次
    #
    # --restrict-file-names=windows // 限制文件名中的字符为指定的 OS (操作系统) 所允许的字符。
    # --domains wordpress.org // 被接受域的列表. 也就是跳出此列表的域名就不follow
    # codex.wordpress.org // 要下载的网站域名下的某个目录


    wget --recursive --no-clobber --page-requisites --html-extension --convert-links --no-parent --level=0 --adjust-extension --tries=3 --restrict-file-names=windows --domains wordpress.org codex.wordpress.org