Download an entire website for offline use with wget. Internal inks will be corrected so that the entire downloaded site will work as it did online.

#The best way to download a website for offline use, using wget

There are two ways - the first way is just one command run plainly in front of you; the second one runs in the background and in a different "shell" so you can get out of your ssh session and it will continue.

First make a folder to download the websites to and begin your downloading: (note if downloading www.steviehoward.com, you will get a folder like this: /websitedl/www.steviehoward.com/ )

##(STEP1)

mkdir /websitedl/
cd /websitedl/

##(STEP2)

###1st way:

wget --limit-rate=200k --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla http://www.kossboss.com

###2nd way:

#####IN THE BACKGROUND DO WITH NOHUP IN FRONT AND & IN BACK

nohup wget --limit-rate=200k --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla http://www.kossboss.com &

#####THEN TO VIEW OUTPUT ( it will put a nohup.out file where you ran the command):

tail -f nohup.out

####WHAT DO ALL THE SWITCHES MEAN:

bash--limit-rate=200k Limit download to 200 Kb /sec

bash--no-clobber don't overwrite any existing files (used in case the download is interrupted and resumed).

bash--convert-links convert links so that they work locally, off-line, instead of pointing to a website online

bash--random-wait random waits between download - websites dont like their websites downloaded

bash-r recursive - downloads full website

bash-p downloads everything even pictures (same as --page-requsites, downloads the images, css stuff and so on)

bash-E gets the right extension of the file, without most html and other files have no extension

bash-e robots=off act like we are not a robot - not like a crawler - websites dont like robots/crawlers unless they are google/or other famous search engine

bash-U mozilla pretends to be just like a browser Mozilla is looking at a page instead of a crawler like wget

####PURPOSELY DIDN'T INCLUDE THE FOLLOWING: bash-o=/websitedl/wget1.txt log everything to wget_log.txt - didnt do this because it gave me no output on the screen and I dont like that id rather use nohup and & and tail -f the output from nohup.out

bash-b because it runs it in background and cant see progress I like "nohup &" better

bash--domain=steviehoward.com didnt include because this is hosted by google so it might need to step into googles domains

bash--restrict-file-names=windows modify filenames so that they will work in Windows as well. Seems to work okay without it

credit

noscripter/website-dl.md

Select an option

No results found

Select an option

No results found