18
Posted November 13, 2009 by Spyros in Linux Tips
 
 

How to Download a Whole Website Using Wget

Earth-Download-256x256
Earth-Download-256x256

I remember that back in the days that i started to learn more about the different linux/unix tools, i was fascinated by the great wget http downloader. Believe me, you will end up using this one quite a lot, even just for means of running a cron to update a website or for some more interesting stuff, like downloading a whole website.

You will have certainly been in the position where you just wanted to download a website because of its great information wealth. In this short post, i will tell you how to do that easily without having to use tools like Teleport Pro in Windows, but just using the awesome wget. Notice that while you can download a whole website that has static pages, downloading websites made in dynamic languages like php is not feasible. This happens because php is by nature a language that depends on user interaction. Therefore, wget cannot just go about getting php pages, it can be done but not this way.

However, most websites that people want to download are those that have lots of static content like videos, images and stuff. These websites can be downloaded by wget. The command to download a whole website is pretty easy:


wget -mk www.website.com

This creates a mirror(-m) for the website that we specify and convert its links to local links(-k) to make our offline browsing easy. Sometimes, this implementation may have a problem because some websites check the user agent field of our connection to specify whether we use wget to download the whole website, thus not allowing us to do so. The -m switch usese “wget” as a user agent specification. Therefore, if we want to specify another user agent, like Mozilla, we should use wget to download the whole website as follows:


wget -r -k -U Mozilla www.website.com

We specify -r here to download a website recursively (that is what -m actually does also). Moreover, using the -U switch, we specify the user agent as being “Mozilla” instead of “wget”.

Bonus Tip – Continue Download In Case of Interruption

For some reason, wget might be interrupted while you are downloading a website. This can be very frustrating, especially if you are downloading big chunks of data like videos. The good thing is that you won’t have to redownload the same files again using wget. Just by using the switch -c, you can just as simply continue the download right at the same spot where it stopped and not lose any progress. Doing that is easy, if you sometime get an interruption on your downloading, just use the same command you used at the first place only including the -c switch after wget, like:


wget -c -r -k -U Mozilla www.website.com


Spyros