Offline copy of Wikipedia with images?

BigToque · May 20, 2013

Where I go to school our school cuts off internet access during school hours, but we're supposed to be able to access Wikipedia. The IT here seem to be unable to figure out why Wikipedia loads, but no images load, even though they've supposedly allowed wikipedia.org and wikimedia.org to pass through whatever filter they have set up.

I've wasted enough time waiting for these people to fix this issue and would like to just have a local copy of Wikipedia. I know I can download a copy of the database, and I found the torrent that is linked to on the Wikipedia website, but it only had the text (at least that's the impression I got from one of the comments left about the torrent file).

How do I get a copy of Wikipedia that includes all the images?

lxskllr · May 20, 2013

wget?

robots.txt said:
# Sorry, wget in its recursive mode is a frequent problem.
# Please read the man page and use it properly; there is a
# --wait option you can use to set the delay between hits,
# for instance.
#
User-agent: wget
Disallow: /

jagec · May 20, 2013

HTTrack?

lxskllr · May 20, 2013

jagec said:
HTTrack?

robots.txt said:
#
# robots.txt for http://www.wikipedia.org/ and friends
#
# Please note: There are a lot of pages on this site, and there are
# some misbehaved spiders out there that go _way_ too fast. If you're
# irresponsible, your access to the site may be blocked.
#

# advertising-related bots:
User-agent: Mediapartners-Google*
Disallow: /

# Wikipedia work bots:
User-agent: IsraBot
Disallow:

User-agent: Orthogaffe
Disallow:

# Crawlers that are kind enough to obey, but which we'd rather not have
# unless they're feeding search engines.
User-agent: UbiCrawler
Disallow: /

User-agent: DOC
Disallow: /

User-agent: Zao
Disallow: /

# Some bots are known to be trouble, particularly those designed to copy
# entire sites. Please obey robots.txt.
User-agent: sitecheck.internetseer.com
Disallow: /

User-agent: Zealbot
Disallow: /

User-agent: MSIECrawler
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: Fetch
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: WebZIP
Disallow: /

User-agent: linko
Disallow: /

User-agent: HTTrack
Disallow: /

User-agent: Microsoft.URL.Control
Disallow: /

User-agent: Xenu
Disallow: /

User-agent: larbin
Disallow: /

User-agent: libwww
Disallow: /

User-agent: ZyBORG
Disallow: /

User-agent: Download Ninja
Disallow: /

#
# Sorry, wget in its recursive mode is a frequent problem.
# Please read the man page and use it properly; there is a
# --wait option you can use to set the delay between hits,
# for instance.
#
User-agent: wget
Disallow: /

#
# The 'grub' distributed client has been *very* poorly behaved.
#
User-agent: grub-client
Disallow: /

#
# Doesn't follow robots.txt anyway, but...
#
User-agent: k2spider
Disallow: /

#
# Hits many times per second, not acceptable
# http://www.nameprotect.com/botinfo.html
User-agent: NPBot
Disallow: /

# A capture bot, downloads gazillions of pages with no public benefit
# http://www.webreaper.net/
User-agent: WebReaper
Disallow: /

# Don't allow the wayback-maschine to index user-pages
#User-agent: ia_archiver
#Disallow: /wiki/User
#Disallow: /wiki/Benutzer

#
# Friendly, low-speed bots are welcome viewing article pages, but not
# dynamically-generated pages please.
#
# Inktomi's "Slurp" can read a minimum delay between hits; if your
# bot supports such a thing using the 'Crawl-delay' or another
# instruction, please let us know.
#
User-agent: *
Disallow: /w/
Disallow: /trap/
Disallow: /wiki/Especial:Search
Disallow: /wiki/Especial%3ASearch
Disallow: /wiki/Special:Collection
Disallow: /wiki/Spezial:Sammlung
Disallow: /wiki/Special:Random
Disallow: /wiki/Special%3ARandom
Disallow: /wiki/Special:Search
Disallow: /wiki/Special%3ASearch
Disallow: /wiki/Spesial:Search
Disallow: /wiki/Spesial%3ASearch
Disallow: /wiki/Spezial:Search
Disallow: /wiki/Spezial%3ASearch
Disallow: /wiki/Specjalna:Search
Disallow: /wiki/Specjalna%3ASearch
Disallow: /wiki/Speciaal:Search
Disallow: /wiki/Speciaal%3ASearch
Disallow: /wiki/Speciaal:Random
Disallow: /wiki/Speciaal%3ARandom
Disallow: /wiki/Speciel:Search
Disallow: /wiki/Speciel%3ASearch
Disallow: /wiki/Speciale:Search
Disallow: /wiki/Speciale%3ASearch
Disallow: /wiki/Istimewa:Search
Disallow: /wiki/Istimewa%3ASearch
Disallow: /wiki/Toiminnot:Search
Disallow: /wiki/Toiminnot%3ASearch
#
...

...

techs · May 20, 2013

I'm trying to download an offline copy of the internet.

jagec · May 21, 2013

lxskllr said:
...

So, use k2spider, because it doesn't follow robots.txt anyway.

This is one of the downsides of leaving good comments, coders.

Or:
http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia

lxskllr · May 21, 2013

Check this out

http://kiwix.org/wiki/Main_Page

Jeff7 · May 21, 2013

techs said:
I'm trying to download an offline copy of the internet.

To a floppy disk?

alangrift · May 21, 2013

lxskllr said:
Check this out

http://kiwix.org/wiki/Main_Page

I second that. Looks definitely like what you need.

If you had a tablet however, there are Android/iOS apps that do something similar.

rivan · May 21, 2013

Just buy a copy of the whole internet from the Elders.

lxskllr · May 21, 2013

Jeff7 said:
To a floppy disk?

Has there ever been a good question/answer ever posted to Yahoo? It's good for some occasional lulz, but I'd have more confidence in the answers given by my cat than the yahoos at Yahoo.

alangrift · May 21, 2013

rivan said:
Just buy a copy of the whole internet from the Elders.

Where can I local the elder's? Google doesn't know the answer.

AstroManLuca · May 21, 2013

Wow, when I was in school they would tell us to not use Wikipedia as a primary source, just as a jumping-off point. Now they're blocking everything EXCEPT Wikipedia so it's impossible to verify any of the information stored there?

Whoever is making the decision there is brain dead.

alangrift · May 21, 2013

AstroManLuca said:
Wow, when I was in school they would tell us to not use Wikipedia as a primary source, just as a jumping-off point. Now they're blocking everything EXCEPT Wikipedia so it's impossible to verify any of the information stored there?

Whoever is making the decision there is brain dead.

NEVER use Wikipedia or Blogs as primary sources (unless they did their own research).

I think people just copy the citations off wikipedia.

KidNiki1 · May 21, 2013

rivan said:
Just buy a copy of the whole internet from the Elders.

haha! :thumbsup::thumbsup:

sciencewhiz · May 21, 2013

http://en.wikipedia.org/wiki/Wikipedia:Database_download#Where_are_images_and_uploaded_files

GlacierFreeze · May 21, 2013

BigToque said:
Where I go to school our school cuts off internet access during school hours, but we're supposed to be able to access Wikipedia. The IT here seem to be unable to figure out why Wikipedia loads, but no images load, even though they've supposedly allowed wikipedia.org and wikimedia.org to pass through whatever filter they have set up.

I've wasted enough time waiting for these people to fix this issue and would like to just have a local copy of Wikipedia. I know I can download a copy of the database, and I found the torrent that is linked to on the Wikipedia website, but it only had the text (at least that's the impression I got from one of the comments left about the torrent file).

How do I get a copy of Wikipedia that includes all the images?

Where do you go to school?

Sounds like a dumb school. Decided it was a good idea to block Internet access? Then allowed only Wiki? IT staff doesn't know why Wiki images won't load? wtf

Search

Offline copy of Wikipedia with images?

BigToque

Lifer

lxskllr

No Lifer

jagec

Lifer

lxskllr

No Lifer

techs

Lifer

jagec

Lifer

lxskllr

No Lifer

Jeff7

Lifer

alangrift

Senior member

rivan

Diamond Member

lxskllr

No Lifer

alangrift

Senior member

AstroManLuca

Lifer

alangrift

Senior member

KidNiki1

Platinum Member

sciencewhiz

Diamond Member

GlacierFreeze

Golden Member

TRENDING THREADS