Getting a list of directories/files from a remote webserver...

MIDIman

Diamond Member
Jan 14, 2000
3,594
0
0
Hi all - I've googled everywhere for this, and can't find an answer.

I'm developing an app in Coldfusion - however I'm not tied to it. There are a lot of ways to connect CF with other languages. We actually have cfdirectory in CF, but due to the installation I'm using, its limited to the local server only.

What I need to do is get a recursive list of directories and files of a giveen remote server path (i.e. http://someserver.com/list/of/folders/).

Requirements:
- Language used is not very important (Java, CF, PHP...whatever - CF can practically talk to anything)
- I don't care how the list comes in - comma delimited, object, array, structure, txt file, xml...CF can deal with just about anything.
- The local server where the app is located can not have anything new installed (i.e. custom extensions). It is Windows, and has most of your various web languages available.
- The remote server could be anything (Windows, Unix, IIS, Apache), and may have nothing more than a standard webserver (i.e. no languages available on the remote side).
- Folder permissions on the remote server are controllable. I could give it chmod 777 temporarily.
- Has to be an automated script of some kind, and the script has to exist on the local server...doing it manually would defeat the purpose here!


So I guess the question is - is there a language out there that can get a remote list of directories as a standard method of that language?

...I could've sworn javascript could do this...
 

Thyme

Platinum Member
Nov 30, 2000
2,330
0
0
I've seen things that bruteforce lists of subdirectories on a remote server, but I don't think there is any way to deterministically get a list.
 

KB

Diamond Member
Nov 8, 1999
5,406
389
126
The HTTP protocol doesn't support this (FTP does). Most web servers hide this information for security reasons.

You can get those HTML generated lists from the webserver if the webserver allows directory browsing on the requested directory. You could then use REGEXPs to find the filenames. But if the default document is configured for the directory then you won't even get a list.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
KB's got it right. There is normally no way to do this over http (and the idea doesn't really make sense) so you have to have a program on the remote server that will get the data and send it back. As he mentioned, webservers frequently do this but it would only work if every single directory had no default pages (very unlikely) and you'd have to scrape the info from a human-oriented html page (difficult to do and easy to break).

Alternatively you could write your own script/program on the server side to do return the data in a reliable fashion (probably as an xml document). The problem with that is that the concept of what exists on a webserver is somewhat virtual. For instance, an asp.net web app has a bunch of content just sitting there that will never, ever be accessible via http, and how are you going to figure out what's what without a huge amount of work? What if the webserver has a virtual directory such that the content resides on some other part of the filesystem? Are you going to examine the webserver configuration (given that the remote server could be anything) and track down all the virtual content? What if the remote webserver is a j2ee server? That's a whole different ball game, kinda like asp.net.

I'd say, given your requirements, this is a very bad idea if you need it to be reliable. The only thing that I can think of is a sort of search engine-like crawler that starts somewhere and just parses html, looking for linked content on the same server. That's feasible (and there's already lots of packages out there that can help you with it) but it would only give you stuff that is actually linked to. You'd also have to run it in the background and cache the results for when you need them. If you need to find content that is not linked to, you can always resort to a dictionary attack :p
 

MIDIman

Diamond Member
Jan 14, 2000
3,594
0
0
Originally posted by: kamper
webservers frequently do this but it would only work if every single directory had no default pages (very unlikely) and you'd have to scrape the info from a human-oriented html page (difficult to do and easy to break).

Assuming that there are no default pages in the said folders, and all of them are set with the appropriate permissions to view the contents of a folder, do you have an example of this? I think that might do the trick.

Thyme - in similarity to everything else I've been running into, I don't think this script will get a list of folders on a remote webserver - only the server where the script is located.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Originally posted by: MIDIman
Originally posted by: kamper
webservers frequently do this but it would only work if every single directory had no default pages (very unlikely) and you'd have to scrape the info from a human-oriented html page (difficult to do and easy to break).

Assuming that there are no default pages in the said folders, and all of them are set with the appropriate permissions to view the contents of a folder, do you have an example of this? I think that might do the trick.
http://www.uoguelph.ca/~jhuiskam/empty/

Notice how, if you go up to the parent directory, you don't get the listing, because I have an index.html in there to keep people from snooping. This is also the most plain directory listing you're likely to see. I think it's some form of apache running on solaris. Every web server will have a different listing format for you to scrape, although they probably all put links to the content so you can just find all the links in those pages and use them.
 

Thyme

Platinum Member
Nov 30, 2000
2,330
0
0
Originally posted by: MIDIman
Thyme - in similarity to everything else I've been running into, I don't think this script will get a list of folders on a remote webserver - only the server where the script is located.

If you are on the server and you have access to the file system, you can use many methods to get what you want. Scripts I've seen for file systems you don't have access to will work on any server, but it's a terrible method and I wouldn't suggest doing it.
 

MIDIman

Diamond Member
Jan 14, 2000
3,594
0
0
Thanks everyone - I ended up using cfhttp, and then I search through the Filecontent key of the returned structure. Its very clanky, but it works!

The only requirement as far as I can tell is that the folder(s) in question: 1) have filenames of a specified type (in my case .jpg), 2) do not have default files (i.e. index.html), and 3) all of the folders have public execute permission.

I know that's quite a bit, but for now, it'll do.

If anyone's interested in the CF code, just pm me. Thanks again.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
So I guess the question is - is there a language out there that can get a remote list of directories as a standard method of that language?

Sure, if the remote server is extremely misconfigured.