|
|
 |
12-05-2012, 01:00 AM
|
#1
|
|
Diamond Member
Join Date: Jun 2000
Posts: 6,997
|
How to list directory contents of a web folder via linux command?
This seems like it would be simple, but google is failing me. How can I retrieve a listing of the files contained within a directory on a website? I know about wget and curl, but those only seem to work when you know the exact filename. Is there a way to simply retrive a listing of the files in a particular directory?
Last edited by Special K; 12-05-2012 at 09:15 AM.
|
|
|
12-05-2012, 01:45 AM
|
#2
|
|
Super Moderator Elite Member
Join Date: Oct 1999
Posts: 26,653
|
If we're talking about over HTTP, you can't. Unless it's setup to show the directory listing, then that information is intentionally hidden and cannot be made to show itself.
Now if you have access to the box, then a simple ls in the directory you want more information on will suffice.
__________________
ViRGE
Team Anandtech: Assimilating a computer near you!
GameStop - An upscale specialized pawnshop that happens to sell new games on the side
Todd the Wraith: On Fruit Bowls - I hope they prove [to be] as delicious as the farmers who grew them
|
|
|
12-05-2012, 01:52 AM
|
#3
|
|
Diamond Member
Join Date: Jun 2000
Posts: 6,997
|
Yes, this is over HTTP. I found a site with a daily podcast I want to check out, and I want to setup a script to automatically download the latest one each night. The podcasts have a regular format, except for an underscore at the end followed by what appears to be random characters, or at least characters without any discernable pattern. The filenames do have the current date in them.
UPDATE: I have discovered that the "random" characters are actually a 4-digit code that specifies the time the podcast was uploaded. For example, if the podcast was uploaded at 3:13 AM, the podcast file would end with _0131a. Nevertheless, this still doesn't help me, as the podcasts are uploaded at more or less random times. All I can be sure of is that there will be one AM and one PM podcast each day.
How can I download the latest one of these automatically if I don't know the exact complete filename? If I could do directory listings it would be simple.
Last edited by Special K; 12-05-2012 at 09:13 AM.
|
|
|
12-05-2012, 02:55 AM
|
#4
|
|
Member
Join Date: Sep 2012
Posts: 59
|
Read wget's manpage. It supports recursive retrieval of web content and spidering. Try something like:
wget --spider -np -r -l 1 -A podcast_* http://site.com/podcasts/
|
|
|
12-05-2012, 09:14 AM
|
#5
|
|
Diamond Member
Join Date: Jun 2000
Posts: 6,997
|
Quote:
Originally Posted by jimmybgood9
Read wget's manpage. It supports recursive retrieval of web content and spidering. Try something like:
wget --spider -np -r -l 1 -A podcast_* http://site.com/podcasts/
|
Thanks jimmybgood9, I'll have to investigate that.
|
|
|
12-05-2012, 11:09 AM
|
#6
|
|
Super Moderator Elite Member
Join Date: Oct 1999
Posts: 26,653
|
Quote:
Originally Posted by Special K
How can I download the latest one of these automatically if I don't know the exact complete filename? If I could do directory listings it would be simple.
|
Use their RSS feed to get the URL. This scenario is precisely what it's for.
__________________
ViRGE
Team Anandtech: Assimilating a computer near you!
GameStop - An upscale specialized pawnshop that happens to sell new games on the side
Todd the Wraith: On Fruit Bowls - I hope they prove [to be] as delicious as the farmers who grew them
|
|
|
12-05-2012, 11:48 AM
|
#7
|
|
Diamond Member
Join Date: Jun 2000
Posts: 6,997
|
Quote:
Originally Posted by ViRGE
Use their RSS feed to get the URL. This scenario is precisely what it's for.
|
How would I do that via command line in linux?
|
|
|
12-05-2012, 11:52 AM
|
#8
|
|
Super Moderator Elite Member
Join Date: Oct 1999
Posts: 26,653
|
It sounds like Podracer will do the trick for you.
__________________
ViRGE
Team Anandtech: Assimilating a computer near you!
GameStop - An upscale specialized pawnshop that happens to sell new games on the side
Todd the Wraith: On Fruit Bowls - I hope they prove [to be] as delicious as the farmers who grew them
|
|
|
12-06-2012, 10:59 PM
|
#9
|
|
Diamond Member
Join Date: Jun 2000
Posts: 6,997
|
Quote:
Originally Posted by ViRGE
If we're talking about over HTTP, you can't. Unless it's setup to show the directory listing, then that information is intentionally hidden and cannot be made to show itself.
Now if you have access to the box, then a simple ls in the directory you want more information on will suffice.
|
Is listing directory contents over HTTP a security risk or something? Why wouldn't they allow a user to do that?
They can always restrict what directories a user can view, but if all the files in a directory are downloadable by anyone, then I don't see the harm in allowing users to do an ls in that directory.
|
|
|
12-06-2012, 11:29 PM
|
#10
|
|
Super Moderator Elite Member
Join Date: Oct 1999
Posts: 26,653
|
Quote:
Originally Posted by Special K
Is listing directory contents over HTTP a security risk or something? Why wouldn't they allow a user to do that?
|
That's exactly it. They don't want every document visible, especially if there are settings files located in that directory or if it means server side scripts that should be executed get transmitted to the user instead. Furthermore a lot of sites like to have directories jump to an index of some kind (ex: forums.anandtech.com really goes to forums.anandtech.com/index.php), which requires that a directly redirect to that file. There's no convention in HTTP for both doing that and showing the contents of a directory, it's one or the other.
Honestly this is why FTP was invented. However that's all but gone by the wayside for most download services now.
__________________
ViRGE
Team Anandtech: Assimilating a computer near you!
GameStop - An upscale specialized pawnshop that happens to sell new games on the side
Todd the Wraith: On Fruit Bowls - I hope they prove [to be] as delicious as the farmers who grew them
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 02:48 AM.
|