Go Back   AnandTech Forums > Software > Operating Systems

Forums
· Hardware and Technology
· CPUs and Overclocking
· Motherboards
· Video Cards and Graphics
· Memory and Storage
· Power Supplies
· Cases & Cooling
· SFF, Notebooks, Pre-Built/Barebones PCs
· Networking
· Peripherals
· General Hardware
· Highly Technical
· Computer Help
· Home Theater PCs
· Consumer Electronics
· Digital and Video Cameras
· Mobile Devices & Gadgets
· Audio/Video & Home Theater
· Software
· Software for Windows
· All Things Apple
· *nix Software
· Operating Systems
· Programming
· PC Gaming
· Console Gaming
· Distributed Computing
· Security
· Social
· Off Topic
· Politics and News
· Discussion Club
· Love and Relationships
· The Garage
· Health and Fitness
· Merchandise and Shopping
· For Sale/Trade
· Hot Deals
· Free Stuff
· Contests and Sweepstakes
· Black Friday 2013
· Forum Issues
· Technical Forum Issues
· Personal Forum Issues
· Suggestion Box
· Moderator Resources
· Moderator Discussions
   

Reply
 
Thread Tools
Old 12-05-2012, 01:00 AM   #1
Special K
Diamond Member
 
Join Date: Jun 2000
Posts: 7,026
Default How to list directory contents of a web folder via linux command?

This seems like it would be simple, but google is failing me. How can I retrieve a listing of the files contained within a directory on a website? I know about wget and curl, but those only seem to work when you know the exact filename. Is there a way to simply retrive a listing of the files in a particular directory?

Last edited by Special K; 12-05-2012 at 09:15 AM.
Special K is offline   Reply With Quote
Old 12-05-2012, 01:45 AM   #2
ViRGE
Super Moderator
Elite Member
 
ViRGE's Avatar
 
Join Date: Oct 1999
Posts: 29,274
Default

If we're talking about over HTTP, you can't. Unless it's setup to show the directory listing, then that information is intentionally hidden and cannot be made to show itself.

Now if you have access to the box, then a simple ls in the directory you want more information on will suffice.
__________________
ViRGE
Team Anandtech: Assimilating a computer near you!
GameStop - An upscale specialized pawnshop that happens to sell new games on the side
Todd the Wraith: On Fruit Bowls - I hope they prove [to be] as delicious as the farmers who grew them
ViRGE is offline   Reply With Quote
Old 12-05-2012, 01:52 AM   #3
Special K
Diamond Member
 
Join Date: Jun 2000
Posts: 7,026
Default

Yes, this is over HTTP. I found a site with a daily podcast I want to check out, and I want to setup a script to automatically download the latest one each night. The podcasts have a regular format, except for an underscore at the end followed by what appears to be random characters, or at least characters without any discernable pattern. The filenames do have the current date in them.

UPDATE: I have discovered that the "random" characters are actually a 4-digit code that specifies the time the podcast was uploaded. For example, if the podcast was uploaded at 3:13 AM, the podcast file would end with _0131a. Nevertheless, this still doesn't help me, as the podcasts are uploaded at more or less random times. All I can be sure of is that there will be one AM and one PM podcast each day.

How can I download the latest one of these automatically if I don't know the exact complete filename? If I could do directory listings it would be simple.

Last edited by Special K; 12-05-2012 at 09:13 AM.
Special K is offline   Reply With Quote
Old 12-05-2012, 02:55 AM   #4
jimmybgood9
Member
 
Join Date: Sep 2012
Posts: 59
Default

Read wget's manpage. It supports recursive retrieval of web content and spidering. Try something like:

wget --spider -np -r -l 1 -A podcast_* http://site.com/podcasts/
jimmybgood9 is offline   Reply With Quote
Old 12-05-2012, 09:14 AM   #5
Special K
Diamond Member
 
Join Date: Jun 2000
Posts: 7,026
Default

Quote:
Originally Posted by jimmybgood9 View Post
Read wget's manpage. It supports recursive retrieval of web content and spidering. Try something like:

wget --spider -np -r -l 1 -A podcast_* http://site.com/podcasts/
Thanks jimmybgood9, I'll have to investigate that.
Special K is offline   Reply With Quote
Old 12-05-2012, 11:09 AM   #6
ViRGE
Super Moderator
Elite Member
 
ViRGE's Avatar
 
Join Date: Oct 1999
Posts: 29,274
Default

Quote:
Originally Posted by Special K View Post
How can I download the latest one of these automatically if I don't know the exact complete filename? If I could do directory listings it would be simple.
Use their RSS feed to get the URL. This scenario is precisely what it's for.
__________________
ViRGE
Team Anandtech: Assimilating a computer near you!
GameStop - An upscale specialized pawnshop that happens to sell new games on the side
Todd the Wraith: On Fruit Bowls - I hope they prove [to be] as delicious as the farmers who grew them
ViRGE is offline   Reply With Quote
Old 12-05-2012, 11:48 AM   #7
Special K
Diamond Member
 
Join Date: Jun 2000
Posts: 7,026
Default

Quote:
Originally Posted by ViRGE View Post
Use their RSS feed to get the URL. This scenario is precisely what it's for.
How would I do that via command line in linux?
Special K is offline   Reply With Quote
Old 12-05-2012, 11:52 AM   #8
ViRGE
Super Moderator
Elite Member
 
ViRGE's Avatar
 
Join Date: Oct 1999
Posts: 29,274
Default

It sounds like Podracer will do the trick for you.
__________________
ViRGE
Team Anandtech: Assimilating a computer near you!
GameStop - An upscale specialized pawnshop that happens to sell new games on the side
Todd the Wraith: On Fruit Bowls - I hope they prove [to be] as delicious as the farmers who grew them
ViRGE is offline   Reply With Quote
Old 12-06-2012, 10:59 PM   #9
Special K
Diamond Member
 
Join Date: Jun 2000
Posts: 7,026
Default

Quote:
Originally Posted by ViRGE View Post
If we're talking about over HTTP, you can't. Unless it's setup to show the directory listing, then that information is intentionally hidden and cannot be made to show itself.

Now if you have access to the box, then a simple ls in the directory you want more information on will suffice.
Is listing directory contents over HTTP a security risk or something? Why wouldn't they allow a user to do that?

They can always restrict what directories a user can view, but if all the files in a directory are downloadable by anyone, then I don't see the harm in allowing users to do an ls in that directory.
Special K is offline   Reply With Quote
Old 12-06-2012, 11:29 PM   #10
ViRGE
Super Moderator
Elite Member
 
ViRGE's Avatar
 
Join Date: Oct 1999
Posts: 29,274
Default

Quote:
Originally Posted by Special K View Post
Is listing directory contents over HTTP a security risk or something? Why wouldn't they allow a user to do that?
That's exactly it. They don't want every document visible, especially if there are settings files located in that directory or if it means server side scripts that should be executed get transmitted to the user instead. Furthermore a lot of sites like to have directories jump to an index of some kind (ex: forums.anandtech.com really goes to forums.anandtech.com/index.php), which requires that a directly redirect to that file. There's no convention in HTTP for both doing that and showing the contents of a directory, it's one or the other.

Honestly this is why FTP was invented. However that's all but gone by the wayside for most download services now.
__________________
ViRGE
Team Anandtech: Assimilating a computer near you!
GameStop - An upscale specialized pawnshop that happens to sell new games on the side
Todd the Wraith: On Fruit Bowls - I hope they prove [to be] as delicious as the farmers who grew them
ViRGE is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 10:44 AM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.