How to list directory contents of a web folder via linux command?

Discussion in 'Operating Systems' started by Special K, Dec 5, 2012.

  1. Special K

    Special K Diamond Member

    Joined:
    Jun 18, 2000
    Messages:
    7,101
    Likes Received:
    0
    This seems like it would be simple, but google is failing me. How can I retrieve a listing of the files contained within a directory on a website? I know about wget and curl, but those only seem to work when you know the exact filename. Is there a way to simply retrive a listing of the files in a particular directory?
     
    #1 Special K, Dec 5, 2012
    Last edited: Dec 5, 2012
  2. Loading...

    Similar Threads - list directory contents Forum Date
    List operating system in Windows 2000 Active Directory Operating Systems Mar 9, 2007
    Needing list of directories to wipe my pc (win xp pro) Operating Systems Aug 22, 2003
    Active Directory User Access Listing? Operating Systems Aug 28, 2002
    Directory content listing Operating Systems Aug 23, 2001
    Is there an easy way to list the contents of a directory in Win2k? Operating Systems Dec 8, 2000

  3. ViRGE

    ViRGE Elite Member, Moderator Emeritus

    Joined:
    Oct 9, 1999
    Messages:
    31,349
    Likes Received:
    91
    If we're talking about over HTTP, you can't. Unless it's setup to show the directory listing, then that information is intentionally hidden and cannot be made to show itself.

    Now if you have access to the box, then a simple ls in the directory you want more information on will suffice.
     
  4. Special K

    Special K Diamond Member

    Joined:
    Jun 18, 2000
    Messages:
    7,101
    Likes Received:
    0
    Yes, this is over HTTP. I found a site with a daily podcast I want to check out, and I want to setup a script to automatically download the latest one each night. The podcasts have a regular format, except for an underscore at the end followed by what appears to be random characters, or at least characters without any discernable pattern. The filenames do have the current date in them.

    UPDATE: I have discovered that the "random" characters are actually a 4-digit code that specifies the time the podcast was uploaded. For example, if the podcast was uploaded at 3:13 AM, the podcast file would end with _0131a. Nevertheless, this still doesn't help me, as the podcasts are uploaded at more or less random times. All I can be sure of is that there will be one AM and one PM podcast each day.

    How can I download the latest one of these automatically if I don't know the exact complete filename? If I could do directory listings it would be simple.
     
    #3 Special K, Dec 5, 2012
    Last edited: Dec 5, 2012
  5. jimmybgood9

    jimmybgood9 Member

    Joined:
    Sep 6, 2012
    Messages:
    59
    Likes Received:
    0
    Read wget's manpage. It supports recursive retrieval of web content and spidering. Try something like:

    wget --spider -np -r -l 1 -A podcast_* http://site.com/podcasts/
     
  6. Special K

    Special K Diamond Member

    Joined:
    Jun 18, 2000
    Messages:
    7,101
    Likes Received:
    0
    Thanks jimmybgood9, I'll have to investigate that.
     
  7. ViRGE

    ViRGE Elite Member, Moderator Emeritus

    Joined:
    Oct 9, 1999
    Messages:
    31,349
    Likes Received:
    91
    Use their RSS feed to get the URL. This scenario is precisely what it's for.
     
  8. Special K

    Special K Diamond Member

    Joined:
    Jun 18, 2000
    Messages:
    7,101
    Likes Received:
    0
    How would I do that via command line in linux?
     
  9. ViRGE

    ViRGE Elite Member, Moderator Emeritus

    Joined:
    Oct 9, 1999
    Messages:
    31,349
    Likes Received:
    91
    It sounds like Podracer will do the trick for you.
     
  10. Special K

    Special K Diamond Member

    Joined:
    Jun 18, 2000
    Messages:
    7,101
    Likes Received:
    0
    Is listing directory contents over HTTP a security risk or something? Why wouldn't they allow a user to do that?

    They can always restrict what directories a user can view, but if all the files in a directory are downloadable by anyone, then I don't see the harm in allowing users to do an ls in that directory.
     
  11. ViRGE

    ViRGE Elite Member, Moderator Emeritus

    Joined:
    Oct 9, 1999
    Messages:
    31,349
    Likes Received:
    91
    That's exactly it. They don't want every document visible, especially if there are settings files located in that directory or if it means server side scripts that should be executed get transmitted to the user instead. Furthermore a lot of sites like to have directories jump to an index of some kind (ex: forums.anandtech.com really goes to forums.anandtech.com/index.php), which requires that a directly redirect to that file. There's no convention in HTTP for both doing that and showing the contents of a directory, it's one or the other.

    Honestly this is why FTP was invented. However that's all but gone by the wayside for most download services now.