- Nov 25, 2008
- 1,019
- 0
- 71
Hello all-
I'm currently working on a personal "project" where I...
1. take a RSS feed url (http://www.tntradioempire.com/rss/?type=podcasts&format=rss&path=/AUDIO/podcast for example)
2. download the entire page (with links) via wget: $wget "http://url.link.here..." -O log.txt
3. parse the log with: grep -w http log.txt download.txt (download.txt is a second file)
As of right now I have many lines (in the second file) where each line is:
<enclosure url="http://media.journalinteractive.com/audio/07010930.mp3" length="10323342" type="audio/mpeg"/>
(NOTE: the ########.mp3 is the timestamped audio file according to when the particular audio clip was played on the air - as it is a radio station out in the Mid-West).
What I'm trying to do is further parse the second file (download.txt) to get the exact web address of the mp3 file. So far I've tried some basic regular expressions (I think I tried s/^http/ ) to further extract the http link, however I just get the exact same lines of text.
What grep, sed, awk, or any other Linux command could I try to extract the http link out of the file and then output the http link to another file?
Thank you in advance for your suggestions and have a great 4th of July!
EDIT: forgot a specific detail, that might further assist the reader
I'm currently working on a personal "project" where I...
1. take a RSS feed url (http://www.tntradioempire.com/rss/?type=podcasts&format=rss&path=/AUDIO/podcast for example)
2. download the entire page (with links) via wget: $wget "http://url.link.here..." -O log.txt
3. parse the log with: grep -w http log.txt download.txt (download.txt is a second file)
As of right now I have many lines (in the second file) where each line is:
<enclosure url="http://media.journalinteractive.com/audio/07010930.mp3" length="10323342" type="audio/mpeg"/>
(NOTE: the ########.mp3 is the timestamped audio file according to when the particular audio clip was played on the air - as it is a radio station out in the Mid-West).
What I'm trying to do is further parse the second file (download.txt) to get the exact web address of the mp3 file. So far I've tried some basic regular expressions (I think I tried s/^http/ ) to further extract the http link, however I just get the exact same lines of text.
What grep, sed, awk, or any other Linux command could I try to extract the http link out of the file and then output the http link to another file?
Thank you in advance for your suggestions and have a great 4th of July!
EDIT: forgot a specific detail, that might further assist the reader
Last edited: