• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

OK, *nix scripting afficionados, can you help me out here?

clockhar

Senior member
I am trying to write a script that edits a simple xml file that changes only one tag. The tag contains a date, in the form YYYY-MM-DD (i.e. tag is " <Date>2002-07-10</Date> "). I plan on running this script daily using cron. How do I even get started?
i.e. how do I parse the file to find the tag? Then, how do I insert the current date in the field?

Any help in getting started is soooooo much appreciated.

 
Depends on what tools you have available. What version of unix is this(solaris, tru64, linux, freebsd, etc)? What shells are there (bash, zsh, sh, csh, etc)? Any scripting languages like perl, python, etc available?
 
Originally posted by: Nothinman
Depends on what tools you have available. What version of unix is this(solaris, tru64, linux, freebsd, etc)? What shells are there (bash, zsh, sh, csh, etc)? Any scripting languages like perl, python, etc available?

Unix = HP-UX, shells = ksh, csh and sh. Perl is available, but I prefer not to use that (since I'm not familiar with it). On the other hand, I guess this would be the best way to learn perl. Does this help?

Any more info needed?
 
i can tell you a bit of a hackish way to get it done, i use it to create an index of files on my website.

http://www.mattvanmater.com/indexgenerator.tar.gz

basically i have a header and footer to the file, and then something that runs a pattern matcher to insert what in your case would be the date inside tags. i use cat to plug the header in a new file, then the replaced text, then the footer. I used sed (stream editor) to match and replace the pattern, which was the tricky part. its kinda a crappy solution, using perl or some other language might be better, but this works just fine for now 🙂
 
The substitution part is trivial with sed:

sed 's/<Date>....-..-..<\/Date>/<Date>2002-07-11<\/Date>/' yourfile.xml > yournewfile.xml

This will replace all occurences of the date tag in the form <Date>YYYY-MM-DD</Date> with <Date>2002-07-11<\Date>
The regex for the date could probably be better.

Getting todays date in there is a bit trickier, you'll probably have to have this sed command in another script to form & substitute the date.
Here's how I would do it in Python. bash should be even simpler if I knew bash scripting 😀

#!/usr/bin/python

import time
import os
import sys

os.system("sed 's/<Date>....-..-..<\/Date>/<Date>" + time.strftime('%Y-%m-%d', time.localtime(time.time())) + "<\/Date>/' " + sys.argv[1] + " > junk.xml")
os.rename('junk.xml', sys.argv[1])



Of course, you could just use python to open the file and make the substitution by reading through all the lines.

 
Use sed:


sed -e "s+<Date>.*</Date>+<Date>`date +%Y-%m-%d`</Date>+"

This will dump something out on stdout, so you can do somehting like this:

sed -e .... infile > $$ && mv $$ infile

This should modify the file and create a temporary file ($$ is a special keyword), then if that succeeds, it
should move the temp file back to infile.

 
Originally posted by: mgpaulus
Use sed:


sed -e "s+<Date>.*</Date>+<Date>`date +%Y-%m-%d`</Date>+"

He should give us a sample format of the file containing the <Date>...</Date> tag just to make sure, because
What if he encounters two <Date>. . .</Date> tags on the same line?

ex.

.
.
.
<Host name=myhost port=777>
<Logger level="1" name="error_"><Home>"/var/log/hello"</Home> <Date>2002-07-08</Date></Logger> <Logger level="3" name="access_"><Home>"/var/log/hello"</Home><Date>2002-07-08</Date></Logger> </Host>

.
.
.

Then the above command may return:
.
.
.
<Host name=myhost port=777>
<Logger level="1" name="error_"><Home>"/var/log/hello"</Home> <Date>2002-07-09</Date></Logger></Host>

.
.
.

And possibly wipe out a tag by mistake.

Just a thought.

L8
 
I suppose you are right. So, the better option would be:


sed -e "s+<Date>[^<]*</Date>+<Date>`date +%Y-%m-%d`</Date>+"

 
Originally posted by: mgpaulus
I suppose you are right. So, the better option would be:


sed -e "s+<Date>[^<]*</Date>+<Date>`date +%Y-%m-%d`</Date>+"


Just an update: The above script works well for my needs. Just one question: What does [^<]* do?

EDIT: Here's another challenge: How would I search a file (same xml file) for another date field (i.e. <date2>YYYY-MM-DD</date2> ) and store that date in a variable?

I assume, I would use a command like awk. But beyond that, I don't have a clue how to store the varying date in between the date2 tags.
 
What does [^<]* do?

The first place to find out stuff like this would be the man pages (You are familiar with 'man sed', right?). However, the short answer is: [...] is used to indicate a collection/set of characters. You might see something like [a-zA-Z] to indicate any letter, or [0-9+-.e] to indicate any number. If you place a ^ as the first character within the [], then that negates the regular expression. So, [^<] tells sed to look for any character that is not an <. And the * tells it to look for 0 or more occurrences.

EDIT: Here's another challenge: How would I search a file (same xml file) for another date field (i.e. <date2>YYYY-MM-DD</date2> ) and store that date in a variable?

I assume, I would use a command like awk. But beyond that, I don't have a clue how to store the varying date in between the date2 tags.


Actually, you can use sed again. There are a couple of ways to do it, but probably the easiest for me would be to use a subexpression (A slightly more advanced sed option)
myVar=`sed -n -e 's+.*<date2>\([^<]*\)</date2>.*+\1+p'`

-n tells sed not to print anything unless a p option is used to explicitely print it.
the leading .* and trailing .* are used to gobble up anything on the line outside of the stuff we want, to make sure we don't get any spurious output
the \( \) are used to capture a subexpression, or to create a sort of buffer that can be used later.
the \1 is used to reference the buffer we created earlier
the trailing p tells sed to print the substition we just created.

So, we are reading in each line. Every time we run into a <date2></date2> pair, we will grab whats between the tags, and display only that. All other output will be ignored.

HTH....

BTW, if you are really serious about playing with *nix, and you have to do it for your job, then it would behoove you to convince your boss to send you to an Advanced Unix class, where they teach you about some of the more useful tools like awk, sed, etc. The basic class is usually all stuff you can get from a half-assed book and/or your friends, but some of the advanced stuff is nice to have from a class, because sometimes you don't know what to look for until you've seen it (chicken/egg problem...) Just my $0.02, tho.



 
Hey, thanks for the info. That also works just fine.

Personally, I would not mind attending an advanced UNIX class (instead of scouring all over 'net searching for basic info). However, I am merely a summer intern, so the company does not see it beneficial (for the company) to send me to a class. So, instead, as part of my project, I'm stuck searching through tons of man pages, webstites and any other peice of literature to learn more about UNIX stuff.

Oh well, i figure, the more I practice using it (at home or at work), the more prepared I'll be once I get a "real" job.

 
Back
Top