Help with C++ programing

Unax

Junior Member
Jun 16, 2001
4
0
0
Hi there,
I gotta write a code that do the following: remove html tags . Basically I gotta turn a .html document in a .txt .

eg: Turn:
<html>
<h1>The Dandy Warhols Discography</h1>
<\html>

In:
The Dandy Warhols Discography .

How do I do that. Dont worry with file manipulation, just wanna know how to
remove those tags.

Thanks in advance,

Unax

 

BlackOmen

Senior member
Aug 23, 2001
526
0
0
Hmm, just for thinking about it for a few seconds, what i would do is:

-open the .html file and the .txt file
-upon reading a '<' i would push that onto a stack
i would then use cin.ignore() to ignore all characters while ins.peek()!='>'
-when the loop is broken, i would ignore the '>' and pop the '<'
-any other text would be streamed into the text file

Like I said, that's what I came up with after thinking about it for a few seconds. You can choose any other method to handle < and >, but if you were so inclined to check, you could verify the html file was formatted properly by checking whether or not the stack is empty. Hope it helps.
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
I second what BlackOmen stated.

However, rather than just stacking,
Parse for the right caret >.
Dump any/all character (s) after that point to your output until you detect the left caret <.
then repeat the sequence.