More C++ questions

NTB

Diamond Member
Mar 26, 2001
5,179
0
0
I'm working on another program for my C++ class, an extension on the word parser that I posted about here a few weeks ago. This time I'm dealing with the C++ string class rather than the older c-strings, but other than that, the base of the program is still the same. It still parses input for individual words.

Here is what I have to do: Read an input file and parse the individual words out of it (words will be returned in all lower-case and stripped of any punctuation, to make a later part of the program easier). Then I have to go through and count - up to a maximum of 100 - the # of unique words within the file, and display them in the DOS/command window at the end of the program.

The first part of the project is done; the file can be read, the words are parsed out and returned in lower-case, and all the punctuation is stripped out. What I'm having trouble with is figuring out how to implement the unique word count. What I am thinking is to put the first word read into an array. After that, compare each consequent word to each element in the array. If the word is found, do nothing; if the word is not found, add it to the array. For each element added to the array, increment a counter for the # of unique words by 1. This sounds easy, but I've been sitting here fiddling with my code all night, to no avail. I cannot get the comparison and counter to work at all. Any suggestions?

PS - sorry for so many questions, but my proffessor stinks at teaching (class oppinion, not just mine :p) and the book I've got only tells me so much. I'm stumped.

Nate
 

Heifetz

Golden Member
Oct 9, 1999
1,398
0
0
If performance does not matter, you can do a really repetitive and inefficient method of just getting one word, and going through your entire array, and counting the number of instances that word appears. Otherwise, I suggest you look for info on the c++ stl, and use some kind of a dictionary container that maps a key to value. I think the STL should also have algorithms that let you hash the word and find if it exists in the dictionary. I haven't done a lot of work with the STL so i'm not completely sure about that.


Heifetz
 

NTB

Diamond Member
Mar 26, 2001
5,179
0
0
From the way the professor was talking, the way that I outlined in my first post (brute force, essentially) is the way he wanted us to do it.

Edit: performance really isn't an issue; this is just a little console program. The file being read in consists of just a couple lines of text, not Steven King's The Stand :p.

Nate
 

Mucman

Diamond Member
Oct 10, 1999
7,246
1
0
If you aren't marked on efficieny, then simply do it the simplest most fool proof way. When I was taking programming courses I would always try to
optimize my code and end up spending many hours digging myself out of a hole that my 'optimizations' caused ;)

My solution would be to create a LinkedList of words. Create an add to list method call that doesn't insert duplicates. Then the number of unique
words is simply the length of this list. Crude, but it should work.

 

AlexWade

Member
Sep 27, 2003
89
0
0
Can you make a tree data structure? Tree are much more complex, but their efficiency can't be beat. Here is how you can do it:

typedef struct treeNode *treePtr;
struct treeNode {
String word;
int count;
treePtr left, right;
}

Now the first word you encounter can be the root of the tree. The next word is either before or after the word. If it is before, add to the right, if it is after, to the left. If it is equal, increment the count.

In your constructor:
treePtr root = NULL;

some add word method:

treePtr current = root;
treePtr parent = NULL;

while(current != NULL) {
if (strcmp(current -> word, myCurrentWord) {
current -> count += 1;
current = NULL;
return;
}

parent = current;
if (myCurrentWord > current -> word) current = current -> left;
else current = current -> right;
}

// At the bottom, add here
current = new treePtr;
if (myCurrentWord > current -> word) parent -> left = current;
else parent -> right = current;

current -> count = 1;
strcpy(current -> word, myCurrentWord);
current -> left = current -> right = NULL;
current = NULL;


Of course, I just made this up in 10 minutes. You can modify it to make it more efficient or error-free. But the idea is like this. If, for some strange reason, all words after the first begin after or before the first, then the tree basically becomes a linked-list. Under normal conditions, you don't have to compare or sort hundreds or thousands of words, just some of them.
 

NTB

Diamond Member
Mar 26, 2001
5,179
0
0
Almost have it figured out; I just need a little help figuring out how to construct what I'm trying to do here. First of all, I have an array:

string unique_words[100] //100 words allowed, max

then, after I read in the file, I put the first word parsed out into the array, since I know that it, at least, is unique. Also, I increment a counter for the # of unique words in the file:

unique_words[0] = word //word is a variable containing whatever is returned from the parsing part of the program
uword_count++;

After this, for each word that I return from the parser, I have to compare that word to the ones already in the array. If I find a match, I don't do anything; if I don't find a match, it must be another unique word. so I add it to the array at unique_word[uword_count-1], and increment the counter again to account for the new word.

That was the part I'm having trouble with. The rest is easy: display the total # of unique words, and display the contents of the unique_word array. This I already have worked out.

Here is the source code that I'm working with; right now all it does is count the TOTAL number of words in the file:

int main(int argc, char *argv[]){

system("CLS");
ifstream in_stream;
ofstream out_stream;

string input_line; // input sentence
string word, unique_words[100]; // holds parsed words
int word_count,total_count=0; // counter for words
int uword_count=0 ;
Text_Parser input_parser; // an object of class Text_Parser

in_stream.open(argv[1]);
if(argv[1]==""){
cout<<"\n Error! no input file specified. please restart the program & try again."<<endl;
cout<<"Bye...";
system("PAUSE");
exit(1);
}//end no argv[1] error catcher

if(in_stream.fail()){
cout<<"\nError! Failed to open input file. Please restart the program & try again."<<endl;
cout<<"Bye...";
system("PAUSE");
exit(2);
}//end input file failure error catcher

while(!in_stream.eof()){

getline(in_stream, input_line); // read input sentence
input_parser.txt_string(input_line); // initialize Text_Parser object
word_count = 0;
word = input_parser.next_word(); // get first word from sentence

while (!word.empty()){ // if not end of sentence ...

word_count ++; // count this word
word = input_parser.next_word(); // get next word from sentence

}//end word.empty() loop



input_parser.reset(); // reset the parser to count words again
input_parser.display(); // display the paragraph

total_count = total_count+word_count;
}//end file-read loop
cout<<"\nthere are "<<total_count<<" unique words total in this file.\n"<<endl;

for(int j=0; j<uword_count; j++){
cout<<unique_words[j]<<" ";
if((i+1)%5==0){
cout<<"\n"<<endl;
}//end if
}//end loop

in_stream.close();
cout << "\n\n Bye ...";
system("PAUSE");
return 0;

}//end main

EDIT: stupid text tags... Oh, and sorry it looks so bad; it's all indented in my real .cpp file.

Nate
 

NTB

Diamond Member
Mar 26, 2001
5,179
0
0
OK, getting a little further... I've got it to the point now where it will find the first couple of unique words. But once it hits the first duplicate, the program stops looking & jumps down to the next paragraph in the text. There it finds the next unique word (first word of the paragraph) and the next word is also a duplicate. At this point, since there are only two paragraphs in the text, it stops and goes to the end of the program - the pause before it actually exits.

Nate
 

kalster

Diamond Member
Jul 23, 2002
7,355
6
81
wondering if your done with this?

you use the map STL object to easily make <string, count> pairs, once you have the the whole input string, just find substring after substring with space delimiter

and keep adding it to the map

you coud do something like
typedef string_count_pair
{
string s;
int count;
}string_count_pair_type;


list<string_count_pair_type> my_list;

int& string_count(string &s)
{
for(i=0;i<my_list.count;i++)
if(my_list.name==s)
return my_list.count;

my_list.push_end(Pair(s,0);


return my_list[my_list.end()-1].count;

}


after this all you need to do is iterate thru the whole string and find substrings (space seperated0


while(there are still substrings)

string_count()++;


}



edit : just checked that you only want unique words (just modify my function to add a new word if not already present in the list;

 

GiLtY

Golden Member
Sep 10, 2000
1,487
1
0
I just skimmed your question, does C++ have any string comparing functions? I know in C there is strcmp, so maybe you can look into that.

--GiLtY
 

kalster

Diamond Member
Jul 23, 2002
7,355
6
81
Originally posted by: GiLtY
I just skimmed your question, does C++ have any string comparing functions? I know in C there is strcmp, so maybe you can look into that.

--GiLtY

string STL object in C++ has many functions , comparison operator == is overloaded as well
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Check the dates guys, if NTB hasn't turned in the assignment by now he's not getting a very good grade on it ;)