- Sep 13, 2001
- 53,582
- 6,424
- 126
first I just want to tell people that i don't want the actual code on how to solve this. i'm simply looking at other ideas on HOW to solve this, as in, talking in plain english. it's a minor programming exercise i have to do for a potential job. i'll give my current idea (after thinking only like 5 minutes about it) after the instructions. here they are
this is going to be solved in java.
what i'm thinking is so read the 'lexicon' file into a hashmap just reading the file line by line, and probably strsplit using a tab as the delimeter between the two strings. then set the KEY and OBJECT being "THE" and "article" for the first line of the lexicon above.
then after that, simply parsing the 'document' file line by line, and a 'word' will simply be what exists between 2 space characters. then going word by word, seeing if the KEY exists in the hashmap, and if so, then getting the object and appending the "/<OBJECT>" to the word and output it. i know i didn't explain the exact formatting of the output, but that part isn't really a concern to me.
i will also have to special case when there is a period at the end of a sentence. for instance, if the 'document' was...
I hate the brown bag.
I believe the output should be
I hate the/article brown/adjective bag/noun.
but what i'm wondering, is if anyone has any other suggestions that may be more efficient and out of the box than this one. my solution just seems very straight foward. but again i've only thought about this for like a few minutes.
Code:
You get a document and a lexicon (part of speech dictionary).
The lexicon contains a part-of-speech label for each word it defines.
There are many words that are not defined. The file is named 'lexicon'.
There is one definition per line, the format is "WORD<TAB>PartOfSpeech"
The task is to process the document, adding the part of speech
whenever it's defined. No re-formatting should be done.
For example, if the lexicon is
THE article
BROWN adjective
BAG noun
and the document is
The small brown bag ripped.
Then your output should be
The/article small brown/adjective bag/noun ripped.
this is going to be solved in java.
what i'm thinking is so read the 'lexicon' file into a hashmap just reading the file line by line, and probably strsplit using a tab as the delimeter between the two strings. then set the KEY and OBJECT being "THE" and "article" for the first line of the lexicon above.
then after that, simply parsing the 'document' file line by line, and a 'word' will simply be what exists between 2 space characters. then going word by word, seeing if the KEY exists in the hashmap, and if so, then getting the object and appending the "/<OBJECT>" to the word and output it. i know i didn't explain the exact formatting of the output, but that part isn't really a concern to me.
i will also have to special case when there is a period at the end of a sentence. for instance, if the 'document' was...
I hate the brown bag.
I believe the output should be
I hate the/article brown/adjective bag/noun.
but what i'm wondering, is if anyone has any other suggestions that may be more efficient and out of the box than this one. my solution just seems very straight foward. but again i've only thought about this for like a few minutes.