I need Scanner to read but not iterate

Danimal1209

Senior member
Nov 9, 2011
355
0
0
For a problem in class, I need to read an entire document. But I need to read it in n-grams, which are user defined.

So, when I do a 3-gram at the start of a file, I need to read the first three words, then put that 3 word String in to a hash table. Next, I need to read words 2,3, and 4 , then 3,4, and 5 and so on.

My problem is I have no idea on how to go about doing this. If I read the second word in the file, to add it to my String, then I lose it for the start of the next String.

Any ideas?
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
Depending on the Language you may find you can use a PushbackInputStream that allows you to unread some of what you read. You can certainly simulate it yourself with a decorator class providing pushback method which stores the words in an in memory buffer which is read first.
 

Danimal1209

Senior member
Nov 9, 2011
355
0
0
I'm using Java, and I can't use that method you noted as it is not something we learned in class. I emailed my teacher, but he hasn't replied for two days.
 

LumbergTech

Diamond Member
Sep 15, 2005
3,622
1
0
I don't quite understand your problem....can't you get the old words back out of the hash table to take in conjunction with the new ones?
 

Danimal1209

Senior member
Nov 9, 2011
355
0
0
I would need to take out the previous n-gram, remove the first word and then append on the next(). Is there a method for removing the first word of a String?
 

LumbergTech

Diamond Member
Sep 15, 2005
3,622
1
0
I would need to take out the previous n-gram, remove the first word and then append on the next(). Is there a method for removing the first word of a String?

the split method of the string class...


String myStr = "Hello There Man";

String[] words = myStr.split(" ");

String firstWord = words[0];
 
Last edited:

esun

Platinum Member
Nov 12, 2001
2,214
0
0
You don't "lose" anything, you're thinking too much about implementation and not about conceptually what you're doing. The things you read don't just disappear. You can store them if you need them.

I'd say the easiest way to go about this would be to have a FIFO queue of words of length n. You populate that with the first n words in the file. Then you iterate over the rest of the words in the file one at a time, each time removing the first word from the list and adding the new word to the end of the list.

Conceptually, you're just buffering a stream into a FIFO of length n and returning (or rather, adding to a hash table) the n items in the FIFO at each iteration.
 

piasabird

Lifer
Feb 6, 2002
17,168
60
91
Duh just use a counter. Java is very clumsy when it comes to splitting up words. Also there are many different ways to read text in files. How do we know what specific methods you are allowed to use.

This is what being a programmer is all about. It is about thinking up new ways to do things with the tools that you have. This is not something you can memorize. You are suppose to use your brain to slove your problems. I dont really see the intelligent point in what you explained, but it must have some point. You could go about this in different ways.

You could just read it all into an index and read in one letter at a time and both count the spaces and record the length of the words. They you can rearrange it. You have to be creative with what tools you have learned. Or you can make a collection of text fields and move one letter at a time till you get the right amount. Did I say to count and use math logic. Java is a lot harder to manipulate text in sometimes than even COBOL.
 

piasabird

Lifer
Feb 6, 2002
17,168
60
91
I dont remember Java much but one time we had to write a game of minesweeper using a 2 level Table and the user was suppose to be able to pick 3 possible sizes for the game and then we could only use one 2 level table to make it work. It was kind of fun.
 

beginner99

Diamond Member
Jun 2, 2009
5,318
1,763
136
http://docs.oracle.com/javase/6/docs/api/java/util/Scanner.html

Disclaimer: Untested. may contain Syntax and logical errors...
Code:
Scanner sc = new Scanner(new File(document));
ArrayList<String> tokens = new ArrayList<String>();
ArrayList<String> grams = new ArrayList<String>();

while (scanner.hasNext()){

	tokens.add(scanner.next());
	
	if (tokens.size() >= n) {
		String gram;
		for(int i = 0; i < n; i++) {
			gram += tokens.get(i);
		}
		grams.add(gram);
		tokens.remove(0);
	}
}

return grams;