• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Do something similar to SQL 'Group By' in java?

wantedSpidy

Senior member
Hey guys,

I need to write a method in Java, that does something similar to SQL's group by.

Here's an example input file:
1, 2, 3, 4
1, 2, 5, 7
1, 2, 5, 8
1, 2, 5, 9
1, 2, 3, 4

And I want to group by column 3,
so my output file should be:
1, 2, 3, 4
1, 2, 3, 4
1, 2, 5, 7
1, 2, 5, 8
1, 2, 5, 9

Whats the best way to do this? Just sort based on values in that column? What if I want to group by more than 1 column?

Thanks for the help in advance!!
 
How is that output in any way related to a 'group by' action? As far as I can tell, all you did was shuffle rows around.
 
If your input will always be the same structure, I would create a Class that mimics it. In this case, 4 integers and way to store the sort information; an integer that records the column to sort perhaps. In that class overload the compareTo(...) function. For each row in the input, create a new instance of your class and add them to a standard array. Then just use the built in java array sort function, something like java.util.Arrays.Sort(YourClass[] items), then just spit them back out to the output.
 
Originally posted by: kamper
How is that output in any way related to a 'group by' action? As far as I can tell, all you did was shuffle rows around.

It's not.... I just assumed he meant a sort instead.
 
Actually I do mean group by, I will eventually perform some aggregate function on the groups -

For example, when I group by column 3 in the above example, lets say I calculate the average value in column 4 for each group.
Then I would get,

1,2,3,[4]
1,2,5,[8]
 
how would you do it for more than 2 group by's?

For example I say group by column 2 and 3, and for those groups calculate the SUM of values in column 4.

Input:
1, 2, 3, 4
2, 2, 3, 5
3, 2, 4, 7
4, 3, 4, 8

Output:
4+5 = 9 [from (2,3)]
7 = 7 [from (2,4)]
8 = 8 [from (3,4)]
5
 
The easy way would be to sort the results. Lets assume that each row is an object FourTuple with (int) properties a,b,c and d.

Object[] data; // your array
Arrays.sort(data, new Compartor() {
public int compare(Object o1, Object o2){
FourTuple first = (FourTuple) o1;
FourTuple second = (FourTuple) o2;

if (first.a > second.a) {
return 1;
}
if (first.a < second.a) {
return -1;
}
// this mean the first column is tied in thee two rows
if (first.b > second.b) {
return 1;
}
if (first.b < second.b) {
return -1;
}
if (first.c > second.c) {
return 1;
}
if (first.c < second.c) {
return -1;
}
if (first.d > second.d) {
return 1;
}
if (first.d < second.d) {
return -1;
}
// at this point all a.b,c,d are equal... so return "equal"
return 0;
}
public boolean equals(Object that) {
return this.a == that.a && this.b == that.b && this.c == that.c && this.d == that.d;
}
});

Aggregating the results is much easier now since the list is sorted.
 
Thanks, I used the same idea to write a generic group by for n number of values in each row.

I used break's in the inner loops to get the same effect as you got though those if else's above.

Thanks a million, really appreciate it.
 
Originally posted by: wantedSpidy
Actually I do mean group by, I will eventually perform some aggregate function on the groups -

For example, when I group by column 3 in the above example, lets say I calculate the average value in column 4 for each group.
Then I would get,

1,2,3,[4]
1,2,5,[8]
You need to apply an aggregate to every column that you are not grouping by. Otherwise what would you have done if the first two columns hadn't all had identical values?

The easiest way I can see to do a group by on two columns is to treat the two columns together as a key so let's say you've got the following input:
1, 2, 3, 4
1, 2, 5, 7
1, 3, 5, 8
1, 3, 5, 10
1, 2, 5, 9
1, 2, 3, 4

and you want to group by columns 2 and 3 and display the avg of columns 1 and 4. Then you'd get:
1, 2, 3, (4+4)/2=4
1, 2, 5, (7+9)/2=8
1, 3, 5, (8+10)/2=9

If you wanted to first group by one column and then by a second column in the result, you'd have to apply new aggregates to all the other columns. You could think of it as a new query on the first result. At least that's what I think, I haven't though too in depth about it.

Coincidentally, I actually did implement a rough sql subset the other week at work for mining some log files which had tabular data in them. The interesting thing about group by is that, unlike other operators like 'where' which take a table and return a table, it takes a table and returns a dictionary of tables, keyed on the value of the column that you're grouping by. That's why you have to apply aggregates to every other column to get a single table back. That was a fun (and easier than expected) experience.
 
"and you want to group by columns 2 and 3 and display the avg of columns 1 and 4. Then you'd get:
1, 2, 3, (4+4)/2=4
1, 2, 5, (7+9)/2=8
1, 3, 5, (8+10)/2=9 "
That is what I wanted to do. And was successful last night 😀

"The easiest way I can see to do a group by on two columns is to treat the two columns together as a key"
yup, I ended up using the same idea.

Thanks
 
Cool. Out of curiosity, how are you representing your tables in memory? I used a dictionary (Map) per row with column names as keys and cell values obviously as values and a table was just a list of such dictionaries. It was convenient to parse into that format but if I were to do it over I'd probably abstract the idea of a table and have a tuple (array) of column titles as well as a list of tuples, one for each row and rely on the order to get the correct values.
 
kill me but I'm using ArrayLists as my temp solution.

I'll revamp the project during christmas break when I have more time on my hands 😀
 
Originally posted by: wantedSpidy
Thanks, I used the same idea to write a generic group by for n number of values in each row.

I used break's in the inner loops to get the same effect as you got though those if else's above.

Thanks a million, really appreciate it.

np... btw how much data are you processing? If your operations become more complex, you might be better off using sqlite and creating in-memory databases. There's a JDBC wrapper for sqlite out there, i've never used it though.
 
right now, I just have some static files with about maybe 1000 tuples.

Need to make it more dynamic, do some polling maybe?

I'll make another thread, to get all your suggestions 😀
 
Back
Top