SQL help with duplicates

rh71

No Lifer
Aug 28, 2001
52,844
1,049
126
should be a simple resolution but I'm just not getting it...

I have the following [simplified] query which generates duplicate rows that I need to get rid of. I'm mainly concerned with MY_ID being duped... other fields don't matter if they're duped but I still need the other fields there in the result set.

SELECT MY_ID, DESCRIPTION, NAME
FROM MYTABLE


When I insert the distinct command after select:
SELECT DISTINCT MY_ID, DESCRIPTION, NAME
FROM MYTABLE


DB2 errors with:
SQL0134N Improper use of a string column, host variable, constant, or
function "DESCRIPTION". SQLSTATE=42907

The datatype of DESCRIPTION is LONG VARCHAR and again - that cannot change nor the need to query that column. It seems this is preventing distinct from working. It will work without DESCRIPTION being pulled, of course, but again - I need that column.

SELECT DISTINCT(MY_ID), DESCRIPTION, NAME
FROM MYTABLE
.... spat out the same error thinking it's still distinct on all.

How do I use SQL to remove the duplicates I am getting since distinct seemingly cannot be used in this scenario ? Can a "WHERE" clause somehow help ?
 

bunker

Lifer
Apr 23, 2001
10,572
0
71
Hmmm...In MS SQL I would type
SELECT DISTINCT MY_ID,DESCRIPTION,NAME...

No parens in MS SQL, maybe try that?
 

rh71

No Lifer
Aug 28, 2001
52,844
1,049
126
Originally posted by: bunker
Hmmm...In MS SQL I would type
SELECT DISTINCT MY_ID,DESCRIPTION,NAME...

No parens in MS SQL, maybe try that?
this is what I tried the 1st time... I should've made that clear. Think it may have something to do with a character limit on distinct... DESCRIPTION is 1048 long varchar... maybe a size limit on distinct... ?
 

Firus

Senior member
Nov 16, 2001
525
0
0
can't you make a new table with the MY_ID as a primary key and then select all the data into the new table from the old table, that will tell you the dupes :) Why is My_ID not primary key anyways?

Edit: otherwise, try this...


select distinct(my_id), count(my_id) as numrows from mytable
group by my_id
having count(my_id) > 1
 

rh71

No Lifer
Aug 28, 2001
52,844
1,049
126
Originally posted by: Firus
can't you make a new table with the MY_ID as a primary key and then select all the data into the new table from the old table, that will tell you the dupes :) Why is My_ID not primary key anyways?

Edit: otherwise, try this...


select distinct(my_id), count(my_id) as numrows from mytable
group by my_id
having count(my_id) > 1
Just read-only against the db so I can't create/manipulate any other tables. The query is actually more complicated with joins to 2 other tables, but I just wanted to simplify the problem. (in reality, DESCRIPTION is the only column out of 8 others I'm pulling that is causing the distinct to fail).

With your suggestion, would I simply add the other columns in there before the FROM statement and it will produce those columns within the result, while the rest of the query acts as the distinct on my_id ? (see below for results)
 

rh71

No Lifer
Aug 28, 2001
52,844
1,049
126
I add the other columns into your SELECT statement (because I need them) and also into the GROUP BY clause because of this error:

SQL0122N A SELECT statement with no GROUP BY clause contains a column name and a column function in the SELECT clause, or a column name is contained in the SELECT clause but not in the GROUP BY clause. SQLSTATE=42803

making my query to be:
select distinct(my_id), count(my_id) as numrows, description, name from mytable
group by my_id, description, name
having count(my_id) > 1


And it's giving me the same error:

SQL0134N Improper use of a string column, host variable, constant, or
function "DESCRIPTION". SQLSTATE=42907

Looks like it doesn't like DESCRIPTION in there for GROUP BY either. Also, removing DESCRIPTION from this query worked, but generated 0 rows.
 

torpid

Lifer
Sep 14, 2003
11,631
11
76
Originally posted by: rh71
I add the other columns into your SELECT statement (because I need them) and also into the GROUP BY clause because of this error:

SQL0122N A SELECT statement with no GROUP BY clause contains a column name and a column function in the SELECT clause, or a column name is contained in the SELECT clause but not in the GROUP BY clause. SQLSTATE=42803

making my query to be:
select distinct(my_id), count(my_id) as numrows, description, name from mytable
group by my_id, description, name
having count(my_id) > 1


And it's giving me the same error:

SQL0134N Improper use of a string column, host variable, constant, or
function "DESCRIPTION". SQLSTATE=42907

Looks like it doesn't like DESCRIPTION in there for GROUP BY either. Also, removing DESCRIPTION from this query worked, but generated 0 rows.

I'm not sure about the table design, if you exclude description, does it return the right data without description? That is, if you just got my_id and name, would it have the right rows, but you need the description for those rows? If that is the case, try a subquery maybe?

SELECT SUBTABLE.MY_ID, SUBTABLE.NAME, MAINTABLE.DESCRIPTION
FROM (SELECT DISTINCT MY_ID, NAME FROM MYTABLE) SUBTABLE INNER JOIN MYTABLE MAINTABLE ON SUBTABLE.MY_ID = MAINTABLE.MY_ID

something like that anyway. If a particular my_id and name might have multiple descriptions, then it's going to be a troublesome query.
 

Firus

Senior member
Nov 16, 2001
525
0
0
I think this is your answer...the SQL Message reference. If description is 1048 bytes long....

An expression resulting in a string data type with a maximum length greater than 254 bytes is not permitted in:

* A SELECT DISTINCT statement
* A GROUP BY clause
* An ORDER BY clause
* A column function with DISTINCT
* A SELECT or VALUES statement of a set operator other than UNION ALL.

An expression resulting in a LONG VARCHAR or LONG VARGRAPHIC data type is not permitted in:

* A predicate other than EXISTS or NULL
* A column function
* The SELECT clause of a subquery of a predicate other than EXISTS or NULL
* The SELECT clause of a subselect in an INSERT statement
* The value expression of a SET clause in an UPDATE statement unless the expression is a LONG VARCHAR or LONG VARGRAPHIC host variable
* A SELECT statement of a set operator (except UNION ALL)
* VARGRAPHIC scalar function.
 

statik213

Golden Member
Oct 31, 2004
1,654
0
0
Try this

SELECT COUNT(MY_ID) AS 'C_MY_ID', MY_ID
FROM table1
GROUP BY C_MY_ID
HAVING C_MY_ID > 1


This will select all MY_IDs that are repeated... you can throw in other fields but group them accordingly so that the count doesn't get messed up....
 

Firus

Senior member
Nov 16, 2001
525
0
0
Originally posted by: statik213
Try this

SELECT COUNT(MY_ID) AS 'C_MY_ID', MY_ID
FROM table1
GROUP BY C_MY_ID
HAVING C_MY_ID > 1


This will select all MY_IDs that are repeated... you can throw in other fields but group them accordingly so that the count doesn't get messed up....

this one won't work as you can't group by 'COUNT'd fields (IIRC)...but this should work (basically, my previous one, but without the distinct)


SELECT COUNT(MY_ID) AS my, MY_ID
FROM mytable
GROUP BY MY_ID
HAVING count(MY_ID) > 1
 

statik213

Golden Member
Oct 31, 2004
1,654
0
0
Same in principle, eh?

Originally posted by: Firus
Originally posted by: statik213
Try this

SELECT COUNT(MY_ID) AS 'C_MY_ID', MY_ID
FROM table1
GROUP BY C_MY_ID
HAVING C_MY_ID > 1


This will select all MY_IDs that are repeated... you can throw in other fields but group them accordingly so that the count doesn't get messed up....

this one won't work as you can't group by 'COUNT'd fields (IIRC)...but this should work (basically, my previous one, but without the distinct)


SELECT COUNT(MY_ID) AS my, MY_ID
FROM mytable
GROUP BY MY_ID
HAVING count(MY_ID) > 1