SQL Inner Join question (fetching newest record only)

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Hi Guys,

I have a sql statement that i need modded, and I'm not quite sure the best way to get the results I need.

SELECT tmm.message_id, tmm.content_id, tmm.sender_id, tmm.new, tmm.replied, tmm.flagged, tmm.sent_date, tmm.viewed_date, tmmc.subject, tmmc.category, tmmc.body, tmcc.first_name, tmcc.middle_name, tmcc.last_name

FROM tbl_member_messages tmm

INNER JOIN tbl_member_messages_content tmmc on tmm.content_id = tmmc.content_id

INNER JOIN members.dbo.tbl_member_core_contact tmcc on tmm.sender_id = tmcc.member_id

WHERE (tmm.recipient_id = <cfqueryparam value="#session.myid#">) and (recipient_deleted <> 1) and (recipient_purged <> 1) and (spam <> 1)


I'm writing a small messaging app, and this is the sql statement for the 'inbox' and it fetches all messages that apply. It works as right now, each new message from another member would get it's own row, but I only want to fetch the 'newest' message.

For instance, say user 'johndoe' send me 'techboyjk' 5 messages. The sql statement above would list each message in the inbox. However, I only want to fetch the newest message from each user, and order the result set by date.

Thoughts?
 

Homerboy

Lifer
Mar 1, 2000
30,890
5,001
126
Couldn't you do a
"SELECT TOP 1 tmm.message_id, tmm.content_id, ...
...WHERE (tmm.recipient_id = <cfqueryparam value="#session.myid#">) and (recipient_deleted <> 1) and (recipient_purged <> 1) and (spam <> 1)

ORDER BY tmm.sent_date"
 

Homerboy

Lifer
Mar 1, 2000
30,890
5,001
126
Oh wait no... that wouldn't give you each latest msg from each sender. That'd just give you the latest msg.

Could you do a GROUP BY tmm.sender_id
and an ORDER BY tmm.sent_date DESC?
 

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Yea.. not sure. I've been throwing a few things at it and it either gives the wrong results, or breaks it.
 

BoberFett

Lifer
Oct 9, 1999
37,562
9
81
What database? In SQL Server this could be done with PARTITION.

Without that sort of functionality in whatever database you're using, you could cobble something together by using a subquery to get the list of only those messages you want.

select * from messages where messageid in ((select max(messageid) from messages group by senderid)
 

Evadman

Administrator Emeritus<br>Elite Member
Feb 18, 2001
30,990
5
81
Code:
SELECT 
    tmm.message_id, 
    tmm.content_id, 
    tmm.sender_id, 
    tmm.new, 
    tmm.replied, 
    tmm.flagged, 
    tmm.sent_date, 
    tmm.viewed_date, 
    tmmc.subject, 
    tmmc.category, 
    tmmc.body, 
    tmcc.first_name, 
    tmcc.middle_name, 
    tmcc.last_name
FROM 
    tbl_member_messages tmm
    INNER JOIN tbl_member_messages_content tmmc on tmm.content_id = tmmc.content_id
    INNER JOIN members.dbo.tbl_member_core_contact tmcc on tmm.sender_id = tmcc.member_id
WHERE
    tmm.recipient_id = <cfqueryparam value="#session.myid#">
    and recipient_deleted <> 1
    and recipient_purged <> 1
    and spam <> 1
    AND tmm.message_id IN 
    (
        SELECT message_id FROM
            (
            SELECT 
                MAX(tmm2.message_id) as message_id
            FROM 
                tbl_member_messages tmm2
                INNER JOIN tbl_member_messages_content tmmc2 on tmm2.content_id = tmmc2.content_id
                INNER JOIN members.dbo.tbl_member_core_contact tmcc on tmm2.sender_id = tmmc2.member_id
            WHERE
                tmm2.recipient_id = <cfqueryparam value="#session.myid#">
                and recipient_deleted <> 1
                and recipient_purged <> 1
                and spam <> 1
            GROUP BY
                tmm2.sender_id
            ) as whoreallycares
    )
ORDER BY
    tmm.sent_date
I didn't really feel like asking lots of questions to see if other ways of making this work would run better, but this should do what you want. Feel free to look up the LIKE keyword, and check your table structure for other ways of getting this done that will be faster. There's an extra subquery in here, but I'm too lazy to remove it.
 
Last edited:

BoberFett

Lifer
Oct 9, 1999
37,562
9
81
I didn't really feel like asking lots of questions to see if other ways of making this work would run better, but this should do what you want. Feel free to look up the LIKE keyword, and check your table structure for other ways of getting this done that will be faster. There's an extra subquery in here, but I'm too lazy to remove it.

Evadman did the same thing I suggested, but just did your homework for you. :p
 

KIAman

Diamond Member
Mar 7, 2001
3,342
23
81
This is a topic that I find programmers are generally very weak.

A programmer's brain has a hard time making the leap to set math and relational algrebra. A programmer's brain wants to write minimal amount of code with as much efficiency and readability as possible. I often see SQL written much like a method/procedure/function in a programming language.

I can't count the number of times I've seen erroneous uses of cursors and loops because it makes the statement more programmer-logic-oriented.

In transact SQL, you can't think like a programmer. TechBoyJK, start thinking of SQL like sets of stuff with a bunch of properties and you have unlimmited virtual space. Don't be afraid to find your own sets and subsets to get the information you need. And in most cases, forming your query like this will be faster because relational DBMS are designed for set operations.

You want to find the most recent single message for each user, here is the thought process you can take to find your solution.

1. Identify the columns you want to return
2. Identify the tables those columns live
3. Identify the relationships between those tables
4. Identify the filters and apply them
5. (this is the tricky part) find your special condition (for example, only want 1 msg per from-user)
6. Check if any table has a column to help with the special condition
7. If not, create your own inline query that will form a result set that gives you the special condition result.
8. Repeat steps 5 through 7 for each special condition
9. Apply steps 2 and 3 for the new result set
10. Optimize statement by joining the tables in order of smaller to bigger table
11. Optimize statement by applying filters from most restrictive to least restrictive

I would have done something similar to Evadman but I would make the "select max(msg_id)" statement as a subquery as the first table to be joined rather than an IN clause.

Just keep in mind, think wide and bandwidth, unlimmited virtual space with very fast and optimized key pointers.
 

BoberFett

Lifer
Oct 9, 1999
37,562
9
81
I would have done something similar to Evadman but I would make the "select max(msg_id)" statement as a subquery as the first table to be joined rather than an IN clause.

I think if you look at the output in a query analyzer, you may find that the IN and a JOIN with similar criteria end up being the same thing.
 

Evadman

Administrator Emeritus<br>Elite Member
Feb 18, 2001
30,990
5
81
I think if you look at the output in a query analyzer, you may find that the IN and a JOIN with similar criteria end up being the same thing.

Exactly. The IN clause is easier to read (to me anyway), and the query engine (for MS anyway, and probably all RDBM's) will rewrite it as an inner join.

Also, please keep in mind that I am a lazy bastard, so as long as I can get the same result, at the same speed, without an effect on portability or documentation, I'm going to do it in the quickest way possible. Expecially when it is an exercise on a forum and not a production system.
 

BoberFett

Lifer
Oct 9, 1999
37,562
9
81
Exactly. The IN clause is easier to read (to me anyway), and the query engine (for MS anyway, and probably all RDBM's) will rewrite it as an inner join.

Also, please keep in mind that I am a lazy bastard, so as long as I can get the same result, at the same speed, without an effect on portability or documentation, I'm going to do it in the quickest way possible. Expecially when it is an exercise on a forum and not a production system.

I've written some extremely complex queries and it frequently seemed that even after refactoring completely in an attempt to streamline things a bit, the execution plan wouldn't change much. Modern query engines are pretty incredible at what they do and what they let the query writer get away with. :)
 

KIAman

Diamond Member
Mar 7, 2001
3,342
23
81
Exactly. The IN clause is easier to read (to me anyway), and the query engine (for MS anyway, and probably all RDBM's) will rewrite it as an inner join.

Also, please keep in mind that I am a lazy bastard, so as long as I can get the same result, at the same speed, without an effect on portability or documentation, I'm going to do it in the quickest way possible. Expecially when it is an exercise on a forum and not a production system.

Please don't take the impression my post was criticism of your suggestion, in any way. I think your query is great. And you and boberfett are absolutely right that in a lot of cases (especially in MS SQL 2005, 2008), the IN and inner join will run the same execution plan. Furthermore, I think you went above and beyond what I would have expected to help the OP.

I've done a lot of mentoring with my co-workers on how to align their thought process to write transact SQL. My post was a very condensed version of what I have preaching at my workplace.

In summary, I am humbled by your SQL skills. My post was directed to the OP, just using your example to let me be lazy and not posting code.
 

BoberFett

Lifer
Oct 9, 1999
37,562
9
81
And let me say, I think your critique of the way many programmers think is accurate as well. Rather than thinking in terms of how to merge large datasets, some programmers get too wrapped up in iterative processes on small chunks of data. I have seen T-SQL get used as a crutch, just as you have. I used to work with a guy who was a really great young programmer, but after looking at some crazy cursored up T-SQL of his for a while, I showed him how to do the same thing using straight SQL and 10x as fast. He looked kind of shocked. I've never put it down formally as you have, but your 11 step process is basically how I write queries.
 

Evadman

Administrator Emeritus<br>Elite Member
Feb 18, 2001
30,990
5
81
Please don't take the impression my post was criticism of your suggestion, in any way. I think your query is great.

I probably worded my response poorly; I wasn't taking it as criticism. I was just doing some more explaining on why I did what I did. Besides, criticism is a good thing (especially constructive like yours) as it leads to better programmers, and thus programs.

You are also right on when explaining some of the issues a programmer will have when they start working with a set based engine (DB's) instead of a one at a time engine (.net, c#, etc). I can usually get a pretty good feel for how much experience a T-SQL programmer has by counting how many cursors they use. Same thing with goto's and transactions. I swear, some of the stuff I see looks like the DBA took some DAO code and, line for line, turned it into SQL. I wanted to smack that guy.
 
Last edited:

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Ok, that pretty much worked. Thanks Evadman!

Question. I'd also like to count the number of messages each user has.

So that it's only going to list the newest message, but I'd also like to know how many new messages there are.

For instance, JoeBlow sends me 5 messages. My inbox will only show his latest one, but I'd also like a column in the result set to show (5).
 

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Ok.. besides the question above, I have another sql problem.

When the user goes to view the messages, I only want the 5 newest to be displayed at first. 5 will be the default, but they will be able to select to view more.

So I will just use 'Select TOP 5'

Problem is, since the older messages are first, Select TOP 5 grabs the oldest 5 messages. If I use 'order by sent_date' desc it does give me the 5 newest, but the sorting is backwards.

I'm wanting the newest record to be at the top, and as the list goes down, it grows older.
 

Evadman

Administrator Emeritus<br>Elite Member
Feb 18, 2001
30,990
5
81
Problem is, since the older messages are first, Select TOP 5 grabs the oldest 5 messages. If I use 'order by sent_date' desc it does give me the 5 newest, but the sorting is backwards.

You have a choice of doing that in the DB or in the program. Personally, that sounds like a good fit for doing in the application. If you really want the DB to do it, then you need to wrap the SQL statement in another statement to order again. I'm too lazy to use your actual fields, but you basicly need to do this:

Code:
Select * from (Select top 5 * from lotsofmessagestable order by age desc) as omgasubquery order by age asc

By the way, if you use * to select fields in a production system I will smack you.

I'd also like to count the number of messages each user has.

Use a subquery that gets the count of messages per user, then join it to the main query using an inner join and alias.
 

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Thanks.. I only use Select * when I'm trying to get something done. Once I get everything working I'll go back and figure out which columns were needed.

I figured I'd just have to requery and re-order but I wasn't sure if there was a better way.
 

BoberFett

Lifer
Oct 9, 1999
37,562
9
81
If you need more data from that grouped query, you might want to think about a temp table. Pull out that inner subquery, and create a temp table including the max(messageid) and count(messageid). Then you can use the temp table to join to the outer query and use the count value, and then in place of the full query in the subquery just query the temp table for the max values. Keep in mind that a subquery for an IN statement can only have a single column returned.
 

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
SELECT
tmm.message_id,
tmm.content_id,
tmm.sender_id,
tmm.new,
tmm.replied,
tmm.flagged,
tmm.sent_date,
tmm.viewed_date,
tmmc.category,
tmmc.body,
tmcc.first_name,
tmcc.middle_name,
tmcc.last_name,
tmpt.thumbnail_url,
tmpt.thumbnail_file
FROM
tbl_member_messages tmm
INNER JOIN tbl_member_messages_content tmmc on tmm.content_id = tmmc.content_id
INNER JOIN details_at_members.dbo.tbl_member_core_contact tmcc on tmm.sender_id = tmcc.member_id
INNER JOIN details_at_members.dbo.tbl_member_profile_thumbnail tmpt on tmm.sender_id = tmpt.member_id
WHERE
tmm.recipient_id = <cfqueryparam value="#session.myid#">
and recipient_archived <> 1
and recipient_purged <> 1
and spam <> 1
AND tmm.message_id IN
(
SELECT message_id FROM
(
SELECT
MAX(tmm2.message_id) as message_id
FROM
tbl_member_messages tmm2
INNER JOIN tbl_member_messages_content tmmc2 on tmm2.content_id = tmmc2.content_id
INNER JOIN details_at_members.dbo.tbl_member_core_contact tmcc on tmm2.sender_id = tmcc.member_id
WHERE
tmm2.recipient_id = <cfqueryparam value="#session.myid#">
and recipient_archived <> 1
and recipient_purged <> 1
and spam <> 1
GROUP BY
tmm2.sender_id
) as whoreallycares
)
ORDER BY
tmm.sent_date




Ok, this code works for fetching inbox messages. It only displays the newest message as well.

Now.. I allow users to archive their messages. Just as there is an 'Inbox' and a 'sent' box, there is also an 'archive' folder.

If you look at the inbox query above, it's based around the 'sender' since the sender is who sent you the message, so you want to display the name of the sender.

For the sent box, I simply replaced 'sender' with recipient and that then shows all messages I sent out, since I'm the recipient.

With the archive, it's both. Problem is, I want the 'other' party to be shown, regardless if they were the sender or receiver. Best way to go about this? Run two identical queries (inbox + sent) and change the code so that the archive bit must be positive? Then union them?

Edit: Or... could I just grab the contact info for both sender and receiver and do an if/then statement. If sender eq ME show Recipient else show sender
 
Last edited:

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Ok, I'm going to just pull the info for both sender/recipient.

I'm using an if/then to determine whether the user is the sender or recipient. If the user is the sender, it shows the recipient of the message, and vice versa.

However, I still need to integrate the 'INTO' select statement component that consolidates multiple messages into 1 line showing only the newest.

Here's what I have so far.

SELECT
tmm.message_id,
tmm.content_id,
tmm.sender_id,
tmm.recipient_id,
tmm.new,
tmm.replied,
tmm.flagged,
tmm.sent_date,
tmm.viewed_date,
tmmc.category,
tmmc.body,
tmccs.first_name as sender_first_name,
tmccs.middle_name as sender_middle_name,
tmccs.last_name as sender_last_name,
tmccr.first_name as recipient_first_name,
tmccr.middle_name as recipient_middle_name,
tmccr.last_name as recipient_last_name,
tmpts.thumbnail_url as sender_thumbnail_url,
tmpts.thumbnail_file as sender_thumbnail_file,
tmptr.thumbnail_url as recipient_thumbnail_url,
tmptr.thumbnail_file as recipient_thumbnail_file
FROM
tbl_member_messages tmm
INNER JOIN tbl_member_messages_content tmmc on tmm.content_id = tmmc.content_id
INNER JOIN details_at_members.dbo.tbl_member_core_contact tmccs on tmm.sender_id = tmccs.member_id
INNER JOIN details_at_members.dbo.tbl_member_core_contact tmccr on tmm.recipient_id = tmccr.member_id
INNER JOIN details_at_members.dbo.tbl_member_profile_thumbnail tmpts on tmm.sender_id = tmpts.member_id
INNER JOIN details_at_members.dbo.tbl_member_profile_thumbnail tmptr on tmm.recipient_id = tmptr.member_id
WHERE
(tmm.recipient_id = <cfqueryparam value="#session.myid#">
and recipient_archived = 1
and recipient_purged <> 1)
or
(tmm.sender_id = <cfqueryparam value="#session.myid#">
and sender_archived = 1
and sender_purged <> 1)
and spam <> 1