• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Social website "friends lists". How is the data handled? Table design...

How would you think that busy sites like facebook and myspace manage each users' friends list. With the sites having millions if not hundreds of millions of users, and some users have thousands of friends, one giant table with all friend connections seems like a bad idea.

Each user get it's own table? Then you could potentially end up with millions of tables. That doesn't seem like good design either.

Granted, a table would only really need to be a few columns.

record id (bigINT), member_id(int), friend_id(int),

If you had 100million users and on average 100 friends per user, that's 10billion records. But it'd be 10billion small, tiny records with only integer based columns. While that'd be a huge record set, you could probably do a lot with caching, and creating different views.

Or, just do a table for each letter of the alphabet. If your last name begins with T, you go int the T table for friends. Then that table only contains the friend records of members who's last name begins with T. That would only require 26 tables, and could then cut 10billion records into tables with only 390million records.

DBA's, how would you approach this?
 
A join table w/ the proper indexes.

Yeah, I can't think of a better way than that. I have no idea how many friends the average Facebook user has, but I bet it is in the mid-double digits. Assuming 500m users, if it were only 10 average friends per user you'd have 5b rows. That's not a small table in anyone's world, so I'm assuming they do some sort of horizontal partitioning.
 
One large table as you described above.

Index on member_id.

Relational databases can create binary trees to find the data quickily and will do the "table for each letter of the alphabet" stuff internally with it's leafs and nodes and branches and roots, etc.
 
I know SQL 2005+ can do database table partitions, but I'm no sure if performance is a reason to use them or not. Just thought I'd mention it.
 
I would personally make 1 table with index on each column. Let the efficiency come from the queries and let the RDMS handle load balance.
 
A join table w/ the proper indexes.

If you do anything else other than this... you'll run into problems coding it.

With the proper setup.. caching, indexes, etc. You'll never know the table was that large.

Also you have to realize.. the scripting could be caching your friends list as well. As you'd be the only one to modify it, and then it would modify your cache, then database.

Edit: With the example of facebook you could end up with fragmentation based on region (country code), etc. But again their coding framework could involve all of the checks for this, and adjust accordingly.
 
Last edited:
Back
Top