Best way to structure relationships in a no-SQL database?

fuzzybabybunny · Aug 23, 2014

I'm using MongoDB. Let's not get into the whole argument of "MongoDB sucks don't use it."

I need to use MongoDB. Let's just leave it at that for now.

I know that MongoDB isn't relational but information sometimes is. So what's the most efficient way to reference these kinds of relationships to lessen database load?

*** Tinder "matches" ***

There are many users in a Users collection. They get matched to each other.

So I'm thinking:

Code:

Document 1:

{
	_id: "d3fg45wr4f343",
	firstName: "Bob",
	lastName: "Lee",
	matches: [
	  "ferh823u9WURF",
	  "8Y283DUFH3FI2",
	  "KJSDH298U2F8",
	  "shdfy2988U2Ywf"
	]
}

Document 2:

{
	_id: "d3fg45wr4f343",
	firstName: "Cindy",
	lastName: "Doe",
	matches: [
	  "d3fg45wr4f343"
	]
}

Would this work OK if there were, say, 10,000 users and you were on Bob's profile page and you wanted to display the firstName of all of his matches?

Any alternative structures that would work better?

*** Online Forum ***

Maybe you've got the following collections?

Users
Topics

Code:

Users Collection:

{
  _id: "d3fg45wr4f343",
  userName: "aircon",
  avatar: "234232.jpg"
}

{
  _id: "23qdf3a3fq3fq3",
  userName: "spider",
  avatar: "986754.jpg"
}

Topics Collection Version 1

One example document in the Topics Collection:

{
title: "A spider just popped out of the AC",
dateTimeSubmitted: 201408201200,
category: 5,
posts: [
	{
		message: "I'm going to use a gun.",
		dateTimeSubmitted: 201408201200,
		author: "d3fg45wr4f343"
	},
	{
		message: "I don't think this would work.",
		dateTimeSubmitted: 201408201201,
		author: "23qdf3a3fq3fq3"
	},
	{
		message: "It will totally work.",
		dateTimeSubmitted: 201408201202,
		author: "d3fg45wr4f343"
	},
	{
		message: "ur dumb",
		dateTimeSubmitted: 201408201203,
		author: "23qdf3a3fq3fq3"
	}
]
}

Topics Collection Version 2

One example document in the Topics Collection. The author's avatar and userName are now imbedded in the document. I know that:

1. This is not DRY.
2. If the author changes their avatar and userName, these change would need to be updated in the Topics Collection and in all of the post documents that are in it.
3. BUT it saves the system from querying for all the avatars and userNames via the authors' ID every single time this thread is viewed on the client.

{
title: "A spider just popped out of the AC",
dateTimeSubmitted: 201408201200,
category: 5,
posts: [
	{
		message: "I'm going to use a gun.",
		dateTimeSubmitted: 201408201200,
		author: "d3fg45wr4f343",
		userName: "aircon",
		avatar: "234232.jpg"
	},
	{
		message: "I don't think this would work.",
		dateTimeSubmitted: 201408201201,
		author: "23qdf3a3fq3fq3",
	  userName: "spider",
	  avatar: "986754.jpg"
	},
	{
		message: "It will totally work.",
		dateTimeSubmitted: 201408201202,
		author: "d3fg45wr4f343",
		userName: "aircon",
		avatar: "234232.jpg"
	},
	{
		message: "ur dumb",
		dateTimeSubmitted: 201408201203,
		author: "23qdf3a3fq3fq3",
	  userName: "spider",
	  avatar: "986754.jpg"
	}
]
}

So yeah, I'm not sure which is best...

fuzzybabybunny · Aug 24, 2014

Damn, does my question suck that bad?

BrightCandle · Aug 25, 2014

Its a trickier question than perhaps you realise because it all depends on usage. One of the problems with some of the NoSQL implementations is its not really possible to link documents together like you would link rows in a relational database. So instead what happens is that you normally denormalise your data to be suitable for the usage, and then write code for updating that ensures all the places where the data is used gets updated.

But how you go about doing this, and to what extent you do the copying really depends on all your use cases. You definitely want to avoid on a single use case a 1+n load scenario but at the same time sometimes that wont be a problem. But its a good idea to stop thinking quite so relational in your data, some duplication is going to be necessary to get efficient use out of NoSQL via Mongo.

fuzzybabybunny · Aug 25, 2014

BrightCandle said:
Its a trickier question than perhaps you realise because it all depends on usage. One of the problems with some of the NoSQL implementations is its not really possible to link documents together like you would link rows in a relational database. So instead what happens is that you normally denormalise your data to be suitable for the usage, and then write code for updating that ensures all the places where the data is used gets updated.

But how you go about doing this, and to what extent you do the copying really depends on all your use cases. You definitely want to avoid on a single use case a 1+n load scenario but at the same time sometimes that wont be a problem. But its a good idea to stop thinking quite so relational in your data, some duplication is going to be necessary to get efficient use out of NoSQL via Mongo.

Ohhhh... I see.

I guess I'm kind of missing the fundamental differences between the two database types.

Correct me if I'm wrong:

There aren't any hard links between documents in a No-SQL database. If you wanted a "relationship," you'd have to embed some kind of unique key (an ID) into a parent document and then use that unique key to do a lookup for that other document that has that ID. So as far as queries go, you do a query for the parent document, find the ID that's embedded, and then you do another query for that other document.

So a minimum of two queries.

When you make that "relationship" you have to have code that embeds that ID into that document.

When you destroy that "relationship" you have to also be diligent to create code to delete that ID that was embedded in that document on creation.

For an SQL database, the relationships are hard relationships. You also embed the ID of one entity into the fields of a parent entity, but then you define the actual relationship.

And then, when you do a query, it's all just one query. Really fast.

And I guess that on creation and destruction, all of the relationship creating/destroying is handled for you automatically as well?

purbeast0 · Aug 25, 2014

in relational db, all of the "relations" in the creation/destroying you still have to set up with the ORM that you are using. i've done hibernate, and you still have to make sure you have all of that stuff set up properly or you will get orphans or just errors when trying to delete stuff.

also, doing a query of 1 query in a relational database that has joins isn't automatically "really fast". you can have some very nasty joins that make things far from "really fast" and are actually really slow.

deleting the relationship between the 2 things in mongo isn't a big deal either - just do an update that removes the id from the "matches" object and that is it. you don't have to do anything on the other item because the other item doesn't know or care about the relationship.

Cerb · Aug 28, 2014

In a relational database, all information is in relations (tables with named and typed columns, in which each row, and the table as a whole, is a set). Those relations then can have relationships implemented.

If a user has modified a document, you might for example have:
user (ref_user)
doc (ref_doc)
doc_ver (ref_doc=doc(ref_doc), user=user(ref_user), modtime)
And really, you'd add an SK there, too:
doc_ver (ref_doc=doc(ref_doc), user=user(ref_user), modtime, doc_ver), where doc_ver is itself unique

What makes an RDB special is that all such information is laid out in a flat manner, with all the referenced values in the "child" tables, and constraints put on the values; whereas a graph database (which most OOP programs are themselves analogous to) would have system-specific links to the referring data.

SQL DBMSes without all the constraints will fly, as will popular NoSQL DBMSes, given many computers to run them on (MySQL as a node-local store, FI, was popular prior to the rise/resurgence of "NoSQL" document and graph DBMSes). Maintaining those relationships is part of what slows them down over time, and makes scaling them out a bit tricky. If you need an object store more than a data manager, SQL gets in the way (unless what you want is an ACID CSV file, in which case it's perfect). If you're stuck with an ORM that only works well with Active Record, exponentially so.

MongoDB appears to support use of data references, that you should be able to use to make ad-hoc graphs with, giving you relationships:
http://docs.mongodb.org/manual/tuto...-one-to-many-relationships-between-documents/

fuzzybabybunny said:
For an SQL database, the relationships are hard relationships. You also embed the ID of one entity into the fields of a parent entity, but then you define the actual relationship.

You may define an ID, but that is optional (though often a good idea), unless you're using annoying frameworks to handle your ORM, like RoR. The logical key (which must be unique) is what makes the values identifiable and linkable.

And I guess that on creation and destruction, all of the relationship creating/destroying is handled for you automatically as well?

If it's an SQL database used as a backing store, because it's the easiest option, it doesn't matter one iota. There may not even be relationships to create or destroy; if there are, you don't much care (the ORM is doing it for you anyway). If it's a relational database, those are handled slowly and carefully.

Search

Best way to structure relationships in a no-SQL database?

fuzzybabybunny

Moderator<br>Digital & Video Cameras

fuzzybabybunny

Moderator<br>Digital & Video Cameras

BrightCandle

Diamond Member

fuzzybabybunny

Moderator<br>Digital & Video Cameras

purbeast0

No Lifer

Cerb

Elite Member

TRENDING THREADS