Simple MySQL join / schema question (combine a fixed list with a variable list)

scootermaster · Aug 8, 2013

I have a number of lists I need to iterate through. There'll be one list (call it MAIN) and then each specific instance will have it's own [smaller] list (SPECIFIC1, SPECIFIC2, etc.).

First question: Do I :

a). Store this all in one table, as (primary key, TYPE (MAIN, SPECIFIC1, SPECIFIC2, etc), data)
b). Have one table for MAIN (PK, data), and then one for ALL the others (PK, TYPE, data)
c). One for main and one table for EACH specific (PK, data)?

Second question:

What would a SQL query look like that would return all the data from the main and from any one specific, given the 1, 2, 3...I assume it's a simple join, but I haven't done anything more complicated than a select * in ages.

Thanks so much!

Hmongkeysauce · Aug 8, 2013

Assuming you dont care for order, something like this would be best:

Id, Data, ParentId(Nullable)

If ParentId is null, you know it's the parent list. With this design, you'll have to use a recursive CTE(see http://technet.microsoft.com/en-us/library/ms186243(v=sql.105).aspx) query to retrieve parent and its children. Something like below will work:

WITH CTE AS (
SELECT * from List l where l.Id = 1
UNION ALL
SELECT * from CTE, List where CTE.ParentId = List.Id
)
SELECT * FROM CTE

Cerb · Aug 8, 2013

There'll be one list (call it MAIN) and then each specific instance will have it's own [smaller] list (SPECIFIC1, SPECIFIC2, etc.).

What are the relationships between MAIN and the sublists? Is there a single list per MAIN entry? Are the sublists all of the same data type (including value constraints)? Can any MAIN entries point to the same sublist entries as other MAIN entries? Does order matter? If so, what needs to be ordered, and how?

a). Store this all in one table, as (primary key, TYPE (MAIN, SPECIFIC1, SPECIFIC2, etc), data)

If every entry in MAIN will have all SPECIFIC columns filled, you could. Probably not the best idea, even so.

b). Have one table for MAIN (PK, data), and then one for ALL the others (PK, TYPE, data)

If the TYPE column is purely metadata, yes, but you'll also need a FK.

c). One for main and one table for EACH specific (PK, data)?

If the SPECIFIC columns are each different data types, yes, but you'll want a FK.

scootermaster · Aug 8, 2013

Cerb said:
What are the relationships between MAIN and the sublists? Is there a single list per MAIN entry? Are the sublists all of the same data type (including value constraints)? Can any MAIN entries point to the same sublist entries as other MAIN entries? Does order matter? If so, what needs to be ordered, and how?

If every entry in MAIN will have all SPECIFIC columns filled, you could. Probably not the best idea, even so.
If the TYPE column is purely metadata, yes, but you'll also need a FK.
If the SPECIFIC columns are each different data types, yes, but you'll want a FK.

Thanks for responding.

I thought I specified the problem, but I guess not. It's really not that complicated.

No ordering. All same [and simple] datatype.

There are no relationships between the two lists. The only thing, that I guess I didn't mention clearly is that each context will have ONLY the main list and it's specific contextual list. So, if these were keywords the main list might be "red", "white" and "blue", and then Context A might have "orange" and "pink" (in addition to red, white and blue), and context B might have "purple" and "black" (in addition to red white and blue).

That's it.

So I guess I could store it like
TABLE MAIN
PK, red, main
PK, white, main
PK, blue, main
PK, orange, A
PK, pink, A
PK, purple, B
PK, black, B

or

TABLE MAIN
PK, red
PK, white
PK, blue

TABLE CONTEXT
PK, orange, A
PK, pink, A
PK, purple, B
PK, black. B

or

TABLE MAIN
PK, red
PK, white
PK, blue

TABLE A
PK, orange
PK, pink

TABLE B
PK, purple
PK, black

I will always know what context I'm in, and I'm only in one context at a time. So, to check the keywords, I'd need a join that returns MAIN and the keywords in context X, for whichever context I happen to be in.

Make sense?

Thanks again for the help!

Cerb · Aug 8, 2013

It reads like, honestly, that dealing with an SQL DBMS alone is making more complicated that it will really need to be.

However, anything you might do joins on does have a relationship, and whatever that relationship will be more or less guides how you'd best arrange it. Except for a usually-useless full join, the other joins operate on the relationships between data in the base tables.

Are the arrangements like so?

TABLE MAIN {int auto_increment, typeA color, typeB context}

or

TABLE MAIN {int auto_increment, typeA color}
TABLE CONTEXT {int auto_increment, typeA color, typeB context}

or

TABLE MAIN {int auto_increment, typeA color}
TABLE A {int auto_increment, typeA color}
TABLE B {int auto_increment, typeA color}

The last once works, but it typically only done for performance needs, when single tables get too big, and are constantly thrashing in and our of RAM (selecting from two or more identical tables is done most easily by a UNION, like in Hmonkeysauce's post). The middle one is redundant, since there is an implied "main" context.

scootermaster · Aug 8, 2013

Thanks for all the feedback, guys. Maybe I meant a union, not a join.

And yes, those tables schemas are identical to the one I set up.

And you're right, there's no reason this needs to be done via a DB. In fact, right now, it's not. It's reading the data from files. I can easily iterate through the main list, figure out my context, and then pick the appropriate file and iterate through that list. And I can easily do the same (select * from MAIN, followed by select * from APPROPRIATE_CONTEXT) but I assumed there was a way of doing it with one query.

Cerb · Aug 8, 2013

With separate tables, you would use a unions, like:

Code:

SELECT * FROM MAIN
UNION ALL -- all is optional, and allows bag semantics
SELECT * FROM A

But, if that's simple, it's probably not worth over-thinking either way. While SQL is always annoying, as a language, an SQL DBMS is absolutely great when you have to change data formats, or look at the same data several different ways.

When all you really need a is a key-value store, if you want to use Access or something to make a nice GUI, it probably doesn't matter how you lay it out, as long as you take care to set the right constraints...but, at the same time, backups take care of the major problems of flat files, in that case, usually (in that DBMSes tend to either be resistant to, or offer features to help make them resistant to, corruption). And, if using a SQL DBMS and some front end is convenient because of already being set up (common in business environments), as long as you're not filling row after row with nulls or very long lists of columns, laying out the same as a big flat file will probably be OK, as long as there aren't lots of other users needing to access it concurrently.

scootermaster · Aug 8, 2013

Cerb said:
With separate tables, you would use a unions, like:

Code:

SELECT * FROM MAIN UNION ALL -- all is optional, and allows bag semantics SELECT * FROM A

But, if that's simple, it's probably not worth over-thinking either way. While SQL is always annoying, as a language, an SQL DBMS is absolutely great when you have to change data formats, or look at the same data several different ways.

When all you really need a is a key-value store, if you want to use Access or something to make a nice GUI, it probably doesn't matter how you lay it out, as long as you take care to set the right constraints...but, at the same time, backups take care of the major problems of flat files, in that case, usually (in that DBMSes tend to either be resistant to, or offer features to help make them resistant to, corruption). And, if using a SQL DBMS and some front end is convenient because of already being set up (common in business environments), as long as you're not filling row after row with nulls or very long lists of columns, laying out the same as a big flat file will probably be OK, as long as there aren't lots of other users needing to access it concurrently.

Perfect. Thanks for the feedback! I really appreciate it.

Cerb · Aug 9, 2013

Also, I should have probably added, with the first OTLT (One True Lookup Table, generally very bad form, but it reads like to really won't matter, and would be easy to normalize later--normally this would be blasphemy), it would be something like:

Code:

SELECT * FROM MAIN WHERE CONTEXT='main' or CONTEXT='A'

or

Code:

SELECT * FROM MAIN WHERE CONTEXT IN('main','A')

scootermaster · Aug 10, 2013

Cerb said:
Also, I should have probably added, with the first OTLT (One True Lookup Table, generally very bad form, but it reads like to really won't matter, and would be easy to normalize later--normally this would be blasphemy), it would be something like:

Code:

SELECT * FROM MAIN WHERE CONTEXT='main' or CONTEXT='A'

or

Code:

SELECT * FROM MAIN WHERE CONTEXT IN('main','A')

So, I guess that's a "bad" hammer to have in my bag, but just out of curiosity...if I did have something like that (i.e. an OTLT) with, the table like (the numbers are the primary key):

0, keywordA, main
1, keywordB, main
2, keywordC, context1
3, keywordD, context2

Is there a way of returning the entire thing, such that the results are in a PHP associative array ordered by context?

I.e.

Code:

 ['main'] => Array
        (
        [0] => Array
             {
                  [pk] => 1
                  [keyword] => keywordA
             )
        [1]
             {
                  [pk] => 2
                  [keyword] => keywordB
             )
        }

 ['context1'] => Array
        (
        [0]
             {
                  [pk] => 3
                  [keyword] => keywordC
             )
  
 ['context2'] => Array
        (
        [0]
             {
                  [pk] => 4
                  [keyword] => keywordD
             )

I guess, in other words, if I wasn't going to stick these things -- or something like this -- in different tables, what's the best way of grouping the results by a certain column, so I can iterate through them all?

Cerb · Aug 10, 2013

scootermaster said:
So, I guess that's a "bad" hammer to have in my bag

If the system starts growing more complex, or slows down due to waiting on access to that one big table, learn you some relational database theory. But, as long as it's fairly simple, and not too big, there's little to gain by making it more complicated, when there aren't real gains from it. As long as the data stays sufficiently simple, it will also be fairly easy to convert it into a proper relational DB (or non-relational big object store), should the need arise. When value types start relying on values from other columns, or you need dozens of columns in a single table, that's usually when you're getting into the flat file death spiral 🙂.

For example, you'll likely have many context duplicate entries. These can often be sped up by making an enum table, listing the contexts, os the contexts are 1-, 2-, or 4-byte ints, rather than varchars. The result is a smaller, and usually faster, main table, but requiring a separate table to keep up with and join to.

Code:

contexts {
    id int NOT NULL AUTO_INCREMENT PRIMARY KEY,
    context VARCHAR(255) NOT NULL UNIQUE
}
main {
    id int NOT NULL AUTO_INCREMENT PRIMARY KEY,
    id_context int FOREIGN KEY REFERENCES CONTEXTS(id),
    keyword varchar(255)
}

Code:

SELECT id, context, keyword FROM MAIN
INNER JOIN CONTEXTS ON main.id_context=contexts.id

Where the join is really just filling in the name of the context, for that query. It can be faster when looking up some contexts, in that you can fill in the context names, and then it can look for the integer id of that in the main table (even though you gave it strings), rather than string matching, which takes more time.

But, so long as you don't do any bad flat file stuff, like changing column value types based on other data, that kind of thing can be relatively easily done later, if needed (you could even write triggers to automatically manage it, with new base tables, so that users--including existing application code--won't have to even know that's going on).

0, keywordA, main
1, keywordB, main
2, keywordC, context1
3, keywordD, context2

Is there a way of returning the entire thing, such that the results are in a PHP associative array ordered by context?

For iterating over all of them, you'll probably just want to order the results, and loop through them with a current context state variable, or inner loop that you break out of (to re-enter with the new context value):

Code:

SELECT * FROM MAIN ORDER BY CONTEXT, PK ASC

For what matter, the same could be added to a smaller listing of them, too:

Code:

SELECT * FROM MAIN
WHERE CONTEXT IN('main', 'context1')
ORDER BY CONTEXT, PK ASC

Then, you can either treat it flat, or loop over it to add it to a set of arrays. A two-query straight-forward approach, where the first query is used to build that first level of container arrays, would be something like this combo:

Code:

SELECT DISTINCT CONTEXT FROM MAIN ORDER BY CONTEXT ASC;
SELECT PK, KEYWORD, CONTEXT, DATA FROM MAIN
ORDER BY CONTEXT, PK ASC;

Then do a for loop over the list of contexts, to put them in arrays named for each context.

Alternatively, while it may take more work than obvious, depending on DBMS, you could use aggregate functions. However, concatenating ones vary by implementation, including their limits (MySQL's group_concat, FI, requires a higher limit set, usually, as it defaults to a max of 1KB per result row).

scootermaster · Aug 12, 2013

Cerb said:
If the system starts growing more complex, or slows down due to waiting on access to that one big table, learn you some relational database theory. But, as long as it's fairly simple, and not too big, there's little to gain by making it more complicated, when there aren't real gains from it. As long as the data stays sufficiently simple, it will also be fairly easy to convert it into a proper relational DB (or non-relational big object store), should the need arise. When value types start relying on values from other columns, or you need dozens of columns in a single table, that's usually when you're getting into the flat file death spiral 🙂.

For example, you'll likely have many context duplicate entries. These can often be sped up by making an enum table, listing the contexts, os the contexts are 1-, 2-, or 4-byte ints, rather than varchars. The result is a smaller, and usually faster, main table, but requiring a separate table to keep up with and join to.

Code:

contexts { id int NOT NULL AUTO_INCREMENT PRIMARY KEY, context VARCHAR(255) NOT NULL UNIQUE } main { id int NOT NULL AUTO_INCREMENT PRIMARY KEY, id_context int FOREIGN KEY REFERENCES CONTEXTS(id), keyword varchar(255) }

Code:

SELECT id, context, keyword FROM MAIN INNER JOIN CONTEXTS ON main.id_context=contexts.id

Where the join is really just filling in the name of the context, for that query. It can be faster when looking up some contexts, in that you can fill in the context names, and then it can look for the integer id of that in the main table (even though you gave it strings), rather than string matching, which takes more time.

But, so long as you don't do any bad flat file stuff, like changing column value types based on other data, that kind of thing can be relatively easily done later, if needed (you could even write triggers to automatically manage it, with new base tables, so that users--including existing application code--won't have to even know that's going on).

For iterating over all of them, you'll probably just want to order the results, and loop through them with a current context state variable, or inner loop that you break out of (to re-enter with the new context value):

Code:

SELECT * FROM MAIN ORDER BY CONTEXT, PK ASC

For what matter, the same could be added to a smaller listing of them, too:

Code:

SELECT * FROM MAIN WHERE CONTEXT IN('main', 'context1') ORDER BY CONTEXT, PK ASC

Then, you can either treat it flat, or loop over it to add it to a set of arrays. A two-query straight-forward approach, where the first query is used to build that first level of container arrays, would be something like this combo:

Code:

SELECT DISTINCT CONTEXT FROM MAIN ORDER BY CONTEXT ASC; SELECT PK, KEYWORD, CONTEXT, DATA FROM MAIN ORDER BY CONTEXT, PK ASC;

Then do a for loop over the list of contexts, to put them in arrays named for each context.

Alternatively, while it may take more work than obvious, depending on DBMS, you could use aggregate functions. However, concatenating ones vary by implementation, including their limits (MySQL's group_concat, FI, requires a higher limit set, usually, as it defaults to a max of 1KB per result row).

It's funny; before I asked this, I totally thought of the "save state, and monitor the transitions" solution, but somehow I thought there must be a pure PHP/SQL solution that's more elegant. I run in to that problem a lot, actually. I often worry about performance in these high-abstracted languages, or worry about too many DB accesses, etc. not realizing that a). computers are damn fast, and b). that's what theses systems are meant to do.

Suffice it to say, I did it that way (get them all, track transitions) and it works just fine.

But I appreciate all your help and advice! It's been super beneficial!

Cerb · Aug 13, 2013

If it gets slow on account of the DB, or grows in complexity (more columns. more tables, and more uses), it can be made better. Also, if the DBMS is on the same computer as the application, sending tens of thousands of rows to PHP is no big deal. Towards that end, you want to avoid columns with data-dependent meaning (IE, if column A is "f" then column B's string means Y, but if A is "k" then B means Z) at all costs (more columns or tables, instead). As long as you avoid that trap, you will be able to convert the data into a better, normalized, form, with relative ease. Normalization can be the difference between queries being lightning-fast, and always waiting on the disk, when the individual tables just have too much crap, and are way too big.

SQL itself is simply not elegant. It was made at a time when the prevailing thought was that programming languages would become like human languages. Languages which did that ended up with the unnecessary complexity, and common ambiguity, of human languages, so that was a bad idea. But, SQL survived, and that's what we're stuck with. The concepts that help make the underlying DBMSes that use it as powerful as they are are elegant, but not SQL.

Simple MySQL join / schema question (combine a fixed list with a variable list)

scootermaster

Platinum Member

Hmongkeysauce

Senior member

Cerb

Elite Member

scootermaster

Platinum Member

Cerb

Elite Member

scootermaster

Platinum Member

Cerb

Elite Member

scootermaster

Platinum Member

Cerb

Elite Member

scootermaster

Platinum Member

Cerb

Elite Member

scootermaster

Platinum Member

Cerb

Elite Member

TRENDING THREADS