Database schema questions

scootermaster · Apr 9, 2008

So it's been about 11 years since my DB class, and I don't remember thing one about it (except, of course, that there's something called "the relational calculus". A lot of good that's going to do me, eh?).

Anyway, I have a couple of DB schema questions:

I have this "source" db, with about 15,000 rows (but only 6ish columns), that I am using to populate a "new" database. I'm writing the parser that'll do all that. So two issues:

1). I want to link the row(s) of the new database to the row(s) in the old database they came from. Does it make sense to just add the 5 or so fields to the new db row, or use a foreign key? I can't see needing the old information all that often, but I do want to keep it around.

2). I'm exploding the old db to a bunch of different "types". Right now I'm using about 7ish tinyints (bits) as flags for which "type" the row is (the old db has none of this). There will be many different entities (mostly "names") which could have rows of many different types, and it would be common to want all the entries for a given name, regardless of type. So I figured it made sense to have one table, and use the bits to determine type, so with one query you could get all the entries of a given name, no join needed. Is is better/cleaner/more efficient to use different tables for each types and then use foreign keys? I can't use the primary key of name, because a given name may have multiple entries for a given name.

Thanks for your thoughts!

KB · Apr 9, 2008

Perhaps you could draw the schema so we can better understand it, but here are my first impressions:

I have this "source" db, with about 15,000 rows (but only 6ish columns), that I am using to populate a "new" database. I'm writing the parser that'll do all that. So two issues:

1). I want to link the row(s) of the new database to the row(s) in the old database they came from. Does it make sense to just add the 5 or so fields to the new db row, or use a foreign key? I can't see needing the old information all that often, but I do want to keep it around.

If you can, insert them into the new database with the same ID they had in the old. If this isn't possible then adding a new foriegn key column would make sense. Adding the additional 5 fields to the new table doesn't make a lot of sense if only a small fraction of the rows will make use of these columns.

2). I'm exploding the old db to a bunch of different "types". Right now I'm using about 7ish tinyints (bits) as flags for which "type" the row is (the old db has none of this). There will be many different entities (mostly "names") which could have rows of many different types, and it would be common to want all the entries for a given name, regardless of type. So I figured it made sense to have one table, and use the bits to determine type, so with one query you could get all the entries of a given name, no join needed. Is is better/cleaner/more efficient to use different tables for each types and then use foreign keys? I can't use the primary key of name, because a given name may have multiple entries for a given name.

This violates third normal form, which is usually a bad idea. It would be better to have a TypeFK column that joins the row to a Table of type values. What this allows you to do is to add new types easily without having to add/alter your database schema. So for example adding new types is something a user could do, while issuing an ALTER TABLE ADD NewType DML statement is not something you want users doing. If a row can have multiple types then you would add a third table to link the row with the type in a many-to-many relationship.

scootermaster · Apr 9, 2008

Originally posted by: KB
Perhaps you could draw the schema so we can better understand it, but here are my first impressions:

I have this "source" db, with about 15,000 rows (but only 6ish columns), that I am using to populate a "new" database. I'm writing the parser that'll do all that. So two issues:

1). I want to link the row(s) of the new database to the row(s) in the old database they came from. Does it make sense to just add the 5 or so fields to the new db row, or use a foreign key? I can't see needing the old information all that often, but I do want to keep it around.

If you can, insert them into the new database with the same ID they had in the old. If this isn't possible then adding a new foriegn key column would make sense. Adding the additional 5 fields to the new table doesn't make a lot of sense if only a small fraction of the rows will make use of these columns.

2). I'm exploding the old db to a bunch of different "types". Right now I'm using about 7ish tinyints (bits) as flags for which "type" the row is (the old db has none of this). There will be many different entities (mostly "names") which could have rows of many different types, and it would be common to want all the entries for a given name, regardless of type. So I figured it made sense to have one table, and use the bits to determine type, so with one query you could get all the entries of a given name, no join needed. Is is better/cleaner/more efficient to use different tables for each types and then use foreign keys? I can't use the primary key of name, because a given name may have multiple entries for a given name.

This violates third normal form, which is usually a bad idea. It would be better to have a TypeFK column that joins the row to a Table of type values. What this allows you to do is to add new types easily without having to add/alter your database schema. So for example adding new types is something a user could do, while issuing an ALTER TABLE ADD NewType DML statement is not something you want users doing. If a row can have multiple types then you would add a third table to link the row with the type in a many-to-many relationship.

VERY good info.

I hadn't thought about the type addition issue, and it didn't occur to me that you could extract the types to different tables, not the entries of a given type.

As a general rule, I'm lazy, so if I can simplify queries, I usually do that, with the understanding that, for the most part, "speed" isn't going to be a huge concern (there aren't going to be millions of queries against this database per minute, and we're not dealing with hundreds of thousands of rows). So I figure I can cheat a bit.

But it's probably good to keep good practices.

So a query for [all entries of] a specific name will also have to include a query that will return the type, right? Not the end of the world, I guess.

What if there are fields that only apply to one type? As I mentioned earlier (or tried to!) I pretty much have one row contain all possible fields across all types. There are maybe a few (one to three?) fields which might "disappear" (be unnecessary) for particular types. Where's the best place to put these?

Thanks so much! I knew there was a reason I took that class in the first place!

scootermaster · Apr 10, 2008

No more thoughts?

Whitedog · Apr 11, 2008

Originally posted by: scootermasterWhat if there are fields that only apply to one type? As I mentioned earlier (or tried to!) I pretty much have one row contain all possible fields across all types. There are maybe a few (one to three?) fields which might "disappear" (be unnecessary) for particular types. Where's the best place to put these? !

This would be a bad design. Do as suggested and have a "type lookup" table. Joining a lookup table to a "data" table adds absolutely zero overhead to your query. It's a one to many.

You're much better off keeping your data table simple and use lookup tables for your "types". Remember, one to many relationships add zero overhead to your query.

Of course, it would help to know a little bit about your data. What do you mean by many different "types" anyway? Different "data types"? Can you list your data columns?

[edit] and about you saying "I'm lazy".... joining a lookup table is one line of code..

Inner Join MyLookupTable on a.type_id = b.type_id and a.type = "My Type"

Yep. You're lazy. 😀
[/edit]

Crusty · Apr 11, 2008

Just remember that if your tables and relationships are set correctly, then your queries will be much simple then if it was half assed. Maybe not simpler in terms of pure syntax, but simpler in the fact that it will be a lot easier for you and anyone else who might come across the queries to understand what exactly it's doing without having to know too much about the actual data inside of the table. If you all of the sudden store a field that determines the type of row in the database now the person writing queries HAS to know what's inside of the table before they can start writing the query. So that means extra documentation and meta data stored and written about the database. In the end proper design wins over any sort of small speed benefit that being lazy gives you...

Of course, if this is just a quick hack for a one time data cleansing or merging a using database then who cares, get the job done quickly. Bad design will come back to bite you in the ass.

Kntx · Apr 14, 2008

To expand on KB's excellent advice. To ease your query cocerns first flatten the table into a view then query on that rather than performing the join every time.

scootermaster · Apr 14, 2008

Okay, so we've established that I'm going to make use of foreign keys, and make another table for the "types" (i.e. a table that has, let's say, "republican", "democrat" and "independent" in it). My issue above is that let's say, something of a republican type needs fields for "age", "sex", "number of slaves owned" and "dollars given to oil companies", whereas democrat has fields like "age", sex", and "dollars given to greenpeace" whereas independent has fields "age", "sex" and "number of times mentioned Ron Paul on an Internet forum".

I could theoretically create four fields in my "main" table for "#of slaves, "$ given to oil", "$ given to greenpeace" and "# of Ron Paul references". But obviously they'd be blank for many entries, since an entity can't be more than one type.

So while I totally get off-loading the type field, instead of using what amounts to a bitvector, how do I handle the fields that are associated with a given type?

Forgive these questions, by the way. I'm pretty much a database scheme newb.

Crusty · Apr 14, 2008

Instead of offloading the "type" to a different table, make the type be the table name and just have different tables for each different type.

Database schema questions

scootermaster

Platinum Member

KB

Diamond Member

scootermaster

Platinum Member

scootermaster

Platinum Member

Whitedog

Diamond Member

Crusty

Lifer

Kntx

Platinum Member

scootermaster

Platinum Member

Crusty

Lifer

TRENDING THREADS