Need database design suggestions...

GeekDrew · Aug 16, 2008

I?m working on a project, and want some suggestions as to how I should create a database schema.

Existing tables:

Group (group_id, group_name...)
Person (person_id, name_last, name_first, address1...)
PersonGroup (persongroup_id, person_id, group_id, name_last, name_first, address1...)

The redundancy of fields between Person and PersonGroup is because PersonGroup must reflect the data as of when the person *joined* the group (never changes after insert), whereas the Person table must reflect the most current data (changes).

What I am trying to decide is how I should store data that applies to all of the people on a per-group basis. For example, group ?A? might require additional fields: date_of_birth, group_membership_number, favorite_color, forum_username, spouse_name, etc. Group ?B? might require additional fields: diet_type, favorite_food, computer_manufacturer, etc. That is to say, each group requires fields that are not required/optional/available for other groups. Because of that, I don?t want to add columns to ?PersonGroup?, and just leave columns unused for the appropriate groups (table ?Person? only requires information that is common to all groups). That would lead to a HUGE number of fields in the ?PersonGroup? table that are left null for various groups.

There are hundreds of groups. The number of people per group could be anywhere from 10 to 1,000.

There?s the entity-attribute-value way of approaching it:

GroupAddtnl (group_id, field_id, question)
PersonGroupAddtnl (persongroup_id, field_id, response)

And there?s the table creation way:

GroupAddtnl (group_id, field_id, question)
PersonGroupAddtnl_# (persongroup_id, field_1, field_2, field_3, field_4...)

And there are various ways to accomplish it that are somewhere in between. What do you recommend I consider, and why?

FP · Aug 17, 2008

I really don't like generic data models simply because they don't scale. I have seen generic models take down enterprise apps before and they are a beast to fix.

This sounds like an ORM problem and depending on the DB there are a couple ways to solve it. Oracle has some features that let you store "objects" which would solve your problem. If you are looking for a purely RDMS solution I would do the following...

Simply treat the specialized groups as "subclasses" and create type specific tables (yes hundreds) that FK back to the Person table. I am not quite sure why you want to associate these additional fields with the PersonGroup record as they are attributes of the Person right?

For example, if you have two groups... TEACHERS and STUDENTS you would create the appropriate tables (along with the associated attributes) and FK from those tables back to the PERSON table.

The downside to the subclass solution is there isn't an easy way to say "give me all attributes of Person ABC" or "tell me all of the groups that Person ABC belongs to." You could denormalize the model a bit and keep your PersonGroup table around (and FK to that instead of PERSON) to make the latter easier to find but I don't like that idea much as it would be hard to manage. It really depends on how you will be accessing the data normally. You probably won't be able to get all the data you need in one query but that is the trade-off when deciding not to use generic models.

I am also not a fan of the redundant data in PersonGroup. Do you have an audit table of records? If so, couldn't you join to that for the historical info that was present when the record was created? For example... PersonGroup could contain a PersonID and PersonRID column that uniquely identifies a historical person record.

GeekDrew · Aug 17, 2008

When finished, this database needs to be able to be powered by either MySQL or MSSQL... so purely RDMS.

This is going to be a meeting/event management product for a small company. "Person" will sign up for a meeting on a website, and the website lets them sign up for meetings hosted by various groups, but only if that Person is previously a member of that group, which is why membership_id and membership_username are required in the PersonGroup table.

As for the redundant data in PersonGroup... I don't see a way for me to get away from that. With some groups, the Person is known by name X, whereas with other groups, they are known as Person Y (for example, maiden name vs married name). Also, some persons use their office address as their home address, only with some groups, and not with others. For example, a law enforcement officer might use his office address as his home address (he purposefully wants it to be known as his home address) while signing up for a meeting with a group that is not a law enforcement personnel group; he might use his real home address to sign up for meetings with groups that are related to law enforcement personnel matters. The vast majority of the time, the data will be redundant between Person and PersonGroup, but there are many "extenuating circumstance" cases where the data will not be the same between the two tables, and I have to provide a mechanism for dealing with that.

I'm not really sure what I was thinking when I previously listed the example attributes related to the PersonGroup table... they would be attributes of the relationship, and wouldn't necessarily make any sense without the Group being part of the parent. A better list of examples for data I might need to collect are "first date of attendance at meeting", "membership ID", "membership username" (that person's username on the group's website, not this website), "member since date", "referred to group by", "certification", etc.

Most of the data access will be done through a PHP website, though it's also critical that my coworkers be able to easily write and run reports based upon this data in Crystal Reports and MS Access. That would be fairly easy to do, I think, if I use a generic data model... I could programatically create views on a per-group basis that serve as a pivot table, with the PersonGroup_ID being grouped as rows and Field_ID being used for the columns. I imagine that would be slower than having a separate table for each group's persons and their applicable values, but I think that storing the data as multiple rows in a single table would be easier than separate tables per group. Also, each group will have relatively few additional attributes that aren't stored in the PersonGroup table -- so far the most I've seen is 6.

Last night, I did an experiment on a group that has roughly 500 persons in it, and 6 attributes. I created PersonGroupAddtnl (person_id, addtnl_id, value) and GroupAddtnlMeta (group_id, addtnl_id, description), and populated PersonGroupAddtnl with sample data for for those 500 persons, and then wrote a pivot view such that the persons were rows and each addtnl_id was a column. The execution time was 0.0045 seconds... which I know means nothing right now, but it was faster than I thought it would be.

I know that the time spent will increase the more rows there are in the table, but if there is an index on persongroup_id and addtnl_id, and the query is always restricted to only those persongroup_ids that are part of group X (by using an inner join with PersonGroup), I don't think that the pivot view would slow down *that* much. Speed when viewing all of the PersonGroup entries and their associated additional fields isn't a top priority... that would be done fairly infrequently (compared to the number of read/write operations on a single PersonGroup row and its affiliated additional rows).

Hrmmmmmmmm.

Search

Need database design suggestions...

GeekDrew

Diamond Member

FP

Diamond Member

GeekDrew

Diamond Member

TRENDING THREADS