MS Access Database Multiple Data Items in One Field Problem

Aug 12, 2004
106
0
76
Hello.

I am the sys admin of the EMR for a division at my hospital.

(Note, I am not a professional database expert. My IT area of expertise is hard ware leaning towards biomed more than anything, but I enough about access and am willing to teach myself what I don't given time to deal with most of these problems.)

Essentially this application exports its data, periodically to an access database.

One of the tasks I have to do is adding custom reports and such to the database, over and above the default ones built into it.

Part of the problem is that the people who set this up prior to me, between the company and the people who work for my organization, not only did not account for multiple doctors working on one patient for one procedure, but they did not build in controls on proper name spelling, proper enumeration in various data items etc. Over time I wish to fix that.

However what I have is a very unwieldy data base, which for example, if I were to run the report of procedures by doctor or by RN might return say 200 separate entries, due to common name misspelling.

So, I made a copy of the export database and have begun creating various queries to clean up this mess.

My first major problem is basically that in the field where the physician or the RN or the anesthesiologist are entered, occasionally, more than one person is entered.

To give you an idea of how this database is structured I will use an airplane/airport analogy, since due to legal restrictions I don't want to be specific about the database itself.

There are two main tables (and a couple of smaller tables which the databases automatic import functions, which I don't use and have disabled, uses). Each tables entry has a two part primary key.

This primary key is the airplanes serial number + the time the flight lands.

One of the fields in this table (analogous to Doctors) is pilot.

Now one thought that I have is to make a unique primary key field, replacing the existing split one, and then splitting each record into multiples, with differing items such as pilots and or navigators now being different in each record. Given that less than 10% of the records, probably closer to 2% or 4% have this problem, and that even 10 years from now, there will be at most say 30,000 total primary key records under the old system, I would basically be taking advantage of small size plus modern computing power to work around this issue.

Another idea is to make another table and segregate this data into that table, use the same primary key setup and let that table contain multiple pilots for each trip, so to speak.

But what I was wondering, which would grossly simplify things is if it is possible to, in the correction query, not to override the whole record but only the part with the misspelling. I know how to use wild cards in my searches so it is perfectly fine in the short term, to keep more than one pilot in the pilot field.
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Using the airport analogy:
Create a field called crew_by_tail.
That is a unique number, only used once within the table.
Create a new table with the same name with three folds.
First field is the crew_by_tail. This is not unique.
Second field is ID of crew member, not unique
Third field is type of crew member, not unique

The combination of all three fields creates an unique instance, however any one of the fields can be duplicated within another record.

Now you have info on the entire crew for that tail.
Because that tail may be on different flight #s, you now need to identify flight and date.

As to data scrubbing, good luck. I did that 15 years ago for State of Indiana, Child services division. Multiple free form text files. Not all fields consistent even within same file.

Best that can be done is to enforce a selection list for fields.

Scrub best you can by name vs number and ask for help from people responsible for original lists
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
sounds like a typical EAV,metadta database with a constraint placed on it:

FIELDNAME , VALUE (unique)

PILOT, AIRPLANESERIAL123578345

Code pairs are very commonly used this way. It's pretty sick to have to deal with but you can write SP's or views to abstract it.

Now what modern big-data does is use JSON or XML or whatever to stuff more data in the fields (overload them)

so PILOT , airplane:{timestamp: '20120301000000',serial:'12356B',name:'Mr Pilot',task:'Test',result:0}

Google up NOSQL and you'll see what I mean.

I'm oldschool and have less data so I prefer relational database design and not using one table for more than one purpose.

Back in the days, you were limited to say 255 open files on a netware server. So if your database had 10 tables in 10 files, that would be 25 people max. Not counting indexes etc. So They would use one table for many data types and get a lot more users in that 255 limit.

You have to remember they didn't have row locking back then, so say if you had file locking, one person grabbed a table file and locked it, you'd be screwed if you had 255 people doing that at once. so spreading out across multiple files made it more reasonable.

It was dark times back then.

Sad thing is folks still have software that uses this old method to this date, they've even just stuffed the databases into sql and called it "client-server" but in fact its the exact same structure just with sql server licensing to be paid lol.
 
Aug 12, 2004
106
0
76
Its not the key the database structure that bothers me so much. Again, I am not a high end developer, but I know enough to develop the reports people want. It would just be smoother with the database scrubbed.

What I decided to do for now for that pilot field is use a series of update queries to eliminate all but the pilots last names.

So for example,

John Smith,J_Smith, Pilot Smith, etc all become Smith.

Bill Jones, BillJones, Pilot Jones, etc all become Jones.

In those rare entries with 2, they will be reduced to Smith/Jones.

When that's done, I will go through again and replace the Smith with John Smith, and the Jones with Bill Jones. It then becomes easily search and sortable on that field.

Given that I also control and will be customizing the EMR itself, I will be able to, by the end of the year, prevent misspellings, wrong data types from being entered in some cases, etc.

Using the pilot analogy, moving the airplane crew so to speak to another table might ultimately be for the best.