• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Access/SQL Noob needs some advice

Netopia

Diamond Member
I regularly have to deal with data sets that have multiple records for a single person or entity. When there are multiples of a record, it means that whatever that unique person/entity (I'll just say entity from here forward) has done, there are several values for it.

The values could be a date, or a product purchased, or a code of any sort. I'm tasked with taking these files and flattening them so that there is only one record per entity, but with that entity having as many fields as necessary to accommodate all of the values from all of the records for that entity. I currently do this manually in Excel, but I'm hoping that there would be some quicker way I could do it using Access or SQL Server (I know very little about either) where I won't have to constantly reinvent the wheel. I'll give a description of what I currently do with some screenshots as examples.

Thanks to everyone who has even read to this point. I appreciate anything you with knowledge give to this beggar of your skills! 😕

==================================
After I get the files, I open them in a piece of software that works with de-duplication. I just mark the recurrences of the records so that I can group them and see how many iterations there are of each.

I then export that file and bring it into Excel. I and sort by the entity, or whatever the common value is, and then by dupe #. (MUST BE IN DUPE ORDER after common value). In this case, the common value is company.

begin_excel.jpg


Then, in the header row, I build logic that will pull the other values into a single record. I do that for each column for as many columns as the max dupe count -1 (e.g. if the highest dupe number is 5, then I'll do this expression for 4 additional columns)

Formulas displayed to make it easier to understand:
excel_formulas.jpg


Once those formulas are created, they are dragged down to fill all the rows needed:
flattened_full_data.jpg


And then I filter on Dupe #1 which gives me one record for each entity, with all the values for that entity on a single line.
final_flattened.jpg


After that, I just copy/paste the filtered cells into a new Excel file, save that second file, and I've got the data in the format I need. But it's a PITA. I know there must be a simpler way, but I don't have the knowledge to get there.

If you've actually read ALL THE ABOVE, you're a saint and I truly am grateful for your time and effort!

Joe
 
Access has/had a normalization function.
It will look through the database and identify areas of duplication of a key.

IF you have the raw file in a CSV type format; email and I will TRY to look at it over the next week with respect to Normalization.
 
Forgot address. I added another query that gets my all the records for dupe 1. I then joined it to the crosstab query to add the address. See screenshots below

POIAKgl.jpg

JiInOfS.jpg

EanoRuI.jpg
 
And then he can sort by the Dupe number and he should be done

I will try this on his data this weekend.
 
WOW. I didn't have time to check the forums over the weekend! Thanks guys.

So KLin, would there be a way to do this without even marking dupes ahead of time?

Thanks again guys.
 
You can normalize by any field.
separate table for that field and then a data field in the overall record that links to that table index.

IN your example above; you have company123 and companyabc
You would create a table called company with fields having an unique index, name of company, location, contact number, etc.

Then in the overall data record; you have a field instead of company as text, it contains the index of the company in the Company Table.

For users, create a table of users.
then a new table that contains user index and company index; this becomes you cross reference between multiple users for a company and multiple companies for a user.

This would be the base for the data; getting it displayed as shown in the last spreadsheet becomes a pain.
 
WOW. I didn't have time to check the forums over the weekend! Thanks guys.

So KLin, would there be a way to do this without even marking dupes ahead of time?

Thanks again guys.

Can you post a screenshot of what the data looks like with duplicates?
 
Back
Top