:( Some people use SQL Server like "Giant excel spread sheets"

Cogman · Jul 29, 2010

Wow, this is really FRUSTRATING. I have the lovely task of setting up a new database system. The problem is, it must
A. Take data from the old system and
B. Maintain the old system.

The problem? The old system was set up by a retarded monkey (I swear). It is a RELATIONAL database, and they treated it as unrelational as possible. Most tables have no private key. Data is duplicated in several places. And whats worse, it looks like some tables are just plain ignored.

Here's an example. There is one table named "Companies" It as a short 3 letter abbreviation and the companies name. the state, the manager, ect. Well, instead of giving that table a private key and referencing that private key in other tables, The other tables will just put in the 3 letter abbreviation. Now, that wouldn't be so bad, if they kept a consistent naming scheme! Seriously, they did different abbreviations all over the place that have nothing to do with the company table. Effectively making the company table worthless. One company was name "Cordova" with a 3 letter abbr. of "CRD" but instead of using that abbr. the table has "CORDOVA".

My goodness, how can people be this dense. In that example above, I've seen at least 3 different tables that reference the same company in 3 different ways...

Cogman · Jul 29, 2010

Here's a better example of the retardedness in action. There is one table with two columns Employees and Who. The ONLY difference I can tell between those two columns is that one column contains the employees username in caps, and the other is lowercase.... I'm not joking.

And a quick query confirms....

GeekDrew · Jul 29, 2010

Why does this surprise you? I see this more often than not.

Cogman · Jul 29, 2010

GeekDrew said:
Why does this surprise you? I see this more often than not.

Oh but it makes things hard to work with. From just about any stand point.

The worst is columns like
Mo Tu We Th Fr Sa Su

Which serve, what purpose? Do people REALLY care about how many hours you worked in all the monday's of the month? It serves no purpose.

Kirby · Jul 29, 2010

Cogman said:
Here's a better example of the retardedness in action. There is one table with two columns Employees and Who. The ONLY difference I can tell between those two columns is that one column contains the employees username in caps, and the other is lowercase.... I'm not joking.

And a quick query confirms....

Replace all the Who entries with lyrics from The Who.

Cogman · Jul 29, 2010

nkgreen said:
Replace all the Who entries with lyrics from The Who.

I would, but I seriously don't know how it is being used in older applications. For all I know, half look up who and the other half look up employees... Damn legacy software

KIAman · Jul 29, 2010

Cogman said:
Oh but it makes things hard to work with. From just about any stand point.

The worst is columns like
Mo Tu We Th Fr Sa Su

Which serve, what purpose? Do people REALLY care about how many hours you worked in all the monday's of the month? It serves no purpose.

Nobody has to justify why they want data and how they use it. I set up several hundred OLAP cubes for a BI data warehouse system and I was surprised at how many different ways people want to see data, but in the end, all you have to do is make it available and keep integrity.

If you already reached the step where you can identify many of the issues, you could proactively clean things up. If this is MSSQL, then you can do plenty of modifications without disrupting production systems.

Are there applications that rely on this schema or is the database a means for storage?

troytime · Jul 29, 2010

that sucks.

i once inherited a website and mysql db that had a column called 'yes'
it was an enum with two possibilities... 'yes' and 'no'

swear to christ.

Cogman · Jul 29, 2010

KIAman said:
Nobody has to justify why they want data and how they use it. I set up several hundred OLAP cubes for a BI data warehouse system and I was surprised at how many different ways people want to see data, but in the end, all you have to do is make it available and keep integrity.

If you already reached the step where you can identify many of the issues, you could proactively clean things up. If this is MSSQL, then you can do plenty of modifications without disrupting production systems.

Are there applications that rely on this schema or is the database a means for storage?

Unfortunately there are applications that rely on the schema. So just dumping the old setup is, sadly, out of the question. I know the reason they have the Mo-Su Columns.. It is because when entering information they are able to do it a week at a time. So instead of calculating what day each entry is, they do a straight insert into the DB.

This DB is split off from the production DB, (I'll keep it that way until I'm satisfied I won't break anything) but my thinking is to use triggers to get a new database communicating with the old one.

airdata · Jul 29, 2010

Cogman : that's why you're making the big bucks 🙂

It does make things difficult though when you need to work with or around poorly setup structure.

Cogman · Jul 29, 2010

airdata said:
Cogman : that's why you're making the big bucks 🙂

It does make things difficult though when you need to work with or around poorly setup structure.

🙁 I'm not making big bucks 🙁

brandonb · Jul 29, 2010

I've had people at my company complain that something was "too normalized" because they had to use 4-5 joins in a SQL statement to link accounts to people to whatever it was they were trying to get. I'm like 4-5 joins big whoop. The thing that irks me the most is when you have a decent system, and some (not really) hotshot developer thinks its too complicated and rewrites it to be like a giant excel spread sheet.

Cogman · Jul 29, 2010

brandonb said:
I've had people at my company complain that something was "too normalized" because they had to use 4-5 joins in a SQL statement to link accounts to people to whatever it was they were trying to get. I'm like 4-5 joins big whoop. The thing that irks me the most is when you have a decent system, and some (not really) hotshot developer thinks its too complicated and rewrites it to be like a giant excel spread sheet.

Man, that does suck. I'm a big fan of "Store data once, in one place" not "Duplicate it wherever you feel like".

It makes things so much clearer to be able to just be able to say "I need information from this table, this table, and this table" because if you ever have to change something, it is as simple as editing one table, not every table.

KIAman · Jul 29, 2010

brandonb said:
I've had people at my company complain that something was "too normalized" because they had to use 4-5 joins in a SQL statement to link accounts to people to whatever it was they were trying to get. I'm like 4-5 joins big whoop. The thing that irks me the most is when you have a decent system, and some (not really) hotshot developer thinks its too complicated and rewrites it to be like a giant excel spread sheet.

It is all situational and there are lots compromises between a highly normalized structure vs denormalized structure but being too "complicated" should never be justification, LOL.

It makes things so much clearer to be able to just be able to say "I need information from this table, this table, and this table" because if you ever have to change something, it is as simple as editing one table, not every table.

Ahh, you are thinking like a developer here. Sometimes the tradeoff is better to have complicated schema editing to satisfy a bigger requirement, like performance, for example.

Ken g6 · Jul 29, 2010

Then again, at an insurance company I worked at for awhile, the boss wanted all the tables to contain field names in one column and values in another. Basically, he wanted a NoSQL database, but it didn't exist (or at least wasn't in common use) back then.

Markbnj · Jul 29, 2010

KIAman said:
It is all situational and there are lots compromises between a highly normalized structure vs denormalized structure but being too "complicated" should never be justification, LOL.

Ahh, you are thinking like a developer here. Sometimes the tradeoff is better to have complicated schema editing to satisfy a bigger requirement, like performance, for example.

The database should always be as normalized as possible within performance constraints, imo. Have people forgotten what views were invented for? 🙂.

I'm in a similar situation this week, Cogman. I work on the back-end content management for a very large digital music archive (1 million+ tracks). We routinely absorb content into the repository from labels large and small. The largest ones send us feeds almost daily in standardized formats with xml metadata and its all handled automatically. But then we get these one-off dumps that we have to process outside the pipe.

They decided to pick up this big chunk of Indian content. The publisher sent a bunch of folders with subdirectories for content and art, and a spreadsheet for each folder to tie the metadata in with the content.

Which would be fine except: a) the spreadsheets are all in different formats: different number of columns, different column ordering, different data formatting ("244" vs. "4:04" for duration, for example, switched right in the middle of a column for no reason); b) the artist, album, and track names in the spreadsheets almost never match the filenames on disk: in one case they preserved spaces, in another subbed in dashes, or they're misspelled. It's ridiculous. We already have two full days in massaging code to suck this stupid crap up.

/sigh

Cogman · Jul 29, 2010

Markbnj said:
The database should always be as normalized as possible within performance constraints, imo. Have people forgotten what views were invented for? 🙂.

I'm in a similar situation this week, Cogman. I work on the back-end content management for a very large digital music archive (1 million+ tracks). We routinely absorb content into the repository from labels large and small. The largest ones send us feeds almost daily in standardized formats with xml metadata and its all handled automatically. But then we get these one-off dumps that we have to process outside the pipe.

They decided to pick up this big chunk of Indian content. The publisher sent a bunch of folders with subdirectories for content and art, and a spreadsheet for each folder to tie the metadata in with the content.

Which would be fine except: a) the spreadsheets are all in different formats: different number of columns, different column ordering, different data formatting ("244" vs. "4:04" for duration, for example, switched right in the middle of a column for no reason); b) the artist, album, and track names in the spreadsheets almost never match the filenames on disk: in one case they preserved spaces, in another subbed in dashes, or they're misspelled. It's ridiculous. We already have two full days in massaging code to suck this stupid crap up.

/sigh

🙂 I hear yah. I talked with one of my coworkers about this (I'm the only one working on it.) And he said "Well, just don't judge the database too harshly, it has gone through several database servers and been managed by several programmers. It has just sort of grown organically".

To which my response was "I wish I had some roundup to kill that organic growth."

I really wish I could just start from scratch. For the system I'm fixing, it would be easier. It is all the interdependencies that really kills me though. Good luck with your project Mark. Screwing around with strange data sucks.

degibson · Jul 29, 2010

Cogman and Markbnj:

Code:

SELECT *
FROM omg INNER JOIN bloody_sucks WHERE 
  omg.wtf = bloody_sucks.craptastic

I feel very sorry for you both.

lozina · Jul 29, 2010

troytime said:
that sucks.

i once inherited a website and mysql db that had a column called 'yes'
it was an enum with two possibilities... 'yes' and 'no'

swear to christ.

haha that's funny. The query must look great:

SELECT * FROM STUPID_TABLE WHERE YES = 'NO'

Evadman · Jul 30, 2010

select * from users where clue > 0
-------
(0) rows returned

tatteredpotato · Jul 30, 2010

I recently got a project that involves a C program that runs on an embedded device. What I got initially was ALL the code in one file and heavy usage of global variables which I have been trying to add structure to without breaking other things (which is easy to do when you have so many globals). The desktop app to interface with it had over 30k lines in one file too.

Ancalagon44 · Jul 30, 2010

If anyone with more than 2 years of development experience insists on the Excel style database format, they should be euthanized. If you need to denormalize your tables that much for performance reasons, you need faster hardware.

@Cogman

Starting a new schema, appealing though it may be, is a huge task that should ideally involve requirements elicitation, analysis, and design. If you arent paid that well, dont do it, unless you think it would be less work than maintaining what they have.

I remember reading an opinion piece written I think in 2007 or 2008 by a guy who claimed that classic ASP is superior to ASP.Net. I lolled.

Cogman · Jul 30, 2010

ObscureCaucasian said:
I recently got a project that involves a C program that runs on an embedded device. What I got initially was ALL the code in one file and heavy usage of global variables which I have been trying to add structure to without breaking other things (which is easy to do when you have so many globals). The desktop app to interface with it had over 30k lines in one file too.

See, I could handle that 🙂 At least changing the structure of a program doesn't really break things (if done right).

Apathetic · Jul 30, 2010

I really wish we could tell you that what you're running into is rare but, sadly, it's far too common. This is why I keep telling people "I'm not jaded, I'm experienced".

Dave

tatteredpotato · Jul 30, 2010

Cogman said:
See, I could handle that 🙂 At least changing the structure of a program doesn't really break things (if done right).

It's a pain but at least I can have a somewhat elegant solution at the end (no external interfaces even need preserved). I hate working on something knowing it's being done in a terrible way.

:( Some people use SQL Server like "Giant excel spread sheets"

Lifer

Lifer

Diamond Member

Lifer

Lifer

Lifer

Diamond Member

Golden Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Programming Moderator, Elite Member

Elite Member <br>Moderator Emeritus

Lifer

Golden Member

Lifer

Administrator Emeritus<br>Elite Member

Diamond Member

Diamond Member

Lifer

Platinum Member

Diamond Member