Postgresql database column type question

Maximilian

Lifer
Feb 8, 2004
12,604
15
81
I have this government created excel sheet of food nutrition:

UuNXsCI.png


Most of it is decimals however some cell values are "Tr" and "N". Tr means trace amounts and N means no data. The problem is I cant put those values into a column which has a numeric type, its either a number or null. I thought up two solutions, maybe theres a better one but I dunno. Would be good to hear what you guys think:

Solution 1: Just use "character varying" type for all the currently numeric columns. Im leaning towards this because the results get turned into a JSON string anyway and then the JavaScript on the client side does what it wants with them.

Solution 2: Have a specific decimal number to represent trace amounts, something like "0.0001". This seems a bit hackey and crap tbh but it could work.

Up until now ive just replaced all "Tr" values with "N" and treated it as null but I dont think thats a good long term plan because a trace amount is still a trace amount and should be represented as such.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
You could use two different columns per actual real column, one decimal and one boolean. If the boolean is true then you have 'trace' amounts. If the decimal is NULL then you have 'no data'.
 

Maximilian

Lifer
Feb 8, 2004
12,604
15
81
Hmm yeah that could work. Theres already 50 columns but apparently postgresql supports ~400 or so.
 

KB

Diamond Member
Nov 8, 1999
5,406
389
126
It might confuse later developers, but I would use 0 as trace and NULL as 0.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,501
4,349
75
But there are three "0" values: 0, Trace, and "No data".

Perhaps 0, NULL, and a negative number? You might see if -0 works.

Edit: On second thought, though, those columns should probably be fixed-point if possible, shouldn't they? So -0 wouldn't work.
 
Last edited:

Cogman

Lifer
Sep 19, 2000
10,283
135
106
But there are three "0" values: 0, Trace, and "No data".

Perhaps 0, NULL, and a negative number? You might see if -0 works.

Edit: On second thought, though, those columns should probably be fixed-point if possible, shouldn't they? So -0 wouldn't work.

Depends on what you are doing with them. floating point is generally pretty good for scientific data. You usually don't care about the least significant digits. (If it represents money, however, then it really should be fixed point.)

Here is what I would do with the data as presented.

I would first make a new table for the measurements (probably named something like foodNutrition or something similar), In that table I would have 4 columns. A column to link to the food table (foodId), a column linking to the nutrition type (nutritionId), a column for the value, a column for the unit type (Kj, MV, kg, whatever) like crusty recommended.

I would make the clustered index be the foodId then the nutritionId and put a constraint that those must be unique. When it is trace measurement, I would insert a row but set the amount to null. When it isn't measured, I wouldn't insert a row for that nutritional value. For everything else, I would just insert the value as it is specified.

In the Json, I would represent that using either the key is set to null or the key is undefined. I would say that undefined == not measured and null == trace.

You could also do this with crusty's approach of having an isTrace column. If you go that way I would suggest putting a constraint on it that the amount column either needs to be 0 xor null for isTrace to be set (depending on your preferences).

But either way, I look at this data and see multiple tables. Whenever you have either repeating string values or lots of sparsely populated columns, you should think about creating new tables.
 

Cogman

Lifer
Sep 19, 2000
10,283
135
106
Another approach if you are never going to do any sort of filtering/sorting/searching on those measurements, another solution is to just throw them all into a json column and do the "undefined == unknown, null == trace" thing. However, I wouldn't start off doing that.
 

Merad

Platinum Member
May 31, 2010
2,586
19
81
Instead of having a bunch of different columns for trace or no data, you could assign the data columns indicies and have two bitfield columns, one holding flags for trace and another for data available.

Another option would bebto define a minimum threshold for real data and sat that any nonzero value less than it is a trace amount. Say some thing like, N >= 1e-8 is real data; 1e-8 > N > 0 is trace; N = 0 is "real" 0; N = null is no data.