C++ : largest and smallest float

Carlis · Jun 21, 2012

I am writing some software to compute partition functions from some monte carlo simulations. The issue is that I have to make sure that I dont obtain nonsens due to the limitations of the size of a float.

I am worried about the following piece of code:

----------------------
arg = (BETA[k]-BETA[j])*Eis + Z[k]-Z[j]
den+=exp(arg)
---------------------------
Now, arg can be rather large. So how can I find out what is the largest/smallest argument that exp can operate on? Also, I suppose that I can declare 'den' as a long double, but that will not help, since exp returns only a double, right?

I am doing this with gcc (g++) on a 64 bit intel mac.

Best
Carlis

Schmide · Jun 21, 2012

They should all be declared in <values.h> <limits.h> or <float.h>

maxfloat
maxdouble
etc

You'll probably run into computational pitfalls before you overflow the numbers.

Cogman · Jun 21, 2012

I agree with Schmide, floating point rounding errors are likely going to bite you pretty quickly.

If you are looking for something that has a high precision and is fairly quick, I recommend using the GMP. It can do arbitrary floating point precision while being pretty darn nippy.

iCyborg · Jun 21, 2012

I did some work with MC involved, and I normally operated with probabilities. And there's a standard way of treating them mathematically not by looking at them directly, but by operating with log probabilities. This has several benefits: usually you would have products of exp(x_i) if the variables are IID, so besides the fact that you deal with a smaller range of values and thus avoid overflows and other numerical issues, due to the property of log-exp, the product is replaced by a sum of logs which improves performance too.

Now, I don't know the problem domain enough from your description, it's a bit weird to see summing exps, so it may well be that the above doesn't apply, but you may still try to see if it makes sense to translate stuff into logs of stuff. If not, you'll have lots of problems if arg can vary a lot. You'll have to be careful not just about overflows, but how you sum them up because summing a small and a large double is something that doesn't produce good results, and with exps even reordering the operations may not help much. Perhaps looking at something like the abovementioned GMP would be worthwhile in that case.

C++ : largest and smallest float

Carlis

Senior member

Schmide

Diamond Member

Cogman

Lifer

iCyborg

Golden Member

TRENDING THREADS