Reply to topic  [ 7 posts ] 
float on 64 bit machines: denormal number become zeroes 
Author Message

Joined: Thu May 10, 2007 2:37 am
Posts: 3
Location: LPNHE Paris
Post float on 64 bit machines: denormal number become zeroes
Hi,
I am a happy user of Yorick, and use it for solving massive inverse problems in astronomy. We moved to 64 bits machines not too long ago and investigating a crash I found out a change in behavior between yorick compiled in 32 bit machines and yorick compiled in 64 bit machines.

On a 32 bit machine:
> f = float(1.17549e-38)
> f
1.17549e-38
> float(f/2)
5.87745e-39

On a 64 bit machine:
> f = float(1.17549e-38)
> f
1.17549e-38
> float(f/2)
0

I checked and float(f/2) == 0 returns true on my 64 bit machines.

This bug ("feature" ? ;) ) happens on linux SL4 and mac intel OSX

I am not an expert in numerical precision, denormal/subnormal numbers etc and thus could not tell if this is really a bug or something related to an IEEE evil feature.

Nevertheless this seemed to me something with a nuisance potential large enough to post it as a bug report.

And of course, I am also very interested by how I could get the old behavior back.

Thanks a lot
Have a nice day
Seb "Ze Frog"

_________________
--------------------------------------------------------------------
perl -e '$a="a";for($i=1;$i<=10654908;$i++){$a++;if($i==252){$b=$a;}
if($i==(6454)){$c=$a;}if($i==(12975)){$d=$a;}if($i==(344426)){$e=$a;


Fri May 15, 2009 5:47 am
Profile
Yorick Guru

Joined: Sat Jan 22, 2005 2:44 pm
Posts: 86
Location: Pasadena, CA
Post 
As expected, different 64-bit systems will produce different results as they are free to do in this case. I checked:
(1) linux x86_64
Code:
     
f = float(1.17549e-38);f;
0

(2) linux ia64
Code:

f = float(1.17549e-38);f/2;
5.87745e-39


Sun May 17, 2009 10:28 pm
Profile YIM

Joined: Thu May 10, 2007 2:37 am
Posts: 3
Location: LPNHE Paris
Post 
Hi,
If it is due to the system, shouldn't all programs behave the same?

If I do the same test (on the same architecture) in python, declaring numbers as float32:

>numpy.float32(1.7e-38)
1.69999994563e-38
> numpy.float32(1.7e-38/2)
8.5000004288e-39
> numpy.float32(1.7e-38/20)
8.4999962249e-40

Does it mean that denormal numbers are such a gray zone that each architecture and program can treat them as it decides ?

And more to the point, is this variable behavior in yorick a feature or a bug?

_________________
--------------------------------------------------------------------
perl -e '$a="a";for($i=1;$i<=10654908;$i++){$a++;if($i==252){$b=$a;}
if($i==(6454)){$c=$a;}if($i==(12975)){$d=$a;}if($i==(344426)){$e=$a;


Tue May 19, 2009 2:32 am
Profile
Yorick Guru

Joined: Sat Jan 22, 2005 2:44 pm
Posts: 86
Location: Pasadena, CA
Post 
Seb, I am no expert in this (in fact you probably know more than me) but I the language reference (C) probably does not specify what should be done in this case. So a prudent approach is: don't do it. There should be plenty of dynamic range for a "normalized" data analysis procedure (if not, use doubles.)


Tue May 19, 2009 9:35 am
Profile YIM
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post 
This subject is way too large to describe completely here. It stems from the very flawed IEEE-754 floating point standard, so if you google IEEE 754 you will find and infinite literature on the subject. Yorick is extremely unusual in that I strive to make it stop on floating point errors, and one flaw is that there is no standard way to do this. The level of confusion on the issue is so great that hardware manufacturers and kernel authors are constantly changing the details of how IEEE 754 issues are handled.

In addition to SIGFPE delivery, denormal arithmetic -- essentially invented and defined in the IEEE 754 standard -- is the most serious obstacle to high performance computing. Denormal numbers, like 1.e-39f in the example, have a completely different binary format, which is not handled by any modern floating point hardware. Therefore, all modern platforms interrupt and handle every single denormal result in software in the kernel. This is roughly 1000 times slower than the floating point unit, when yo take into account that the interrupt wipes out the vector pipeline, forcing it to be regenerated. If you do any floating point calculation in which more than one part in 1000 of the results are denormals, you will lose more than a factor of two in compute speed on all modern processors. (Many fft and linear algebra operations routinely approach this rate of denormal usage, even in problems that have no small input numbers.)

The correct solution to this performance problem is called flush-to-zero (FTZ), which causes a floating point unit to simply store zero instead of interrupting in order to compute a denormal. A related FPU flag is called denormals-are-zero (DNZ), which causes the fetch operation to put zero in a register instead of a denormal. For performance reasons, I always try as hard as I can to set both FTZ and DNZ on all FPUs which support them. As I noted, both harware and kernel software is constantly evolving to bypass all these attempts.

For your specific case, if you are used to Intel x86 hardware, there is an interesting history: The original x87 FPU did not lose any performance for dealing with denormals, so there is no FTZ or DNZ flag for those 32-bit x86 instructions. Partly because of this "feature", that FPU has very poor performance by modern standards. Up to a couple of years ago, neverthelesss, the gcc compiler typically used on ly the x87 instructions in x86 code, so you would typically have gotten denormals on old x86 hardware running yorick, as you report. However, gcc and most other compilers use the superior SSE FPU even on 32-bit x86 chips now. I try to set FTZ and DNZ in the SSE status and control register, so hopefully, you will get FTZ behavior more often. In the 64-bit x86 instruction set, there is no FPU which does not require FTZ and DNZ in order to get full performance, so you will (I hope) always see FTZ on x86-64 hardware.

The PowerPC, in the Power 5 and beyond version, has again flip flopped, and I am no longer able to get FTZ and DNZ behavior. In fact, no one at LLNL, nor anyone we have been able to contact at IBM, knows how to turn on FTZ and DNZ on those chips. Some of our biggest machines (Blue Gene L for example) use this hardware, and the high performance computing people at LLNL have recently figured out that it is costing us a 10-20% slowdown in many of our important simulation codes. If anyone knows how to fix this, please let me know.

In any event, your preference for denormals is wrong. I urge you to embrace FTZ. This is an important feature of yorick, and I consider it a bug NOT to have FTZ on any hardware (such as x86-64 or PPC) where running without FTZ cripples the floating point performance.


Tue May 19, 2009 8:03 pm
Profile

Joined: Thu May 10, 2007 2:37 am
Posts: 3
Location: LPNHE Paris
Post 
Hi,
thanks a lot for your answers. It seems that I was right at least in not being sure about this new behavior being a bug or not ;)

If I summarize, the fact that yorick did not FTZ the before was the bug, and now on the new architecture we have access to, we have the correct behavior.

I don't have a preference for denormals, it was just handled so transparently that I was not aware I was using them before it blew up in my face.

All this has been very positive:
1) we switched to float64
2) I learned about denormals and FTZ, and began to spread the Truth around me
3) our new architecture behaves correctly

As far as the forum is concerned, maybe this post belongs to Discussion and Support instead of Bug Report.

Have a nice day
Seb

_________________
--------------------------------------------------------------------
perl -e '$a="a";for($i=1;$i<=10654908;$i++){$a++;if($i==252){$b=$a;}
if($i==(6454)){$c=$a;}if($i==(12975)){$d=$a;}if($i==(344426)){$e=$a;


Wed May 20, 2009 4:45 am
Profile
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post 
Good summary.

I forgot to mention that because yorick (or any other programming environment) cannot be consistent in its treatment of denormals, any section of code which depends on the presence or absence of denormals will not be portable. Hence, you should regard any such dependence as a bug in your program.

Another gotcha is that the behavior of libm functions (like exp, cos, log, etc.) may, or may not, respect the FPU control register settings of the caller (yorick in this case). Hence, even when I have turned on SIGFPE trapping, on some platforms these functions will return Inf or NaN. Yorick tries to catch this annoying behavior in simple places, but it does fail on some platforms. Less seriously, but still annoying, even if I do succeed in turning on FTZ, on some platforms the libm functions will continue delivering denormal results. That is one reason DAZ is good to turn on for platforms which support it.

Again, on the most recent PPC Macs, based on Power 5 or 6 chips, yorick fails to get FTZ set, and you will get (very slowly) denormals there. If anyone knows how to fix this, please tell me.


Thu May 21, 2009 7:12 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 7 posts ] 

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware for PTF.