Reply to topic  [ 4 posts ] 
Files too big for i86_primitives 
Author Message
Yorick Master

Joined: Wed Jun 01, 2005 11:34 am
Posts: 112
Post Files too big for i86_primitives
Consider this interactive session, run on a 64-bit platform:
Code:
> sizeof(long)
8
> test = indgen(540000000)
> name = "test"
> f = createb("test.pbd", i86_primitives)
> save, f, test
> save, f, name
> close, f
> f = openb("test.pbd")
> show, f
2 non-record variables
     name     test
> f.name
(nil)
> f.test(0)
540000000
> close, f
> f = createb("test.pbd", i86_primitives)
> save, f, name
> save, f, test
> close, f
> f = openb("test.pbd")
> show, f
2 non-record variables
     name     test
> f.name
"test"
> f.test(0)
540000000
>


i86_primitives is targeted at 32-bit systems and has sizeof(long) == 4. An array of 540,000,000 32-bit longs is larger than 2^31 bytes. When I save the array first, saving another variable afterwards apparently doesn't work because it's past the end of what should be able to be stored in the file. Yet, the array itself should be too big for the file and it loads back in just fine.

I no longer have a 32-bit system easily available to test on, but I wish I did because I'd be interested in seeing how it'd handle the files. I'm pretty sure I couldn't create that array on a 32-bit Yorick.

I would expect an error from any of the above. I shouldn't be able to write that array out to a 32-bit file at all. But since I can and since it can be loaded back in properly, I don't understand why subsequent variables do not also work properly. Is there a bug here or is this working as intended?


Tue Mar 20, 2012 1:29 pm
Profile
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post Re: Files too big for i86_primitives
The i86_primitives function only affects the binary format of the individual array elements you have stored in the file. It has nothing to do with individual array dimensions, or with the total number of bytes in the file. If you took that file to a 32-bit machine (or even built a 32-bit version of yorick with -m32 on your 64-bit machine), you would find that you couldn't read back your array, or seek to data beyond 2GB -- in fact, since the symbol table is probably beyond 2GB, openb would be able to seek there and would fail to open the file in the first place. However, on a 64-bit machine, everything should work exactly as it did. If you'd made the array values larger than 2G, you'd see that the array was the right shape, but that the values had been truncated, as you requested by writing with i86_primitives.

I'll try to look into the bug you found with data written after the big array. It should have worked. Does it work if you don't specify i86_primitives? (I'd almost expect it not to, since I don't think the i86_primitives should have any effect at all...)


Fri Mar 23, 2012 10:07 pm
Profile
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post Re: Files too big for i86_primitives
Okay, I can confirm this "feature". It's not a bug (I declare), but rather a limitation of the indirect string and pointer data types. I should also point out that the PDB data file format is not at fault -- I had to extend it in order to handle yorick pointers and strings.

The basic problem is that an array of strings (hence a scalar string, which I never implement as a special case in yorick) is an array of pointers to objects of indeterminate size. How do you store such a thing in a file? The simplest answer is that you store an array of byte addresses in the file, with the data itself stored at the address pointed to in the address array. What is the data type of the address array? The biggest integer supported by the hardware, which is a long. (All modern 32-bit machines support the even longer long long, but that was not true when I wrote this code over 20 years ago... The new implementation of the binary file package, which I've been promising for years, will at least partially fix this.) When you tell yorick to use i86_primitives, it writes the byte addresses as 32-bit numbers, which can be an even bigger disaster than what happened to you: Your address was negative when truncated, hence recognized as illegal -- it could just as well have been positive, which could have crashed yorick, because-- For a pointee, the address (the pointer) is the byte address of a header that describes the pointee array, followed by the array itself. For a string, the address (the pointer) points to the string length stored as a long, followed by the text. Thus, if you point to a random byte in the middle of the file, you could hit 4 bytes that when interpreted as a string length specified a 2 billion byte string, which yorick would happily read...

Thus, although you didn't notice it, there is a second "failure" mode, which is that if you write a string with strlen greater than 2GB into a file created with 4-byte longs, yorick will also fail to be able to read that back, even on the 64-bit machine where you wrote it.

Thus, the real problem here has to do with the string (or pointer) data type in files bigger than 2GB when you have specified 4-byte longs. Any non-indirect data types will work fine. If you want to make your files more standard -- that is easier to understand what the contents mean, if you had to go in and read the symbol table (it's text at the end of the pdb file) and recover your data by hand -- you should never write either string or pointer types. For strings, you should write them as arrays of char, which you can now do pretty easily with the strchar function -- something like "save,f,var=strchar(var)" to write and "var=strchar(f.var)" to read, assuming you don't care about multidimensional string arrays or the distinction between empty strings and nil strings. Writing strings and pointers directly into a file is the yorick equivalent of using impenetrable .doc or .xls format for your data -- you are restricting yourself to yorick only when you create such a file. (Of course, people outside of LLNL don't usually have tools that can interpret the pdb symbol table other than yorick either -- nevertheless, if you write a short pdb file and open it in emacs, you'll see that your non-string, non-pointer data is extremely easy to dig out by hand.)

Incidentally, it is a perfectly reasonable programming technique to define a set of file primitives with 4-byte double and 4-byte long in order to save disk space, even when the file will only be read or written on 64-bit hardware. So I'm not eager to insert error detection that would prevent people from doing that.


Sat Mar 24, 2012 10:42 am
Profile
Yorick Master

Joined: Wed Jun 01, 2005 11:34 am
Posts: 112
Post Re: Files too big for i86_primitives
munro wrote:
Does it work if you don't specify i86_primitives? (I'd almost expect it not to, since I don't think the i86_primitives should have any effect at all...)


Yes, this works for me if I do not specify i86_primitives.

Thank you for looking into this and providing an explanation. I have to use the string type for backwards compatibility, but it seems like as long as I save it to file first it should work safely (and these strings are always small so the 2GB string issue wouldn't be encountered). But it's good to know to try to avoid strings in PDBs in the future.

I am still uncomfortable with the fact that it will silently allow for the writing of data that then cannot be read back in. I can see why you don't want to prohibit 2GB+ file sizes (and agree with the reasoning). But if an attempt is made to write something that requires an address larger than the pointer type supported, shouldn't that throw an error? Why shouldn't an illegal/invalid pointer throw an error? I would much rather discover the problem at write-time so that I do not risk losing data.


Mon Mar 26, 2012 12:07 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 4 posts ] 

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware for PTF.