Reply to topic  [ 8 posts ] 
end of line and stream buffer overflow 
Author Message
Yorick Guru

Joined: Wed Nov 24, 2004 12:51 pm
Posts: 97
Location: Observatoire de Lyon (France)
Post end of line and stream buffer overflow
There are two annoying bugs in Yorick i/o system.

First, DOS end-of-line style (CR-LF) in Yorick code (*.i files) yields errors in the parser (include or require functions). I do not have tested this with (old?) Mac OS convention (CR). This is annoying for portability. The functions in 'i/textload.i' solves this problem when reading a text file but I think that it would be nice to have the parser does this automatically for Yorick scripts.

Second, there is a buffer overflow (or limit) for instance when printing the contents of a large array
Code:
> x = indgen(1000*1000);
> x
which stops (on my machine) at the 66536-th element (that's 393112 printed bytes). You have the same behaviour with shell redirection:
Code:
echo 'indgen(1000*1000);' | yorick > /tmp/out
You can argue that this is a feature because you do not want to print such a large array on the standard output. But this is really a problem when interfacing Yorick with other programs, for instance, via the spawn function. To overcome the problem we had to interleave the sending of data to the spawned process with calls to the pause function. I however consider this as a dirty trick since the parameters (how many data can be send at a time, how long to wait for the buffers to flush, etc.) are probably system and machine (and loading) dependent. Another way to overcome the problem could be to restrict the i/o with the spawned process to small lines of text and write large data to be send/received in a temporary file. But again, I think this is really a drawback to have such limitations and to not be sure that what you send on one side of the "pipe" will be received on the other side.

To fix the problem, the read/write functions should be in blocking mode (or emulate this mode). Unfortunately I did not have time to figure out where to fix the code. I am even not sure whether all Yorick i/o (such as in the YpParse function) uses the Play interface to (virtual) files.


Wed Dec 09, 2009 1:26 am
Profile WWW
Yorick Guru

Joined: Wed Nov 24, 2004 12:51 pm
Posts: 97
Location: Observatoire de Lyon (France)
Post 
The second bug may have already been noticed by Thibaut Paumard: http://yorick.sourceforge.net/phpBB2/viewtopic.php?t=232


Wed Dec 09, 2009 1:31 am
Profile WWW
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post 
Both these problems have to do with the extremely obsolete UNIX termios interface. Yes, yorick uses (or is supposed to use) only the functions in play/unix/files.c for all its I/O. [Actually, I just noticed that the yorick print function seems to use puts(), rather than the play wrapper. I'll check that when I get a chance.]

In particular, all text input is via the fgets ANSI C function, which is what cannot handle CRLF. I originally went for extremely simple, extremely portable, strictly ANSI C functions for all yorick I/O. This is a huge benefit for portability and maintainability, but it has the drawback that whenever the ANSI C functions fail, there is no equally attractive solution. I created textload.i, which uses binary read functions to circumvent the problem. No doubt I should just give up on fgets, but this will cause problems because the binary input functions will interact badly with non-file input streams. But the short story is that I (or someone) has to write a robust replacement for fgets that handles DOS EOLs -- and don't forget Mac EOLs (CRLF, CR, and LF all have to be handled).

The output problem I do not understand. The write function uses p_fputs(), while the print function uses (a bug, it seems) puts. There is one call per line of output, and yorick does not mess with the default stream modes, which should block whenever the destination (whether file or pipe) pauses in its consumption. This is worth some exploration with trivial C programs on the offending platforms. Does the problem happen both with yorick's write function, as well as with its print function? If it only happens for print, fixing print to use p_fputs should solve the problem. But in either case, this seems like a bug in the OS when puts() or fputs() is called too many times too quickly. Does it happen on wildly different OSes (e.g.- both Linux and MacOS)?


Wed Dec 09, 2009 9:56 pm
Profile
Yorick Guru

Joined: Sat Jan 22, 2005 2:44 pm
Posts: 86
Location: Pasadena, CA
Post 
Stops at 66536 on tthree unix/linux flavors here (Sun, Irix, Altix(Suse))

Cheers

PS: Eric, thank you for the yeti upgrade.


Thu Dec 10, 2009 2:55 pm
Profile YIM
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post 
Okay, I had time to take a closer look at the output problem; the source code is in yorick/yio.c. It's an appalling mess -- one of the oldest things in yorick, and I apologize.

The problem is indeed that I am "protecting" you from yourself. There is a variable maxPrintLines which is set to 5000 lines, and the output from a single print command will never exceed this number of lines. You will find that 66536 is the last number in the 5000th line in your example. Incidentally, I was wrong about the print function using puts -- it does use the p_fputs function as it should.

It looks like I had planned to allow you to change the value of maxPrintLines, but somehow never did so. It would go into the print_format function, and I will go ahead and modify that function to allow you to change both the single line length limit (which you can already change) and the maximum number of lines limit (which you currently cannot) with the print_format function.

However, I have to say I think it is a bad idea to set maxPrintLines any larger. The reason is that print is yorick's default function. There are a great many times when you are working with very large arrays, and typing "x" is very painful even when only 5000 lines are output. It can be difficult to hit ctrl-c fast enough to stop the deluge, and on Windows and in many UNIX terminal programs, the terminal itself can easily get hung up for a long time trying to cope with huge amounts of output. For that reason, I do want to leave the 5000 line limit as the default behavior.

Note that it is only the print function which has this limit; the write function will happily fill your universe with text. The print function is designed so that you can cut and paste its output into yorick source code, and otherwise present arrays in a semi-human-readable format. Things which are over 5000 lines cannot very easily be used in this manner. I don't quite understand why you need to do

print, indgen(1000000);

as opposed to

write, indgen(1000000);

Nevertheless, I will go ahead and modify print_format to give you the option of changing the default print line limit to your own taste.

Thanks for pointing out this issue.


Thu Dec 10, 2009 9:48 pm
Profile
Yorick Guru

Joined: Wed Nov 24, 2004 12:51 pm
Posts: 97
Location: Observatoire de Lyon (France)
Post 
For the first problem (end-of-line), I think that the fix can involve quite a hard work (unless you use the slow getc/ungetc). To be able to read text lines with '\n', '\n\r' or '\r' end of line markers, I see no other solution than writting our own buffered i/o functions. The difficulties are: the buffer management (pending bytes, flushing, etc.), look-ahead character (to figure out whether there is a '\r' after and '\n'), dealing with blocking/non-blocking mode.

This interface may be layered on top of fread/fwrite functions or on top of (unbuffered?) read/write functions (do these exist for Windows?).

I can see other advantages (among fixing the eol bug): the same interface could be used to wrap other sequential sources of data (sockets, pipes, etc.) for both text or binary i/o.

Other public domain softwares can be a source of inspiration. For instance, the i/o code in Tcl/Tk is portable and handles many issues such as partial read/write, buffering by line, end-of-line translation, blocking/non-blocking mode, etc.

Perhaps an easier (partial) fix is to have only Yorick parser that copes with different end-of-line styles. Instead of using f_gets, it can use f_read and manage its own input buffer to figure out where are the end of line markers. I've some temptative code in that direction. I can try to implement this fix if you want.

For the first problem, I tested with the small C program below that there is no "overflow" problem with the Linux output functions fputs and puts (compile with -UUSE_PUTS or -DUSE_PUTS):
Code:
#include <stdio.h>
#include <string.h>

#ifdef USE_PUTS
# define PUTS(str, len)    puts(str)
#else
# define PUTS(str, len)    str[len] = '\n';       \
                           str[(len) + 1] = '\0'; \
                           fputs(str, stdout)
#endif

int main(int argc, char *argv[])
{
  const long N = 1000*1000;
  long j, off;
  char buf[128];

  off = 0;
  for (j = 1; j <= N; ++j) {
    sprintf(buf + off, "%ld,", j);
    off += strlen(buf + off);
    if (off >= 80) {
      PUTS(buf, off);
      off = 0;
    }
  }
  if (off > 0) {
    PUTS(buf, off);
  }
  return 0;
}

My guess is therefore that the problem is elsewhere (i.e. in Yocrick). This is confirmed by the fact that (if I remember well) I had the same "overflow" problem with Yorick under Windows.

OK this confirm what you've just posted.


Thu Dec 10, 2009 11:29 pm
Profile WWW
Yorick Guru

Joined: Wed Nov 24, 2004 12:51 pm
Posts: 97
Location: Observatoire de Lyon (France)
Post some tests about the overflow in spawn
I've done some tests to check the behaviour of spawn. I used the following code:
Code:
local _overflow_process, _overflow_counter;

func overflow_start(argv)
{
  extern _overflow_process, _overflow_counter;
  if (is_void(argv)) argv = "/bin/cat";
  _overflow_process = spawn(argv, _overflow_on_ouput);
  _overflow_counter = 0;
}

func overflow_stop
{
  extern _overflow_process;
  _overflow_process = []; /* trigger end of process if any running */
}

func overflow_send(data)
{
  _overflow_process, data;
}
func _overflow_on_ouput(txt)
{
  extern _overflow_process, _overflow_counter;
  if (txt) {
    write, format="[%d] %d bytes: %s\n",
      ++_overflow_counter, strlen(txt), txt;
  } else {
    /* child died */
    write, format="[%d] 0 byte: <FINISH>\n",
      ++_overflow_counter;
    _overflow_process = [];
    _overflow_counter = 0;
  }
}

First, I've done:
Code:
overflow_start, "/bin/cat";
then
Code:
overflow_send, "";
gives an error (why? better if it could be a no-op). Sending 100 lines, yield a single output (hence not buffered by lines):
Code:
overflow_send, sum(swrite(format="hello %d\n", indgen(100)));
Sending lots of data, yield a single *truncated* output (exactly after 2048 bytes, the rest is lost even if you send something more):
Code:
overflow_send, sum(swrite(format="%d,", indgen(1000)));
overflow_send, "something more";
overflow_stop;

Finally, I tried to "cat" a text file larger than 2048 bytes, e.g.:
Code:
overflow_start,["/bin/cat", "/home/eric/.emacs"];
and the results are:
1. the process terminate before any output;
2. then comes the output(s): sometimes just the first 2048 bytes, sometimes in several pieces (2048 each, but the last one which is smaller) and not in order (as far as I was able to check, there were no missing parts).

I can understand the first point: the "cat" process terminates before Yorick does idle and executes the callbacks of the spawned process. This can be an issue (which, at least, needs to be documented) since the writer of the callback may not be aware that some additional data may arrive soon after the process died. After all, we are speaking with asynchronous i/o, so...

The second point is much more problematic. I cannot figure out whether the problem comes from the input, the output, or both. Again the results depend on the speed of the connection and splitting the sending of data in small parts interleaved with "pause" helps. For instance, the following works:
Code:
for (i=0;i<100;++i) { overflow_send, sum(swrite(format="%d,",indgen(0:99)+100*i)); pause, 1; }

But if you omit the pause, you get several pieces and not in order (it is more or less in reverse order but not totally in reverse order!) and with no missing parts (hopefully). Try for instance, this small script, in batch mode:
Code:
include,"overflow.i",1;
overflow_start;
for (i=0;i<100;++i) overflow_send, sum(swrite(format="%d,",indgen(0:99)+100*i));
quit;
which I saved as "overflow_test.i" while the first block of code above is saved into "overflow.i", then:
Code:
yorick -batch overflow_test.i > overflow_output
and have a look at "overflow_output"...


Fri Dec 11, 2009 2:41 am
Profile WWW
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post 
Okay, let's move the spawn discussion to a new topic.

I just committed a fix for the print output 5000 line limit: The print_format function now allows you to change that limit (from 10 lines max to billions). The default value is still 5000, which I regard as prudent to prevent accidental terminal lockups.

For the input issue, there are many workarounds, and I do not think that attempting to fix it immediately is worth the effort. I agree that it is a worthy long term goal. In fact, I think you should start a movement to get the POSIX standard for the fgets function changed to require that it accept all three EOL conventions, since it's hard to imagine anyone who wouldn't benefit.

If you need a workaround,

include, [text_lines("any_platform.i")];

will parse yorick source no matter what the EOL convention. This is only a partial workaround, since the yorick debugger will not be able to find line numbers, and help will not be able to print DOCUMENT comments, for functions defined in any_platform.i. But it does guarantee that automated scripts will work.


Sat Dec 12, 2009 9:07 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 8 posts ] 

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware for PTF.