Reply to topic  [ 3 posts ] 
bug in streplace 
Author Message
Yorick Guru

Joined: Wed Nov 24, 2004 12:51 pm
Posts: 97
Location: Observatoire de Lyon (France)
Post bug in streplace
There is a bug in streplace: if the input string array has some null's, then all remaining elements are skipped. I've looked at the code and could not figure out where was the bug... The following example illustrates the problem for the out-of-place and in-place operation:
Code:
> s = ["hello", "world", string(), "and", "everybody"];
> sel = strfind("o",s);
> sel
[[4,5],[1,2],[0,-1],[3,-1],[6,7]]
> strpart(s,sel)
["o","o",(nil),(nil),"o"]
> streplace(s,sel,"#");
["hell#","w#rld",(nil),(nil),(nil)]
> streplace, s, sel, "#";
> s;
["hell#","w#rld",(nil),"and","everybody"]


Mon Oct 06, 2008 11:15 pm
Profile WWW
Yorick Padawan

Joined: Fri Nov 19, 2004 9:47 am
Posts: 20
Location: Bonn, Germany
Post 
Hello,

This might not help, but we had the same type of problems when coding replacement functions: Let's say we have the following function definition
Code:
func my_old_str_to_double(str)
{
    a = array(0.0, dimsof(str));
    sread, str, a;
    return a;
}   

calling
Code:
> my_old_str_to_double(["1","43.1242","ww","3495","349872.22342"])

returns
Code:
[1,43.1242,0,0,0]

as you can see, having characters here makes trailing zeroes for all the array. Maybe is it the same type of problem you have in your case ?

For information, the solution we found is the following :
Code:
func my_new_str_to_double(str)
{
    a = array(0.0, dimsof(str));
    sread, str, a;

    if(anyof(a==0))
    {
        w = where(a==0);
        for(k=1;k<=numberof(w);k++)
            sread, str(k), a(k);
    }

    return a;
}

This is ugly coding but works, as soon as the array of strings is not too long


Tue Oct 14, 2008 5:33 am
Profile WWW
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post 
This streplace bug fixed with version 1.4 of yorick/ystr.c, just committed to CVS.

The behavior of sread is a feature, not a bug. The sread function is identical to read, except the lines of input from a terminal or a file are replaced by consecutive elements of a string array. When any read encounters a "matching failure" (see the man page for the ANSI C library function scanf), the read stops. Hence, when sread is looking for a number with an implied "%g" format (the default format for readiing a double) and it encounters "ww", the read operation stops, and sread returns the number of items it read. Another example you might want to consider is

Code:
x=array(0.,5);
sread, ["1.2", "3.4", "5.6 7.8 9.10", "-2.1", "-4.3"], x;


This sread succeeds, but the values returned are [1.2, 3.4, 5.6, 7.8, 9.10], not [1.2, 3.4, 5.6, -2.1, -4.3]. Yorick does not have a function that converts an array of strings into an array of doubles on the assumption of one value per element of the string array. If I had to write such an interface, I would include some way to specify what doble value was returned for array elements which were not numbers. You also have to make a decision about what constitutes a valid number. (For example, "three dimensional" is clearly not a number, but should "3D" return the number 3? How about "3.14 mm" or "299.79 mm/ns"?) This is one of those things that seems easy at first, but actually turns out to be an exercise in artificial intelligence. Here is a compromise I might be willing to make:

Code:
func strtod(s, nan=)
{
  if (is_void(nan)) nan = -1.e99;
  d = array(double(nan), dimsof(s));
  s = strtok(s)(1,..);
  i = array([0,1], dimsof(s));
  s1 = strpart(s, i);
  list = where((s1=="-") | (s1=="+"));
  if (numberof(list)) s1(list) = strpart(s(list), ++i(,list));
  list = where(s1==".");
  if (numberof(list)) s1(list) = strpart(s(list), ++i(,list));
  list = where(strlen(s1) & (s1>="0") & (s1<="9"));
  if (numberof(list)) {
    dd = array(0., numberof(list));
    if (sread(s(list), dd) != numberof(dd))
      error, "first token ambiguous, cannot decide if it is a number";
    d(list) = dd;
  }
  return d;
}


This would not accept "3D", but would accept "3 D" as a number, and "D3" would return the nan= value, instead of blowing up with the ambiguous error like "3D".

I do not include a strtod function in the distribution yorick. Should I? Is the above one good enough, or do people have other ideas about the level of artificial intelligence that is appropriate? Note that it is a very large amount of work to figure out in advance what strings will be accepted as complete numbers by sread, so most practical suggestions will involve errors being generated like this strtod. I feel it is probably a good thing to generate some kind of error for whatever you consider "ambiguous", because I can easily imagine cases where just silently returning the nan value is not a good idea.

Finally, I note that in the reverse direction, swrite does the job: swrite(format="%g",x) has the dimensions of x, with the values converted to strings.


Sun Oct 19, 2008 3:22 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 3 posts ] 

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware for PTF.