Reply to topic  [ 9 posts ] 
ERROR (_after_work) (BUG?) bad link in _after_func 
Author Message
Yorick Master

Joined: Wed Jun 01, 2005 11:34 am
Posts: 112
Post ERROR (_after_work) (BUG?) bad link in _after_func
I've recently started using the after functionality for a process that runs periodically in the background to keep certain data items in sync between Yorick and Tcl. (We run in a hybrid "Ytk" environment and Yorick can communicate to Tcl via a pipe.)

Occasionally, this error pops up while Yorick is idle:

Code:
ERROR (_after_work) (BUG?) bad link in _after_func
  LINE: 5491  FILE: /opt/alps/yorick/i0/std.i
To enter debug mode, type <RETURN> now (then dbexit to get out)


Looking at std.i at the specified line:

Code:
extern _after_func;  /* work functions for after */
func _after_work { _after_func; }


In the Yorick source code, the error is coming from Y__after_func in task.c around line 366:

Code:
  if (ym_after_i<0 || ym_after_i>=ym_after_n)
    y_error("(BUG?) bad link in _after_func");


In case it helps, the code we're using for this is wrapped up in an oxy group. Here is the function that handles the after stuff:

Code:
func tksync_background(void) {
  after, -, tksync, background;
  tksync, check;
  after, tksync.interval, tksync, background;
}


The middle line, "tksync,check", checks for variables that have changed and then sends changed values over to Tcl. The value "tksync.interval" is 0.25. I cancel the after statement in the first line to prevent having a bunch of them running concurrently (in case someone manually types "tksync,background" a bunch of times).

I'm not sure how to go about troubleshooting this, especially since it's intermittent. Any advice?


Wed Apr 03, 2013 10:43 am
Profile
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post Re: ERROR (_after_work) (BUG?) bad link in _after_func
The logic and data structures for this in yorick/task.c are too complicated for me to remember exactly what I was trying to do. If I had to guess, I'd say that there is something wrong in the linked list maintenance in task.c:yexec_after, probably in the section at the top which handles cancellation. It definitely sounds like a serious bug. No doubt it's another race condition, owing to the lack of clarity about the precise state of the queue of after functions. You could add a logging function to task.c which kept the state of the after stack. If there are too many events to log (if you're adding and cancelling every fraction of a second), you could keep the last hundred events and add an interpreted function to print them on demand after you hit the error.

Again, this is a brutally difficult thing you are trying to do, devising a rapid fire exchange between two processes. Surely there must be an easier way... If it helps, yorick can create an X window as a child of the window of another process (one of the more obscure features of the window command). To use this, you may have to defeat modern X server security features aimed at preventing this sort of thing. My recollection is that's not too bad as long as they are running on the same machine.


Sun Apr 07, 2013 8:15 am
Profile
Yorick Master

Joined: Wed Jun 01, 2005 11:34 am
Posts: 112
Post Re: ERROR (_after_work) (BUG?) bad link in _after_func
Okay, if the issue continues to happen with enough frequency to a nuisance, I'll take a try at logging and will let you know if I find anything.

We're already using the obscure window feature you mention; we commonly embed Yorick windows in Tcl GUIs. (And it works quite well, I'd add.) That doesn't really address this need at all, though. We're trying to keep the values of variables in sync, not plotted content. Also, it's not (usually) a rapid fire exchange. On each call (several times per second), the Yorick function checks to see if the variables on its list have changed. If they have, it sends a message to Tcl to update it. So the only time messages get sent is after variables change values. The variables being monitored do not change often enough to result in rapid fire.


Mon Apr 08, 2013 5:55 am
Profile
Yorick Master

Joined: Wed Jun 01, 2005 11:34 am
Posts: 112
Post Re: ERROR (_after_work) (BUG?) bad link in _after_func
I added some logging. There are four state values of interest: ym_after_n, ym_after_i, ym_after_j, ym_after_k. For brevity, they're hereafter referred to simply as n, i, j, and k. Specifing logging added:

Y__after_func
Printing function name and n,i,j,k at function start.

yexec_after
Printing function name, fndx, dndx, and n,i,j,k at function start.
Printing "returning early" and n,i,j,k if we return early (there's only one point where we return early)
Printing n,i,j,k at end of function.

ym_after
Printing function name, its local i, and n,i,j,k at function start.
Printing "cancelled" if it's cancelled and returns early.
Printing n,i,j,k at end of function.

The only after function I have running is tksync,background. As mentioned elsewhere, this function opens by canceling any existing instances of itself (after, -, tksync, background), then does its work, then schedules a new instance (after, 0.25, tksync, background).

After kicking off the tksync,after loop, the debug output quickly settles into printing the following block repeatedly:

Code:
yexec_after: fndx: -1  dndx: 1583
  n,i,j,k: 4,-1,0,-1
  returning early
  n,i,j,k: 4,-1,0,-1
yexec_after: fndx: -1  dndx: 1583
  n,i,j,k: 4,-1,0,-1
  n,i,j,k: 4,-1,1,-1
ym_after: i: 0
  n,i,j,k: 4,-1,1,-1
  n,i,j,k: 4,0,1,0
Y__after_func
  n,i,j,k: 4,0,1,0


(The first yexec_after is triggered by the cancellation and the second yexec_after is triggered by the rescheduling.)

Theoretically, calling tksync,background shouldn't change stable looping sequence. I can see it briefly altering, but it should settle back into its normal loop since there's still only ever a single item in the after queue at a time.

However, that's not the case. If I type "tksync, background" manually, I get an alternate sequence. Following is the debug output, plus some comments I'm adding with // marks.

Code:
// This block repeats in a stable loop as before
yexec_after: fndx: -1  dndx: 1583
  n,i,j,k: 4,-1,0,-1
  returning early
  n,i,j,k: 4,-1,0,-1
yexec_after: fndx: -1  dndx: 1583
  n,i,j,k: 4,-1,0,-1
  n,i,j,k: 4,-1,1,-1
ym_after: i: 0
  n,i,j,k: 4,-1,1,-1
  n,i,j,k: 4,0,1,0
Y__after_func
  n,i,j,k: 4,0,1,0

// The block starts to repeat but it gets interrupted
yexec_after: fndx: -1  dndx: 1583
  n,i,j,k: 4,-1,0,-1
  returning early
  n,i,j,k: 4,-1,0,-1
yexec_after: fndx: -1  dndx: 1583
  n,i,j,k: 4,-1,0,-1
  n,i,j,k: 4,-1,1,-1

// tksync, background gets called

// Now, the following two blocks repeat
yexec_after: fndx: -1  dndx: 1583
  n,i,j,k: 4,-1,1,-1
  returning early
  n,i,j,k: 4,-1,1,-1
yexec_after: fndx: -1  dndx: 1583
  n,i,j,k: 4,-1,1,-1
  n,i,j,k: 4,-1,2,-1
ym_after: i: 0
  n,i,j,k: 4,-1,2,-1
  n,i,j,k: 4,0,2,0
Y__after_func
  n,i,j,k: 4,0,2,0

yexec_after: fndx: -1  dndx: 1583
  n,i,j,k: 4,-1,0,-1
  returning early
  n,i,j,k: 4,-1,0,-1
yexec_after: fndx: -1  dndx: 1583
  n,i,j,k: 4,-1,0,-1
  n,i,j,k: 4,-1,2,-1
ym_after: i: 1
  n,i,j,k: 4,-1,2,-1
  n,i,j,k: 4,1,2,1
Y__after_func
  n,i,j,k: 4,1,2,1


So those last two blocks repeat, alternating back and forth. It's a stable loop, but over two items instead of one. It appears that it thinks there are two entries to alternate between.

It's possible to get even more instances going. Poking around, I even got n to double to 8 once. Given that there should only ever be one after event going, that should never happen.


Mon Apr 08, 2013 10:38 am
Profile
Yorick Master

Joined: Wed Jun 01, 2005 11:34 am
Posts: 112
Post Re: ERROR (_after_work) (BUG?) bad link in _after_func
Okay, I think I just narrowed down the scope of the bug. It doesn't appear that canceling oxy-type calls works properly.

For example, if I run this:
Code:
> func run {
cont> after, 1, test, run;
cont> write, "test";
cont> }
> test = save(run);
> test, run;
> // wait a while, watching "test" get printed
> after, -, test, run;
> // wait a while, watching "test" still get printed - bug!
> after, -;
> // no more printing happens

I see "test" get printed once per second as expected while I'm waiting. However, "after, -, test, run" doesn't cancel it. It continues to print once per second after that. I have to use "after, -;" to make it actually stop.

I'm not sure how this links to the original bug, or if it even does.


Mon Apr 08, 2013 10:46 am
Profile
Yorick Master

Joined: Tue Mar 07, 2006 10:31 pm
Posts: 125
Location: Meudon, France
Post Re: ERROR (_after_work) (BUG?) bad link in _after_func
For the record, the yorick-svipc plug-in can be used for interprocess communication instead of stdout. I've never done it myself, but I know this works for GUI applications written in python using the sibling python-svipc module. The basic idea is to create you state variables in shared memory and use semaphores to signal changes.

yorick-svipc is based on the very standard (some would say old) System V IPC stack. It's of course not ubiquitous, but it's possible that it exists on all the systems you are interested in (it does on Linux and Mac OS X). Not sure whether Tcl bindings exist. It's packaged at least for Debian (Wheezy) and derivatives, Archlinux and macports.

See: https://github.com/mdcb/yp-svipc


Tue Apr 09, 2013 12:03 am
Profile WWW
Yorick Master

Joined: Wed Jun 01, 2005 11:34 am
Posts: 112
Post Re: ERROR (_after_work) (BUG?) bad link in _after_func
Thanks for the information about yorick-svipc. I took a look into it and it looks like a pretty useful package, but I don't know that it really fits our needs. It looks like I'd have to re-implement python-svipc in Tcl for it to be useful, since yorick-svipc builds a layer on top of System V IPC. And while it looks like Yorick array values can be directly bound into shared memory using shm_var, I don't see anything similar for Python and I don't think Tcl would support such a linkage either. So I'd still have the problem of detecting changes. Are you aware of any Python GUIs that use this? Maybe taking a look at how they use it would help me understand how to apply it to Tcl.


Tue Apr 09, 2013 1:25 pm
Profile
Yorick Master

Joined: Tue Mar 07, 2006 10:31 pm
Posts: 125
Location: Meudon, France
Post Re: ERROR (_after_work) (BUG?) bad link in _after_func
Hi,

I don't know of any public code which does it, and the private code I now is huge and messy. But it works. I asked a colleague to contact you about that.

Regards, Thibaut.


Tue Apr 16, 2013 1:10 am
Profile WWW

Joined: Tue Apr 15, 2008 3:26 am
Posts: 11
Post Re: ERROR (_after_work) (BUG?) bad link in _after_func
Hi all,

So we are using yorick-svipc to update or get values of a pyGTK GUI.
With pyk.i, we had some issues when we tried update and get the new value quickly or when we tried to update a lot of widgets in the init process.

What we have done it's a system based on coins and a listener in python. Actually, we still use pyk for launching a yorick function from python.

If you are still interested, I can put some explanations and codes.

Cheers,

Arnaud


Thu Apr 18, 2013 12:10 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 9 posts ] 

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware for PTF.