Reply to topic  [ 13 posts ] 
mpy2 arrives 
Author Message
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post mpy2 arrives
Today I committed an all new version of the yorick MPI package, mpy. Mpy is part of the main yorick distribution, found in the mpy subdirectory.

The new mpy is incompatible with the old. The old API had serious defects which made it nearly impossible to write correct programs. If you are using it, you should convert your mpy programs to the new API. The old mpy is available as the yorick-mpy package from CVS. Do not confuse this with the recommended mpy package, which is part of the main distribution!

The mpy/README file contains installation and usage details. There are a few trivial sample parallel programs in mpy/testmp.i.


Sun Feb 28, 2010 3:59 pm
Profile
Yorick Guru

Joined: Sat Jan 22, 2005 2:44 pm
Posts: 86
Location: Pasadena, CA
Post 
Thanks Dave. Tests in testmp.i are ok with:
mpirun (Open MPI) 1.4.1
Altix Linux 2.6.16.54-0.2.3-default #1 SMP ia64 GNU/Linux

I do have a small issue with "mpirun -n N mpy" not returning
an interactive prompt... looking into it.


Thu Mar 04, 2010 1:50 pm
Profile YIM
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post 
The absence of a prompt (actually incorrect ordering of the prompt) is standard behavior on many MPI platforms. For example, under SLURM, the starting command is "srun" and in order to get the prompt correctly, the command line is:

srun -i0 -u -n N mpy

The -i0 says that stdin is to be delivered only to rank 0 (the default is to deliver stdin to all ranks), and the -u says you want unbuffered output (the default is to buffer stdout from all the ranks in some unspecified way, which is where the prompt gets lost). The insane defaults are typical of many if not most MPI environments, which appear to be specifically designed to make it impossible to work interactively. We have complained about this and many other user unfriendly misfeatures of MPI (the absence of any way to deliver SIGINT is the worst) for literally twenty years with zero effect. So please add your voice any way you know how...

So look up the features (misfeatures) of your particular brand of mpirun and see whether there are any options you can give it to get sane behavior of prompts... (By the way, the standard installation of the mpich package on Ubuntu does deliver prompts in the expected order -- that is, before you type -- if you start your programs with mpiexec, so not all defaults are insane on all platforms...)


Sat Mar 06, 2010 8:31 am
Profile
Yorick Guru

Joined: Sat Jan 22, 2005 2:44 pm
Posts: 86
Location: Pasadena, CA
Post OpenMPI & mpy
Thank you for the reply. Yes, I had tried a few things like making sure that stdin is returns on rank 0 (-stdin 0.) I have also tried several other things like using "mpiexec" instead of "mpirun," redirecting rank 0 stdin to a new xterm, starting from a fresh yorick CVS install, or varying the number of processes requested : "mpirun -np N {-stdin 0 ; -xterm 0} mpy" always starts N porcesses on N cpus for N>1. For N==1, the failure message is:
"ERROR (VM idle or lost) MPI initialization sequence failed"
The interactive prompt is always delivered for N==2, sometimes for N==3, in even fewer instances for N==4, and so on.

When a new xterm is requested, the terminal is always started, and correctly titled "Rank 0." With N==2, the yorick prompt appears with mpy correctly initialized. When the prompt is not returned (N>2) there is no prompt until I do a ^C interrupt, which sends me into mpy debugging mode somewhere in the mpy initialization (usually in "graph.i", maybe elsewhere sometimes? ....)
^C
(BUG) lost function produced following error:
WARNING 1 ranks report fault, parallel task halted
ERROR (VM idle or lost) Keyboard interrupt received (SIGINT)
ERROR (*main*) non-numeric data type in unary -
WARNING detailed line number information unavailable
now at pc= 1 (of 22), failed at pc= 5
LINE: 1772 FILE: /u/uav1/trm/Yorick/yorick-cvs/mpy/../relocate/i0/graph.i
To enter debug mode, type <RETURN> now (then dbexit to get out)
dbug> mp_size
[]
dbug> mp_rank
[]

I also noticed that when the prompt is returned, the rank 0 process is idle (CPU==0,) while the accumulated CPU time is non-zero and steadily increasing for all other rank processes. When the prompt is not returned all processes are accumulating CPU, rank 0 included.

I will try to install MPICH .


Tue Mar 09, 2010 10:34 am
Profile YIM
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post 
I just fixed a bug in mpy.c that caused deadlock during startup with some message arrival sequences. So you might want to try getting a fresh version from CVS. It sounds like one of your problems might be fixed.

When the number of processes equals 1, mpy will always fail -- that's a feature, not a bug. Obviously you should just run the serial code.

The fact that mp_rank and mp_size are both [] is interesting. There are some things that are not initialized properly until mpy.i is included, which would happen long after graph.i. Anyway, there is an outside chance that the bug I just fixed will make all your problems go away.

I'm still having intermittent problems with the error recovery (testmp4).


Tue Mar 09, 2010 1:50 pm
Profile
Yorick Guru

Joined: Sat Jan 22, 2005 2:44 pm
Posts: 86
Location: Pasadena, CA
Post 
Thank you for the fix. I switched to MPICH and all tests in "testmp.i" pass, and I do get an interactive prompt.

testmp4 seems to work as planed too :
testmp4(N) -> dbug@N, for N>1
usual debug for N==0

I can start thinking about using mpy....


Tue Mar 09, 2010 4:41 pm
Profile YIM
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post 
Yesterday's bug fix was not quite correct -- the fix left error recovery broken with a race-like condition that caused intermittent fatalities during error recovery. I think the version as of last night (maybe eight hours ago) fixes all the problems with mpy_get_next blocking logic.

Thierry, I'm very interested to know if this bug fix makes mpy run under your original (OpenMPI?) MPI installation. If not, there are still some serious issues with the code which I'd like to document even if I can't fix them immediately. Just because MPICH delivers an interactive prompt and the other environment doesn't is no reason mpy should not work there -- it just means they're blowing off attempts to use their MPI interactively, which is the prevalent attitude in the MPI developer community. But having an mpy that is usable in batch mode under all MPI environments is an important goal. Even more importantly, if mpy doesn't run under some MPI environment, it almost certainly means there is a bug in mpy. That means it is merely an accident that mpy runs in other environments. But it can easily happen that the bug is so intermittent in the working environments that it is impossible to track it down there... So the MPI environment where it breaks is actually a very valuable environment for code development! If you can rebuild the new MPI in your old environment and check that it works there, I'd really appreciate it! And if it doesn't work -- and I don't mean just the prompts not being delivered, which we know is a feature of that environment -- please give me specifics about that MPI so I can try to track it down.


Wed Mar 10, 2010 8:55 am
Profile
Yorick Guru

Joined: Sat Jan 22, 2005 2:44 pm
Posts: 86
Location: Pasadena, CA
Post 
The latest CVS passes all tests and gives a prompt on this machine:
mpirun (Open MPI) 1.4.1
Linux 2.6.16.54-0.2.3-default #1 SMP ia64 GNU/Linux (Suse/SGI)
Intel icc (ICC) 11.0 20090131

I will always be glad to test stuff. I might even do it right if there is
not too much thinking involved ;)


Wed Mar 10, 2010 10:35 am
Profile YIM
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post 
More growing pains. There have been a few more minor bug fixes, but the big recent change is the addition of mpool.i, a new (hopefully improved) pool-of-tasks manager. The new mpool.i should be easier to use than the mpy1 pool manager. Its new features include automatically collecting timing statistics to help decide how many slaves you ought to use, plus support for nested pools and multiple simultaneous pools. This continues the theme of better support for extremely large numbers of processors.

Also, there have been some questions about how to use mpy1.i. The answer to most questions is that you need to modify most mpy1 .i files by adding a single line at the top of the file:

require, "mpy1.i";

This is especially true for startup .i files belonging to compiled packages that use mpy. Note also (as it says in the README) that you may need to put other require statements at the top of such files to force load anything which used to be autoloaded under mpy1. Read mpy/README for more.


Sun Mar 14, 2010 10:41 am
Profile
Yorick Master

Joined: Sun Sep 26, 2004 10:33 am
Posts: 150
Location: Australia
Post 
I seem to have issues making it work.
Just let me summarize to check I haven't done any mistake. It's not super clear from the doc just how this thing should be invoked.

1) build:
1.1) yorick installed as relocatable (yorick cvs tuesday april 6 2010)
1.2) using mpich2 (version 1.2.1???)
1.3) made sure path is adjusted to use this yorick
1.4) in mpy, ./configure --mpicc=/opt/mpich2/bin/mpicc
1.5) make && make install go without warning nor error.

2) run: the README mention a "mpy -j somefile.i" but doesn't say up front really how to start it. This thread mentions running "mpirun -n N mpy"
I guess the second version is the correct one, and what you meant with the -j was:
mpirun -n N mpy -j somefile.i

ok, so I had a number of issues:

I have a race condition. Starting:
$ mpirun -n 2 mpy
one of my cpu max out. I get the prompt, no error on stdout and I can type commands. With -n 4 both my cpu max out.
Also, there is an error when I try to quit:
Quote:
poliahu:yorick-2.1 $ mpirun -n 2 mpy
Copyright (c) 2005. The Regents of the University of California.
All rights reserved. Yorick 2.1.05x ready. For help type 'help'
> mp_rank
0
> mp_size
2
> quit
WARNING 1 ranks report fault, parallel task halted
ERROR (vunpack[74]) first argument must be byte stream from vpack to unpackType <RETURN> now to debug on rank 1
>
LINE: 2400 FILE: /home/frigaut/temp/yorick-2.1/relocate/i0/std.i

Entering dbug mode on rank 1, type dbexit to exit
dbug@1> quit
> quit
poliahu:yorick-2.1 $


The thing quits properly, I don't have to ctrl-C or anything.

Now this particular error also shows when I run testmp:
Quote:
poliahu:yorick-2.1 $ mpirun -n 2 mpy -j mpy/testmp.i
WARNING 1 ranks report fault, parallel task halted
ERROR (vunpack[74]) first argument must be byte stream from vpack to unpackType <RETURN> now to debug on rank 1
>
LINE: 2400 FILE: /home/frigaut/temp/yorick-2.1/relocate/i0/std.i
quit
poliahu:yorick-2.1 $


OK, so here are some more information:

Quote:
poliahu:yorick-2.1 $ mpich2version
MPICH2 Version: 1.2.1p1
MPICH2 Release date: Unknown, built on Tue Apr 6 08:35:12 CLT 2010
MPICH2 Device: ch3:nemesis
MPICH2 configure: --prefix=/opt/mpich2 --enable-sharedlibs=gcc --with-pm=gforker:mpd --with-python=python
MPICH2 CC: gcc -march=x86-64 -mtune=generic -O2 -pipe -O2
MPICH2 CXX: c++ -march=x86-64 -mtune=generic -O2 -pipe -O2
MPICH2 F77:
MPICH2 F90:
poliahu:yorick-2.1 $ gcc --version
gcc (GCC) 4.4.3 20100316 (prerelease)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

poliahu:yorick-2.1 $ uname -a
Linux poliahu 2.6.32-ARCH #1 SMP PREEMPT Mon Mar 15 20:44:03 CET 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T9550 @ 2.66GHz GenuineIntel GNU/Linux
poliahu:yorick-2.1 $


I'm going to try out openmpi and report.

edit:
It does exactly the same thing with openmpi:
Quote:
ERROR (vunpack[74]) first argument must be byte stream from vpack to unpack

hum... gcc related?


Last edited by francois on Tue Apr 06, 2010 6:54 am, edited 1 time in total.



Tue Apr 06, 2010 6:33 am
Profile WWW
Yorick Master

Joined: Tue Mar 07, 2006 10:31 pm
Posts: 125
Location: Meudon, France
Post 
Two comments:

- MPY is now part of Debian (unstable). Two versions are built: for OpenMPI and for MPICH2.

- last time I checked, testmp failed under LAM. I don't care much since LAM is being obsoleted by OpenMPI.

Regards, Thibaut.


Tue Apr 06, 2010 6:47 am
Profile WWW
Yorick Master

Joined: Sun Sep 26, 2004 10:33 am
Posts: 150
Location: Australia
Post 
OK, the previous post was on archlinux btw. I just now tried on ubuntu (Lucid beta1 updated to current snapshot).

It's a little bit different on lucid: I don't get any error and the testmp() runs without error, but I still have the cpu maxed out (from what I assumed earlier to be a race condition, but I'm not sure anymore...).

Quote:
frigaut@frigaut-laptop:~/yorick-2.1/mpy$ mpirun -n 2 mpy -j testmp.i
Copyright (c) 2005. The Regents of the University of California.
All rights reserved. Yorick 2.1.05x ready. For help type 'help'
> testmp
testmp2 passed on all 2 ranks
testmp3 passed on all 2 ranks
begin testing mpool
mpool finished (vpack) nerrors=0
mpool self=1 finished (vsave) nerrors=0
mpool list= finished (vpack) nerrors=0
> quit
frigaut@frigaut-laptop:~/yorick-2.1/mpy$


System and package info:
Quote:
frigaut@frigaut-laptop:~/yorick-2.1/mpy$ mpich2version
MPICH2 Version: 1.2.1p1
MPICH2 Release date: Unknown, built on Fri Apr 2 05:16:59 UTC 2010
MPICH2 Device: ch3:nemesis
MPICH2 configure: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --libexecdir=${prefix}/lib/mpich2 --disable-maintainer-mode --disable-dependency-tracking --disable-silent-rules --srcdir=. --enable-sharedlibs=gcc --prefix=/usr --enable-f90 --sysconfdir=/etc/mpich2 --includedir=/usr/include/mpich2 --docdir=/usr/share/doc/mpich2
MPICH2 CC: gcc -g -O2 -g -Wall -O2 -O2
MPICH2 CXX: c++ -g -O2 -g -Wall -O2 -O2
MPICH2 F77: gfortran -g -O2 -O2
MPICH2 F90: f95 -O2
frigaut@frigaut-laptop:~/yorick-2.1/mpy$ gcc --version
gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

frigaut@frigaut-laptop:~/yorick-2.1/mpy$ uname -a
Linux frigaut-laptop 2.6.32-19-generic #28-Ubuntu SMP Thu Apr 1 10:39:41 UTC 2010 x86_64 GNU/Linux
frigaut@frigaut-laptop:~/yorick-2.1/mpy$


This is on a (almost) fresh install of Ubuntu Lucid in a virtual machine.


Tue Apr 06, 2010 7:42 am
Profile WWW
Yorick Master

Joined: Mon Nov 22, 2004 9:43 am
Posts: 354
Location: Livermore, CA, USA
Post 
As the document sys, there is no standard way to invoke an MPI program. The recommendation in the MPI2 document is:

mpiexec -n #cpus program program_args

This seems to work under most standard unix distro mpich and openmpi installations, but you may very well need numerous additional switches to make any MPI program run correctly interactively -- it's part of the MPI policy of user hostility. For example, under SLURM, you need

srun -n #cpus -u -i0 program program_args

As Thibaut says, what used to be called "LAM" is now called "OpenMPI", and I've run on Ubuntu under both that and MPICH2. I didn't notice the CPUs running when it's not doing anything, but I may not have noticed if they were. You should find another MPI program to be sure this is not a feature of MPI rather than mpy. MPI is not designed to run on timesharing systems, so they may implement the blocking I/O during message passing by a simple polling loop. I do not know of any other interactive MPI programs, although I'm pretty sure python has an MPI wrapper. If you try that, post back here and let me know how it compares.

Generally, people have been having a very hard time coaxing mpy to build and run, so you're not alone. By the way, MPI has been like this for 20 years and shows no signs of improving with age.


Wed Apr 14, 2010 6:15 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 13 posts ] 

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware for PTF.