Page 1 of 1

Problem executing in Parallel

Posted: Fri May 05, 2006 11:15 am
by Franky
Hello everybody!

I am running vasp on a 64bit-Linux cluster (2xOpteron CPUs per Node). Compiler is pgf90 5.2-4, MPI Version is MPICH2-1.0.3, vasp version is 4.6.28.

The serial version of vasp runs fine.

The parallel version compiles and gets linked to MPICH2, which I compiled myself using the configure options suggested in the vasp-Makefile. The header mpif.h is copied to the vasp build directory (Do I need to convert to F90? Where is the tool "Convert"? Seems to work this way though.). For testing purposes however I turned the optimization off (vasp and mpich2 with -O0). Compiling with -O3 isnt different though.

Makefile setting:
FC=pgf90
FCL=mpif90
SCA=

After booting the MPI environment (mpdboot -f hosts) with just the local Server (=2 CPUs), I try to start vasp (mpiexec -np 2 ./vasp) in parallel withe following INCAR:
------------------------------------
System = Bulk-Au (fcc)
LPLANE = .TRUE.
NPAR = 2
LSCALU = .FALSE.
NSIM = 2

IALGO = 48
NBANDS = 8
ISMEAR = 1
SIGMA = 0.40
NELM = 5
-----------------------------------

That produces the following output on Stdout:

[cli_0]: aborting job:
Fatal error in MPI_Cart_sub: Invalid communicator, error stack:
MPI_Cart_sub(194): MPI_Cart_sub(MPI_COMM_NULL, remain_dims=0xadda80, comm_new=0xd0f300) failed
MPI_Cart_sub(76).: Null communicator
[cli_1]: aborting job:
Fatal error in MPI_Cart_sub: Invalid communicator, error stack:
MPI_Cart_sub(194): MPI_Cart_sub(MPI_COMM_NULL, remain_dims=0xadda80, comm_new=0xd0f300) failed
MPI_Cart_sub(76).: Null communicator
rank 1 in job 7 rzcluster.rz.uni-kiel.de_43986 caused collective abort of all ranks
exit status of rank 1: return code 13

Does anybody know what went wrong?
I appreciate your help.

Problem executing in Parallel

Posted: Mon May 08, 2006 10:09 am
by alex
Hi Franky,

have you got a 64bit MPICH-executable?

Hth
Alex

Problem executing in Parallel

Posted: Mon May 08, 2006 12:39 pm
by Franky
Hi alex,

compilation parameters are:
F77='pgf77 -Mx,119,0x200000'
F90='pgf90 -Mx,119,0x200000'
FFLAGS='-O0 -tp k8-64 -i8'
F90FLAGS='-O0 -tp k8-64 -i8'
-> export F77 F90 FFLAGS F90FLAGS
-> ./configure --prefix=... --without-romio --without-mpe
-> make && make install

I guess this should give me a 64bit MPICH executable.

Franky

Problem executing in Parallel

Posted: Mon May 15, 2006 2:57 am
by lahaye
Hello,

I ran into same or similar MPI errors on a 32-bits Linux
cluster. I then tried to compile everything against
MPICH-1, which seemed to work fine.
Therefore I never tried to find a solution for the errors
with MPICH2.

Rob.

Problem executing in Parallel

Posted: Thu May 18, 2006 11:29 am
by Franky
Hi,
which version of mpich1 did you use? I tried this also and got some error concerning 'MAPSET'.
Could you give me your compiler options for mpich1 and the way you started the mpi environment?
Thank you.