parallel vasp.4.6 on Opteron

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.

Moderators: Global Moderator, Moderator

Locked
Message
Author
guest

parallel vasp.4.6 on Opteron

#1 Post by guest » Tue Dec 28, 2004 4:59 am

I try to install VASP 4.6 on an Opteron cluster using PGI 5.2 and the libgoto library.The serial version compiles and runs fine.
However the parallel version (using mpich1.2.1 or 1.2.6) fails to run, giving the following error:

running on 1 nodes
0 - MPI_CART_CREATE : Invalid topology
[0] Aborting program !
[0] Aborting program!
p0_4921: p4_error: : 10

Has anyone been succesful compiling parallel VASP on an Opteron cluster?
Thanks, Mark
Last edited by guest on Tue Dec 28, 2004 4:59 am, edited 1 time in total.

matlgen
Newbie
Newbie
Posts: 4
Joined: Mon Nov 22, 2004 11:30 pm

parallel vasp.4.6 on Opteron

#2 Post by matlgen » Mon Jun 27, 2005 5:52 pm

I had this exact same issue: Compiling and running a serial version of vasp on our Opteron cluster worked fine. The parallel version gave the same error as yours. I did some modifying of the Makefile, and a change that seemed to work for me was to change the FFLAGS. Our Opterons are running 32-bit right now (not our choice), are yours?

Anyway, the solution seems to be changing the FFLAGS to be:
FFLAGS=-Mfree -tp k8-32 -i4
I did this on both of the makefiles (ie the one in vasp.4.lib and vasp.4.6). The -tp k8-32 means that we are running in 32-bit mode. The -i4 means that INTEGERs are 4 bytes. I suspect that an issue could be mpich was compiled with 4 byte integers.

Let us know if that solves your problem, or if you solved it another way.

David
Last edited by matlgen on Mon Jun 27, 2005 5:52 pm, edited 1 time in total.

saurabh
Newbie
Newbie
Posts: 7
Joined: Thu Apr 07, 2005 6:05 am
Location: Calcutta .India

parallel vasp.4.6 on Opteron

#3 Post by saurabh » Thu Jul 07, 2005 5:20 am

Hello,
I am also facing the same problem on Opteron machines. I have already installed 64 bit OS on all the machines. Also I changed the Makefile according to Dave, but that also is giving problems when linking. Can you suggest a way for running VASP parallely on our Opteron cluster, using the 64 bit OS itself. I am using PG Cluster Development Kit 5.2, lbgoto-opt64 library and fftw-3.0.1. What changes I have to make in the Makefile to compile vasp parallely on the 64-bit OS. Please help

Regards,

Saurabh
Last edited by saurabh on Thu Jul 07, 2005 5:20 am, edited 1 time in total.

saurabh
Newbie
Newbie
Posts: 7
Joined: Thu Apr 07, 2005 6:05 am
Location: Calcutta .India

parallel vasp.4.6 on Opteron

#4 Post by saurabh » Thu Jul 07, 2005 6:37 am

Hello all,

And one more thing. I have used -i4 alone keeping the other command line switches in FFLAGS intact. Also I have made the same change in vasp.4.lib. Now the previous error is not coming. The error it hsows now is:

*****************************************
POSCAR, INCAR and KPOINTS ok, starting setup
FFT: planning ... 1
p0_27854: p4_error: interrupt SIGSEGV: 11
Killed by signal 2.
/usr/pgi/linux86-64/5.2/bin/mpirun: line 1: 27854 Broken pipe /home/iacs/vasp/vasp.4.6/vasp -p4pg /home/iacs/TEST/test2/mailtoby/tial2/PI27770 -p4wd /home/iacs/TEST/test2/mailtoby/tial2

******************************************

Please tell me what is going wrong here. Thanx in advance...

Regards

Saurabh
Last edited by saurabh on Thu Jul 07, 2005 6:37 am, edited 1 time in total.

cheathturner
Newbie
Newbie
Posts: 1
Joined: Tue Jan 24, 2006 2:03 pm

parallel vasp.4.6 on Opteron

#5 Post by cheathturner » Tue Jan 24, 2006 2:29 pm

Hello,

Has anyone been able to resolve this issue? I am having the same difficulties. The serial version compiles fine (PGF90 compiler, v.5.2, AMD Opteron, 64-bit SuSE linux). However, after compiling with mpi (version 1.2.5.2) and running with 2 procs, I get the same message:

0 - MPI_CART_CREATE : Invalid topology
[0] Aborting program !
[0] Aborting program!
p0_2981: p4_error: : 10
Killed by signal 2.

If I use the '-i4' flag, I also get a message similar to saurabh:

running on 2 nodes
distr: one band on 1 nodes, 2 groups
vasp.4.6.28 25Jul05 complex
POSCAR found : 1 types and 50 ions
p0_5651: p4_error: interrupt SIGSEGV: 11
Killed by signal 2.

Any suggestions??

Heath
Last edited by cheathturner on Tue Jan 24, 2006 2:29 pm, edited 1 time in total.

tjf
Full Member
Full Member
Posts: 107
Joined: Wed Aug 10, 2005 1:30 pm
Location: Leiden, Netherlands

parallel vasp.4.6 on Opteron

#6 Post by tjf » Tue Jan 24, 2006 7:51 pm

I would suggest getting another MPI library (such as OpenMPI) and building that on your machine, then rebuilding VASP. Or, indeed, rebuild MPICH yourself (I assume you're using a prebuilt library) so that you're sure what's been built and what it's been built with.
Last edited by tjf on Tue Jan 24, 2006 7:51 pm, edited 1 time in total.

applelinux

parallel vasp.4.6 on Opteron

#7 Post by applelinux » Thu Jun 08, 2006 11:25 pm

i have the same problem while compiling parallel vasp. It was solved by setting FFLAGS=-Mfree -tp k8-64 -i8 to FFLAGS=-Mfree -tp k8-64 -i4 . It looks like the default flag for the integer is 4 bytes.
Last edited by applelinux on Thu Jun 08, 2006 11:25 pm, edited 1 time in total.

admin
Administrator
Administrator
Posts: 2922
Joined: Tue Aug 03, 2004 8:18 am
License Nr.: 458

parallel vasp.4.6 on Opteron

#8 Post by admin » Tue Jun 13, 2006 6:50 am

The problem seems to be the length of the integers in the MPICH installation. Please check the byte-length of integer numbers in your MPICH installation. VASP then has to be compiled with the same byte-length for integer numbers.
Last edited by admin on Tue Jun 13, 2006 6:50 am, edited 1 time in total.

gollum

parallel vasp.4.6 on Opteron

#9 Post by gollum » Wed Jul 05, 2006 1:18 pm

I have same problem. and I use both "FFLAGS=-Mfree -tp k8-64 -i8" and "FFLAGS=-Mfree -tp k8-64 -i4 " for each time. But problem is not solved. How can i check a byte-length for integer number?
Last edited by gollum on Wed Jul 05, 2006 1:18 pm, edited 1 time in total.

DavidCGreen

parallel vasp.4.6 on Opteron

#10 Post by DavidCGreen » Sat Aug 05, 2006 5:20 am

Hi Gollum

I think you just need to look in the right header file(s) for your MPI installation.
(you'll probably need the LAM/MPICH/OpenMPI "development" package installed before you can find it)

For LAM, use the grep command and look for LAM_SIZEOF_ lines in the lam_config.h header file.

On Fedora Core 3 (64 bit) the file is /usr/include/lam_config.h
On Ubuntu (32 bit Dapper) the file is /usr/lib/lam/include/lam_config.h

I'm looking to install VASP on a Sun Microsytems V20z (Opteron) cluster and using it in parallel mode via MPI will be essential.

Cheers
David
<span class='smallblacktext'>[ Edited Sat Aug 05 2006, 07:30AM ]</span>
Last edited by DavidCGreen on Sat Aug 05, 2006 5:20 am, edited 1 time in total.

Locked