Report on NEB VASP installation

Message

pavel · #1 Post by **pavel** » Wed Jun 27, 2007 12:10 pm

There is a contradiction in the VASP manual:
(i) p.23 section 3.5.16 MPI_CHAIN says: "In this case each image must run on one and only one node ..."
(ii) p.81 section 6.54 Elastic band method says "The number of nodes must be dividable by the number of images (...). VASP divides the nodes in groups, and each group then works on one "image"."

After finding this contradiction I had a problem what precompiler flag NGXhalf (serial version) or NGZhalf (parallel version) should be selected. By trial and fail method I found that NGZhalf flag should be used.

Dear VASP admin, it would be useful to make appropriate changes in the documentation.

The following problems were encountered during preprocessing and compilation:
1) The file "symbol.inc" contains the following macro

Code: Select all

#define STOP CALL M_exit; STOP

which is defined in the section #elif defined(MPI_CHAIN).
This macro produces recursion expanding STOP again and again.
Solution is simple and is used in the other section of "symbol.inc":
#define STOP CALL M_exit; stop
Preprocessor appears to be case sensitive and does not expand stop again.

2) Compilation errors were found for pardens.f. The reason was that some parts of the code, which should be compiled if MPI flag is defined, were missed when MPI_CHAIN flag was defined.
Solution was not so elegant: I replaced blocks

Code: Select all

 #ifdef MPI
...
#endif

with

Code: Select all

 #ifdef MPI
...
#elif defined(MPI_CHAIN)
... (here the same block of lines repeated as above)
#endif

In this way description and initialization of some variables is included both for MPI and MPI_CHAIN flags.

#2 Post by **admin** » Thu Jun 28, 2007 9:36 am

the usage of NGXhalf and NGZhalf is clearly defined: the NEB method affords the parallel executable, hence NGZhalf has to be used.
also the usage of the MPI_CHAIN is obvious: if you intend to use as many procesors as IMAGES in INCAR, vasp runs faster if you use -DMPI_CHAIN. If you intend to use more than one processor for each image (ie number of porcessors is an integer multiple of the number of images), -DMPI has to be used.

pavel · #3 Post by **pavel** » Tue Jul 03, 2007 2:52 pm

Thanks for your clarification. If it would be written in the manual this way, it would be no problems with understanding.
Now another problem was discovered: executable compiled with MPI_CHAIN flag failed with an error:
* 253 Invalid operation PROG=m_alltoallv_z ELN=1047(40005df04)
* 252 Floating-point zero divide PROG=fexcg_ ELN=357(4001e2e80)
...
**** 99 Execution suspended PROG=fexcg_ ELN=357(4001e362c)
Called from xcgrad.fexcg ELN=188(4001deec8)
Called from pot.potlok ELN=269(4002f5d18)
Called from elmin ELN=352(40052a404)
Called from vamp ELN=2337(4000208b0)

It looks like there is a division by zero in fexcg_. Do you have some suggestion how to fix it?

On the other hand the executable compiled with NGZhalf and MPI is running the same NEB job without errors. Only ISYM=0 was set to avoid problems with one of the configurations. However this version is extremely slow in comparison with simple relaxation job:
Starting and ending points for NEB were relaxed using 4 CPUs for 1.15 h (convergence reached). This means that using 1 CPU it would require 4.6 h. Taking into account interprocess communication in NEB job with 4 intermediate configurations (one configuration per CPU) one can estimate the time to be about 6-7 h.
The task is now running for 3 days and is not finished yet. However, the intermediate results are OK and there is a convergence. The problem is that each relaxation step now requires about 10 times more (500 sec x 4 CPU=2000 sec per ion relaxation step should be compared with 16000 sec per relaxation step for NEB job).
Is it a normal behavior? Are you expecting that MPI_CHAIN executable will be faster?