Hi VASP developers,
I hope you are doing well.
I have installed the GPU version of VASP, and I got a speedup of 12x for my system with 82 atoms compared to the CPU-only version. I would like to optimize some settings to fully utilize my cluster’s capabilities, especially as I plan to scale to larger systems with more atoms (100 Pt atoms).
Currently, I have access to 2 GPUs and 40 CPU cores per node, and 4 nodes in total. I am running a geometry optimization on a single node using the two GPUs, with NTASKS=2 and 16 CPUs per task. I use k-point parallelization with KPAR=2 and didn't specify NPAR (default). Given this setup, is there a way to further increase the per-node performance with the resources available? Which parameters (e.g., NCORE, NPAR, or others) should I prioritize adjusting first for benchmarking and improved scaling?
Thank you for your guidance, and happy to provide more info if needed.