Keeping output files small for long (10 ns) MLMD runs

Message

jelle_lagerweij · #1 Post by **jelle_lagerweij** » Tue Mar 19, 2024 9:29 am

Dear all,
I have a query regarding VASP input parameters for my molecular dynamics (MD) simulations. While utilizing machine learning-enhanced MD (MLMD) has accelerated my simulations, I'm facing performance limitations due to excessive write actions on my cluster and large file sizes which slow down post processing.

For post-processing, I primarily focus on extracting data from the vaspout.h5 file, including Atomic positions, atomic forces [1], system stresses, system energies (kinetic, potential, total), and temperatures. I need these properties approximately every 10 time steps as sampling them more often is excessive.

In exploring the NWRITE and NBLOCK options in the VASP documentation, I found that these primarily address interaction with OUTCAR or XDATCAR files, which are not directly related to the vaspout.h5 file. However, adjusting the NBLOCK parameter does indeed impact the vaspout.h5 file, controlling the frequency of data recording.

Despite setting NBLOCK, I noticed that forces are still stored at every time step, indicating that the NBLOCK setting does not affect force recording. This behaviour does not differ from what is indicated VASP documentation. Therefore, I do not think that this should be indicated as a bug. However, it would be valuable for file-size management and reducing of the number of write actions for forces similarly as what is currently done for positions.

In conclusion, I seek clarification on retrieving these properties with correct spacing using additional INCAR tags. My focus lies on positions and forces, while the others are either less computationally intensive or required for every time step.

Kind regards,
Jelle Lagerweij

[1] Utilizing the force radial distribution function (RDF) by Borgis et al. and others can significantly improve statistical accuracy, particularly for systems with limited simulation time. Although this may not directly impact MLMD, it serves as a benchmark for comparison between different simulation methods. For further details, refer to: https://doi.org/10.1080/00268976.2013.838316 or https://doi.org/10.1103/PhysRevLett.120.218001. Implementing this rdf calculation method to write to the PCDAT in VASP on the fly itself is not too difficult and might be of great value for people doing expensive and short AIMD simulations.

jelle_lagerweij · #2 Post by **jelle_lagerweij** » Tue Mar 19, 2024 9:51 am

Please, scratch this post at all.
I see that I am at the wrong place. For this, I should use ML_OUTBLOCK and ML_OUTPUT_MODE.
Kind regards,
Jelle

PS: It might be good to indicate the existence of these two tags a bit more on the NBLOCK and NWRITE documentation pages. I believe that it will result in less questions on the forum.

Again, my bad for this useless question.

jelle_lagerweij · #3 Post by **jelle_lagerweij** » Tue Mar 19, 2024 1:03 pm

Oh, and in the end, I still found some strange behavior while using ML_OUTBLOCK and ML_OUTPUT_MODE.
When I use the INCAR show in the code block below (its a simple one). I intent to get my data every 10 time steps

Code: Select all

ML_OUTBLOCK = 10

time steps in a simulation takes 10000 time steps

Code: Select all

NWS = 10000

. I looked into my vaspout.h5 data (I can do it with py4vasp, h5py or in this case using the HDFview.3.3.1 program to visually show it for you guys

. What I see is a bit weird, the position array has a size of (NSW/10, N_atoms, 3) as expected. However, my forces array still has the size of (NSW, N_atoms, 3), my energies array has the size (NSW, 6) (6 for all energy types and temperature). I took a screenshot of the bottom rightmost part of these arrays, see attachment. What happens is that where the position array size is adjusted for ML_OUTBLOCK, it looks like the forces and energies still hold all the time steps, but only write them every 10 time steps. I believe that both approaches are fine from a VASP computation performance standpoint, and believe that the HDF5 compression takes care of the 0's very efficiently. However, when post-processing these files, I need to load all these forces and prefer to do this at once to make them later available in my RAM. I will create a needlessly large array when I use the following code block:

Code: Select all

df = Calculation.from_file(path + '/vaspout.h5')
data_s = df.structure[:].to_dict()
data_f = df.force[:].to_dict()

I believe that I can solve this my slicing my forces smartly as I need every 10th item of this list starting with item 10 (python starts counting at 0, therefore the 9):

Code: Select all

df.force[9::10].to_dict()

and something similar for the energies. However, I must note that this difference in behavior is strange from a user perspective. I think it would be good to make ML_OUTBLOCK more consistent in how it behaves for all properties.

Code: Select all

# Comment in INCAR

# Setting the type of calculation (0 = (AI)MD, 1 2 3 = relaxation method)
IBRION = 0             ! set to MD
ISYM = 0               ! No symetry
LCHARG = .FALSE.        ! make the LCHARG files not writen to save data space
LWAVE = .FALSE.         ! make WAVECAR not writen to save data space

#############################################################
# Setting the MD properties
POTIM = 0.5               ! timestep size in fs
NSW = 10000             ! number of timesteps set

# Setting the thermostat
MDALGO = 2           ! Nosé-Hoover
SMASS = 5            ! lattice mass

TEBEG = 325             ! slightly elevated temperature
TEEND = 325             ! slightly elevated temperature
ISIF = 2                ! Keep the cell shape and volume fixed, however update particle positions
#############################################################

#############################################################
# Turn on machine learning to see what happens
ML_LMLFF  = .TRUE.
ML_MODE = run
ML_OUTPUT_MODE = 0  # reduce output for long runs
ML_OUTBLOCK = 10  # reduce output for long runs

#############################################################
# Additional settings for vasp efficiencies
# Additional settings for vasp efficiencies while reselecting
# NCORE = 4
# LREAL= Auto
KSPACING = 200

jelle_lagerweij · #4 Post by **jelle_lagerweij** » Tue Mar 19, 2024 1:16 pm

I forgot to add the screenshot...
Now it is added.

screenshot of hdf5 file.PNG

jelle_lagerweij · #5 Post by **jelle_lagerweij** » Wed Mar 20, 2024 12:02 pm

Just to showcase what goes wrong if the force data is stored in the way it currently is.
When I unpack the positions and then the forces, things go wrong in my memory. What happens is that decompressing takes additional memory. For the positions, this is fine. The memory allocation sizes are indicated in red. You can clearly see that after loading the position array, the memory allocation drops a bit again. Note that I show % of memory allocation, my total memory amount is 32 GB on this machine.

Now, with consistent storing of the vaspout.h5 positions and forces arrays, we would expect the same memory overhead and the same memory increase when unpacking the forces array. However, what happens is that it also decompressed the 0's in the array and stores them during the unpacking action. My python code then slices the right time steps out of it, however, that is too late already. When the memory reaches 70%, SWAP is activated and parts of my RAM allocation is moved to my hard disk (in this case a fast NVME drive of 1TB). We can see this as the Disk Write Bytes/sec (in green) moves up and disappears out of the screen. I checked and at this point, the python instance running was allocating over 40GB in combinded RAM+SWAP before I even canceled it. This is not surprising as reading the position array allocated approximately 7GB of RAM (x10 would result in 70GB, but I canceled my calculations before it got that far).

Again my code used was:

Code: Select all

path =  'path/to/vaspout.h5'
self.df = Calculation.from_file(path)
skips = h5py.File(path)['/input/incar/ML_OUTBLOCK'][()]  # automatic reading the ML_OUTBLOCK, I assume you can do this with py4vasp as well, but I used h5py here
data_s = self.df.structure[:].to_dict()   # reading the position array
data_f = self.df.force[skips-1::skips].to_dict()  # reading the forces array and slicing directly.

I assume that what goes wrong is that the slicing only occurs after the array is fully decompressed. Is there some h5py code possible where I only read chunks, then slice it and then read the next chunk until I have the forces array? And/or is it possible to change the vasp source code to make the forces (and then immediately the other properties as well) write more consistently?
kind regards,
Jelle

PS: my memory allocation graph over time.

memory_too_much.gif

#6 Post by **andreas.singraber** » Wed Mar 20, 2024 12:52 pm

Dear Jelle Lagerweij,

thank you for your very detailed report! I am sorry it was not very clear from the Wiki that the ML_OUTBLOCK is preferable to NBLOCK in case of MLMD-only runs. I will add a note on the Wiki page of NBLOCK to make this information more visible.

Also, you are 100% correct that there was still issues with the ML_OUTBLOCK tag up until VASP 6.4.2. Indeed, the "sparsification" of per-time-step data was not applied to all output files. We discovered this recently and it should be fixed in the latest version (6.4.3, which was actually released yesterday

)! I hope this little overview of the effects of the tags NBLOCK and ML_OUTBLOCK (until 6.4.2, 6.4.3) makes it more clear:

Code: Select all

| output property          |   NBLOCK    |   ML_OUTBLOCK   |
|                          |             | 6.4.2  | 6.4.3  |
|--------------------------|-------------|-----------------|
|   screen output          |   no        |  yes   |  yes   |
|   OSZICAR                |   no        |  yes   |  yes   |
|   OUTCAR                 |   no        |  no    |  yes   |
|   XDATCAR                |   yes       |  yes   |  yes   |
|   PCDAT              (1) |   yes       |  yes   |  yes   |
|   REPORT                 |   yes       |  yes   |  yes   |
|   ML_LOGFILE             |   no        |  yes   |  yes   |
|   ML_HEAT                |   no        |  no    |  yes   |
|   ML_EATOM               |   no        |  no    |  yes   |
|   vasprun.xml            |             |                 |
|   - structure        (1) |   no        |  no    |  yes   |
|   - forces           (1) |   no        |  no    |  yes   |
|   - stress           (1) |   no        |  no    |  yes   |
|   - time                 |   no        |  no    |  yes   |
|   - energy               |   no        |  no    |  yes   |
|   vaspout.h5             |             |                 |
|   - energies             |   no        |  no*   |  yes   |
|   - forces               |   no        |  no*   |  yes   |
|   - ion_velocities       |   yes       |  yes   |  yes   |
|   - lattice_vectors      |   yes       |  yes   |  yes   |
|   - position_ions        |   yes       |  yes   |  yes   |
|   - stress               |   no        |  no*   |  yes   |
|   - pair_correlation (1) |   yes       |  yes   |  yes   |

yes ... TAG controls output frequency
no .... Output is written every step
* ..... Zeros are written for intermediate steps
(1) ML_OUTPUT_MODE = 0 can disable output completely

I am very sorry for the inconvenience caused by this buggy INCAR tag and I hope that we really covered all issues in the latest release. It would be great if you could try with VASP 6.4.3 and verify that the memory issues are gone. Please let us know if you still find some erroneous behavior, thank you!

All the best,
Andreas Singraber

jelle_lagerweij · #7 Post by **jelle_lagerweij** » Thu Mar 21, 2024 8:40 am

Hi Andreas,
Thanks a lot. I have not had a bug report where the reply was “Solved yesterday”, that is pretty awesome. I am at a conference today and will see if I can get 6.4.3 compiled and running. Otherwise, I will check this tomorrow.
Regards,
Jelle

PS: I have limited storage space on my cluster (even on scratch). At the moment, I delete the .xml immediately after running, and I imagine others doing the same with the .h5 file. Would it be possible to create a tag which turns on/of certain output files. That would save writing (and compression) time/effort, as well as reduce the storage requirements during the simulations.

PPS: I love the overview. It would be perfect if you show it on the vasp wiki pages for both NBLOCK and ML_OUTBLOCK. I think that this would help out quite a lot of future users.

jelle_lagerweij · #8 Post by **jelle_lagerweij** » Fri Mar 22, 2024 8:44 am

Dear Andreas,
I found out that my research group has no license for VASP 6.4.3 (although we have it for 6.4.2). Although I find buying a new license for (in my case) mostly a bugfix a bit over the top, but I will ask my promotor to help out and see if he can update the group license..
Regards,
Jelle

#9 Post by **andreas.singraber** » Tue Mar 26, 2024 3:55 pm

Dear Jelle,

I hope you can get access to the latest VASP version eventually, it contains many bug fixes and new features, not just the ML_OUTBLOCK fix.

However, to get this up and running for the moment I suggest the following mini-fix: in your source code of VASP 6.4.2 go to lines 3464, 3465 and 3849 of main.F and replace NSTEP by NSTEP/LBLOCK_HELP and then recompile the code. This will get rid of the annoying zeros of energies, stresses and forces for all the intermediate steps in the vaspout.h5 file. Of course, all the other output in the table will remain unfixed but at least it may allow you to process the HDF5 file without running out of memory. So you still need turn off stuff with ML_OUTPUT_MODE = 0 and cope with large OUTCAR files. Please note that I just made a very crude test to see if this works... I highly recommend to get the update to the latest VASP version.

All the best,
Andreas Singraber

My Community

Keeping output files small for long (10 ns) MLMD runs

Keeping output files small for long (10 ns) MLMD runs

Re: Keeping output files small for long (10 ns) MLMD runs

Re: Keeping output files small for long (10 ns) MLMD runs

Re: Keeping output files small for long (10 ns) MLMD runs

Re: Keeping output files small for long (10 ns) MLMD runs

Re: Keeping output files small for long (10 ns) MLMD runs

Re: Keeping output files small for long (10 ns) MLMD runs

Re: Keeping output files small for long (10 ns) MLMD runs

Re: Keeping output files small for long (10 ns) MLMD runs