FEAP User Forum

FEAP => Parallel FEAP => Topic started by: JStorm on May 23, 2019, 07:37:49 AM

Title: Main memory at ParFEAP 8.4
Post by: JStorm on May 23, 2019, 07:37:49 AM
I have observed that the allocated main memory for each process is increasing (perfectly) linear during simulation.

Moreover, if I use the restart feature in order to continue the simulation after half of the time steps, the required main memory seems to start with the initial amount of space. It is again linearly increasing, even with the same slope. Thus less memory is required.

It seems to be a memory leak. Have someone observed this behaviour too?

Increasing memory is very disturbing at HPC simulations. It would be really nice to identify the reason and to overcome this.

I have performed simulations with a user material and user macro so far. Both do not allocate dynamic arrays. I will soon create and provide a benchmark with standard ParFEAP features only.

Title: Re: Main memory at ParFEAP 8.4
Post by: JStorm on May 23, 2019, 07:51:28 AM
I have further plotted the size of the dynamic FEAP arrays via "show dict". Nothing is changing during the simulation.

Maybe PETSc has increasing memory consumption? Is there a simple way to get some memory information from PETSc?
Title: Re: Main memory at ParFEAP 8.4
Post by: Prof. S. Govindjee on May 23, 2019, 05:02:37 PM
There are some command line options to see memory information in petsc.  see the petsc manual. there are probably some other subroutines to call to get petsc memory information inside the code.  perhaps we can set us a user macro to call these and then figure out where the memory is get bumped.  lastly valgrind may reveal the issue.

when you get your pure feap case demonstrating the problem we will also test on our end.
Title: Re: Main memory at ParFEAP 8.4
Post by: JStorm on May 27, 2019, 04:43:32 AM
I have figured out, that the memory leak is only occurring for simulations with lots of iterations per time increment (>50). For usual number of iterations (<10) I could not measure an significant increase in memory. The memory is not released at the end of the time increment, but is growing along the whole simulation.
Title: Re: Main memory at ParFEAP 8.4
Post by: Prof. S. Govindjee on May 27, 2019, 12:50:29 PM
Thanks.  For the files.  I'll give them a try.  Are the parameters set to produce the error?  At what time step will it start? At what iteration?
Also what are you monitoring that is showing you the the memory growth?
Title: Re: Main memory at ParFEAP 8.4
Post by: JStorm on May 27, 2019, 10:59:06 PM
The parameters have shown the memory leak for me. The memory is increasing linear continuous along the simulation. I guess, that the memory is slightly increasing with each iteration. Thus, with many of iterations the effect becomes visible.

I have studied the resources with the profiling feature of my HPC (based on SLURM job management). In this way, I can analyse the main memory of each MPI process. I can also provide the profile file (*.H5), if this is helpful for you.

Thank you for your help.
Title: Re: Main memory at ParFEAP 8.4
Post by: Prof. S. Govindjee on May 29, 2019, 03:40:18 PM
I agree there is a growing resident memory foot print. 
I have determined that the growth does not occur due to memory mallocs from Petsc,
rather it is someplace else in the fortran memory system.

Petsc's malloc space is constant at 3307888 (bytes).  But the total grows from 45436928 (bytes) to 107515904 (bytes) [Note I changed ns=100 to ns=500 to have 410 steps].

One comment about the input files that you posted.  The Iplate file has a tie command in
it.  You should first run an OUTMesh to generate a flat file and then use that to generate your
partitioned input files.

The question now is why this is occurring.  Does it have to do with fortran not performing memory clean up when it should? or is it because we have
let something get lost.  I'll try to dig some more. 

I am attaching a UMACro command that you can call to have the data printed to the output file.  Command is called PMEMory.
Title: Re: Main memory at ParFEAP 8.4
Post by: Prof. S. Govindjee on May 30, 2019, 11:58:59 AM
I have a fix that works with OPENMPI (but not yet with MPICH).

In psetb.F  make the following changes:

1.  redimension req and add reqcnt:  integer    usolve_msg, req(ntasks),reqcnt
2.  Before the first loop:
      req( : ) = 0
      reqcnt = 0
3. Change the send:
          reqcnt = reqcnt + 1
          call MPI_Isend( sdatabuf(soff+1), sbuf, MPI_DOUBLE_PRECISION,
     &                    i-1, usolve_msg, MPI_COMM_WORLD, req(reqcnt),
     &                    ierr)

4.  Right before the call to mpi_barrier add
     call MPI_WaitALL(reqcnt,req,MPI_STATUSES_IGNORE,ierr)
Title: Re: Main memory at ParFEAP 8.4
Post by: Prof. S. Govindjee on May 30, 2019, 01:16:05 PM
Small clean up if you ever decide to use the umacr49.F.  This version calls MPI_AllReduce instead of our home grown version.
Title: Re: Main memory at ParFEAP 8.4
Post by: JStorm on May 31, 2019, 02:18:03 PM
To be honest, I do not understand your solution (I am not experienced with MPI). I will try it out next week. Do you think that the memory leak is due to MPI objects, which should be destroyed after usage?
Title: Re: Main memory at ParFEAP 8.4
Post by: Prof. S. Govindjee on May 31, 2019, 06:29:05 PM
Yes that is the main issue.  When using MPI_ISend( ) one needs to also use an MPI_Wait( )/MPI_WaitAll( ).
Other parts of PETSc are creating some memory issues but a dominant one is this issue.  Note that MPICH
is not playing nicely on my machine with respect to this issue, but OPENMPI seems to respect the MPI standard here.
Title: Re: Main memory at ParFEAP 8.4
Post by: Prof. S. Govindjee on June 06, 2019, 01:57:55 PM
If you use the patch above for psetb.F also in pfeapsr.F then the memory leak is fixed for both openmpi and mpich :)
Title: Re: Main memory at ParFEAP 8.4
Post by: JStorm on June 18, 2019, 07:51:05 AM
If you use the patch above for psetb.F also in pfeapsr.F then the memory leak is fixed for both openmpi and mpich :)

Dear Prof. Govindjee,

I have tried your changes to psetb.F and pfeapsr.F with MPICH at the HPC. Now the memory size is perfectly constant. Many thanks for that. Your fix helps me a lot to perform studies at the HPC without wasting resources or using restarts frequently in order to keep memory small.

 With kind regards
Title: Re: Main memory at ParFEAP 8.4
Post by: Prof. S. Govindjee on June 18, 2019, 09:01:04 PM
If you search the files there are few other MPI_Barrier calls before which you should insert mpi_waitall calls; this will eliminate all the memory growth issues.

These will all be fixed in the next release, which will hopefully come out later this summer or early in the fall.
Title: Re: Main memory at ParFEAP 8.4
Post by: JStorm on June 18, 2019, 10:43:44 PM
Thank you for support. I have applied your fix to pfeapmi.F, pfeapsr.F, psubsp.F, psetb.F, scalev.F.