FEAP User Forum
FEAP => Parallel FEAP => Topic started by: JStorm on July 29, 2020, 02:09:40 AM
-
Is it possible to determine the time, while a process is waiting for the others?
In this way I want to identify slow nodes in an HPC which slow down the whole solution.
-
This is tricky, I think. It will depend a lot on the particular allocation of nodes that the job scheduler is giving you.
PETSc does give a number of options for timing and they do help identify code that is unbalanced; see https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf#chapter.13 .
With regard to particular nodes you could try placing calls to PetscTime in your code (this will return the current time in seconds -- from some reference, typically the epoch). If you print this along with the value of rank, then you will see which nodes arrived at the print statement at what times. This will tell you which nodes are slower than others.
I don't know if there is a Fortran wrapper for PetscTime, so you will just have to try. Also I do not know if it is synchronized across all processes. If not, you can directly use the MPI_Wtime() function which returns a real*8 time in seconds since a fixed reference (like the epoch). The function MPI_Wtick() will give you the resolution. I believe that the MPI clocks are not guaranteed to be synchronized; see the value of MPI_WTIME_IS_GLOBAL.
-
One option for dealing with the sync problem is to do something like:
use mpi
implicit none
real (kind=8) :: myt
#include "setups.h"
myt = MPI_Wtime()
! a bunch of code
myt = MPI_Wtime() - myt
write(*,*) rank,myt
-
Thank you Prof. Govindjee, I will give it a try.