Author Topic: partition error in parFEAP  (Read 7293 times)

Shuai Wang

  • Jr. Member
  • **
  • Posts: 23
partition error in parFEAP
« on: April 14, 2016, 03:26:16 AM »
Dear all,

I am trying to build parFEAP on HPC with petsc. After I build parfeap program and run a simple test  by "parfeap -iItest". The following messages comes

Quote
Note: The following floating-point exceptions are signalling: IEEE_DIVIDE_BY_ZERO

And when I run the file by
Quote
mpirun -np 4 parfeap -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package {mumps,pastix,superlu_dist}
Error comes as following:

         Are filenames correct?( y or n; r = redefine all, s = stop) :
 *ERROR* FILNAM: Reinput data
^C[0]PETSC ERROR: ------------------------------------------------------------------------
  • PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the batch system) has told this process to end
  • [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
  • PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
  • [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
  • PETSC ERROR: to get more information on the crash.
  • [0]PETSC ERROR: --------------------- Error Message ------------------------------------
  • PETSC ERROR: Signal received!
  • [0]PETSC ERROR: ------------------------------------------------------------------------
  • PETSC ERROR: Petsc Release Version 3.3.0, Patch 7, Sat May 11 22:15:24 CDT 2013
  • [0]PETSC ERROR: See docs/changes/index.html for recent updates.
  • PETSC ERROR: See docs/faq.html for hints about trouble shooting.
  • [0]PETSC ERROR: See docs/index.html for manual pages.
  • PETSC ERROR: ------------------------------------------------------------------------
  • [0]PETSC ERROR: parfeap on a linux-gnu named hla0001 by sw33dypa Thu Apr 14 12:14:47 2016
  • PETSC ERROR: Libraries linked from /home/sw33dypa/software/petsc-3.3-p7/linux-gnu-c-opt/lib
  • [0]PETSC ERROR: Configure run at Thu Apr 14 11:42:40 2016
  • PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-blas-lib=/shared/apps/intel/2016u2/mkl/lib/intel64/libmkl_rt.so --with-lapack-lib=/shared/apps/intel/2016u2/mkl/lib/intel64/libmkl_rt.so --download-spooles --download-parmetis --download-superlu_dist --download-prometheus --download-mpich --download-ml --download-hypre --download-metis --download-mumps --download-scalapack --download-blacs --with-debugging=0
  • [0]PETSC ERROR: ------------------------------------------------------------------------
  • PETSC ERROR: User provided function() line 0 in unknown directory unknown file

application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
[unset]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0



The feap version is 8.4.1g  petsc version:3.3-p7 OS:CentOS Linux release 7.2.1511 (Core)


I dont know how to fix this problem. I've tried some input files, they all show the same problem.


FEAP_Admin

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 993
Re: partition error in parFEAP
« Reply #1 on: April 14, 2016, 08:39:22 PM »
Before running the actual job you need to partition the problem.  I suggest first trying in interactive mode.

$FEAPHOME8_4/parfeap/feap

followed by selecting the correct input file (Itest), doing the graph partition, and generating the domain files.

Then try issuing the mpirun command.

Shuai Wang

  • Jr. Member
  • **
  • Posts: 23
Re: partition error in parFEAP
« Reply #2 on: April 21, 2016, 03:00:43 AM »
Thanks for your response.

But I think it is not that case. I have done the graph partition and generated Itest_0001 to Itest_0004 by command "$FEAPHOME8_4/parfeap/feap "
Code: [Select]
...
Mesh output:  Filename = Itest_0004
  Number partition nodes =         5701  Number total nodes =       22801
  Start COORD: TIME =   1.72641695   
  Start COORD: TIME =   1.72643304   
  End   COORD: TIME =   1.74428797   
  Start ELEMT: TIME =   1.74432194               0
  End   ELEMT: TIME =   1.75281799               0        5701
  Ghost  Stress Elements =            0  Partn Stress Elements =         5547
  Matrix Store: Diagonal =     1260075  Off-Diagonal =       11050
 
  Total Ghost Stress Elements =         329
  Total Partn Stress Elements =       22171
  Total all   Stress Elements =       22500
Note: The following floating-point exceptions are signalling: IEEE_DIVIDE_BY_ZERO

It seems that after partition, parfeap cannot read the divided Input file. In the fold, there are "Itest Itest_0001 Itest_0002 Itest_0003 Itest_0004 solve.test" But parfeap read Itest instead of Itest_000x


FEAP_Admin

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 993
Re: partition error in parFEAP
« Reply #3 on: April 21, 2016, 09:45:24 AM »
If parfeap is trying to read Itest when you start it, then hit 'n' followed by the 'return' key.  Feap will then prompt you for an input file name.
Now type Itest_0001 and hit  'return'.  It will then prompt you for further file names.  Simply hit 'return' to accept the defaults.  Once it has
all the file names, it will ask you to confirm once more.  Simply hit 'y' follwed by the 'return' key.

Normally parfeap is smarter than this but it can get confused occasionally.

Another thing you can do is edit the file feapname and add _0001 to all the names of files listed.

Shuai Wang

  • Jr. Member
  • **
  • Posts: 23
Re: partition error in parFEAP
« Reply #4 on: April 22, 2016, 06:05:13 AM »
Thanks for your answer.

Yes, It works following your suggestion. It successfully read the input file, but It only calculate the first partition Itest_0001, other partitions are not calculated.
The command I tried are
Code: [Select]
mpirun -np 4 parfeapand
Code: [Select]
mpirun -np 4 parfeap -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package {mumps,pastix,superlu_dist}
My feap version is 8.4, Petsc version is 3.3.p7
Petsc functions well
Code: [Select]
xxx@xxx:/home/wang/Feap/ver84/software/petsc-3.3-p7$ make test
Running test examples to verify correct installation
Using PETSC_DIR=/home/wang/Feap/ver84/software/petsc-3.3-p7 and PETSC_ARCH=linux-gnu-c-opt
C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/examples/tutorials/ex19 run successfully with 2 MPI processes
Fortran example src/snes/examples/tutorials/ex5f run successfully with 1 MPI process
Completed test examples



FEAP_Admin

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 993
Re: partition error in parFEAP
« Reply #5 on: April 22, 2016, 08:06:34 AM »
Why do you think it only computed partition 1?  Are there no output files for the other partitions?  Were there error messages?

Shuai Wang

  • Jr. Member
  • **
  • Posts: 23
Re: partition error in parFEAP
« Reply #6 on: April 22, 2016, 11:07:11 AM »
Yes there are error messages

Code: [Select]
0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: Argument out of range!
[0]PETSC ERROR: New nonzero at (0,23590) caused a malloc!
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 7, Sat May 11 22:15:24 CDT 2013
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: parfeap on a linux-gnu named mfm162 by wang Fri Apr 22 14:59:36 2016
[0]PETSC ERROR: Libraries linked from /home/wang/Feap/ver84/software/petsc-3.3-p7/linux-gnu-c-opt/lib
[0]PETSC ERROR: Configure run at Thu Apr 21 14:05:24 2016
[0]PETSC ERROR: Configure options --download-spooles --download-parmetis --download-superlu_dist --download-prometheus --download-mpich --download-ml --download-hypre --download-metis --download-mumps --download-scalapack --download-blacs --with-blas-lib=/home/wang/software/BLAS/libblas.a --with-lapack-lib=/home/wang/software/LAPACK/liblapack.a --with-debugging=0
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: MatSetValues_SeqAIJ() line 345 in src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: MatSetValues() line 1025 in src/mat/interface/matrix.c
...

and it really calculate the partition 0001 because in my code I let feap output vtk file at every step. And this time, it only output  feap_0001.vtk.0000001 feap_0001.vtk.0000002 feap_0001.vtk.0000003......  no feap.0002.vtk.xxxxx

FEAP_Admin

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 993
Re: partition error in parFEAP
« Reply #7 on: April 22, 2016, 03:53:02 PM »
(1) are you sure that you implemented your vtk output routines correctly?
(2) the "New nonzero at (0,23590)..." indicates that you input files are not correct.  Did you start your partitioning with a flat file?  FEAPs parallel implementation exactly computes the needed memory for PETSc and thus there should never be the need to malloc for an undexpected non-zero.

Shuai Wang

  • Jr. Member
  • **
  • Posts: 23
Re: partition error in parFEAP
« Reply #8 on: April 25, 2016, 05:17:43 AM »
The vtk is right, it have been tested by serial run.

I find where the error comes.   there are two mpirun in the system. One is the default one, the other is in petsc. I should use mpirun in {PETSC_DIR}/{PETSC_ARCH}/bin/mpirun, It works fine. But I dont know the exact reason behind.

Thanks for your kind reply.