Author Topic: integer overflow in feap8.5 integer8 (Read 9651 times)

blackbird · « **on:** March 15, 2018, 03:56:23 AM »

Dear all,

I am using recent version of (par)FEAP 8.5 with PETSC 3.8.3 on a Linux Bull HPC Cluster.

I would like to simulate a problem with a quite large amount of finite elements. Using the input given below, standard feap commands are invoked to create a flat input file, that will be used for a subsequent parfeap application, however, the problem is already the creation of the mesh itself. I pursue to have a fine discretization with 3000x3000x1200 elements, yet, already for the "coarsest" mesh with 125x125x50 by

Code: [Select]

FEAP * * mesh
*,*,1,3,3,8

MATErial 1                                                     
  SOLId
  ELAStic ISOTropic 30e9 0.18

! CART  125  125   50 1 1 1 10
! CART  150  150   60 1 1 1 10
! CART  250  250  100 1 1 1 10
! CART  375  375  150 1 1 1 10
! CART  750  750  300 1 1 1 10
! CART 1500 1500  600 1 1 1 10
! CART 3000 3000 1200 1 1 1 10


BLOCk
CART 125 125 50 1 1 1 10
1 0.00 0.00 0.00
2 0.75 0.00 0.00 
3 0.75 0.75 0.00
4 0.00 0.75 0.00
5 0.00 0.00 0.30
6 0.75 0.00 0.30
7 0.75 0.75 0.30
8 0.00 0.75 0.30

END

BATCH
  OUTM
END
 
STOP

the output is giving a negative amount for both the tangent terms and the average colum height:

Code: [Select]

     E l e m e n t   S i z e   V a l u e s
          h-minimum =  6.0000E-03
          h-maximum =  1.0392E-02

     P a r t i t i o n   1

     E q u a t i o n / P r o b l e m   S u m m a r y:

       Mesh dimension  (ndm) =         3 :  Number nodes     =          809676
       Number dof/node (ndf) =         3 :  Number elements  =          781250
       Element eqs.   (nadd) =         0 :  Number materials =               1
       Number rigid bodies   =         0 :  Number equations =         2429028
       Number joints         =         0 :  Number tang terms=     -1622254014
       Transient integrator  = Static    :  Average col. ht. =            -666

The problem gets worse, for increased number of elements - for 375x375x150 my computer is not able to finish the mesh output anymore. The O-file ends in the mid of an element output and the process will run without doing anything. While the amount of nodes and elements in this case is far below the limitations of integer8, I still suspect an integer overflow to be the possible cause for at least the negative number in tangent terms. Do you agree? Or is there another possible explanation for the problem? Is there any way to make feap handle such kind of large meshes properly?

Best
Christian

FEAP_Admin · « **Reply #1 on:** March 15, 2018, 11:55:20 AM »

The negative numbers has to do with the max integer that can be stored in a 32-bit integer (signed). The code itself is probably ok.

In fact, I added some BCs and solved your with the iterative solver without problem.

The most important things is that no single array dimension exceed 2,147,483,647, which with this problem you are probably ok.

With your ultimate size problem you already have 10,816,207,201 nodes. To solve such large problems you will need to promote the default integer size to 64bit. This mean use the compiler flag that converts integer:: data types to integer (kind=8)::. With gfortran use -fdefault-integer-8, with ifort use -integer-size 64. Also in feap85.f you will need to change ipr to ipr = 1. Then recompile everything!

For the problem with 375*375*150, it seems unlikely that you exceed an array size limit, so I am not sure what is going on there. Please try running in the debugger to provide us with more information, as it worked just fine for me (albeit producing a 3GB+ file).

blackbird · « **Reply #2 on:** March 19, 2018, 01:48:54 AM »

Dear Feap Admin,

in the feap main folder file makefile.in, I have added

Code: [Select]

#------------------------------------------------------------------------
# Other options to be used by the compiler (generally this is blank)

   FOPTIONS = -fdefault-integer-8
   COPTIONS =

and in ./main/feap85.f there is line 131 ff.

Code: [Select]

!-----[--.----+----.----+----.-----------------------------------------]
!     Set ratio for real to integer variables: Set ipr = 1 or 2
                          ! ipr = 1 for equal  length real to integers
      ipr = 1             ! ipr = 2 for double length real to integers

However, this results in the output of

Code: [Select]

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f3ce9279fcd in ???
#1  0x7f3ce9279203 in ???
#2  0x7f3ce89afcaf in ???
#3  0x42cab6 in pzeroi_
	at /home/cs/FEAP85/ver85/program/pzeroi.f:29
#4  0x881205 in setmem_
	at /home/cs/FEAP85/ver85/unix/memory/setmem.f:116
#5  0x77a4b3 in usetmem_
	at /home/cs/FEAP85/ver85/program/usetmem.f:55
#6  0x69d745 in ualloc_
	at /home/cs/FEAP85/ver85/user/ualloc.f:58
#7  0x4c334c in palloc_
	at /home/cs/FEAP85/ver85/program/palloc.f:836
#8  0x41bcd6 in pnewprob_
	at /home/cs/FEAP85/ver85/program/pnewprob.f:620
#9  0x40866b in pcontr_
	at /home/cs/FEAP85/ver85/program/pcontr.f:1009
#10  0x402b15 in feap
	at /home/cs/FEAP85/ver85/main/feap85.f:185
#11  0x402b4c in main
	at /home/cs/FEAP85/ver85/main/feap85.f:191

regardless of the input file given. I am using gfortran 4.8.4.

blackbird · « **Reply #3 on:** March 19, 2018, 01:57:07 AM »

sorry, I just noticed I manually set the gfortran compiler with

Code: [Select]

FF = /usr/bin/gfortran-7
where option -v is output

Code: [Select]

Using built-in specs.
COLLECT_GCC=/usr/bin/gfortran-7
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.2.0-1ubuntu1~14.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=gcc4-compatible --disable-libstdcxx-dual-abi --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.2.0 (Ubuntu 7.2.0-1ubuntu1~14.04)

Prof. S. Govindjee · « **Reply #4 on:** March 19, 2018, 04:24:28 AM »

v7.2.0 is far better that v4.8.4.

Are you sure you built the code from scratch? deleted all *.o and *.a files? feap85.o? feap?

Also please double check "man gfortran" for version 7.2.0 to make sure that -fdefault-integer-8 is the correct command line argument.

blackbird · « **Reply #5 on:** March 19, 2018, 04:53:54 AM »

yes, every *.o and *.a deleted, also the executable itself. The code comes directly out of the repository, nothing changed here.

man gfortran says, -fdefault-integer-8 is valid, however, there is also -finteger-4-integer-8 in order to change every integer(kind=4) into kind=8. Nevertheless, I tried every combination of these two or only one of these options - the problem remains the same.

blackbird · « **Reply #6 on:** March 19, 2018, 06:09:37 AM »

I investigated a little further and found, that the number stored in np(34) is causing the problem. np(34) is randomly changing for different runs with the same input. For the smallest input given above, I obtained np(34)=
2598619
584411
2815195
...

However, the subsequent call of mr(np(34)) in ./ver85/unix/memory/setmem.f:116 results in the segmentation fault, as mr(104368) seems to be the largest index allowed. I am not sure, how these values of np(34) are obtained nor what should be the correct value. Could you give me some help to find the place where this number is calculated and set?

Prof. R.L. Taylor · « **Reply #7 on:** March 19, 2018, 09:12:33 AM »

The values in mr(np(34)) store the LD array which is used to map element degree of freedoms to global ones. In parFEAP there are several maps set follow the "ld(" to see how it is set. DO NOT CHANGE ANY ASSIGNMENTS.

In your coding never use: integer (kind=4) as this cannot be changed by the compile line. Instead always use integer :: as this will normally be 32 bit and can be changed by compile line options.

Hopefully, we have removed all the integer (kind=4) from the release version.

blackbird · « **Reply #8 on:** March 19, 2018, 09:16:38 AM »

One problem here is, that the FOPTIONS in the makefile.in are not considered for the compiler. Therefore, I used

Code: [Select]

  FF = /usr/bin/gfortran-7  -finteger-4-integer-8 -fdefault-integer-8

which enables the solution of the stage 4 problem, i.e. 375x375x150 element discretization - indeed results in a 3GB+ mesh file.

However, next stage (750x750x300) yields a new error:

Code: [Select]

...
         R U N N I N G    F E A P    P R O B L E M    N O W

          --> Please report errors by e-mail to:
              feap@berkeley.edu 

Memory allocation error
 CALLOC() returns NULL pointer
 *ERROR* Insufficient memory for array IX   . Need  3206250000 integer words.
 --> ERRORS OCCURRED: For details see file: Osize

while the information in Osize is limited to

Code: [Select]

 FEAP * * shb                                                                   

     Solution date: Mon Mar 19 17:09:39 2018

              Release 8.5.2h      
              23 January 2018     

     Input Data Filename: Isize                                                                                                                           

     Number of Nodal Points  - - - - - - :169764301
     Number of Elements  - - - - - - - - :168750000

     Spatial Dimension of Mesh - - - - - :        3
     Degrees-of-Freedom/Node (Maximum) - :        3
     Equations/Element       (Maximum) - :        0
     Number Element Nodes    (Maximum) - :        8

     Number of Material Sets - - - - - - :        1
     Number Parameters/Set   (Program) - :      300
     Number Parameters/Set   (Users  ) - :      150
 *ERROR* Insufficient memory for array IX   . Need  3206250000 integer words.

By the way, I used the cmem.c and cmemck.c from folder ./unix/largemem

blackbird · « **Reply #9 on:** March 19, 2018, 09:19:16 AM »

I did not find any integer*4 or integer(kind=4) assignements in the code

Prof. S. Govindjee · « **Reply #10 on:** March 19, 2018, 09:28:05 AM »

You only need to use the -fdefault-integer-8 in the code.

The flag that gets propagated is FFOPTFLAG; thanks for pointing that out.

The error is telling you that the code is trying to allocate 3.2x10^9 integer words. Each of your integers are 8-bytes in size.
Thus at this stage the IX array alone is asking for more than 24GB of memory.

How much RAM do you have on your computer?

Prof. S. Govindjee · « **Reply #11 on:** March 19, 2018, 09:37:52 AM »

One other comment. Once the problem gets too large, using FEAP to make a flat file will get difficult as will partitioning.
There is a helper program in parfeap/partition but it still assumes you have a flat file and can read it.

You maybe able to get around this by constructing your own partitioned files directly with a separate piece of code hardwired for your problem. The details of what goes into the partitioned files is given in the parfeap manual. If your problem is simple, this is perhaps feasible. Otherwise it will be challenging.

blackbird · « **Reply #12 on:** March 19, 2018, 09:43:09 AM »

I have a total of 16GB RAM on my local machine. However, I assumed that exceeding this, it shall be swapped out on the SSD. While this will definitely slow down the process, I hoped to still get the mesh. As stated, the serial paritioning of the problem is kind of the bottleneck here, because the partitions will have much smaller amount of nodes and elements. However, to obtain them, all the meshing needs to be done first.

So you say all I need is to run it on a VM with increased amount of RAM?

blackbird · « **Reply #13 on:** March 19, 2018, 09:47:50 AM »

yes, obtaining the flat input file is the objective here. I really hope to be able to use FEAP for this, as the code given here is only a simple proof of concept for this huge amount of nodes and elements. Of course later on I would like to use FEAP commands to set up boundaries and loading ...

FEAP_Admin · « **Reply #14 on:** March 19, 2018, 10:07:33 AM »

Your swap size is probably not large enough. Notwithstanding, you could wait an eternity if it goes out to swap. You will need to run on a truly large memory machine to perform the partition.

FEAP User Forum

News:

Author Topic: integer overflow in feap8.5 integer8 (Read 9651 times)

blackbird

integer overflow in feap8.5 integer8

FEAP_Admin

Re: integer overflow in feap8.5 integer8

blackbird

Re: integer overflow in feap8.5 integer8

blackbird

Re: integer overflow in feap8.5 integer8

Prof. S. Govindjee

Re: integer overflow in feap8.5 integer8

blackbird

Re: integer overflow in feap8.5 integer8

blackbird

Re: integer overflow in feap8.5 integer8

Prof. R.L. Taylor

Re: integer overflow in feap8.5 integer8

blackbird

Re: integer overflow in feap8.5 integer8

blackbird

Re: integer overflow in feap8.5 integer8

Prof. S. Govindjee

Re: integer overflow in feap8.5 integer8

Prof. S. Govindjee

Re: integer overflow in feap8.5 integer8

blackbird

Re: integer overflow in feap8.5 integer8

blackbird

Re: integer overflow in feap8.5 integer8

FEAP_Admin

Re: integer overflow in feap8.5 integer8