Author Topic: Convergence problems with v8.6 (very slow compared to v8.5)  (Read 32011 times)

arktik

  • Jr. Member
  • **
  • Posts: 46
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #15 on: June 13, 2021, 08:57:29 AM »
Dear FEAP Admin,

I have tested the new update 8.6.1j. The slow convergence problems reported above (in comparison with v8.5) seems to be resolved. Thanks for the support.

However, there seems to be a problem with the log files in parallel. Except the first one Lxxx_0001, log files from other cpus (e.g Lxxx_0002 ... Lxxx_000n) have no output (logs for solution steps are missing).

Prof. S. Govindjee

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 1164
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #16 on: June 13, 2021, 03:22:26 PM »
Thanks.  I have reproduced the 'bug'; though I will note the files should be the same -- except for maybe the timing values on the end of each line.

Prof. S. Govindjee

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 1164
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #17 on: June 13, 2021, 03:28:49 PM »
Actually this is a 'feature' to save disk I/O since the information was the same in all the files.  In pmacr2.f, there is now a check that rank.eq.0 before writing to the Log file.

arktik

  • Jr. Member
  • **
  • Posts: 46
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #18 on: June 14, 2021, 03:00:35 AM »
I totally agree that the other L-files need not be fully printed. Their size gets really large e.g. in case of transient explicit problems over 100s of CPUs. Did not know that this is now an intended feature. Thanks!

arktik

  • Jr. Member
  • **
  • Posts: 46
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #19 on: June 16, 2021, 03:40:14 AM »
In the meantime, I also checked the new version 8.6.1j with a more complicated problem (as mentioned in the first post in this topic). This problem has ~2000 material tags (each one an orthotropic crystal having unique orientation) and ~6 million DOFs. Unfortunately, the latest update also does not show convergence. As I said, the previous version (v8.5.2.i) works correctly.  The following error message is shown
Code: [Select]
NO CONVERGENCE REASON:  Iterations exceeded
Only FEAP standard features are used to solve this problem. I suspect the simple checks performed previously were not enough to trigger a possible bug.

Prof. S. Govindjee

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 1164
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #20 on: June 16, 2021, 12:23:38 PM »
Hmmm...this is going to be challenging to debug with 6M dof.

As I understand, you now have 8.5.2i and 8.6.1j using the same version of openmpi and petsc.

(1) Can you (re)post your petsc logs and the L files from both versions with your 6M dof test case.

(2) Also which FEAP material model are you using?

(3) Do you know if the results are the same up the time step before the problem arises?

arktik

  • Jr. Member
  • **
  • Posts: 46
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #21 on: June 17, 2021, 03:55:52 AM »
8.5.2i and 8.6.1j are not installed with the same version of openmpi and petsc (as mentioned in first post). It's hard to say if that plays any role since 8.6.1j exhibits no other issues when compared to 8.5.2i installation. Regarding the current problem.
  • With v8.5, KSP residual norm reaches tolerance within max iterations (petsc default 10000). With v8.6 KSP residual norm does not reach tolerance at the end of 10000 iterations
  • To make it easy for v8.6, the default petsc tolerances are increased e.g 1e-16 for energy and 1e-8 for residual. Even then v86 doesn't converge
  • Each grain is supposed to be orthotropic elasticplastic (Hill plasticity). However, for comparison I reduced the complexity to elastic isotropic. That means we are eventually solving a linear elastic domain with ~2000 repetitive material tags
  • Since v86 shows no convergence right from start, the L-file is empty. However, I have attached O-file from first processor and petsc log for both versions. Only performing single time increment for testing

Prof. S. Govindjee

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 1164
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #22 on: June 17, 2021, 04:37:21 PM »
Looking at the information you have provided, I see that the solution is getting off to a bad start without even trying a solution.

You can see that the very first residual that you are computing between the two programs is different and this should not be the case -- though I will point out that the technical details of how those residuals are computed differs between 8.5 and 8.6.  Most likely the tangents are different too.

To help focus in it will be helpful to know

(1)  the exact lines you are using to partition your equations

(2)  what happens if you just use one material and not 2000.

(3)  can you start serial FEAP on your problem and run FORM to get the expected residual?  Note this expected residual will be the residual that one expects in parallel if you have output the parallel files using OUTD,AIJ and not OUTD,AIJ,1




arktik

  • Jr. Member
  • **
  • Posts: 46
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #23 on: June 21, 2021, 01:58:19 PM »
My recent troubleshooting didn't give any conclusive results. To your points,

1. Partition is done with
Code: [Select]
BATCh
 GRAPh NODE ncpu
 OUTDomain aij 1 3
END
I used non-flat as well as flat file. Both give diverged solution.

2. I created a block mesh (roughy same DOF) with one material tag, the solution converges and yields correct solution! The original problem uses INCLude statement to get discretized geometry (coordinates and elements) of 2000 grains. May be there is something happening here in new release?

3. The problem is too large that serial FEAP throws memory error (also with DIREct SPARse).


Prof. R.L. Taylor

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 2649
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #24 on: June 21, 2021, 02:14:14 PM »
In your material specification, are all the materials different, or only their parameters?  Do they use history variables? It seems that changes in 8.5 to 8.6 may have something to do with the number of material sets.

Did you try CG with the serial version?

arktik

  • Jr. Member
  • **
  • Posts: 46
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #25 on: June 22, 2021, 05:07:27 AM »
All materials are same (e.g. testing done with elastic isotropic) with different material constants. However, the simplified version (linear elastic isotropic - no history) does not work either for a problem with ca. 2000 'elastic isotropic' grains having 2x10^ DOF.

I also ran serial FEAP (v8.6.1j) with ITERATIOn BPCG. It helped a bit. From **Diverged due to Indefinite Matrix** or *NO Convergence: ITERATIONS exceeded* messages, BPCG shows convergence although very slow (e.g. 10 Newton iterations achieve R_norm ~ 1E-5).

I found out that another BVP which uses just one material tag (a user element) with ca. 10x10^6 DOF does not converge with v8.6.1j. The same BVP and user element works without trouble with v8.5.2i.

Prof. R.L. Taylor

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 2649
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #26 on: June 22, 2021, 07:23:30 AM »
Just out of curiosity, what happens if you run the command CHECk?
The mystery is still why 8.5 works and 8.6 does not?

arktik

  • Jr. Member
  • **
  • Posts: 46
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #27 on: June 22, 2021, 08:18:23 AM »
CHECk shows no red flags. For your information, here is the output of the test problem
Code: [Select]
Restrained Degree-of-Freedom Check
               DOF   Fixed
          ----------------
                 1  816827
                 2  807531
                 3  808627
          ----------------
                On 2000376 Nodes

A brief summary of findings so far:
  • The new release v8.6.1j definitely improved convergence in parallel FEAP compared to v8.6.1h
  • The conclusion in (1) is based on testing simple problems using standard library with DOF < 1E4
  • More complex problems - DOF > 1E6 from INCLude files (mesh) and/or material tags > 1000 fail to converge
  • However, problems with mesh generated from BLOCK when DOF > 1E6 runs without problem
Overall, I am suspecting some issue with using INCLude statement to import coordinates and elements (created for FEAP from third-party programs). But I may be totally wrong!

Prof. R.L. Taylor

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 2649
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #28 on: June 22, 2021, 09:56:43 AM »
The only prob with import is if too many digits for numbers, more than. 15
When you did CHECk were you using a FEAP element or one of yours.  I think check would catch problems by an include

arktik

  • Jr. Member
  • **
  • Posts: 46
Re: Convergence problems with v8.6 (very slow compared to v8.5)
« Reply #29 on: June 22, 2021, 10:16:46 AM »
The CHECk was performed with the standard library element. The digits for numbers in imported mesh are less than 15.