My ongoing troubleshooting with parallel v8.6.1j has shed some new light on the possible origins of the bug. I haven't yet incorporated debugging with valgrind. So far it's more of a mechanistic debugging

1. Two identical BVPs (DOF>1E6) are tested: one where mesh is generated with FEAP
BLOCk and second where mesh is imported with
INCLude using 3rd party programs. Both work correctly with standard library as well as user element. Therefore, a potential bug in
INCLUde can be ruled out.
2. Partitioning done with
v8.5.2 is slightly different from
v8.6.1. E.g.
v8.5.2 prints
EREGions in partitioned files which is missing in
v8.6.1.
3. Series of identical BVPs with increasing number of grains (=material tags) are performed. When grains (material tags) > 999, user element (solid3d with ndf=4) throws the following error
Material Number1000: Element Type: user : ELMT =***
Element Material Set = 1
*ERROR* ELMLIB: Element: 0, type number*** input, isw = 1
RANK = 0
Feap standard library (3d elastic orthotropic) does not throw this error but simply stops converging.
4. For grains (material tags) <=999 standard library and user elements work correctly doesn't matter how many DOF.
Somehow the argument
jel to
program/elmlib.f seems to be corrupted when material tags > 999. One possible explanation as to how this corruption leads to divergence is that elements with material tags > 999 have garbage properties. This I haven't tested yet.