I am not too sure what you mean by Kerlnels but if you refer to processors or compute nodes (=multiple processors and dedicated memory on a card), then there is really no limitation. But you should pay attention to the solver that you are using, parfeap requires the use of PETSc or some other parallel solver library that can have their own limitations.
Personally I run simulations with PETSc and there is no hard limitations, that being said, depending on the size of my problem I might want to decide to switch from a direct to an iterative solver and consider sometimes using preconditioniners such as multigrid for example.