Author Topic: handicraft cluster (Read 6352 times)

blackbird · « **on:** November 06, 2014, 01:19:09 AM »

Dear FEAP Admin,

while the HPC system of our university is down I tinkered to run parFEAP on a connection of several discharged single machines. I am using FEAP 8.3 on a Ubuntu 14.04 OS. The distribution of the tasks is achieved through the machinefile - IP:number of cores. Additionally there is needed:
- enabled xdmcp (/etc/lightdm/user.conf)
- interchange ssh public key (ssh-keygen, ssh-copy-id)
- each participating computer need to know other computers by name instead of by IP (/etc/hosts)

While these machines live in a seperated IP range (#.#.1.#) everything is fine and they work well together. Now I tried to connect another machine to my "cluster". Unfortunately this one comes from the outside with IP #.#.0.# into the system and everytime I try to call it there is the error:

bash: $PETSC_DIR/$PETSC_ARCH/bin/hydra_pmi_proxy: No such file or directory

Actually I am not quite sure if it is really the IP which is causing the trouble, but it is the only thing left:
- the OS in the machines are clones with adjusted IP, name and ssh-key
- the file hydra_pmi_proxy exists (by a short look inside it seems not human-readable)
- the machines can ping each other
- the problem occurs when parallel calculation is started from #.#.1.# or from #.#.0.#

Did you ever encounter that error? Do you know why this is not working? Can you suggest a workaround?

Greetings, Christian

Prof. S. Govindjee · « **Reply #1 on:** November 06, 2014, 08:22:44 AM »

I have not seen this error but it looks to me to be a problem with your message passing interface. You could try using a different implementation, for example, openmpi.

blackbird · « **Reply #2 on:** November 07, 2014, 05:29:28 AM »

I did not know there is a way to switch to OpenMPI in parFEAP. Do I need additional source files? Do you have a documentation on how to set parFEAP to use OpenMPi?

Prof. S. Govindjee · « **Reply #3 on:** November 07, 2014, 07:36:00 AM »

FEAP does not care which MPI implementation you use. This is selected when you build your version of PETSc. In other words, try building your PETSc with a different MPI system (see the PETSc installation pages). Then completely rebuild your parFEAP (delete all object files and archives in the parfeap directory, then rebuild).

blackbird · « **Reply #4 on:** November 11, 2014, 04:24:05 AM »

Dear Prof. S. Govindjee.

I found the solution to my problem. It had nothing to do with MPI version. Instead it was a matter of correct setup of path. While all the machines in #.#.0.# were clones of each other their FEAP was in the same place and could be reached by the same export inside a shell-script. Unfortunately the machine in #.#.1.# had its FEAP executable in a slightly different folder, so whenever this machine was included in calculations the executable was not found. I fixed the name of the directory and now everything is working fine.

Thanks, Christian

FEAP User Forum

News:

Author Topic: handicraft cluster (Read 6352 times)

blackbird

handicraft cluster

Prof. S. Govindjee

Re: handicraft cluster

blackbird

Re: handicraft cluster

Prof. S. Govindjee

Re: handicraft cluster

blackbird

Re: handicraft cluster