Dear FEAP Admin,
while the HPC system of our university is down I tinkered to run parFEAP on a connection of several discharged single machines. I am using FEAP 8.3 on a Ubuntu 14.04 OS. The distribution of the tasks is achieved through the machinefile - IP:number of cores. Additionally there is needed:
- enabled xdmcp (/etc/lightdm/user.conf)
- interchange ssh public key (ssh-keygen, ssh-copy-id)
- each participating computer need to know other computers by name instead of by IP (/etc/hosts)
While these machines live in a seperated IP range (#.#.1.#) everything is fine and they work well together. Now I tried to connect another machine to my "cluster". Unfortunately this one comes from the outside with IP #.#.0.# into the system and everytime I try to call it there is the error:
bash: $PETSC_DIR/$PETSC_ARCH/bin/hydra_pmi_proxy: No such file or directory
Actually I am not quite sure if it is really the IP which is causing the trouble, but it is the only thing left:
- the OS in the machines are clones with adjusted IP, name and ssh-key
- the file hydra_pmi_proxy exists (by a short look inside it seems not human-readable)
- the machines can ping each other
- the problem occurs when parallel calculation is started from #.#.1.# or from #.#.0.#
Did you ever encounter that error? Do you know why this is not working? Can you suggest a workaround?
Greetings, Christian