Author Topic: handicraft cluster  (Read 6352 times)

blackbird

  • Full Member
  • ***
  • Posts: 100
handicraft cluster
« on: November 06, 2014, 01:19:09 AM »
Dear FEAP Admin,

while the HPC system of our university is down I tinkered to run parFEAP on a connection of several discharged single machines. I am using FEAP 8.3 on a Ubuntu 14.04 OS. The distribution of the tasks is achieved through the machinefile - IP:number of cores. Additionally there is needed:
- enabled xdmcp (/etc/lightdm/user.conf)
- interchange ssh public key (ssh-keygen, ssh-copy-id)
- each participating computer need to know other computers by name instead of by IP (/etc/hosts)

While these machines live in a seperated IP range (#.#.1.#) everything is fine and they work well together. Now I tried to connect another machine to my "cluster". Unfortunately this one comes from the outside with IP #.#.0.# into the system and everytime I try to call it there is the error:

bash: $PETSC_DIR/$PETSC_ARCH/bin/hydra_pmi_proxy: No such file or directory

Actually I am not quite sure if it is really the IP which is causing the trouble, but it is the only thing left:
- the OS in the machines are clones with adjusted IP, name and ssh-key
- the file hydra_pmi_proxy exists (by a short look inside it seems not human-readable)
- the machines can ping each other
- the problem occurs when parallel calculation is started from #.#.1.# or from #.#.0.#

Did you ever encounter that error? Do you know why this is not working? Can you suggest a workaround?

Greetings, Christian

Prof. S. Govindjee

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 1165
Re: handicraft cluster
« Reply #1 on: November 06, 2014, 08:22:44 AM »
I have not seen this error but it looks to me to be a problem with your message passing interface.  You could try using a different implementation, for example, openmpi.

blackbird

  • Full Member
  • ***
  • Posts: 100
Re: handicraft cluster
« Reply #2 on: November 07, 2014, 05:29:28 AM »
I did not know there is a way to switch to OpenMPI in parFEAP. Do I need additional source files? Do you have a documentation on how to set parFEAP to use OpenMPi?

Prof. S. Govindjee

  • Administrator
  • FEAP Guru
  • *****
  • Posts: 1165
Re: handicraft cluster
« Reply #3 on: November 07, 2014, 07:36:00 AM »
FEAP does not care which MPI implementation you use.  This is selected when you build your version of PETSc.  In other words, try building your PETSc with a different MPI system (see the PETSc installation pages).  Then completely rebuild your parFEAP (delete all object files and archives in the parfeap directory, then rebuild).

blackbird

  • Full Member
  • ***
  • Posts: 100
Re: handicraft cluster
« Reply #4 on: November 11, 2014, 04:24:05 AM »
Dear Prof. S. Govindjee.

I found the solution to my problem. It had nothing to do with MPI version. Instead it was a matter of correct setup of path. While all the machines in #.#.0.# were clones of each other their FEAP was in the same place and could be reached by the same export inside a shell-script. Unfortunately the machine in #.#.1.# had its FEAP executable in a slightly different folder, so whenever this machine was included in calculations the executable was not found. I fixed the name of the directory and now everything is working fine.

Thanks, Christian