6 Common Errors When Building a Raspberry Pi Cluster

Though the Internet is a vast resource for getting a Pi cluster up and running, there is no full set of errors that you can refer to when you’re stuck. Many times it happens that the same error on two different clusters can have two totally different meanings, making it even more difficult to solve problems. The issues that come with building these clusters are increasingly relevant as their popularity increases in all parts of the world.

Raspberry Pi clusters are affordable and can be clustered to make a super computer or cloud computing clusters. It can be use for many cool application like desk automation.Raspberry Pi: Make a Bench automation computer
Bobo Cloud, for example, is an open-source cloud service for students built on Raspberry Pis.

As thse clusters become more common, there will be an increasing need for documentation and other resources. This post intends to be a reource for troubleshooting problems. It describes five common types of errors that beginners in this field of Raspberry Pi may encounter and their possible solutions.

Power Issues

Errors:

  • First boot password change is not allowed or password change on first boot hangs the Pi each time.
  • Any time the Raspberry  Pi is fired up, only two of its ports work. For example, only the two USB ports might work and the HDMI port connecting to a terminal and the ethernet port don’t. Or, the  HDMI and one USB work, disabling the rest of the ports on the device.
  • Midway into operating, the Raspberry Pi restarts and continues to restart repeatedly at irregular intervals.
  • All ports work fine but the Pi stops responding to key presses.

Problem and solution:

These problems are caused by lack of power to the Pi. Raspberry Pi is designed to run on low power but when the power supply  goes much lower than required, it works but does not perform at its full capacity.

An ideal power supply for Raspberry Pi model B is 5v, 2A . Though most of the mobile adapters are used to power the Pi, most of the times it has much lesser power rating which leads to the above problems.

Also, if the adapter is built to send the power through a micro USB cord, low-quality adapters cause loss during power transmission so that the full 5V is not delivered.

You might be able to use a workaround for some of the above errors, like for detecting the key presses of a keyboard by changing configuration of the speed of USB transfers, but doing so merely delays the onset of the other errors. Instead, get a good quality adapter to fix these errors.

Overlapping MPI Problems

Errors:

  • mpiexec crash
  • ssh error: error passing parameters

Problem and solution:

These errors occur generally due to the overlapping of multiple MPI distributions. The most commonly used and ideal distributions are OpenMPI and MPICH. Linux generally uses MPICH.

When you install packages directly to the system with one of the MPI distributions, overlapping occurs leading to the corruption of mpiexec and behavior of mpicc. For example, direct installation of Python packages to run with MPI or multiple installations of the same MPI distribution (shared and unshared) leads to the  clashing. This happens because direct installation sometimes doesn’t check the full compatibility. It just checks for the dependencies and if that check is passed, the packages get downloaded and installed. Unknowingly these packages might install the MPI distribution, even if you’ve already installed an MPI distribution. It might also change the system path for MPI. This can corrupt the entire MPI installation.

The solution is to build each package manually. Installation guides specific to each
MPI distribution are typically available.  If not or if a manual build is not possible, make sure you have a restore point created before the installation so that any corruption can be backtracked

Hostname Issues

Errors:

  • hostname not resolved
  • $pi@(none):

Problem and solution:

This error was one of the most confusing to me. Why? Because a node wouldn’t know its hostname and it would give this error but would perform the task given to it 60 percent of the time. This occurs only if the hostname is changed from its default to something the user would want so that he/she can differentiate between the nodes.

The “hostname not resolved” error can be cleared by changing the hostnames in two places:

  • sudo nano /etc/hostname
  • sudo nano /etc/hosts

Run the above commands, each of  which opens a file. In that file, change the default

hostname to the hostname that is required.

Finally The  pi@(none) or (none) hostname errors occur when the hostname given has
the ‘-’ (hyphen symbol)  or any other symbol. The solution is to edit both the files mentioned above and replace the illegal symbol for the hostname with an
‘_’   (underscore symbol).        

Below are two images of  file /etc/hosts before and after changing  hostname from
“akshay” to “akshay_001.”

 

Rpi1

Rpi2
    

The same has to be done for the other file. Save and reboot for the changes to be applied successfully.

HDMI Port Problems

Errors:

  • insufficient ports for connecting a display terminal
  • choosing the right display

Problem and solution:

More than any error this is a difficulty which many beginners might face. It happens
when you don’t have an HDMI port on a terminal/monitor or  when you want to connect a different display to the Raspberry Pi.

The solution lies in the availability of materials. The many options for connecting
the display are:

  • Use a display with an HDMI port.
  • Use an HDMI to VGA converter and connect it to the display.
  • Use an Ethernet cable to connect to a laptop’s display (this is recommended if you don’t need the Ethernet port for a LAN connection).
  • Use any computer or laptop’s display wirelessly using SSH. When SSH has been configured on both laptop and the Raspberry Pis, then it can be connected remotely, which means none of the ports on the device will be required.

Repeat Login Problems

Errors:

  • Connection to another node through SSH fails.
  • Login credentials required for each remote login to other nodes.
  • Error: RSA key not safe.
  • Warning: unprotected private key file.

Problem and solution:

The main aim for using SSH is what many computer experts call “secure gateway without login.” What this means is that you have to provide the login credential only for the first time you log in to the other node.  If you are asked to input credentials each time  you log into a node, it means something is wrong. In fact, it’s a disaster if the cluster has around 64 nodes and login is needed for each of them. This error generally occurs when the file containing the private key is copied to another location or its access permission has been tampered with. Another possible reason is that the hostname is not correctly configured, in which case SSH is not sure whether that node exists in its network or not. This leads the SSH to believe that the default setting no longer  exists and concludes that the network is not safe and secure. Hence, these errors.

I fixed these errors by changing the access permissions to only read and execute by the owner and no permissions to the group or  to others. If the hostnames are not configured properly, use the solution given for Hostname Issues above.

Shared Libraries Problem:

Errors:

  •  Build failed : mpicc not found.
  • –enable-shared option not recognized.

Problem and solution:

We had experienced these errors because of two reasons. Firstly, we had already  built mpich2 without shared libraries. And then tried to build a shared version parallel to it. Now , theoretically, it shouldn’t be a problem and many papers published with regard to this suggest that it is possible to build  mpich2  with shared libraries parallel to  the version without shared libraries.  But turns out that, with the newer versions of mpich it doesn’t work.

For some users only the –enable-shared option might not be recognized. This is because the build script is not able to locate the path of mpicc or some other dependencies.

The solution is to build only the mpich with shared library.  And its advisable to build it in the /usr/local directory of the Raspbian OS. So this removes any ambiguity  in PATH information and leads to a successful build. It is important to know that having shared library is only important when there is a need for a dynamic library(.so) and not a static library(.a) . So if there any need to load libraries at runtime, mpich should be configured with shared libraries.

Advertisements

4 thoughts on “6 Common Errors When Building a Raspberry Pi Cluster

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s