Manipulating image pixels with python scikit image – color schemes

Scikit Image is an image processing library in python. It is a huge collection of algorithms and available for free of cost. This library can perform a variety of complex image processing like denoising, color scheme changing, image stabilization, perspective correction, edge and contour detection, etc. There are many powerful techniques that can be using in manipulating image pixels. Being the first article in the series of image processing articles with Scikit Image, in this tutorial, we will start with the basics of changing the color spaces of images.

Color space indicates the color scheme used by that image. Most of the images are either in RGB (Red, Green, Blue) color space as in image 1 shown below.

lena1

Interesting Fact: This image called the lena is one of the standard images used for image processing since 1973 and it is a picture of Lena Söderberg, shot by photographer Dwight Hooker. This appeared in the November 1972 issue of Playboy magazine.
Source: Wikipedia

We will use this as the testing image for the rest of the tutorial. all the color schemes or scales belong to the scikit’s color package. Before we get to the color schemes, we’ll define a way to read an image with scikit.

1. Read, write and show Image using scikit

Images in scikit are in the form of numpy arrays.  You can use scikit to read an image as a numpy array, apply the algorithms and write the arrays back as an image. Use the following code to read/ write an image.

from skimage.io import imread
from skimage.io import imsave

# read the image
inp_image = imread("/home/akshay/lena.png")
# replace the path above with the absolute path of the image you want to read

#write image back to file

#parameter 1: path where the image has to be saved.
#parameter 2: the array of the image.
imsave("/home/akshay/new_lena.png",inp_image)

If the image is not in the form of integer values, it cannot be saved. This happens when you apply some thresholding or manipulating functions. In such a case, you will have to first convert the image values to integer and then save it. The following code will help you do just that:

from skimage import img_as_uint

imsave("/home/akshay/new_lena.png",img_as_uint(inp_image)) #img_as_uint is the secret to correctly save images that cannot be saved directly!!

you can also see the image that you have read of just about to write by calling the imshow method as shown below:

from skimage.io import imshow, show

imshow(inp_image,'matplotlib')
show()
# the 'matplotlib' is used to tell the viewer to use matplotlib plugin while plotting the image.

Now that we know how to read, write and see images programmatically, we can start with the techniques to change color scales.

2. RGB to Gray (or Grey)

The RGB is a 3 channel color scheme with Red, Blue and Green channels whereas the Gray or Grey is a 2 channel scheme.

from skimage.io import imread, imsave
from skimage.color import rgb2gray, rgb2grey

inp_image = imread("/home/akshay/lena.png")

img_gray = rgb2gray(inp_image) # rgb2grey(inp_image) can also be used

imsave("/home/akshay/lena_gray.png",img_gray)

the images below show the image before and after conversion

lena_gray

This technique is used as a pre-processing step in many image processing techniques like thresholding/ binarization. This is very useful when you do not need a 3 channel pixel like (120,140,30) for R, G, and B respectively but have just one value like (133) which represents the gray channel.

2. Gray (or grey) to RGB

There are cases when you have a single channel grayscale image and you want to convert into a 3 channel image, then it can be done by using the gray2rgb method of scikitImage. What you must understand is that this does not convert a grayscale image to color. that is a black and white image, remains black and white image after conversion. The only thing that changes is that, before the conversion happens, each pixel are represented as a single value, example: (120) and after conversion, it is represented in 3 channels. Example: (120,120,120). Thus, you will have an RGB image.

from skimage.color import gray2rgb

inp_img2 = imread("/home/akshay/lena_gray.png")
rgb_img = gray2rgb(inp_img2)
imsave("/home/akshay/lena_gray2color.png",rgb_img)

the images below are before and after applying the conversion. As you can see, there is no visual change.

lena-gray2

NewYearsPromoTechie-10usd728x90

3. RGB to HSV

HSV color space stands for Hue, Saturation, and Value. This is also called the HSI or HSB color space where ‘I’ indicates Intensity and ‘B’ indicates Brightness. The HSV color space is used a lot in the artists’ community or design community who want to understand and visually see the lighting intensity or the hue present in a given image.  Some of the image processing tasks that rely on intensities need the image to be in the HSV color space. Here is how you can convert a RGB image to HSV.

from skimage.color import  rgb2hsv

hsv_img = rgb2hsv(inp_image)
imsave("/home/akshay/lena_hsv.png",hsv_img)

The images below show the image before and after applying the filter:

lena-3

4. Binarization or Thresholding

Binarization, a.k.a Thresholding refers to converting an image of any type to a binary image. Each pixel in a binary can be either black (0) or white(1).  It is important to note that for binarizing an image, the image should be in grayscale first. So we start with a RGB image and then we convert it to grayscale before applying the thresholding to it.

from skimage.color import rgb2gray
from skimage.io import imread, imsave
from skimage.filters import threshold_otsu
from skimage import img_as_uint

inp_image = imread("/home/akshay/lena.png")
img_gray = rgb2gray(inp_image)

thresh = threshold_otsu(img_gray)
binary_thresh_img = img_gray > thresh

imsave("/home/akshay/lena_thresh.png", img_as_uint(binary_thresh_img))

You might have observed that while saving the image, we are using the “img_as_uint” method to save the thresholded image. This is done because, when the image has been binarized, it is in the form of non-integer values and thus cannot be saved. So by using the “img_as_uint” we are converting the image values to unsigned integers which can then be saved by scikitImage.

Following image shows the effect of applying the OTSU_thresholding on the image. Thresholding is used for many image based solutions like Optical Character Recognition, template recognition, etc.

lena4

Advertisements

Django, python web-framework installation

Django is a Python web framework. It helps you rapidly build high performance and efficient web applications. It’s very much liked by the developer community because of some of its amazing features like template system , URL design , etc.  Django supports both Python 2.7.x and Python 3.x . Some of the famous web applications  built using Django are :

  • Instagram – A photo sharing app for android and IOS.
  • Matplotlib – A powerful python 2D plotting library.
  • Pinterest – a virtual pin board to share things you find on the web
  • Mozilla –  creators of firefox ,browser and OS.

And many many more. This encouraged me to start learning Django and try building my own   web application. But when I started searching for resources . I found it difficult as a beginner to find and install what I needed to get it up and running.

I was confused because of the verity of choices I that was there to install and set it up. But after a lot of searching and experimenting,  I found one straightforward method , which is good enough for a beginner.

It’s important you know how to program in python so that it helps you to build awesome applications quickly. There are a lot of amazing free courses to learn python that you can make use of.

So, Here are the steps to install it.

1 Installing Python

The first step is to install python. Generally most linux OS have python 2.7 installed by default. To check if it exists, use the following command:


python --version

you may get an output similar to

Python 2.7.6

or any other version installed. If not, then, it can be downloaded from HERE.

2 Installing a database system (SQLite)

Since most of the web applications need a database and querying has to be done upon it, it’s better to have a database setup on your system. Django provides the usage of database engines like PostgreSQL, MySQL, SQLite, Oracle. Its very simple to learn using a DB with python. Knowing this  gives you an added advantage in landing your next high paying job.

SQLite is a database we can use, it is a light weight database and its good enough to begin with. For any simple web applications that you develop, you can use SQLite itself and later upgrade it to suit your needs. So, to install SQLite, use the following command:


sudo apt-get install sqlite

Please do note that in some linux systems SQLite is preinstalled along with python, in such cases, the above command can be ignored.

3 Installing pip and easy_install

Any previous versions of Django if existing has to be removed. But if you have pip or easy_install for installation then you don’t have to worry about removing the previous versions because the pip or easy_install will do it for you. So, install both of them by using the commands:


sudo apt-get install python-setuptools

The above command installs the required python setup tools along with easy_install. Most of the cases, “pip” is preinstalled. If in any case it isn’t, install pip as given in the official documentations HERE.

Before proceeding, confirm that python, SQLite, pip and easy_install has been installed. To do so, use the commands one after another given in the image below and the output of each command should be similar(not same) as shown in the image below.

4 Installing a virtual environment

In this step, we install a “Virtual Environment.” After a lot of searching and testing, I found that Django can be run very easily on a virtual environment. A virtual environment is created to encapsulate all the data and resources required to run Django at one place so that all the changes made remain in that environment itself. Another important benefit of the virtual environment is that it supports the light weight web server provided by Django by default. This allows the installation and integration of apache server to be avoided.

One of the easiest way to install virtual environment on linux is by using the “easy_install” command. This script comes with a package called python-setuptools which we have installed in a previous step. So now, we can install the environment using the following command:


sudo easy_install virtualenv

Be patient, as it may take some time depending on the speed of the internet. When finished, the terminal output should be similar to the image below.

5 Creating and setting up the virtual environment

Now we create a folder using virtualenv so that the folder can act as the virtual environment to contain Django. Type the following command in the terminal:

virtualenv --no-site-packages django-user

Here django-user is the folder that will be created and used as the environment. It will be created under the directory you are currently in. Now to start the environment use the command:


source django-user/bin/activate

Now if you see your folder name

(django-user)

at the beginning of the prompt , it means that the environment is started. Refer to the image below.

Navigate to the folder django-user using the command.


cd django-user

Upon listing the items in the folder using the “ls” command, you will be able to see directories like bin, lib, include, local. So what this virtual environment does is that any command or operation performed in the environment will not affect anything outside the environment. So the changes are isolated and this allows us to easily create as many environments as we want and test many things very easily.
<h2 id=”-installing-the-django-framework”>6 Installing the Django framework</h2>
The final step is installing Django within this environment that we have created in the previous step. Remember that you still have to be in the virtual environment in the django-user folder else django will be installed outside the environmant and cannot be used. To install Django use the command:

easy_install django

As a reference, view the following image. Note that the beginning of the prompt says (django-user) which means that you are currently in the virtual environment and before installing django, you should be within the “django-user” directory. This is very important.

Thats it! Django is installed on your system with all required functionality for beginners to develop and learn the framework. Now you can go ahead and try out the DJANGO tutorial to learn the different functionalities and run your first web app. You can find the tutorial in the official Django documentation HERE.

Learn from the best in the market, Use this offer to get all courses for $10 on Udemy.

 

9 Free Udemy courses to learn python

Python is a great programming language and I just love it. It’s easy to learn and it’s widely used in various fields of technology.

Right from web to backend applications, python has its place. It is also at the forefront of machine learning field, taking artificial intelligence and data analytics to a whole new level. Now is the right time to learn python and decided to gather all the free Udemy courses to help you get started.

Once you check these courses out, leave a comment sharing the course you have picked to start with and the reason for it.

If the courses that you want aren’t free, then don’t worry. We have a Sitewide-10dollars offer for a limited time.

NOTE : As most of the sale is over, these courses are NO longer free. You can still go ahead and purchase them as they are some of the best courses for python available online.

So,  here are top 9 free Udemy courses to learn python:

I) Python course for beginners.From Scratch to Expert

  1. The Python Bible Everything You Need to Program in Python

    python bible

II) Beginner’s course with examples and small projects

  1. Python Training for Beginners – Learn Python with Exercises

    python with examples.png

  2. Learn Python, it’s CAKE (Beginners)

    cake walk python

III) Python With Networking and network Programming

  1. Python Network Programming – Part 1: Build 7 Python Apps

    python networking

IV) Python on the web

  1. Python Web Programming

    python final

V) Introduction to python  and basics for Beginners

  1. Python Tutorial for Absolute Beginners

    absl beginners

  2. Introduction To Python ProgrammingIntroduction To Python for beginners

    intro to python

  3. Learn Python for Beginners!

    beginning python

  4. Python 1000: The Python Primer

    python primer

Are you an intermediate  or advanced python developer ?

The courses above though free are mostly aimed at beginners and structured that way. If you are serious about getting better at python then these courses will not help much. They will teach you what you already know.  So, here are some paid courses  that you can invest for yourself to get the best new knowledge in programming in python.

  1. Intermediate course for python 1:
  2. Intermediate course for python 2 :

Now that you have got a chance to view all the free python courses, comment on the one which you have picked and do share the reason for selecting it.

Blog series Idea : Image processing in python

I am planning a series on image processing with python. The image processing would be helpful in machine learning tasks such as computer vision. It can also be used as a pre-processing step in applications like Optical CharacterRecognitions.

Following are the ideas I have. Please share the ones  you think are important or useful in the comments below.

  1. Image processing using Scikit-Image : This library has a lot of rich algorithms that help to process images. This involves tutorials on  blurring, smoothening, de-noising, enhancing , histogram and perspective corrections and many more.
  2. Image processing using Open CV :  This would involve things like background and foreground separation, Identify faces in an image, etc.
  3. Using numpy with a java Image processing library called ( Lucene Image retrieval ) for image features manipulation and representation. Various algorithms that work on global as well as local levels of an image can be applied to extract the image features.
  4. using PIL for pixel extraction :  PIL or pillow is used for performing various operations on images.  RGB values extraction and other features like editing , resizing can be performed using PIL.
  5. ImageMagick : A linux tool, used to  convert images from one format to another. Also to generate logos, or resize/ scale or any other image manipulation APIs are provided by Imagemagick.

 

Apart from these , If you are interested in learning any other library or certain specific topics, comment and let us know and we will include it in our plan to publish articles related to that.

Build your own steganography tool with python

Steganography is the process of hiding text or files like images, documents etc  within another  file like an image, audio, some other text, etc. This technique was used by many groups of people to hide and send a secret message so that only The person to whom it would be delivered could know what it contains. To extract the message from the hidden files many different tools can be used.

Here is an example of steganography. Person A hides his personal details within a message using a steganography tool. Only he knows that an image has some text hidden within it. Anyone else who gets that image can only see the image but will not even have a clue that it contains some data. Now person A can get back his data using the same steganographic tool in reverse order.

Steganography is different from encryption. In the sense that, in encryption, no data is hidden but only converted or transformed into some other form of data depending on the algorithm used. But here we actually hide the data. Its up to the user whether he wants to encrypt it or not before he hides it.

I have built a simple tool , you can either use that to improve it or use the idea to build something better. So read further to learn more.

I wanted to try out steganography  just like I do other projects in my free time. so I started with a project to create a steganography tool that would hide messages within a data. This tool is completely built in python
Here I call the data to be hidden as “message” and the data in which the message is hidden “base data”. So the base data that I chose is the lorem ipsum text.

Lorem Ipsum is generally used in the typeset industry to check  the layout and font look. It is also used to randomly fill, in templates with data. So these sentences though widely used do not have any meaning.The reason I chose lorem ipsum as my base data is if configured right I would be able to send messages within template like format so no one would suspect that it would contain hidden messages. For a layman, it’s just some lorem ipsum data put up to see how a website looks when it gets real data.

So finally my project when finished, it could accept a data ,generate lorem ipsum sentences and hide data within it. It would also generate a key which would help the user to extract the message from the base data. This key is different for each message and also I encrypted the key. This tool also provides the option to store the key in the same file that contains the base data or the user can make a note of the key and send it to the recipient by some other means. Thus without that key, the data cannot be extracted.

the source code can be found on  github : https://github.com/akshaypai/Stegano

Here is an example where I hide the message “may the force be with you” 😛 within the base data.

Workspace 1_004

As you can see, it generates a key that you have to decide whether to manually store it somewhere or store it in the file with the base data.
below is the screenshot to show the base data in which the message is hidden.

Screenshot from 2014-02-22 11:46:23

So within that paragraph lies hidden the message.I also created a

I also created a visualization to demonstrate how the data will be stored. Here I have used 2 colors alternatively to differentiate between words. So, if the letters of the  first word are showed in green then the letters of the second word will be in blue and so on it will alternate.

Screenshot from 2014-02-22 11:44:43

Find synonyms and hyponyms using Python nltk and WordNet​

What are Wordnet, Hyponyms, and synonyms?

Wordnet is a large collection of words and vocabulary from the English language that are related to each other and  are grouped in some way. That’s the reason WordNet is also called a lexical database.

WordNet groups nouns, adjectives, verbs which are similar and calls them synsets or synonyms. A group of synsets might belong to some other synset. For example, the synsets “Brick”   and “concrete” belong to the synset “Construction Materials” or the synset “Brick” also belongs to another synset called “brickwork ” . In the example given, brick and concrete are called hyponyms of  synset construction materials and also the synsets  construction material and brickwork are called synonyms.

You can imagine wordnet as a tree, where synonyms are nodes on the same level and hyponyms are nodes lower than the current node.

What is nltk ?

Natural Language Toolkit (NLTK)  is a python library to process human language. Not only does it have various features to help in natural language processing, it also comes with a lot of data and corpus that can be used. Wordnet is one such corpus provided by nltk data.

How to install nltk and Wordnet  ?

To install nltk on Linux and Mac, just run the following command :


sudo pip install nltk

For full installation details and installation on other platforms visit their official installation page.

Once nltk is downloaded, you can download wordnet using the nltk data interface. Follow the instructions given here.

How do you find all the synonyms and hyponyms of a given word ?

We can use the  downloaded data along with nltk API to fetch the synonyms of a given word directly. To fetch all the hyponyms of a word, we would have to recursively  navigate to each node  and its synonyms in the wordnet hierarchy.  Here is a python script to do that.

  • Get all synonyms or Thesaurus  for a given word

    from nltk.corpus import wordnet as wn
    input_word = raw_input("Enter word to get different meanings: ")
    
    for i,j in enumerate(wn.synsets(input_word)):
    print "Meaning",i, "NLTK ID:", j.name()
    print "Definition:",j.definition()
    print
    
    

    Following example finds the synoyms/ synsets for the word car:

    wordnetPic1

  • Get all the hyponyms and hypernyms for a given word

    
    from nltk.corpus import wordnet as wn
    from itertools import chain
    
    input_word = raw_input("Enter word to get hyponyms and hypernyms: ")
    
    for i,j in enumerate(wn.synsets('dog')):
    print "Meaning",i, "NLTK ID:", j.name()
    print "Hypernyms:", ", ".join(list(chain(*[l.lemma_names() for l in j.hypernyms()])))
    print "Hyponyms:", ", ".join(list(chain(*[l.lemma_names() for l in j.hyponyms()])))
    print
    
    

    hypernyms are nothing but synsets above a given word. Getting all the hypo and hypernyms are  also called ontology of a word. In the following example, the ontology for the  word car is extracted.

    wordnetPic2

  • Get all Hyponyms with synsetID

    each synset has an Id  which is nothing but the offset of that particular word in the list of all words. If you know the Id of a synset and want to find out the id of all the hyponyms instead of meanings and definitions, you can do this:

    
    from nltk.corpus import wordnet as wn
    
    X = []
    
    id = int(raw_input("enter synset ID: "))
    wr = wn._synset_from_pos_and_offset('n',id)
    
    def traverse(wr):
    if(len(wr.hyponyms()) ==0):
    X.append(wr.offset())
    else:
    list_hypo = wr.hyponyms()
    for each_hypo in list_hypo:
    traverse(each_hypo)
    
    traverse(wr)
    print X
    
    

    wordnetPic3

Source : StackOverflow

Hitchhiker’s guide to learning python

This blog post will be a guide to python resource, right from where you can start learning this amazing  language to finding resources to c solve complex problems in the field of computer vision, big data , Natural Language processing, etc.

My aim is to refine these post to  make it better each week, add more resources, add more information and eventually create a path for people to choose from. But, its gonna take time to reach there. Till then , I hope this continues to help you.

Please feel free to add your suggestions in the comments.

This post will be forever growing, so come back each week, to find more resources :

1) Where to begin learning Python ?

There are tons of resources out there but very few that teach you to use this language in the right way while showing you the power it has. Here is a list that has resonated well with me:

  • Head First Python  :  For absolute beginners who want to have a taste of all the things python can do , this is the right book . The advantages of learning from this book are:
    • It uses a project based approach, so you can see your progress visually with what you’ve achieved so far.
    • It dives straight to programming from the first chapter and has exercises in between chapters to help you make sure you understand concepts.
    • It covers various fields like standalone applications, web applications, mobile applications. So , you know the capability  of python and where it can be used.
    • By the end of this book, you will be able to build applications on our own with very little help

The disadvantage of this is that, it’s not for people who prefer in-depth    and in-
detail explanation of each concept.

If this is the right book for you, you can purchase it here :

  • Think Python :  ( How to think like a computer scientist ) : This is a free e-book designed for those people who like to master the core-concepts, the syntax and features available in python.   This focuses on introducing you to different programming concepts and how they can be effectively implemented.  This book gets into each aspect of programming be it recursion or inheritance, etc in much detail than the “Head first Python ” book.

There is a hard cover book(link below) , the latest edition has more in depth
explanation, resources and updated with many more examples and real use case
scenarios.

  Advantages : 

  1. Teaches in depth, the concepts and efficient programming paradigms
  2. Uses a scientific approach by providing resources to algorithms and efficient data structures implementations
  3. Provides a guide to tools and libraries for  mathematical computations, and also insights to data-analysis

You can buy the paperback  from here :

 

 

 

Distributed parallel programming in Python : MPI4PY

1 Introduction

MPI stands for Message passing interface. An implementation of MPI such as MPICH” or OpenMPI is used to create a platform to write parallel programs in a distributed system such as a Linux cluster with distributed memory. Generally the platform built allows programming in C using the MPI standard. So in order to run Parallel programs in this environment in python, we need to make use of a module called MPI4py which means “MPI for Python”. This module provides standard functions to do tasks such as get the rank of processors, send and receive messages/ data from various nodes in the clusters. It allows the program to be parallely executed with messages being passed between nodes. It is important that MPIch2 and MPI4py is installed in your system. So, if you haven’t installed MPI4Py, following are 2 guides to refer to for installing, building and testing a sample program in MPI4PY.

https://seethesource.wordpress.com/2015/01/05/raspberypi-hacks-part1/
https://seethesource.wordpress.com/2015/01/14/raspberypi-hacks-part2/

Once MPI4PY is installed, you can start programming in it. This tutorial covers the various important functions provide by MPI4PY like sending-receiving messages, scattering and gathering data and broadcastingmessage and how it can be used by providing examples. Using these information, it is possible to build scalable efficient distributed parallel programs in Python. So, let’s begin.

2 Sending and receiving Messages

Communication in mpi4py is done using the send() and he recv() methods. As the name suggests, it is used to send and receive messages from nodes respectively.

2.1 Introduction to send()

The general syntax of this function is: comm.send(data,dest)

here “data” can be any data/message which has to be sent to another node and “dest” indicates the process rank of node(s) to send it to.

Example: comm.send((rank+1)*5,dest=1).
This sends the message “(rank+1)*5” to the node with process rank=1. So only that node can receive it.

2.2 Introduction to recv()

The general syntax of this function is: comm.recv(source)

This tells a particular process to receive data/message only from the process with rank mentioned in “source” parameter.

Example: comm.recv(source=1)
This receives the message only from a process with rank=1.

2.3 Example with simple send() and recv()

if rank==0 :
	    data= (rank+1)*5
	    comm.send(data,dest=1)
if rank==1:
	    data=comm.recv(source-0)
	    print data

(For full implementation program refer to Example1.py)

[Download Example1.py]

2.4 Notes

  • When a node is running the recv() method, it waits till it receives some data from the expected source. Once it receives some data, it continues with the rest of the program.
  • Here, the “dest” parameter in send() and “source” parameter in recv() need not have just a constant value (or rank), it can be an expression.
  • The “size” member of “comm” object is a good way to conditionalize send() and receive() methods and this leads us to have dynamic sending and receiving of messages.

2.5 Sending and receiving dynamically

Dynamic transfer of data is far more useful as it allows data to be sent and received by multiple nodes at once and decision to transfer can be done depending on particular situations and thus this increases the flexibility dramatically.

2.6 Example of dynamic sending and receiving of data

comm.send(data_shared,dest=(rank*2)%size)
comm.recv(source=(rank-3)%size)

The above two statements are dynamic because, the data to be sent and also who it has to be sent to depends on the value substituted by rank and size , which are dynamically happen and so this eliminates the need for hard-coding the values. The recv() method, however, receives only one message even though its qualified to receive many of them, so only the first message it receives, it services and continues to the next statement in the program.

(for full implementaion refer to Example2.py)

[Download Example2.py]

3 Tagged send() and recv() functions

When we tag the send() and recv(), we can guarantee the order of receiving of messages, thus we can be sure that one message will be delivered before another

During dynamic transfer of data, situations arise where, we need a particular send() to match a particular recv() to achieve a kind of synchronization. This can be done using the “tag” parameter in both send() and recv().

For example a send() can look like : comm.send(shared_data,dest=2,tag=1) and a matching recv() to the above statement would look like: comm.recv(source=1,tag=1)

So, this structure forces a match, leading to synchronization of data transfers. The advantage of tagging is that a recv() can be made to wait till it receives data from a corresponding send() with the expected tag. But, this has to be used with extreme care as it can lead to a deadlock state.

3.1 Example

  If rank==0:
	shared_data1 = 23
	comm.send(shared_data1,dest=3,tag=1)
	shared_data2 = 34
	comm.send(shared_data2,dest=3,tag=2)
  if rank==3:
	recv_data1 = comm.recv(source=0,tag=2)
	print  recv_data1
	recv_data2 = comm.recv(source=0,tag=1)
	print  recv_data2

The output of this would look like:

34
23

Thus, we can see that even though shared_data1 was sent first the first recv() waited for the send() with tag=2 and received the data , printed it and forwarded it to the next recv() method.

(For full implementations refer to Example3.py)

[Download Example3.py]

to view full post visit here.

Python for Pi cluster Part 2: testing mpi4py and running MPI programs with python

The previous post demonstrates how we can go about building mpi4py so that we can write and run python programs using MPICH. So once the mpi4py is built and is installed, it has to be tested.

Here, it is assumed that you have a machinefile that stores the IP addresses of all the nodes in the network. This will be used by the MPICH to communicate and send/receive messages between various nodes.

In the extracted folder mpi4py, is another folder named demo. The demo folder has many python programs that can be run to test the working of mpi4py.

Initially a good, testing program is the helloworld.py. The procedure to run it is:

cd ~/mpi4py/demo

mpiexec –np 4 –machinefile ~/mpitest/machinefile python helloworld.py

 

output:

latest blog2

So if the output looks similar as above and all the nodes have been included, then it works.

Please note that ~/mpi4py/demo is the path to mpi4py on my system and to be replaced with the one in yours. Same is the case with path to the machinefile.

There are other programs in the demo folder that can be used. For example,

 

There are some benchmark programs created by the Ohio State University. They are :

  • osu_bw.py :  This program calculates bandwidth where, the master node sends out a series of fixed size messages to other nodes, and the receiver sends a reply only after all the messages is received. So the master node calculates the bandwidth based on the time elapsed and bytes sent by the user.
  • osu_bibw.py : This program  is similar to the above one but both nodes are involved in sending and receicing a series of messages.
  • osu_latency.py : This program when run send mesages to various nodes and waits for a reply from them. This occurs various number of times and the latency is calculated.

These are many other programs in the demo folder that can be tested. All of these programs can be run in a similar way that helloworld.py was run.

latest blog3

 

Once the testing is done, programs compatible with MPI can be written using python.The way in which MPI programs work is that, all the nodes in the cluster should have the same program. So every processor runs the program but depending on conditions it executes only a part of the program, so this allows parallel executions.

This also means that we can write 2 different programs and give it the same name and share store each of such program on different nodes. and run them. So this can be used to create a server program and store it on the master node and another program can be written as the client program and stored with the same name  on worker nodes.


A sample MPI program:

latest blog4

 

 

So the above program has a communicator that contains all kinds of methods and process information and its called MPI.COMM_WORLD. Its various features are:

  • comm.rank : It gives the rank of the process running on that processor or node.
  • comm.size : It provides the number of nodes in the cluster
  • comm.get_processor_name() : It gives the name of the processor on which a particular process is running.
  • com.send() :  is used to send data to a node, indicated by the dest parameter.
  • comm.receive() : is used to receive some data from source node received from the node indicated by the source parameter.

These are the basic functions. But many others are present that can be utilised to create a MPI compliant python program.

 

One thing to note that, if  edge conditions are not taken care of and number processes to be used is provided to be greater than the number of nodes then the execution of the program fails. To avoid this the processes can be given a loop around by using the

%size operation as shown in the above example,  that would wrap around from the 1st processor to do the task.

 

Raspberry Pi Hacks – Part 1: building MPI for python on a Raspberry Pi cluster

This article assumes that a raspberry Pi cluster is running the latest Raspbian OS and the MPICH2 interface is built and is operational.
(if you haven’t built a cluster and want to , do comment here with your email id/some contact on social media and I can provide the resource and our procedure sheet)
Now the conventional way to install the MPI for python (which is called mpi4py) will not work. That is using the command:

 sudo apt-get install python-mpi4py

will install the mpi4py, but when its run to execute, it fails or crashes. This will be observed only by the developers who have installed MPICH2 interface in their cluster. The reason why it crashes is, unknowingly,  the command above will install instances of openMPI. OpenMPI is a different interface that clashes with the one that is already installed, MPICH2. A system is usually designed to run only one interface and when there are multiple instances running, it leads to a system failure.

To avoid this failure and the tedious task to restore the operating system back to its previous state, a work around exists. This work around is to build the mpi4py manually on each of the node in the cluster.

The following are the steps to build it:

1) download the mpi4py package.

      curl –k –O https://mpi4py.googlecode.com/files/mpi4py-1.3.1.tar.gz

      We can use wget instead of curl but I couldn’t find an option that bypasses the certificate    issue that hasn’t been resolved by the website maintenance team.

2) Unpack it. And change to that folder.

       tar –zxf mpi4py-1.3.1.tar.gz
cd mpi4py-1.3.1.tar.gz

3) Before the build is started, it is important to make sure that all the python development tools     are available.

This ensures that many important header files like Python.h is present and can be used by the build function.

(This step can be skipped if the python development tools are already installed)

         sudo apt-get update –fix-missing

         sudo apt-get install python-dev

4) Now, we can build the package.

           cd mpi4py-1.3.1.tar.gz

           sudo python setup.py build  –mpicc=/usr/local/mpich2/bin/mpicc

    few things that have to be noted here:

  •        The option –mpicc is used to provide the build file the location of the MPI compiler.
  •        The option –mpicc has to be used only if the location of that compiler doesn’t already exist in the system path.
  •        The path /usr/local/mpich2/bin/mpicc is the location on my node, where the mpich2 is built. It might not be the same for everyone and so that has to be replaced with the path, where mpicc is located in that system.

The only thing now left do is to install the build.to install change working directory to mpi4py:

cd mpi4py

After shifting to this directory, run the command :

sudo python setup.py install

Once this is done, repeat the process in every other node in the cluster. Then the demo program helloworld.py can be run to test if mpi4py is installed on all the node successfully and is running correctly.

If the nodes of the cluster aren’t already built, then the easier way to do it would be to perform the above procedure on one node and read the entire image of the OS and write it into the SD cards of each of the other node. This would eliminate building of mpi4py package on each node individually.