RSS

Category Archives: code

HPC (High Performance Compute) Cluster with MPI and ArchLinux

The following is a simple guide to setting up a cluster server and nodes using ArchLinux.The advantage of this approach is the flexibility of setting up a computer capable of high speed parallel computation using commodity hardware.
The procedure will be generally similar for most Unix based systems.The preference for Arch is driven by its philosophy of keeping-it-simple.’Simple’ is defined from a technical standpoint, not a usability standpoint. It is better to be technically elegant with a higher learning curve, than to be easy to use, and technically crap.Thus for a base system that will be as lean and fast as possible the minimalist base Arch install is perfect for the task at hand.

Open MPI

The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available.

Machine setup

This guide assumes:

  • all the machines have been formatted and Arch base system installed according to the guide
  • the machines are connected via a TCP/IP network with the ip addresses and hostnames noted down as they will be required in later steps.
  • each machine has a common login account (in this case baloo)
  • all machines are using the same processor architecture i686 or x86_64

Its always a good idea to get the latest and up-to date Arch system so a quick:pacman -Syu

SSH setup

Open MPI communicates between the nodes and server over a secure connection provided by openssh secure shell.The full details of openssh options can be found from the arch wiki or the main openssh site .Here the bare minimum is given to get a cluster up and running.

Installing openssh

Accomplished by calling:pacman -S openssh
the default configuration for the sshd (server deamon) are enough for our needs.Inspect the /etc/ssh/sshd_config making sure all options are sane then continue.

Generating ssh-keys

To allow the cluster to send communication to the nodes from the server without the password being requested at every instance we shall use ssh-keys to enable the seamless logon.Using the defaults accept as given.No passphrase is selected , although inherently less secure than with one this precludes the need to setup key management via keyring.

Copying Keys to the server

Start the ssh deamon rc.d start sshd on both the server and the slave node and copy the public key from each node to the server.These will all end up in the home directory for our common user baloo.ie /home/baloo/.ssh/
The server publickey (id_rsa.pub)and each of the publickeys copied over from the nodes are then appended to the authorized_keys file at ~.ssh/authorized_keys on the server.To enable two way communication its then possible to copy this file back to all the nodes after.
IMPORTANT:make sure the permissions for the following are all appropriate for reading and writing only by the owner:chmod 700 ~/
chmod 700 /.ssh
chmod 600 authorized_keys

logging into the remote machines via ssh should no longer require a passsword.

NFS setup

OpenMPI requires the programs that are to be run to be in a common location here .Instead of copying the program executable over and over to the slave nodes we set up a simple NFS shared directory with the actual folder on the server from which all the nodes will mirror the contents.

Server Configuration

Create the directory that will be shared /parallel in this instance and edit the /etc/exports to have the file mirrored to the remote nodes
/parallel . . . . . . . *(rw,sync)
and change the ownership permissions for the shared directory to nobody
chown -R nobody.nobody /parallel
edit /etc/conf.d/nfs-common.conf
STATD_OPTS=”–no-notify”

Client Configuration

Edit /etc/fstab to include the following line so the clients can access the shared /parallel directory
192.168.2.103:/parallel /parallel nfs defaults 0 0

Daemons Configuration

Setting the appropriate daemons to launch on start-up simply requires the modification of /etc/rc.conf and adding the appropriate entries.

Server

#
DAEMONS=(…….sshd rpcbind  nfs-common nfs-server ……)
#

Nodes

#
DAEMONS=(…….sshd rpcbind  nfs-common ……)
#

OpenMPI setup

With the preliminary setup out of the way we can now install the openMPI package , it comes with inbuilt wrappers for c++ fortran and c additionally the python wrappers can also be installed.It should be installed on both the server and nodes
pacman -S openmpi python-mpi4py python2-mpi4py
*the python wrappers are there if you want to implement the parallel programs in mpi for python

OpenMPI Configuration

To allow Open MPI to know on which machines to run your programs create a hostfile in the default user home directory.if /etc/hosts was set up you can use the host names here otherwise the IP addresses of the machines can work just as well.~/mhosts

#The master node is dual processor machine hence slots = 2
#
localhost slots=2
#
#The slave node is a quad core machine hence the slots=4
#
Or1oN slots=4

Running Programs on the cluster

To run myprogram on the cluster issue the following command from the /parallel directory:$mpirun -n 4 –hostfile ~/mhosts ./myprogram$mpirun -n 4 –hostfile ~/mhosts python myprogram.py

or$mpiexec -n 4 –hostfile ~/mhosts ./myprogram

$mpiexec -n 4 –hostfile ~/mhosts python myprogram.py

 
Leave a comment

Posted by on April 18, 2012 in code, Harware, Processors, Uncategorized

 

Tags: , , , ,

DANG on a Cloud

Thats Django, ArchLinux ,Nginx and Gunicorn on an Amazon EC2instance.Getting a server to play around with root access courtesy of Amazon AWS is pretty easy, just sign up provide your credit card details and you are almost ready to go.First things first setting up the server.Theer are plenty of ready-made AMI images floating out there so one is really spoilt for choice , personally my preference is for Archlinux having run the rolling release distro as a primary dev machine for the better part of a decade it seemed right to give it a run as a production server.Loading the instance was pretty straightforward (instructions available on AWS blogs and forums).As with all good Arch systems getting it up to the latest version is as easy as:

pacman -Syu
once the changes have been made to the system and confirmed it time to setup the environment.The deployed app will be deployed within a virtualenv , this has the advantage of keeping the python installation of the app sand-boxed from that running on the system with a simple:

pacman -S virtualenv
then executing
virtualenv /path/to/apps -p /usr/bin/python2
source /path/to/apps/bin/activate
once created install the source control system of choice (assuming the app is being managed by one) or good old sftp can work just as well with filezilla .Once unpacked into its deployment folder
installing the project requirements is a case of :
pip install requirements.txt
and
pip install gunicorn.
As the gunicorn server will be running embedded from within the django app its a matter of adding it to the settings.py file and running it will be:

INSTALLED_APPS = [
.
.
“south”,
“rosetta”,
“mptt”,
“gunicorn”,
#
.
.
]
python manage.py run_gunicorn
The final part of the puzzle is the nginx server, lighter than the more resource hungry Apache and way simpler to configure with the editing of one file the nginx.conf to server the static assets from the project .
   upstream app_server {
server localhost:8000 fail_timeout=0;
}

server {
listen 80 default;
client_max_body_size 4G;
server_name _;

keepalive_timeout 5;

location /static/ {
autoindex on;
alias   /home/path/to/site_media/static/;
}

location / {
# checks for static file, if not found proxy to app
try_files $uri @proxy_to_app;
}

location @proxy_to_app {
proxy_pass_header Server;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_redirect off;

proxy_pass   http://app_server;
}

error_page 500 502 503 504 /500.html;

}

The gunicorn server will execute the python thread to a port that the nginx will listen to and server to the world your great new web-app hosted on amazon cloud.
 
Leave a comment

Posted by on February 11, 2012 in code, Internet & networks, Python

 

Tags: , , , , , ,

Human Target in less than 90 lines of Code (Python+OpenCV)

Using python and the versatile OpenCV library its possible to get a human detection system in less than 90 lines of code.This assumes a basic knowledge of python and some working knowledge of the OpenCV computer vision library.

Initialization

The initialization selects whether a camera is to be used or captured image file from the disk,with the inbuilt functions as many cameras as needed can be connected starting from CAM(0) and going up incrementally.

   def __init__(self , typ , fpath):
if( typ == 1 ):
self.capture = cv.CaptureFromCAM(0)
elif( typ == 2 ):
self.capture = cv.CaptureFromFile(fpath)
cv.NamedWindow(“Target”, 1)

Pre-Processing

Before the human can be detected we have to clear up the artifacts that may reduce the efficiency of our detection algorithms.This involves Gaussian smoothing to eliminate false positives.

The capture is gray-scale converted as this color space can be worked on faster than a full color space and re-sized to a smaller size also for the same added speed in processing.

           # Smooth to get rid of false positives
cv.Smooth(color_image, color_image, cv.CV_GAUSSIAN, 3, 0)

# Convert the image to grayscale.
cv.CvtColor(color_image, grey_image, cv.CV_RGB2GRAY)

# scale input image for faster processing
cv.Resize(gray, small_img, cv.CV_INTER_LINEAR)

cv.EqualizeHist(small_img, small_img)

Detection

The detection is carried out by utilizing a HaarCascade based on Viola-Jones.After training to detect a specific object of interest , the cascade file (an XML) is loaded with into the HaarDetectObjects function which stores the detected objects as a tuple which when iterated over yields our detected objects position.

cascade = cv.Load(“data/HS.xml”)

faces = cv.HaarDetectObjects(small_img, cascade, cv.CreateMemStorage(0),
haar_scale, min_neighbors, cv.CV_HAAR_DO_CANNY_PRUNING, min_size)

A central point is computed and a rectangle with a bulls-eye.The VideoWriter provides a easy way to capture an recording of the whole program run.

writer = cv.CreateVideoWriter(“output.avi”, cv.CV_FOURCC(‘M’,’J’,’P’,’G’), 15, frame_size)

cv.WriteFrame(writer, color_image)

humans found
 
Leave a comment

Posted by on February 7, 2012 in Computer Vision, Python

 

Tags: , , ,

Intro to GIS using MapXtreme Java

Part of a small project i  worked on during the summer using NetBeans and MapXtreme Java.With Google Maps , YahooMaps and Openstreet Maps you are genrally spoilt for choice for mapping  applications to build on.However for a few of the more customizable map applications you can roll your own maps and have a suitable interface built using JavaBeans in no time at all.

Requirements

Netbeans
MapXtreme Java
Prepared Map (mdf format)

Rolling the code

The MapXtreme beans are based on swing tech so integrating them into any java project isnt that muchof a hassle.Main ones to be aware of are and the ones that will be used :

  • VisualMapJ Bean
  • Map ToolBar Bean
  • Tool Beans
  • LayerControl Bean

The visualmapj displays within its content pane built on MapJ base object from which they all inherit.The MapXtreme components come with buildt in webserver (tomcat) from which the map is loaded and served via jsp.After loading the map and checking that it displays ok on the browser.The next step is serving the data from the webserer to a desktop client.

Loading Map into tomcat webserver

Loading Map into tomcat webserver

and displaying the map in the browser:

Webview of Map

Webview of Map

finally putting the desktop app into shape:

imports added to the top of the app:

import com.mapinfo.beans.tools.MapTool;
import com.mapinfo.beans.tools.MapToolBar;
import com.mapinfo.beans.tools.RectangleSelectionMapTool;
import com.mapinfo.beans.vmapj.RendererParams;
import com.mapinfo.beans.vmapj.VisualMapJ;
import com.mapinfo.mapdefcontainer.FileMapDefContainer;
import com.mapinfo.xmlprot.mxtj.ImageRequestComposer;

preparing the colors for the display panel and the map:

int maxColors = ImageRequestComposer.MAX_COLORS_TRUECOLOR;
String mimeType = “image/gif”;
String filename = “working.mdf”; 

introducing the beans to hold the map and the control panel:

VisualMapJ vMapJ = new VisualMapJ();
MapToolBar mapToolBar1 = new MapToolBar();

specifying the address of the server running and the container that will receie the displayed map:

String m_mapxtremeURL = “http://Cass1opeA:8090/mapxtreme482/servlet/mapxtreme”;
FileMapDefContainer fmdc = new FileMapDefContainer(“/home/wakwanza/Cod/Maps/data/”); 

and finally putting it all together in a simple try catch to get our app running:

try
{
this.getContentPane().add(buttonPanel,BorderLayout.WEST);
this.getContentPane().add(mapPanel,BorderLayout.CENTER);
RendererParams myRendererParams = new RendererParams(m_mapxtremeURL,mimeType);
vMapJ.setRendererParams(myRendererParams);
fmdc.load(filename);
vMapJ.setStartupMapDefinition(new com.mapinfo.beans.vmapj.MapDefParams(fmdc, filename));
vMapJ.setBackground(Color.BLUE);
vMapJ.setSize(new Dimension(550,550));
mapPanel.setVisible(true);
}
catch(Exception e)
{
e.printStackTrace();
}

JavaApp pulling map from MapXtreme

JavaApp pulling map from MapXtreme

 
Leave a comment

Posted by on November 11, 2011 in code, Java, MapXtreme

 

Futures in Neuromorphic Computing

Which chip will emerge the victor in the new race to beat Moore’s law and finally give us the intelligent machines weve been told are going to be in our future; by being out-competed or forming a marriage of convenience , its still far too early to tell.Briefly some background as i may be running off with the premise of this piece even before the starting gun.

BACKGROUND

The state of transistor tech that has sustained the electronics and computer industry for the past 20 plus years has grown by leaps and bounds (thank you Moore’s Law) enabling massive computational devices to proliferate at a fraction of the cost that they would have been had at in the preceding year.And even from the earliest times when a PC took up an entire room and drew as much power as a small town , the dream of AI has been slowly gaining traction.However it was realised early on that the Positronic brains we so desire for out robot would not be realised by the current hardware at hand .Fast forward to the present to where the problem still persists , no matter how many processor cores one throws at it the crop of supercomputers built to simulate an artificial intelligence still hold to that same principle of a large roomful of boxes drawing enough power to a small town (the more things change).However a fundamental difference with the earl efforts in AI research is with advances in neuroscience we know better how the functioning of the brain can be possibly simulated by artificial means .

The hardware side of AI research has shown that a fundamental flaw in the model being the von Neumann architecture.
<”von Neumann architecture is a design model for a stored-program digital computer that uses a central processing unit (CPU) and a single separate storage structure (“memory”) to hold both instructions and data.The separation between the CPU and memory leads to the von Neumann bottleneck, the limited throughput (data transfer rate) between the CPU and memory compared to the amount of memory. In most modern computers, throughput is much smaller than the rate at which the CPU can work. This seriously limits the effective processing speed when the CPU is required to perform minimal processing on large amounts of data. The CPU is continuously forced to wait for needed data to be transferred to or from memory.”-Wiki>

This is functionally different from the way that a brain will organise its information let alone process it

<”A biological brain is able to quickly execute this massive simultaneous information orgy—and do it in a small package—because it has evolved a number of stupendous shortcuts. Here’s what happens in a brain: Neuron 1 spits out an impulse, and the resultant information is sent down the axon to the synapse of its target, Neuron 2. The synapse of Neuron 2, having stored its own state locally, evaluates the importance of the information coming from Neuron 1 by integrating it with its own previous state and the strength of its connection to Neuron 1. Then, these two pieces of information—the information from Neuron 1 and the state of Neuron 2’s synapse—flow toward the body of Neuron 2 over the dendrites. And here is the important part: By the time that information reaches the body of Neuron 2, there is only a single value—all processing has already taken place during the information transfer. There is never any need for the brain to take information out of one neuron, spend time processing it, and then return it to a different set of neurons. Instead, in the mammalian brain, storage and processing happen at the same time and in the same place.” – Spectrum IEEE>

This brings us to the first of the next generation processing elements based on memristor technology.

MEMRISTORS

From the ground up a memristor , whose existence was theorised in the 70’s and actualised by HP labs in — in application is like a FPGA ; realising functions that need several transistors in a CMOS circuit with the added advantage of non-volatile memory (no power required for state refreshing) and a structure that is remarkably defect-tolerant.

The memristor layer interacts with the CMOS logic layer of the hybrid chip and according to the circuit configurations is able to realise any number of logic gate structures.The process of creating the hybrid chip leaves the underlying CMOS layer untouched , redundant data paths of the crossbar architecture allow routing around defective areas.In neuromorphic computing application the memristor as synapse and transistors as the neurones unsupervised learning becomes an actual possibility.A current work in progress by Boston University , MoNETA where the aim is to realise a general purpose AI able to adapt to solving a problem without prior training , which essentially boils down to a brute force technique with little room for creative problem solving.Using hundreds of normal PE cores sandwiched in a memristor layer where memory is localised to a super-cache immediately accessible and relying on very little power to maintain the information.

The software in this case for modelling the neurological topology is being handled by Cog Ex Machina a special purpose Os.

CHAOGATES

The next contender to the plate of a nueromorphic chip is the Chaogate.I must confess I’m particularly attached to this one , and not just because of butterfly’s.Partial differential equations and the way their solutions arise bring some of the most beautiful patterns , and i like to think brains work similarly if we could only see.As far as chip construction is concerned a new type of gate the Chaogate has been developed recently able to reconfigure itself to provide different logic gates – hence chaogates.Different from FPGAs where switching between RCLG’s achieves reconfiguration chaogates morph via the pattern inherent in their constitutive nonlinear element. Modern computers depend on boolean logic of which any logical operation can be realised by NOR and NAND gates.The chaotic processor is taken as a 1D system whose state is represented by x and dynamics given by non-linear map f(x) , if necessary and sufficient conditions are satisfied by f(x) simultaneously it is able to implement the full set of logical operations.

It also becomes possible to implement combinational logic directly, case in point the half adder involving two AND gates (for the carry) and XOR (sums 1st digit) is implementable with one 1D chaotic element.And a full adder requires three iterations of the single chaotic element giving us efficient computational modules without cascading.

Development by ChaoLogix using standard CMOS techniques has led to an array with: a morphing ALU giving higher functions (multiplier and adder ) in less than 1 clock cycle and communication protocols morphing between 2 different communication protocols in less than 1 clock cycle ( synchronous serial data link or serial computer bus).Arrays can be conceivably be programmed on the run , with threshold values being sent from an external program for optimisation of the task at hand.

Current efforts are aimed at optimisation of the design of a chaogate to sizes similar or smaller to NAND gates , and as a caveat the developers add that programming the chaogates will require development of a new hardware description language , whose scarcity at the moment lends ideas from evolutionary algorithms to be considered as viable alternatives to achieve optimal array configurations.

CONCLUSIONS

While focusing on the hardware advances in recent months on the software side of things Numenta deserves a nod for its work in recreation of a workable model of the human neocortex using its HTM approach.On the open-source side dust seems to be gathering with the last activity on projects like OpenAI being about four years ago.

With recent advancements tackling the whole problem of AI from a new perspective its high time a proper open stack was available to enable the faintest vestiges of consciousness to be breathed into our computers.So say we all.

 

Image credits.”Positronic Brain” by Fernando Laub [ http://j.mp/gUu06E%5D
“Optical Micrograph of CMOS chip with memristor ” [ Nano Lett., 2009, 9 (10), pp 3640–3645 DOI: 10.1021/nl901874j]
“Chaogate Element” –  American Institute of Physics.[doi:10.1063/1.3489889]
 
Leave a comment

Posted by on December 26, 2010 in code, Harware, Processors

 

Simple Qt4 and PostgreSQL book catalog application

For this current project we shall create a basic book storage application using the QT toolkit and PostgreSQL.Download and install QT   http://qt.nokia.com/products/developer-tools/  it comes with its own IDE qtcreator which we shall use for the actual work although any text editor can be used.Though that would unnecessarily prolong the length of the project.

Download and install the PostgreSQL database and configure it for use http://www.yolinux.com/TUTORIAL/LinuxTutorialPostgreSQL.html gives a good rundown if you are working on the linux platform.

Step1.Select the creation of  a new project.In the projects selection choose Qt4 Gui Application.Select the location and name of the project and in the required modules include the QtSql module.

Step2.Select forms from the Navmenu and open up the mainwindow.ui for editing.Design the layout of your app by using the available widgets.In this case we select LineEdit for the input.change their object names to something meaningful.Select pushbuttons and drag onto the workspace , give them useful names.

Step3.The design phase is now over and we shall proceed to coding the internals.Switch to Signal/Slot edit mode(F4).Direct the signals from the buttons to the main layout.In the configure connection pop-up select edit and add 3 new slots one each for clear , add and search.Link the clicked() action to our newly created slots.

Step4.Select the headers and in mainwindow.h add private slots addpub() , clearpub() and searchpub().In the protected section we shall add an auxillary function openDB() for establishing the database connection.In the private section we add a QSqlDatabase object db.

private slots:
void addpub();
void clearpub();
void searchpub();

protected:
void changeEvent(QEvent *e);
bool openDB();

private:
Ui::MainWindow *ui;
QSqlDatabase db;

Step5.We define the functions from step 4 , establishing the connection and inserting records with addpub() , searching for a particular record with searchpub() and clearing the form entries using clearpub().

-openDB function to connect to the database:

bool MainWindow::openDB()
{
db  = (QSqlDatabase::addDatabase(“QPSQL”));
db.setHostName(“localhost”);
db.setDatabaseName(“storagebox”);
db.setUserName(“noob”);
bool ret = db.open();
return ret;
}

-addpub function to insert from the form to the database

void MainWindow::addpub()
{

QString args = “INSERT INTO shelf VALUES (‘ “;;
bool ret = false;
bool lck = false;

args.append(ui->titleline->text()); args.append(“‘,'”);
args.append(ui->authorline->text()); args.append(“‘,'”);
args.append(ui->publisherline->text()); args.append(“‘,'”);
args.append(ui->isbnline->text()); args.append(“‘,'”);
args.append(ui->genreline->text()); args.append( ” ‘); “);

if(!db.isOpen())
{
lck = MainWindow::openDB();
}
ui->resultline->setText(“Pub insertion starting”);
if( lck )
{
ui->resultline->setText(“Pub is being inserted”);
QSqlQuery query;
ret = query.exec(args);

if(ret)
{
qDebug() <<“Pub has been inserted”;
ui->resultline->setText(“Pub has been inserted”);
}

}

}

-searchpub function to search for a stored record:

void MainWindow::searchpub()
{
bool ret = false;

if(!db.isOpen())
{
ret = MainWindow::openDB();
}

QString outLine;
QString findr;
QSqlQuery que;
findr = “select * from shelf where “;   findr += ui->srcq->currentText();   findr += ” like ‘%”;
findr += ui->searchline->text();        findr += ” %’ ;”;
if (ret)
{

que.exec(findr);

if( que.next())
{
outLine = que.value(0).toString();  outLine +=”\n”;
outLine.append(que.value(1).toString());    outLine +=”\n”;
outLine.append(que.value(2).toString());    outLine +=”\n”;
outLine.append(que.value(3).toString());    outLine +=”\n”;
outLine.append(que.value(4).toString());    outLine +=”\n”;

ui->textEdit->setText(outLine);
}
else
{
ui->textEdit->setText(“No result found”);
}

}

Step6.At this point you are ready to compile and run your application.

To download the project files used in this example  https://bitbucket.org/ar119/dbproject/get/f1dbf7bca3cd.zip

 
1 Comment

Posted by on September 8, 2010 in code

 

Tags: , , ,