Saturday 14 May 2011

REST, HTML, Ajax and PHP

What is REST?

REST is an architectural style for the Web. The term was coined by Roy Fielding in his PhD thesis. Roy was one of the architects of HTTP/1.1 so he knows what he was talking about. His thesis was rather abstract, so his ideas were ignored by the many who went chasing after SOAP, WSDL and UDDI in the hope that salvation lay that way. It didn't of course, and I have written about that in A Critique of Web Services.

REST as applied to the Web is based around the HTTP verbs of GET, PUT, DELETE and POST. Requests from a user-agent (such as a browser or web client application) signal what type of action they expect from a Web server by the choice of verb in the HTTP request. In a way, this is reminiscent of O/O programming where you get (GET) or set (PUT) a field of an object, delete (DELETE) an object or perform some arbitrary action (POST). I have explored this in An Overview of REST.

REST has become more popular as an approach to designing web applications, and has influenced many well-known systems such as Twitter and Google maps. Ruby on Rails is said to have adopted REST as a backend architecture. But the problem with an architecture is that it doesn't tell you how to do things, just what they should look like.

So in this blog, I'm going to look at building a simple web application from the front- to the back- end. It's a simple one: a database of people where each person has a name and ID, and you can query the database or add or delete from it.

From the REST viewpoint, a client will make these queries of the backend. The meanings of these are essentially given  by the HTTP specification:
HTTP method URI result
GET /people GETs a collection of people
POST /people POST: does something to the collection (using data in the POST request)
GET /people/jan GETs jan
PUT /people/jan PUTs a new version or an updated version of jan (using data in the PUT request)
DELETE /people/jan DELETEs jan
POST /people/jan POST: does something to "jan" (using data in the POST request)

The client will make these requests to an HTTP server. It will perform appropriate actions and return a result. The HTTP specification also specifies the result types:
HTTP method Status
GET 200 - found resource
404 - not found
POST 200 - resource returned
204 - no resource to return
201 - new resource created
PUT 201 - resource created
200 or 204- resource modified
4XX, etc - error occurred
DELETE 200 - deleted and resource returned
204 - deleted but no resource returned
202 - will be deleted
4XX, etc - won't be deleted

Python client
 
First I will give Python code to send these requests and interpret the responses. It is just a simple command-line interface, but you could wrap it in a GUI such as PythonCard.

#!/usr/bin/python

import httplib

baseURL = "/boxhill/ict329/webservices/people/"

def connect(method, url, data):
    connection = httplib.HTTPConnection("localhost")
    connection.request(method, url, data)
    response = connection.getresponse()
    status = response.status
    body = response.read()
    return (status, body)


def get(name):
    (status, body) = connect("GET", 
                              baseURL + name, 
                             "")
    if (status == 404):
        print name + ": no such person"
    else:
        print body

def put(name, ID):
    (status, body) = connect("PUT",
                             baseURL + name,
                             ID)
    if (status == 201):
        print name + ": created"
    else:
        print name + ": modified"

def delete(name):
    (status, body) = connect("DELETE",
                             baseURL + name,
                             "")
    if (status == 404):
        print name + ": couldn't delete"
    else:
        print name + ": deleted"
        print body

def getAll():
    (status, body) = connect("GET",
            baseURL,
            "")
    if (status == 404):
        print "No people"
    else:
        print body
 
You can run this with requests such as

get("Peter")
put("Fred", 21)
delete("Paul")

The backend will act on these requests and return suitable responses.

HTML and REST

The Web isn't just HTTP of course. HTTP is there to carry content, and most of the content is HTML (I'm talking value of content here, not size of content as in PowerPoint slides :-). How does HTML co-operate with REST?

HTML allows you to create documents with links, using tags such as <a href="...">. In the case of http: links, all browsers use an HTTP GET, although I can't find this specified anywhere.

HTML also allows you to create forms where the content can be submitted to a server. The type of submission is controlled by the "action" attribute of the form tag, and can be either POST or GET. This is true of HTML 4 and also currently true of the HTML 5 draft. I do have a copy of an 2009 draft which did allow PUT and DELETE, but now in 2011 those options have disappeared.

So HTML offers no support for PUT or DELETE and so cannot be considered supportive of REST. Using POST to mean GET, PUT, DELETE as well is not support as I would consider it.

JavaScript and REST

With JavaScript you can load() a document. This uses HTTP GET. You can submit a form by form.submit(). This uses the form's action attribute, which is either POST or GET.

"Standard" JavaScript does not have support for REST.

Ajax and REST

You have to turn to Ajax to get proper support for REST. That means you have to make Ajax calls using XMLHttpRequest if you want a browser to make the correct REST calls, whether you like it or not.

 I'll use JQuery as that is a little easier and takes care of browser differences compared to straight JavaScript. There is an object $.ajax which takes a dictionary of attribute: action pairs. This can be used to specify the calling method (GET/PUT/...) as well as the URL, and the action to take when a result is returned. For our purposes here, the HTTP return codes are 2XX Okay codes or 4XX error codes. These can be handled by $.ajax success: and error: elements respectively.

A function to GET a value and display the results or an error in an Alert box is

      function doGet() {
        name =  document.getElementById("get_name").value;
        $.ajax({
           type: "GET",
           url: "people/" + name,
           success: function(data, status, jqXHR) {
                      alert(data);
                },
           error: function(jqXHR, textStatus, errorThrown) {
                      alert(name + ": no such person");
                },
           async: false,
        });
      };
which extracts the name from a textbox with id "get_name" in a form and makes a synchronous GET request, and shows the appropriate alert box when the call returns.

A similar function can be used for the DELETE call:

      function doDelete() {
        name =  document.getElementById("delete_name").value;
        $.ajax({
           type: "DELETE",
           url: "people/" + name,
           error: function() {
                         alert(name + ": couldn't delete");
                  },
           success: function() {
                         alert(name + ": deleted");
                  },                     
           async: false,
        });
      };

For the PUT method, we want to distinguish between a response of 201 (the resource was created) and 204 (the resource was modified). Both of these are success values, but done in a different way. From JQuery 1.5 onwards, we can examine the status code of the HTTP response:

      function doPut() {
        name =  document.getElementById("put_name").value;
        ID =  document.getElementById("put_id").value;
        $.ajax({
           type: "PUT",
           url: "people/" + name,
           data: ID,
           statusCode: {
                201: function(jqXHR, textStatus, errorThrown) {
                         alert(name + ": created ");
                     },
                204: function(jqXHR, textStatus, errorThrown)  {
                         alert(name + ": modified");
                     }
           },
           async: false,
        });
      };
Finally, we link these calls to forms:
<ul>
  <li>
    <p>
      Get info about one person
    </p>
    <p>
      <form>
        Name <input type="text" id="get_name"/>
        <br/>
        <input type="button" value="Submit this"
                    onClick="doGet()"/>
       </form>
    </p>
  </li>

  <li>
    <p>
      Create a person
    </p>
    <p>
      <form>
        Name <input type="text" id="put_name"/>
        <br/>
        ID <input type="text" id="put_id"/>
        <br/>
        <input type="button" value="Submit this"
                    onClick="doPut()"/>
      </form>
    </p>
  </li>

  <li>
    <p>
      Delete a person
    </p>
    <p>
      <form>
        Name <input type="text" id="delete_name"/>
        <br/>
        <input type="button" value="Submit this"
                    onClick="doDelete()"/>
      </form>
    </p>
  </li>
</ul>

Apache and REST

That's enough of the client side - now for the server! Most of the world uses Apache and I do too (except for Lighttpd on my hacked MyBook World server).

Besides the HTTP verbs, the other major component to REST is resources. We have already been using them as "http:people/jan", "http:people/fred" where the URL labels the resources "Jan" and "Fred". REST assumes that URLs are identifying resources, which may be documents, people, items on a shopping cart, etc.

So why didn't we use URLs such "http:people/get.php?jan" when we want to do a GET? Simple: REST URLs label resources not actions on these resources. get.php is an action on some parameter rather than a label. The value of using resources for URLs is that we can perform different actions on the same resource as in "GET people/jan", "DELETE people/jan" etc, and each time we refer to the same resource. This would not be clear if we had "GET people/get.php?jan" and "PUT people/put.php?name=jan&id=1".

In addition, just for maintenance reasons, we would want to separate implementation from data: suppose we decided to move away from PHP and use Ruby on Rails. We wouldn't want to revise all of our implementation-specific URLs - that should be an action behind the scenes in how to handle our resource URLs.

That said, how do we go about mapping resource URLs into actions on those resources? There are several methods. I've chosen Apache's Rewrite rules. These are specified in Apache configuration files, such as /etc/apache2/sites-enabled/000-default on an Ubuntu system. A rule consists of two components: a condition to be satisfied and then a rewrite of the URL. The rewrite rule uses regular expressions as in Perl, where "^(.*)$" means any characters between a begin and end of a string.

I have rules that apply only to the directory on my server box where I keep the "people" files. The Apache directive is

        <Directory /home/httpd/html/boxhill/ict329/webservices/people/>
                RewriteEngine on

                RewriteCond %{REQUEST_METHOD} =GET
                RewriteRule ^(.*)$ get_request.php?$1 [QSA]

                RewriteCond %{REQUEST_METHOD} =PUT
                RewriteRule ^(.*)$ put_request.php?$1 [QSA]

                RewriteCond %{REQUEST_METHOD} =DELETE
                RewriteRule ^(.*)$ delete_request.php?$1 [QSA]
        </Directory>

which calls get/put/delete_request.php with the person's name extracted from the URL and appended as a parameter to the PHP call. After you have made changes like this to the Apache configuration files don't forget to restart or refresh Apache (kill -HUP `cat /var/run/apache2.pid`).

My database

For this blog, I use a MySQL table with two fields, name (text) and id (integer).

PHP

The PHP code is fairly straightforward. It just extracts the parameter information from the $_REQUEST, accesses the database and returns the results using the appropriate HTTP return codes.

The get.php program is

<?php
$user="*****";
$password="*****";
$database="*****";
$localhost="localhost";
mysql_connect($localhost,$user,$password);
@mysql_select_db($database) or die( "Unable to select database");

$req=$_REQUEST;
$keys=array_keys($req);
$name=mysql_real_escape_string($keys[1]);
if ($name=="") {
   $query="SELECT * FROM people";
} else {
  $query="SELECT * FROM people WHERE name=\"$name\"";
}

$result=mysql_query($query);
mysql_close();

$num=mysql_num_rows($result);
if ($num == 0) {
   header("HTTP/1.1 404 Not Found");
   exit();
}

$n=0;
while ($n < $num) {
      $id=mysql_result($result,$n,"id");
      $name=mysql_result($result,$n,"name");

      echo "$name\n";    
      echo "$id\n";

      $n++;
}
?>
The delete_request.php is slightly more complex as we have to determine if the DELETE  has succeeded by counting if the number of rows affected is zero or not.

<?php
$user="*****";
$password="******";
$database="******";
$localhost="localhost";
mysql_connect($localhost,$user,$password);
@mysql_select_db($database) or die( "Unable to select database");

$req=$_REQUEST;
#print_r($req);
$keys=array_keys($req);
$name=mysql_real_escape_string($keys[1]);

$query="DELETE FROM people WHERE name=\"$name\"";

$result=mysql_query($query);
$num=mysql_affected_rows();
mysql_close();

if ($num == 0) {
   header("HTTP/1.1 404 Not Found");
} else {
  header("HTTP/1.1 202 OK");
}
?>

The put_request.php is also complicated by needing to tell if we are creating (INSERT) or modifying a row (exists, DELETE, INSERT). It is


<?php
$user="*****";
$password="******";
$database="*****";
$localhost="localhost";
mysql_connect($localhost,$user,$password);
@mysql_select_db($database) or die( "Unable to select database");

$req=$_REQUEST;
$keys=array_keys($req);
$name=mysql_real_escape_string($keys[1]);

$putdata = fopen("php://input", "r");
$ID = intval(fread($putdata, 1024));
fclose($putdata);

# are we updating?
$query="SELECT * FROM people WHERE name=\"$name\"";    
$result=mysql_query($query);

$num=mysql_num_rows($result);
if ($num == 0) {
   header("HTTP/1.1 201 Created"); # creating
} else {
   header("HTTP/1.1 204 OK"); # updating
   # delete current row first
   $query="DELETE FROM people WHERE name=\"$name\"";

   $result=mysql_query($query);
}

# now add new person
$query="INSERT people VALUES (\"$name\", $ID)";

$result=mysql_query($query);
mysql_close();

?>

Summary

This blog has discussed beginning to end of a web application using REST. Hope it is useful!

If you like this blog, please contribute using Flattr


or donate using PayPal





Wednesday 4 May 2011

Ubuntu 11.04

I've got a laptop running Ubuntu 10.10, a netbook running 10.04 and as an alternative boot, the 10.10 netbook remix and also a rather odd system: I'm using an HP thin client as media centre in my lounge connected to a 58 inch Panasonic TV via DVI to HDMI. It streams from a MyBook World NAS that I hacked into to become a general server.

The TV screen is about 3 metres away from where I sit, so I want a "10 foot GUI". Surprisingly perhaps some of the netbook distros do very well at that as while they are designed for small screens they also do well for big screens a long way away. Currently I am using xPud which I have customised to my environment.

But I have to keep rebuilding my version of xPud to keep up, so I am always trying other distros with an easier management cycle. So of course Ubuntu 11.04 looks like a nice candidate.

It's worse than a disappointment, it doesn't display a usable screen at all. Bits of icons smeared across the screen and the left tool bar only showing when you click on the right-hand-side (!) of the screen.

It runs okay on my netbook, but I'm loathe to upgrade my main laptop while there are such problems with other systems. I guess I will wait till 11.10 for Unity to stabilise before doing a general upgrade to Ubuntu 11.

Sunday 3 April 2011

Shell scripts to handle filenames with spaces

Posix/Unix/Linux was not designed to handle filenames with spaces in them. However, Linux and Windows filesystems allow them and also many other "funny" characters. This has been brewing as a topic in Linux Journal recently, and Dave Taylor has just written an article on it in the February, 2011 issue. He spots files with spaces in them by the shell pattern "*\ *" and then mucks around changing spaces into other things. It's good stuff, but overkill for some cases.


For a long time now, I've been writing scripts that handle filenames both with and without spaces. You've got to know your shell and how Posix commands work! Commonly, I want to list files in a directory and do things to them whether or not they have spaces in them.

Shell patterns such as "*" break strings into "words" based on whitespace (spaces, tabs, newlines). This stuffs up a filename if its has spaces in it, since the name then gets split into separate words. But commands such as "ls" (when not directed to a terminal) list each filename on a separate line. So if you have something that distinguishes between spaces/tabs and newlines then you can get complete filenames with or without spaces.

The shell command "read" reads a line and breaks it into words. so
 read a b c
with input
 a line of text
will assign
 a="a"
 b="line"
 c="of text"
Just
  read line
will read all of the line into the variable. It stops reading on end-of-line so it has the distinction type I often need.

But how to use it? Well, the shell while loop is just a simple command, and as such can have its I/O redirected.  So I do this:
 ls |
 while read filename
 do
    #process filename e.g.
    cp "$filename" ~/backups
 done
This works for all files, with or without spaces. Just don't forget the quotes while processing the file! 

Of course, this doesn't work for all uses: note the find and xargs combination that Dave also commented on:
 find . -print0 | xargs -0 ...

Saturday 2 April 2011

HTML 5 has a serious flaw

HTML 5 is long overdue, after the WWW Consortium's failed attempt at convincing us to use XHTML. It has many useful features, but one glaring fault: it has discarded version control. I've been writing and designing distributed systems for over twenty years, and one thing has become very clear: if you don't include version numbers in your protocol then you are asking for trouble.

The document type has been simplified. Before it used to have horrible things like
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd">

But now it has been simplified overmuch to just

<!DOCTYPE html>
There are some people who like this (e.g. John Resig). But I don't. HTML will continue to evolve - there will be new tags and attributes, and the existing behaviour will be clarified or changed. But without a version number, how will a browser (or any user agent) be able to work out which version it is dealing with? And how can a content generator signal which version it is creating? Already there is considerable confusion about which bits of HTML 5 are supported by different browsers.

The simple answer is perhaps that this allows vendors free reign to do what they want - and we saw what a mess that caused before HTML 4 put a standard in the ground. There is still time for the WWW Consortium to fix at least this one error before it is too late.