Details of the Experiments

This section describes the details of the experiments we conducted. The results are summarised in "Summary of results and conclusions", so on a first reading you may want to skip to "here".

All times are averaged over 1000 invocations unless otherwise stated. The second figure given in brackets for each experiment was obtained with the client and server on a different machine.

Experiment 1

The client invokes the Echo CGI program and measures time to return result. This took 202 milliseconds (201 milliseconds).

Experiment 2

We wrote a new CGI program which returns two longs and a string (the latter being the echo payload). Part of the IDL skeleton for this service generated by the stub compiler was linked with the HTTP server. The code of the server was changed so that if it launched a CGI program it would wait for the CGI program to terminate, ignore the results and use the IDL skeleton to return two longs and a string to the client. The two longs marshalled as the results were HT_start (the start time in milliseconds) and HT_stop (the stop time in milliseconds).

We timed from the point that the forked server starts until the point immediately before it sends the reply data down TCP/IP connection to the client (but after it has sent the headers). This took 183 milliseconds (185 milliseconds). This implies that it is taking 10 milliseconds for round trip TCP traffic (including slow start and establishing the connection) and for client to set up call and decode results (including MIME parsing). (We are deducting the 10 milliseconds to fork an HTTP process to deal with the request).

Experiment 3

We timed from the call to fork the CGI process to the same point as above. This took 140 milliseconds (148 milliseconds). This implies that it takes 43 milliseconds to parse the headers and decode an incoming request.

The headers of the incoming request are shown below:

POST /cgi-bin/Echo2 HTTP/1.0
Accept: application/x-corba
Accept: */*; q=0.300
Accept: application/octet-stream; q=0.100
Accept: text/plain
Accept: text/html
From: nje@socrates.ansa.co.uk
User-Agent: Echo client/0 libwww/2.17
Content-length: 542
Content-type: application/x-www-form-urlencoded

The server parses the headers by doing a string comparison on each line. Note that a normal browser would include much more extensive header information than this, because it can cope with many more types of media (e.g. typically Mosaic(TM) generates about 40 lines of headers).

In CERN HTTPD the headers are parsed in HTParseRequest(). This routine reads the headers in one line at a time from the socket. However, measurements suggest that this is non-blocking (or at least no significant blocking): we measured the time to read in each line to be about 60 microseconds.

Experiment 4

We timed from call to fork CGI process to the point where the HTTP daemon has finished writing data down the pipe to the CGI process. This was an average of 133 milliseconds (137 milliseconds) per call. So it would seem it takes about 7 milliseconds for the CGI process to unmarshal the data, marshal the results and write them back down the pipe. This is not surprising given that all data in the body of a request is url-encoded, so each character needs checking to see if it needs decoding (or unescaping).

There is no need to url-encode all the data in the body of the POST. Since the data is not included in the headers, it cannot confuse the parsing machinery. However, this emulates current browsers, which url-encode all data regardless of whether they are invoking GET or POST from a form. (The encoding is necessary for GET, since the request body is included in the HTTP headers).

Experiment 5

We started the timer at the point where the HTTP daemon starts writing down the pipe. We stopped it when the daemon finished writing down the pipe. This took 124 milliseconds (131 milliseconds), which is astonishing considering it is writing only 542 bytes.

Note that if we subtract 124 milliseconds from 133 milliseconds (obtained in section 1.2.4) we get 9 milliseconds which confirms the time to fork a process measured independently in section 1.1.

Experiment 6

Examining the source of the HTTP Daemon we noted that it first reads in the HTTP headers and processes them. It is not until it has forked the child that it reads the body of the request using HTInputSocket_getBlock(). So we timed the length of execution of this routine. The result was 122 milliseconds (129 milliseconds). This is clearly one of the major performance bottle necks.

Experiment 7

We placed calls of gettimeofday() either side of NETREAD() in HTInputSocket_getBlock() in HTFormat.c in libwww. On HPUX, NETREAD() is a call to read(). It reads in the body of the request from the socket. We found that on a lightly loaded PA-RISC machine (even with the client and server "niced" to run at the top priority of 0) the call to NETREAD() still took about 110 milliseconds (averaged over 10 invocations). It seems likely that the kernel must be doing some scheduling here and that a context switch occurs, blocking the server for a period of time.

When a client does a "POST" using libwww, it sends the headers and then immediately makes a separate second call to write() on the socket sending the body of the request. If the client and server are on the same machine, one possible scenario is that the CPU does a context switch after the client has sent the headers and then runs the server until it blocks trying to read in the body of the request. Next the client would be rescheduled to send the body of the request.

However, I do not believe that this context switching is the major performance overhead, as otherwise we would expect to see a significant speed up when client and server were executed on different machines. We don't.

It seems most likely that what is causing this blocking is of the server is some interaction with the underlying sockets and TCP library (probably during the reading of the headers). We have not pursued this any further.

Next ANSA home page

Nigel Edwards <nje@ansa.co.uk>, Owen Rees <rtor@ansa.co.uk>