af83

The pragma strikes back (and how bonjour(something).fr could enter the nodejs code base)

Last month, Ori was debugging an IE / Flex URLRequest / SSL bug, and the solution to his problem was a Pragma. Today I was debugging on nodejs, and the origin of my problem was a pragma…

Well, to be precise, the pragma was at the origin of the problem as I encountered it, but it could have been any header field. First things first, let's start by the beginning:

  • I'm writing an application using nodejs, the über asynchronous JS framework by Ryan Dahl.
  • For some reasons, this application collects URLs people post in different places, to have them in one place. Because nodejs make it easy, the application checks the URLs posted are valid and if the server answer by a redirect, keep the result of the redirection.
  • Yesterday's night, Shakaman (the author of ShakaCSS) posted an URL on IRC and here is another corresponding one: http://bonjourlechat.fr , but the link didn't appeared in the collected URLs.
  • I first thought it was the server fault, so I came back home. But this morning, I checked the answer from the server, just to be sure, and it was a regular 301 reply. There was a bug somewhere.

After reproducing the bug I found out that nodejs headers had no "location" header field but a "pragmalocation" one. I first thought "WTF, is that some kind of new header I haven't heard about?!", then looked up on the Internet… To find nothing.

So the "pragmalocation" header field does not exist, but shows up in nodejs… Firebug doesn't tell me anything about any pragma header, but Chromium does: it seems there is a prama header field with an empty value. wget gave me the exact order in which header fields were given:

HTTP/1.0 301 Moved Permanently
Date: Thu, 03 Jun 2010 09:56:32 GMT
Server: Apache/2.2.3 (Red Hat)
X-Tumblr-Perf: "ch:0/ cm:0/ ce:0/ c:0/0 d:0/0 e:0/0"
P3P: CP="ALL ADM DEV PSAi COM OUR OTRo STP IND ONL"
Cache-Control: public
Pragma: 
Location: http://www.bonjourlechat.fr/
Vary: Accept-Encoding
X-Tumblr-Usec: D=47851
Content-Length: 0
Content-Type: text/html; charset=UTF-8
X-Cache: MISS from rack1.tumblr.com
X-Cache-Lookup: MISS from rack1.tumblr.com:80
Via: 1.0 rack1.tumblr.com:80 (squid/2.6.STABLE6)
Connection: keep-alive

From now it was clear what the cause of the bug was: no value in a header field would provoke the field name to be concatenated to the next one… I was able to reproduce the bug with other header field names.

After further investigation, it was clear that the problem came from the http-parser used by nodejs (http://github.com/ry/http-parser). This parser is a finite state machine designed to be fast and triggering some callbacks along the parsing. So the state machine is basically triggering two callbacks:

  • on_header_field: called when some pieces of header field names is parsed;
  • on_header_value: called when some pieces of header field values is parsed.

The problem with such approach, is that only using the two provided callbacks, it is not possible to detect the case where a header field has no value… It also makes it a bit tricky to use.

So I proposed a solution, which is just to add another callback to be called once the parser finish to parse a header field. The patch is pretty small (3 lines in the parser), but is makes it easier to use and solve our problem. As any bug need a test, I added a test reproducing the bug. The test making a reference to bonjourlechat.fr, this is how this website could enter the nodejs code base!

To illustrate how simple it makes the code using the parser, here is how could be rewritten the http-parser.js :

Before:

  ...

  parser.onMessageBegin = function () {
    parser.incoming = new IncomingMessage(parser.socket);
    parser.field = null;
    parser.value = null;
  };

  ...

  parser.onHeaderField = function (b, start, len) {
    var slice = b.toString('ascii', start, start+len).toLowerCase();
    if (parser.value) {
      parser.incoming._addHeaderLine(parser.field, parser.value);
      parser.field = null;
      parser.value = null;
    }
    if (parser.field) {
      parser.field += slice;
    } else {
      parser.field = slice;
    }
  };

  parser.onHeaderValue = function (b, start, len) {
    var slice = b.toString('ascii', start, start+len);
    if (parser.value) {
      parser.value += slice;
    } else {
      parser.value = slice;
    }
  };
  
  ...

After:

  ...

  parser.onMessageBegin = function () {
    parser.incoming = new IncomingMessage(parser.socket);
    parser.field = "";
    parser.value = "";
  };

  ...

  parser.onHeaderField = function (b, start, len) {
    parser.field += b.toString('ascii', start, start+len).toLowerCase();
  };

  parser.onHeaderValue = function (b, start, len) {
    parser.value += b.toString('ascii', start, start+len);
  };

  parser.onHeaderValueComplete = function () {
    parser.incoming._addHeaderLine(parser.field, parser.value);
    parser.field = "";
    parser.value = "";
  };

  ...

This is much simpler! I like the story of this bug, this is why I posted here. Along the debugging process, I had a look at the way JS and C++ communicate in nodejs, and also probably learned some C tricks reading Ryan code, so thanks Ryan!

Pierre

Update: The patch has not been accepted, Ryan preferring to keep the API as it. But the test case has been kept (cf. commit).

blog comments powered by Disqus