Friday, May 30, 2014

Is a script to run a command within a virtualenv missing from virtualenv?


Contents of script run_with_env.sh:

#!/bin/sh

envdir=$1
shift

curdir=$1
shift

. $envdir/bin/activate && cd $curdir && exec "$@"

This runs a specified command (plus arguments) in a particular directory within a particular virtualenv.

Example use:

$ crontab -l
...
30 0 * * * /home/trawick/myhg/apache/bin/run_with_env.sh git/edurepo/envs/edurepo git/edurepo/src/edurepo python teachers/pretend_teacher.py
...

July 4 update:

See also vex.

Monday, May 26, 2014

If you replace the hard drive in your MacBook...


Be sure to check the startup disk setting after you are successfully booting from the new drive.

We just replaced the original drive in a Core 2 Duo White MacBook (2008, 4.1) with a 256GB Crucial M500 SSD, following some instructions on Apple Support Communities (search for Second way) for moving the data*. Our power-button-to-productivity benchmark of bootup+login+Chrome-window-appears+Word-window-appears was 2:36 with the original drive and 1:08 with the SSD drive. But that 1:08 included a surprising 25-30 second delay after the screen lit up on power-on but before the Apple logo appeared. There are multiple causes described in this article, but the simple issue for us was selection of the startup disk in System Preferences. After selecting the Crucial SSD as the startup disk, the annoying delay at power up was gone. Thus our power-button-to-productivity time is more like 0:40, down from the original 2:36.

*Nuance with moving the data via Disk Utility Restore: When initially booting from the original drive over USB I selected Macintosh HD instead of Recovery HD. Disk Utility wouldn't let me restore from the original drive to the new drive. I had to choose Recovery HD instead of Macintosh HD from the boot menu. That led to a simple menu (including Disk Utility) instead of my normal OS X environment, and then I was able to restore from the original drive.

Thursday, May 22, 2014

Which Apache httpd module failed the request?


I was reminded today of a module I wrote a while back for Apache httpd 2.4 when I was debugging a config snippet from a customer and saw

[core:trace3] ... request.c(311): fixups hook gave 400: /

Whatever module set 400 (HTTP_BAD_REQUEST) didn't log anything. If you have Apache httpd 2.4 and build it yourself, this type of issue can be solved with mod_hook_ar. This message from mod_hook_ar significantly shrank the search area:

[hook_ar:error] mod_rewrite.c fixups -> 400

Unfortunately, mod_hook_ar doesn't currently have its own web page. You can download the code and information about building it from http://emptyhammock.com/downloads/, and you can read about it starting at slide 46 in this presentation:

Tuesday, May 20, 2014

Recent fun with mod_whatkilledus and mod_backtrace


After Apache httpd 2.4.10 is released, I'll push out a new release of these modules.

The most critical issue resolved in the next release is a problem mod_whatkilledus has with tracking the active request when using the Event MPM. It doesn't notice when the Event MPM maps the request to a different thread while processing the response, so the request which triggered the crash could be misidentified. By using a new web server API I added to httpd 2.4.10, mod_whatkilledus now notices when request processing moves across threads.

Other useful changes in the next release:

  • Windows: Support 64-bit builds, and add the option of disabling the system pop-up error handling dialog after a child process crash, which allows the server to automatically recover and handle new clients. (An alternative to this httpd-specific setting is a Windows registry setting; it has more capabilities, such as enabling a user dump, but it affects all applications.)
  • Linux, FreeBSD, OS X: Optionally use libunwind, for better backtraces in some circumstances.

Strangely, libunwind's unw_step() returns no stack frames in 32-bit builds on OS X, returning 0 on the first call. I didn't encounter any issues with 64-bit builds. (This is with the libunwind APIs in the system library in both Lion and Mavericks, not with libunwind built from source.)

Monday, May 5, 2014

Comparing performance of SCGI and uwsgi protocols


I've been playing with httpd and nginx using different protocols to route to a Python application, and one of the questions that has arisen is whether or not I should try uwsgi (the protocol, not the uWSGI application) with httpd. The third-party module mod_proxy_uwsgi needs some debugging first, so it isn't as simple as modifying one of my existing .conf snippets and seeing how fast it goes.

For the purposes of reverse proxy behind httpd or nginx, uwsgi is essentially SCGI with a more efficient encoding of the length of the parameters passed over — sending over binary lengths with the strings instead of just the string, that then has to be traversed to find the terminating binary zero. I don't think the work in the web server is significantly different either way, though I think SCGI is cheaper for the web server because when copying over HTTP headers and other data it does a strlen() on the var name and value, then a memcpy() for each, including the terminating '\0' so that the backend knows the extent. uwsgi requires copying both strings (without the '\0') and building a two-byte binary length for each in a particular endian order (i.e., shifts and masks). This extra work for uwsgi more than overcomes the savings with the encoding of a couple of other lengths, for which SCGI requires building a printable string.

So how can I make an approximation of the speed-up without actually debugging the uwsgi support for httpd? First, it seems worthwhile to look at nginx. Here are some numbers with nginx, where the request consists of a POST with 71,453-byte body that is written to a very simple WSGI application running under uWSGI, which simply echoes it back. (8 runs each with ab, 10,000 requests, concurrency 100, throw out high and low runs)

ScenarioRequests/sec
nginx, SCGI over a Unix socket4,332
nginx, uwsgi over a Unix socket4,363

So over a lot of runs and a lot of requests per run, uwsgi overall (web server processing + uWSGI processing) has an edge by a little less than one percent.

I came up with a lame experiment with mod_proxy_scgi to see how an improvement in the efficiency of dealing with the parameters might help. In the following patch, I simply remove the second pass of touching (and copying) the parameters in mod_proxy_scgi. Of course this only works if all requests are exactly the same, as in my little benchmark :)

--- mod_proxy_scgi.c 2014-05-05 13:21:39.253193636 -0400
+++ no-header-building 2014-05-05 13:15:33.785184159 -0400
@@ -247,6 +247,8 @@
     return OK;
 }
 
+static char *saved_headers;
+static apr_size_t saved_headers_size;
 
 /*
  * Send SCGI header block
@@ -292,6 +294,11 @@
     ns_len = apr_psprintf(r->pool, "%" APR_SIZE_T_FMT ":", headerlen);
     len = strlen(ns_len);
     headerlen += len + 1; /* 1 == , */
+
+    if (getenv("foo") && saved_headers) {
+        return sendall(conn, saved_headers, saved_headers_size, r);
+    }
+
     cp = buf = apr_palloc(r->pool, headerlen);
     memcpy(cp, ns_len, len);
     cp += len;
@@ -320,6 +327,15 @@
     }
     *cp++ = ',';
 
+    if (getenv("foo") && !saved_headers) {
+        char *tmp = malloc(headerlen);
+        memcpy(tmp, buf, headerlen);
+        saved_headers_size = headerlen;
+        saved_headers = tmp;
+        ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r,
+                      "Saved headers...");
+    }
+
     return sendall(conn, buf, headerlen, r);
 }

It doesn't exactly match what uWSGI has to do extra with SCGI, but it does knock off some processing in the web server.

ScenarioRequests/sec
httpd, SCGI over a Unix socket, no optimization9,997
httpd, SCGI over a Unix socket, optimization10,014

(Again, these are the averages of multiple runs after throwing out highs and lows.)

Hmmm... Even getting rid of a lot of strlen() and memcpy() calls (my rough attempt to trade cycles in httpd for the cycles that would have been saved in uWSGI if we used uwsgi) resulted in much less than one percent improvement. I think I'll stick with SCGI for now, and I don't even think it is worthwhile to change httpd's SCGI implementation to build the header in a single pass, which would get back only some of the cycles saved by the benchmark-specific optimization shown above. (And I don't think httpd is suffering by not having a bundled, reliable implementation of uwsgi.)

Friday, May 2, 2014

Lingering close


One of the early pieces of code I tackled in httpd was APR-izing lingering_close(). I recall dean gaudet ensuring that I didn't screw it up. (See an early part of the conversation.)

Perhaps a year or two later, some colleagues in z/OS TCP/IP and SNA, where I had worked before joining the IBM team working on httpd, let me know that a file transfer program I had written long before had stopped working reliably when transferring to or from z/OS after some updates in z/OS TCP/IP. (Why does one reinvent file transfer? I decided to learn sockets programming but got tired of all the ifdefs to support Windows and OS/2 and VM/CMS and MVS, so I did what everyone else did and wrote a high-level library. A file transfer program comes next, right? Anyway, it was used by a number of colleagues for higher level features like printing to a workstation printer from VM/CMS, XEDIT macros that interacted with your PC clipboard, and other fun stuff.) At any rate, it was good that I had learned about lingering close via httpd because otherwise I would have been shocked at the reason behind the intermittent failures in the file transfer program; every indication was that both client and server modes were doing exactly what they needed to do, but one of the peers could get ECONNRESET before it had finished reading the response. IIRC, the lingering close logic was then implemented and some small amount of happiness ensued, but I didn't have time to rework the build for the then-available tools on VM/CMS, and a big use-case died. Sorry, folks!

Fast forward through most of the life of the web... I've been playing recently with httpd and nginx in front of uWSGI and writing up my notes in this in-progress tutorial. After initially encountering a bug where uWSGI doesn't eat the empty FCGI_STDIN record with FastCGI, now that it is fixed I'm left with a familiar scenario: The server (uWSGI in this case) writes the entire response to the client (FastCGI gateway) and then calls close(), and sometimes (more often over a slow network) the gateway gets ECONNRESET before it can read everything.

What is authoritative text on the subject? I don't know. Most discussions about getting ECONNRESET on a read-type call on the Internet do not talk about getting hit by RST jumping ahead of data already copied by the server to the TCP send buffers. Some discussions raise the issue about the client perhaps trying to send data after the server has closed. (The lingering close logic as normally implemented in TCP servers helps a lot in those cases.) This ancient Apache httpd 1.3 comment is as succinct as any I know of in describing the problem:

 * in a nutshell -- if we don't make this effort we risk causing
 * TCP RST packets to be sent which can tear down a connection before
 * all the response data has been sent to the client.

Here is a blog page that covers this more disturbing scenario very clearly:

So can I trigger the issue at will with a simple client-or-server program? At first I couldn't. After quite a bit of experimentation, the answer is definitely yes. It isn't hard to trigger with httpd's mod_proxy_fcgi and uWSGI and a particular application-level flow and uWSGI configured to run the simple application on multiple processes and threads, but it wasn't so easy with this simple program.

The final methodology to see the error:

server
$ ./a.out server
(no trick there)
client
Run this in 8 or preferably more different terminal sessions:
$ while ./a.out client 192.168.1.211; do echo ok; done
human
# while no-error-on-any-client-terminal-session; do Go clean some room in your house; done

Client and server are on two different machines connected by a relatively slow Wi-Fi.

As long as you see that the server displays no socket call errors but at least one client displays something like read: Connection reset by peer, you've encountered the error. Maybe errors will occur only with more clients in your environment. After seeing a few failures in my setup with around twelve client loops, I went for a long walk and found after my return that seven were still happy (many successful transactions with zero errors).

Can SO_LINGER help any, at least on recent-Ubuntu talking to recent-Ubuntu? I'll try that, though I think the tried and true server logic (shutdown for write then wait for timeout or client to read) is the safest solution.

Massive fail

Early results were incorrect; I had forgotten to set the listen backlog to a high enough number to avoid TCP resetting connections. More results to be posted later...

The moral of the story: If you get ECONNRESET on a read-type call before having read any data, check for an issue with listen backlog first, in which case the server would have never processed the connection before it was reset.