Monday, April 28, 2014

sysdig really is strace plus lsof

What does that mean, anyway? This morning it meant that I could see helpful internal file table information in the context of syscalls, so I could see more than just the file descriptors shown by strace.

The problem

A simple WSGI application which is managed by uWSGI is sitting behind either httpd or nginx, and I've been trying out different protocols, socket types, and both httpd and nginx with it. With httpd and FastCGI over TCP connecting to uWSGI, I get several I/O failures out of a thousand. With httpd and FastCGI over AF_UNIX I get perhaps 40 or 50. I don't see any errors with nginx, and I don't see any errors with HTTP or SCGI. What is going on?

The AF_UNIX scenario has the most frequent I/O errors, so I started debugging that. mod_proxy_fcgi sees EPIPE intermittently when writing 8 bytes to the application (after sending the FastCGI begin and parameters data).

I configured httpd (Event MPM) to use a single child process to serve requests, and configured uWSGI to do the same, to make data analysis easier.

I figured out a relatively short (i.e., limited data collection) ab invocation to trigger the failure (50 requests, concurrency 5 IIRC), and tried it several times until it failed. Each time I ran sudo sysdig -w trace.scap to capture data during the run, hoping for a failure. (The capture was started anew for every run to minimize the data collected.) After a short time, I hit 1 EPIPE scenario and had a matching capture. Now what?

22000 was the httpd child pid and 20881 was the uWSGI pid. Here are a few sysdig invocations used to start getting acquainted with the data. (No, I don't necessarily recommend egrep over sysdig expressions :) ).

$ sudo sysdig -r trace.scap " or" |less
$ sudo sysdig -r trace.scap | egrep '(connect|writev|close|read|PIPE)' |less
$ sudo sysdig -r trace.scap "( or and fd.type=unix" |less
(and more)

The view with both processes and just the operations on AF_UNIX sockets was the most helpful. Searching for EPIPE yields this information:

89254 07:12:54.259984241 6 httpd (22014) > writev fd=26(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=8 
89285 07:12:54.260023713 6 httpd (22014) < writev res=-32(EPIPE) data=........

The long hex addresses (ffff880108bcc440 and ffff880108bcba80) are the key to seeing what happens inside both processes, since the file descriptors won't match between processes.

$ sysdig -r trace.scap "( or and fd.type=unix"  | egrep '(ffff880108bcc440|EPIPE)'
87796 07:12:54.244173691 6 httpd (22014) < connect res=0 tuple=ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s 
87797 07:12:54.244176698 6 httpd (22014) > fcntl fd=26(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) cmd=4(F_GETFL) 
87800 07:12:54.244178198 6 httpd (22014) > fcntl fd=26(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) cmd=5(F_SETFL) 
87802 07:12:54.244180563 6 httpd (22014) > writev fd=26(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=16 
88619 07:12:54.259281618 7 uwsgi (20882) < accept fd=10(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) tuple=ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s queuepct=0 
88627 07:12:54.259287565 7 uwsgi (20882) > read fd=10(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=4096 
89021 07:12:54.259721201 6 httpd (22014) > writev fd=26(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=699 
89040 07:12:54.259743588 6 uwsgi (20882) > read fd=10(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=4096 
89083 07:12:54.259786716 6 httpd (22014) > writev fd=26(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=8 
89097 07:12:54.259802288 6 uwsgi (20882) > read fd=10(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=4096 
89148 07:12:54.259862936 6 uwsgi (20882) > write fd=10(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=8 
89158 07:12:54.259872030 6 uwsgi (20882) > write fd=10(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=43 
89161 07:12:54.259874552 6 uwsgi (20882) > write fd=10(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=8 
89165 07:12:54.259877021 6 uwsgi (20882) > write fd=10(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=11 
89184 07:12:54.259892476 6 uwsgi (20882) > write fd=10(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=24 
89188 07:12:54.259895312 6 uwsgi (20882) > close fd=10(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) 
89254 07:12:54.259984241 6 httpd (22014) > writev fd=26(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s) size=8 
89285 07:12:54.260023713 6 httpd (22014) < writev res=-32(EPIPE) data=........ 
89314 07:12:54.260054390 6 httpd (22014) > close fd=26(<u>ffff880108bcc440->ffff880108bcba80 /tmp/helloFastCGI.s)

The buffer at ffff880108bcc440 was also used for other socket connections, though not concurrently. I omitted the data for connections that didn't have the EPIPE error.

So httpd writes 16 then 699 then 8 then dies trying to write 8 more. uWSGI reads several times with a 4096-byte buffer, but we have to dig into the trace to find out how much it really read (following the "< read" entry). uWSGI then writes 8 then 43 then 8 then 11 then 24 bytes, then closes before httpd tried to write the last 8.

The missing data showing how much uWSGI actually received when asking for 4K is

88633 07:12:54.259291681 7 uwsgi (20882) < read res=16 data=................
89045 07:12:54.259748834 6 uwsgi (20882) < read res=699 data=..........HTTP_HOST127.0.0.1..HTTP_USER_AGENTApacheBench/2.3..HTTP_ACCEPT*/*.<PA
89104 07:12:54.259806481 6 uwsgi (20882) < read res=8 data=........

So uWSGI read the 16, the 699, and the 8 but never tried to read again before closing.

Looking at the mod_proxy_fcgi source code I see that the final 8, the one that occasionally fails with EPIPE, is for telling the application that there are 0 bytes of FastCGI STDIN (i.e., no request body). So it seems that if mod_proxy_fcgi can get those 8 bytes in the socket buffer before uWSGI closes the socket it won't get a write, and usually it is able to. Next I added a sleep for 15 milliseconds just before that last write. With that it fails every time, since those 8 bytes are never in the socket buffer before uWSGI closes.

So why does it work reliably with nginx? It has all the data (begin request plus params/envvars + end of stdin) in a single buffer, so it doesn't get caught. uWSGI doesn't read it anyway, but there's no EPIPE error.

I looked a bit further at uWSGI, in particular its FastCGI protocol handler. Some higher-level code calls into the protocol handler to perform different work; one call is to read the request body. I found by setting breakpoints in gdb that the function to read the request body is never called. For some protocols that makes sense, since you can tell from the request header fields for this request that there's no body. So perhaps the FastCGI protocol handler needs to be smarter and slurp up the last 8 byte record earlier in this scenario. More investigation is required, presumably with gdb, to understand the flows between the generic code and the FastCGI protocol handler. (It seems very simple, but initial testing with a POST request didn't show what I expected.)

While playing with this I created a workaround in mod_proxy_fcgi:

             rv = send_data(conn, vec, nvec, &len, 0);
             if (rv != APR_SUCCESS) {
-                break;
+                if (last_stdin && writebuflen == 0) {
+                    ap_log_rerror(APLOG_MARK, APLOG_ERR, rv, r, "ignoring error sending empty stdin");
+                }
+                else {
+                    break;
+                }

It isn't harmful, but it still looks like uWSGI needs a fix.

Sometimes you win, sometimes you lose. Recently I was trying to find whether a file was opened before or after a change in identity, but AFAICT sysdig doesn't have a filter for the syscall, and so it continues to report (if you ask for it in a -p format) the original uid/gid.

Tuesday, April 22, 2014

Busy day...

Over the last couple of days I committed my implementation of Certificate Transparency features for httpd (to trunk). The doc is finally in reasonable shape (here). With any luck, someone will actually try it out and provide feedback before I forget everything.

github/trawick/ct-httpd still has a few useful nuggets. I'll put up instructions there soon for using the CT features with httpd 2.4.

I've been thinking about how to pervert the talks I gave at ApacheCon US 2014 for Budapest in the fall (assuming that I can save up enough quarters for aviation fuel). As part of that I queried the local Python user group for information about how they used Apache httpd to front end Python web apps. Apparently Apache httpd isn't very popular, except perhaps to keep ancient mod_rewrite-heavy recipes running. nginx has some very accessible documentation in this space, both in the official documentation as well as within the documentation for various Python web-related projects.

I started writing some notes today, possibly as my attempt at documentation, possibly as a rough draft for a future talk. In it I plan to show httpd and nginx recipes side by side, without preferring one over the other. A lot of work remains.

Monday, April 21, 2014

mod_qos and QS_EventKBytesPerSecLimit

Previously I noted in the mod_qos discussion group some odd testing results for this feature. I didn't receive any advice.

As I noted in a recent follow-up (still stuck in moderation???), I've worked up an alternate implementation for limiting output bandwidth for requests matching a certain event. That can be found at github/trawick/tweak-qos. I have of course solicited advice from the author and offered my changes under the same license as the original.

Saturday, April 12, 2014

Where did that .pod end up? (sysdig)

So I RTFM yesterday and, not having any OpenSSL muscle memory, spelled it STACKOF in my program just like in the man page. Fail.

The .pod file was easy enough to fix, but does it format properly? What happens to it and where does it go?

$ sudo sysdig evt.type=open and contains cert_chain
(wait for command in other terminal)
2901943 13:35:42.114428066 7 sh (24543) < open fd=3(/home/trawick/git/work-on-openssl-master/doc/ssl/SSL_get_peer_cert_chain.pod) name=doc/ssl/SSL_get_peer_cert_chain.pod(/home/trawick/git/work-on-openssl-master/doc/ssl/SSL_get_peer_cert_chain.pod) flags=1(O_RDONLY) mode=0 
2902384 13:35:42.117743930 7 sh (24544) < open fd=3(/home/trawick/inst/omaster/ssl/man/man3/SSL_get_peer_cert_chain.3) name=/home/trawick/inst/omaster/ssl/man/man3/SSL_get_peer_cert_chain.3 flags=262(O_TRUNC|O_CREAT|O_WRONLY) mode=0 
2906809 13:35:42.172268013 5 perl (24547) < open fd=4(/home/trawick/git/work-on-openssl-master/doc/ssl/SSL_get_peer_cert_chain.pod) name=SSL_get_peer_cert_chain.pod(/home/trawick/git/work-on-openssl-master/doc/ssl/SSL_get_peer_cert_chain.pod) flags=1(O_RDONLY) mode=0 
2907040 13:35:42.176303490 7 sh (24548) < open fd=3(/home/trawick/git/work-on-openssl-master/doc/ssl/SSL_get_peer_cert_chain.pod) name=doc/ssl/SSL_get_peer_cert_chain.pod(/home/trawick/git/work-on-openssl-master/doc/ssl/SSL_get_peer_cert_chain.pod) flags=1(O_RDONLY) mode=0
$ make install_docs

That's more satisfying than strace since I don't have to run make install_docs under strace and decide how to separate their outputs. Now to use sysdig as often as possible for a while so I can get used to it...

Friday, April 11, 2014

My httpd-related slides from ApacheCon NA 2014, earlier this week

Apache HTTP Server and CVE-2014-0160, the so called Heartbleed Bug

SSL/TLS-enabled configurations of Apache HTTP Server with OpenSSL 1.0.1 through 1.0.1f are vulnerable to CVE-2014-0160, the so called Heartbleed Bug. (OpenSSL versions smaller than 1.0.1, such as 0.9.8 and 1.0.0, are safe.)

No Apache HTTP Server fix is needed to resolve this; no Apache HTTP Server configuration change besides disabling SSL/TLS completely can resolve this. (SSLv2 and SSLv3 are not vulnerable to CVE-2014-0160, but limiting the configuration to one or both of those protocols is not recommended because of other security concerns.) Instead, a patch to OpenSSL, a rebuild of OpenSSL with the TLS Heartbeat extension disabled, or an upgrade of OpenSSL to 1.0.1g or later is required.

In some cases OpenSSL may be statically linked with mod_ssl. httpd and mod_ssl must be rebuilt with the updated OpenSSL when OpenSSL is statically linked with mod_ssl. Note: The build of mod_spdy may rebuild mod_ssl in this manner.

If you build OpenSSL yourself, refer to the OpenSSL project for further information, including the advisory at

Checking for mod_ssl statically linked with OpenSSL


$ nm /path/to/httpd/modules/ | grep X509_STORE_CTX_free
                 U X509_STORE_CTX_free...

The U in the output means that the relevant OpenSSL library is not statically linked with mod_ssl.

Other platforms

Similar checks for symbols work elsewhere. ldd is another way on many Unix-like platforms to determine where mod_ssl finds OpenSSL. depends can be used on Windows to see if relies on OpenSSL.

When to consult with vendors or other third parties

Contact a third-party supplier in the following situations to determine the applicability of CVE-2014-0160 to your server:

  • You obtain OpenSSL in binary form with or without Apache HTTP Server
  • You are using a commercial product based on Apache HTTP Server
  • You are otherwise using mod_ssl or a replacement for it from a third party
  • Build or installation of a third-party feature (e.g., mod_spdy) rebuilds mod_ssl


Correct the affected versions: 1.0.1 (without the a) is the earliest affected version. Mention explicitly that older versions are not affected. Suggested by Rainer Jung