Thursday, October 18, 2012

Chasing after fd 5


On FreeBSD 9, the mod_cgid daemon appears to have an odd file open, and in my current configuration that is file descriptor 5. procstat displays it as

26889 httpd               5 ? - ---------   2       0 -

The question mark is the result of translating unknown through multiple namespaces, and I stopped chasing it through the code backwards when I saw libprocstat copying foo_UNKNOWN to bar_UNKNOWN.

The file descriptor isn't really in use, since a CGI request will result in accept() returning 5 in that process.

Using DTrace to print the syscall name and pid for every syscall issued by httpd with either first arg 5 or return value 5 shows that it is a listening socket created in the initial httpd process and closed by the mod_cgid daemon, presumably via a call to ap_close_listeners().

There may be an interesting story/bug behind this, but I don't think it is in httpd-land, and time is flying.

#!/usr/sbin/dtrace -s

syscall:::entry
/execname == "httpd" && arg0 == 5/
{
  printf("%s %d %d\n", probefunc, arg0, pid);
}

syscall:::return
/execname == "httpd" && arg0 == 5/
{
  printf("%s %d %d\n", probefunc, arg0, pid);
}

(No, ustack() isn't working for me, but I didn't make world with the suggested flags.)

Saturday, October 13, 2012

lsof process group selection troubles on OS X too?


lsof exits with status 1 and no output on Snow Leopard when selecting either a Dropbox process group or an httpd process group. (I should get a download code for Lion in the next few days. I wonder if that works better.)

Anyway, the script mentioned previously which shows file descriptors by process group is available here as pgfiles.py.

Later

The problem symptom didn't change after installing Lion (lsof 4.84 built by Apple a couple of months ago), but the workaround of supplying lsof with a list of the pids in the group of interest appears to work fine.

The fix to lsof was trivial. I sent it to the author, so hopefully it will be in the next release.

FreeBSD lsof process group duplication????


A few days ago I was looking over some new code for an Apache httpd module and I was afraid that the design would lead to daemons like the mod_cgid daemon inheriting the module's pipe and inadvertently keeping the write end open and thereby break part of what the module used the pipe for. That lead to a desire to summarize Apache httpd file descriptors by which processes had them open. This morning I set out to write a script for that but with too little sleep+caffeine I stared at a mostly-empty Emacs buffer long enough that I decided to set a timer for one hour to force the issue.

The timer went off and I was still wading through far too much output; the same process id was listed multiple times for a given descriptor, and I couldn't find the reason in the code. I wasted a bit of time messing with the code but finally went back to a normal lsof display in the shell and discovered what was going on: When using -g NNN to select via process group id, lsof is displaying the same process multiple times, as in this snippet from the repeated displays of all the fds for one of the httpd processes:

$  lsof -P -g 38239 -a -d ^txt,^rtd,^cwd,^mem,^DEL | grep '38243.*4u'
lsof: WARNING: compiled for FreeBSD release 9.0-RC2; this is 9.0-RELEASE.
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)

I don't see the same behavior on Linux. I ended up adding filtering to deal with the repetition. My script is now able to display a more manageable summary of files by process:

$ apfds.py 38239
fd 0 type VCHR name /dev/null
  38239 38240 38241 38242 38243
fd 1 type VCHR name /dev/null
  38239 38240 38241 38242 38243
fd 2 type VREG name /usr/home/trawick/inst/24-64/logs/error_log
  38239 38240 38241 38242 38243
fd 3 type IPv6 dev 0xfffffe002706c000 name *:8080
  38239 38241 38242 38243
fd 4 type IPv4 dev 0xfffffe00271a57a0 name *:*
  38239 38240 38241 38242 38243
fd 5 type IPv6 dev 0xfffffe0027099b70 name *:10080
  38239 38241 38242 38243
fd 6 type IPv4 dev 0xfffffe00271a73d0 name *:*
  38239 38240 38241 38242 38243
fd 7 type PIPE dev 0xfffffe0002729000 name ->0xfffffe0002729158
  38239 38240 38241 38242 38243
fd 8 type PIPE dev 0xfffffe0002729158 name ->0xfffffe0002729000
  38239 38240 38241 38242 38243
fd 9 type VREG name /usr/home/trawick/inst/24-64/logs/access_log
  38239 38240 38241 38242 38243
fd 10 type VREG name /usr/home/trawick/inst/24-64/logs/rewrite-map.38239
  38239 38241 38242 38243
fd 3 type unix dev 0xfffffe00273ad2a8 name /home/trawick/inst/24-64/logs/cgisock.38239
  38240
fd 5 name 0xfffffe0027171960 file struct, ty=0, op=0xffffffff81079180
  38240
fd 11 type VREG name /usr/home/trawick/inst/24-64/logs/rewrite-map.38239
  38241 38242 38243
fd 12 type KQUEUE dev 0xfffffe00131d4000 name count=0, state=0x2
  38241
fd 12 type KQUEUE dev 0xfffffe0018b9c600 name count=0, state=0x2
  38242
fd 12 type KQUEUE dev 0xfffffe002775e300 name count=0, state=0x2
  38243

(In some cases it may not be correct to require the fds to match in order for two files to be the same; OTOH, the heuristics might be unmanageable, and it may help to see the separate listings for distinct fds anyway.)

Later

More FreeBSD fun: Given the Mac OS X issue, I reimplemented the lsof group selection to use -p pid1,pid2,pid3, with the list built internally via ps. But that doesn't work at all on FreeBSD. Instead, it lists files for only the last pid in the list and exits with status 1. So the latest version of the script uses -g pgid (along with code to filter out the over-reporting) on FreeBSD and -p pid1,pid2,pid3 elsewhere. (I dare not try it on Solaris today.)

Later still...

I built lsof for Solaris 10 and didn't see any glitches with -pLIST or -gPGID. Also, I was able to create a fix for the FreeBSD glitches and send it to the lsof author.

Friday, October 12, 2012

Lion


As noted elsewhere, OS X Lion can still be purchased via a call to 1-800-MY-APPLE. $19.99 + tax.

As far as I know, this article is still correct on the likely but undocumented policy of security fixes. Another reason I want to upgrade is to be able to test software on a later OS X level.

Later

Throw in another $50.00 for the latest VMware Fusion (I had 2.0.6 previously, which doesn't support Lion) and a lot of wasted time fiddling with Xcode and Mac Ports to restore a working build environment (the upgrade wiped out symlinks which were part of the old Xcode (as well as the guts of Mercurial), new Xcode doesn't provide autoconf, ...). I guess everybody else went through this a year ago.

Monday, October 8, 2012

Recruiter spam


How can I get a phone call today about an Android development position as well as an e-mail about Hadoop work? I have no experience in either of these.

I can imagine some sort of procedure that takes a list of names and looks for interesting data for those search terms. I can sort of see a connection with Hadoop, as google("trawick hadoop") shows a few interesting hits:

  • From a conference schedule, where a Hadoop talk is listed after mine:
    Jeff Trawick. Big Data. Hadoop
    
  • From an apache.org server-status page:
    .. GET /dist/hadoop/common/hadoop-0.20.2/hadoop-0.20.2/src/docs/cn ..... 119.63.88.205, www.apache.org, GET /~trawick/apache-2-on-os390.html HTTP/1.0 ...
    
  • Does this make it appear that I'm a Hadoop developer?
    hive/trunk/common/src/java/org/apache/hadoop - SVNSearch
    svnsearch.org/svnsearch/repos/.../search?rev.../hadoop/...Share1370910 08.08.2012 21:37:27, by trawick. grab r1370907 from trunk: ... M /hadoop/common/branches/branch-2/hadoop-hdfs-project/had
    

and a bunch of less interesting pages totaling 19,500.

I didn't see anything interesting at the top of google("trawick android").

google("trawick resume keyword") and google("trawick experience keyword") yield only a few hits.

I guess the contacts were purely to rifle through *my* contacts. I know I shouldn't take it so seriously, but I respect the problem they're trying to solve. I have lost hope that this simple message would help avoid wasting the time of busy people, not just myself.