Saturday, October 13, 2012

FreeBSD lsof process group duplication????


A few days ago I was looking over some new code for an Apache httpd module and I was afraid that the design would lead to daemons like the mod_cgid daemon inheriting the module's pipe and inadvertently keeping the write end open and thereby break part of what the module used the pipe for. That lead to a desire to summarize Apache httpd file descriptors by which processes had them open. This morning I set out to write a script for that but with too little sleep+caffeine I stared at a mostly-empty Emacs buffer long enough that I decided to set a timer for one hour to force the issue.

The timer went off and I was still wading through far too much output; the same process id was listed multiple times for a given descriptor, and I couldn't find the reason in the code. I wasted a bit of time messing with the code but finally went back to a normal lsof display in the shell and discovered what was going on: When using -g NNN to select via process group id, lsof is displaying the same process multiple times, as in this snippet from the repeated displays of all the fds for one of the httpd processes:

$  lsof -P -g 38239 -a -d ^txt,^rtd,^cwd,^mem,^DEL | grep '38243.*4u'
lsof: WARNING: compiled for FreeBSD release 9.0-RC2; this is 9.0-RELEASE.
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)
httpd   38243 38239 trawick    4u    IPv4 0xfffffe00271a57a0      0t0    TCP *:* (CLOSED)

I don't see the same behavior on Linux. I ended up adding filtering to deal with the repetition. My script is now able to display a more manageable summary of files by process:

$ apfds.py 38239
fd 0 type VCHR name /dev/null
  38239 38240 38241 38242 38243
fd 1 type VCHR name /dev/null
  38239 38240 38241 38242 38243
fd 2 type VREG name /usr/home/trawick/inst/24-64/logs/error_log
  38239 38240 38241 38242 38243
fd 3 type IPv6 dev 0xfffffe002706c000 name *:8080
  38239 38241 38242 38243
fd 4 type IPv4 dev 0xfffffe00271a57a0 name *:*
  38239 38240 38241 38242 38243
fd 5 type IPv6 dev 0xfffffe0027099b70 name *:10080
  38239 38241 38242 38243
fd 6 type IPv4 dev 0xfffffe00271a73d0 name *:*
  38239 38240 38241 38242 38243
fd 7 type PIPE dev 0xfffffe0002729000 name ->0xfffffe0002729158
  38239 38240 38241 38242 38243
fd 8 type PIPE dev 0xfffffe0002729158 name ->0xfffffe0002729000
  38239 38240 38241 38242 38243
fd 9 type VREG name /usr/home/trawick/inst/24-64/logs/access_log
  38239 38240 38241 38242 38243
fd 10 type VREG name /usr/home/trawick/inst/24-64/logs/rewrite-map.38239
  38239 38241 38242 38243
fd 3 type unix dev 0xfffffe00273ad2a8 name /home/trawick/inst/24-64/logs/cgisock.38239
  38240
fd 5 name 0xfffffe0027171960 file struct, ty=0, op=0xffffffff81079180
  38240
fd 11 type VREG name /usr/home/trawick/inst/24-64/logs/rewrite-map.38239
  38241 38242 38243
fd 12 type KQUEUE dev 0xfffffe00131d4000 name count=0, state=0x2
  38241
fd 12 type KQUEUE dev 0xfffffe0018b9c600 name count=0, state=0x2
  38242
fd 12 type KQUEUE dev 0xfffffe002775e300 name count=0, state=0x2
  38243

(In some cases it may not be correct to require the fds to match in order for two files to be the same; OTOH, the heuristics might be unmanageable, and it may help to see the separate listings for distinct fds anyway.)

Later

More FreeBSD fun: Given the Mac OS X issue, I reimplemented the lsof group selection to use -p pid1,pid2,pid3, with the list built internally via ps. But that doesn't work at all on FreeBSD. Instead, it lists files for only the last pid in the list and exits with status 1. So the latest version of the script uses -g pgid (along with code to filter out the over-reporting) on FreeBSD and -p pid1,pid2,pid3 elsewhere. (I dare not try it on Solaris today.)

Later still...

I built lsof for Solaris 10 and didn't see any glitches with -pLIST or -gPGID. Also, I was able to create a fix for the FreeBSD glitches and send it to the lsof author.

No comments: