13

I have a java process (Glassfish) which is leaking file descriptors. I know this because I get the helpful java.io.IOException: Too many open files exception. I can look in /proc/PID#/fd and see all the open file descriptors. When I use lsof I get a very large number of entries like this:

java 18510 root 8811u sock 0,4 1576079 can't identify protocol
java 18510 root 8812u sock 0,4 1576111 can't identify protocol
java 18510 root 8813u sock 0,4 1576150 can't identify protocol

I see 12 new ones created per minute. What options can I use on lsof or what other tools are available to me to help track down socket file descriptors where the protocol can't be identified?

7ochem
  • 282
  • 1
  • 4
  • 12
cclark
  • 567

3 Answers3

7

to see the top 20 file handle using processes:

for x in `ps -eF| awk '{ print $2 }'`;do echo `ls /proc/$x/fd 2> /dev/null | wc -l` $x `cat /proc/$x/cmdline 2> /dev/null`;done | sort -n -r | head -n 20

the output is in the format file handle count, pid, cmndline for process

example output

701 1216 /sbin/rsyslogd-n-c5
169 11835 postgres: spaceuser spaceschema [local] idle
164 13621 postgres: spaceuser spaceschema [local] idle
161 13622 postgres: spaceuser spaceschema [local] idle
161 13618 postgres: spaceuser spaceschema [local] idle
4

Become familiar with the strace command. It monitors system calls. I recently used it to track down file descriptor leaks that were causing our snmpd daemon to crash repeatedly. It takes some getting used to, but it's a powerful tool.

You can use strace to attach to a running process (don't forget the -f flag to follow child processes).

1

What exactly are you trying to track down? The remote IP address(es) associated with the leaked FDs, the defective code, or something else?

As you've already identified that there is a leak, contacting the engineers responsible for this java process seems like a reasonable next step.

An̲̳̳drew
  • 1,333