Friday, December 26, 2008

Difference between close_wait and time_wait

If we take snapshot of netstat (netstat -nP tcp) the common states we see would be   ESTABLISHED,  TIME_WAIT,  CLOSE_WAIT.
I will try explaining what they mean.

Before we go into TIME_WAIT and   CLOSE_WAIT, lets take close look at sequence of steps for socket closing.
Socket connection is essentially between two peers (Browser to webserver, a java client to webserver, webserver to DB server, a webserver to another webserver etc )

Say there is a socket connection established between webserver1 and webserver2. This would be the closing sequence, once the data transfer is done: (From TCP sequence diagram)

Here I am assuming webserver1 initiates the close of connection.
1)   Socket on webserver1 sends a TCP segment with FIN bit (in TCP header) and the socket goes into FIN_WAIT_1 state.

2)  Socket on webserver2 receives the FIN and responds back with ACK to acknowledge the FIN and the socket goes to CLOSE_WAIT state.
Now until the application calls the close() on this socket this is going to be in CLOSE_WAIT state.

3)  Socket on webserver1 receives the ACK and changes to FIN_WAIT_2

4)  Socket on webserver2 closes the connection(once the application calls close()) and sends back FIN to its peer to close the connection and changes its state to Last Ack

5)  Socket on webserver1 receives the FIN and sends back ACK.
At this point the socket implementation on webserver1 would start a timer (TIME_WAIT) to handle the scenario where last ACK has been lost and server resends FIN.
Now the socket would wait for 2* MSL (Maximum segment lifetime- default is 4mins for solaris & windows)

6)  Socket on webserver2 receives the ACK and it moves the connection to closed state

7)  After TIME_WAIT is elapsed socket/connection will be closed on webserver1.

These multiple levels of acknowledgments & retransmits are needed since TCP is a reliable protocol unlike basic UDP
Here is what the three states mean:

ESTABLISHED:
This is pretty explanatory which basically means the two ends are in a state where data transfer can occur or occurring in both directions. (tcp socket is full duplex, i.e data can be received and responded to on same channel)

CLOSE_WAIT:
This is a state where socket is waiting for the application to execute close()
CLOSE_WAIT is not something that can be configured where as TIME_WAIT can be set through tcp_time_wait_interval (The attribute tcp_close_wait_interval has nothing to do with close_wait state and this was renamed to tcp_time_wait_interval starting from Solaris 7)
A socket can be in CLOSE_WAIT state indefinitely until the application closes it.
Faulty scenarios would be like filedescriptor leak, server not being execute close() on socket leading to pile up of close_wait sockets. (At java level, this manifests as "Too many open files" error)

TIME_WAIT:
This is just a time based wait on socket before closing down the connection permanently.
Under most circumstances, sockets in TIME_WAIT is nothing to worry about.

References:
http://www.port80software.com/200ok/archive/2004/12/07/205.aspx

7 comments:

Aditya Tripathi said...

nicely explained.

Eranada Banadara said...

Thanks, perfectly addressed my issue

Anonymous said...

Very good information; presented in in a very easy to understand way.
Good work.
- RJoshi

Micha said...

in theory that is all correct, however there are also cases where sockets stay in CLOSE_WAIT state although the application called close() on the socket.
not sure if those cases are due to JDK bugs. It seems that the CLOSE_WAIT state is usually left when the application called close() AND the garbage collector freed the resources.
I have seen the same application behaving perfectly correct on one system but piling up CLOSE_WAITs on another

Ankur Agarwal said...

Hi Micha,
Yes, there could be a case where sockets stay in CLOSE_WAIT state though an application called close() on the socket. As explained in TCP sequence flow, Socket on webserver2 closes the connection(once the application calls close()). So there could be scenario where application called close() but web server not being execute close() on socket or filedescriptor being leaked.
Please correct if my understanding is wrong.

-Ankur Agarwal

Vinay D said...

very helpful... thanks a lot for the information

Anonymous said...

Hi,

How do we resolve this issue?
How can I resolve the close_wait issue and prevent this from happening again?

Kind regards
Maphefo