Friday, December 26, 2008
Difference between close_wait and time_wait
I will try explaining what they mean.
Before we go into TIME_WAIT and   CLOSE_WAIT, lets take close look at sequence of steps for socket closing.
Socket connection is essentially between two peers (Browser to webserver, a java client to webserver, webserver to DB server, a webserver to another webserver etc )
Say there is a socket connection established between webserver1 and webserver2. This would be the closing sequence, once the data transfer is done: (From TCP sequence diagram)
Here I am assuming webserver1 initiates the close of connection.
1)   Socket on webserver1 sends a TCP segment with FIN bit (in TCP header) and the socket goes into FIN_WAIT_1 state.
2)  Socket on webserver2 receives the FIN and responds back with ACK to acknowledge the FIN and the socket goes to CLOSE_WAIT state.
Now until the application calls the close() on this socket this is going to be in CLOSE_WAIT state.
3)  Socket on webserver1 receives the ACK and changes to FIN_WAIT_2
4)  Socket on webserver2 closes the connection(once the application calls close()) and sends back FIN to its peer to close the connection and changes its state to Last Ack
5)  Socket on webserver1 receives the FIN and sends back ACK.
At this point the socket implementation on webserver1 would start a timer (TIME_WAIT) to handle the scenario where last ACK has been lost and server resends FIN.
Now the socket would wait for 2* MSL (Maximum segment lifetime- default is 4mins for solaris & windows)
6)  Socket on webserver2 receives the ACK and it moves the connection to closed state
7)  After TIME_WAIT is elapsed socket/connection will be closed on webserver1.
These multiple levels of acknowledgments & retransmits are needed since TCP is a reliable protocol unlike basic UDP
Here is what the three states mean:
ESTABLISHED:
This is pretty explanatory which basically means the two ends are in a state where data transfer can occur or occurring in both directions. (tcp socket is full duplex, i.e data can be received and responded to on same channel)
CLOSE_WAIT:
This is a state where socket is waiting for the application to execute close()
CLOSE_WAIT is not something that can be configured where as TIME_WAIT can be set through tcp_time_wait_interval (The attribute tcp_close_wait_interval has nothing to do with close_wait state and this was renamed to tcp_time_wait_interval starting from Solaris 7)
A socket can be in CLOSE_WAIT state indefinitely until the application closes it.
Faulty scenarios would be like filedescriptor leak, server not being execute close() on socket leading to pile up of close_wait sockets. (At java level, this manifests as "Too many open files" error)
TIME_WAIT:
This is just a time based wait on socket before closing down the connection permanently.
Under most circumstances, sockets in TIME_WAIT is nothing to worry about.
References:
http://www.port80software.com/200ok/archive/2004/12/07/205.aspx
Monday, December 22, 2008
Managing keystores, keys and certificates
In a typical infrastructure, we deal with different entities like loadbalancers, webservers like Apache, java web/app servers like Tomcat, weblogic,Oracle app server.
Each of these support limited set of formats/extensions.
For eg:
Weblogic & Tomcat would only understand java keystores (jks), Oracle understands only PKCS12 format through its Wallet Manager (repository for certs analogous to java keystore).
We have command line utilities like java keytool to deal with java keystores, openSSL (commonly used with Apache servers).
We also have GUI tools to deal with cert repositories.
KeyTool IUI:
This is GUI for java keytool utility. The requirement is that you would need JDK1.6.0 and above.
This supports most of the prominent keystore formats like JKS (.jks,.ks), JCEKS (.jce), PKCS12(.p12,.pfx) and most certificate formats like DER(.crt,.cer) ,PKCS7(.p7b) ,PEM(.pem) ,PKCS10(.p10)
This tool pretty much does most of format conversions.
For eg:
We can convert keypair in PKCS12 to JKS and viceversa, sign a file with private key, verify a signed file etc
Portecle
Portecle is another user friendly GUI application for creating, managing and examining keystores, keys, certificates, certificate requests, certificate revocation lists and more.
Its pretty capable of performing similar set of tasks though somewhat lesser functionality to what KeyTool IUI provides.
Reference http://www.startux.de/index.php/java/44-dealing-with-java-keystores
Friday, December 19, 2008
Wednesday, December 17, 2008
Friday, December 12, 2008
Mapping java thread to pstack and pmap
Starting from solaris9, we have a one-one threading model.
Solaris library libthread.so schedules the solaris threads to run on LWPs.
LWPs are mapped 1-1 with kernel threads.
The kernel threads are the ones that are scheduled on processor's dispatch queue.
With no debugger handy, if we need comprehensive information related to a thread or set of threads, we would need:
1) java thread dumps(kill -3 pid)
2) pstack (pstack pid)
3) pmap (pmap
Here is an example on how we can related all the three outputs:
1) Java thread dump:
This is single thread stack trace taken from java thread dump:
The below thread has a java level thread identifier "0x00651988" that is passed as argument for the native thread call.
It also has native thread id as nid=0x2c6 that corresponds to a LWP.
"Thread-646" daemon prio=10 tid=0x00651988 nid=0x2c6 runnable [0x629ff000..0x629ff8f0]
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.net.SocketInputStream.read(SocketInputStream.java:182)
at com.novell.ldap.asn1.ASN1Identifier.
2) Pstack Output
The corresponding native thread for the above java thread is shown below:
(obtained from executing #pstack pid) nid=0x2c6, the corresponding decimal representation for 0x2c6 is 710, so we got to look for thread# 710 from pstack
----------------- lwp# 710 / thread# 710 --------------------
ff31f4a8 read (32, 629fec98, 1)
fead346c JVM_Read (32, 629fec98, 1, 8460, 651988, ff0148f8) + b8
6f89c1a4 Java_java_net_SocketInputStream_socketRead0 (651a44, 629ff500, 32, 629ff4f8, 0, 1) + 1fc
f960fe6c ???????? (d1265768, d12653b0, d1301f58, 0, 1, 0)
fa6946e4 ???????? (ffffffff, d1265308, 1, 1, 651988, ff150000)
f904e644 ???????? (d1265768, b6, 0, 8388, 80000000, 1)
f90058b8 ???????? (d1265768, b6, 0, f9015d28, 74f6f258, 629ff640)
f90058b8 ???????? (d1301f40, b7, 0, f9015ce0, d1379800, 629ff6c8)
f9005764 ???????? (d1265830, 7082c430, 0, f9015f40, 1, 629ff790)
f9005c2c ???????? (0, ff013928, 0, f9016230, 7ce8, 629ff808)
f9000218 ???????? (629ff8f0, 629ffaf0, a, 7082d668, f900a840, 629ffa08)
…………..
feadb2e4 __1cKJavaThreadDrun6M_v_ (651988, ff01882c, 6f7c, 0, 6c00, 8c00) + 2b0
fee71a00 __1cG_start6Fpv_0_ (651988, 732b, fefc0000, 0, 4ff4, 4c00) + 208
ff395370 _lwp_start (0, 0, 0, 0, 0, 0)
A pstack is native representation of the java thread stack trace. There are 6 arguments for each function that are loaded into the registers. This is Solaris standard register allocation model (there are 6 registers for parameter passing) In the above example we see 6 arguments as:
fee71a00 __1cG_start6Fpv_0_ (651988, 732b, fefc0000, 0, 4ff4, 4c00) + 208
fee71a00 is the program counter. (Virtual address space where the library is loaded)
651988 is the first arguments thats pushed to register0 and note this is the tid from java thread dump (tid=0x00651988)
3) pmap output (pmap pid)
We can trace the library loaded based on the native stack with help of pmap.
Refering the same example:
ff395370 _lwp_start (0, 0, 0, 0, 0, 0)
Check the PC value "ff395370" and see where it fits in pmap:
pmap output indicates as below: (We know ff395370 fits in this range)
FF380000 96K r-x-- /usr/lib/libthread.so.1
FF3A8000 8K rwx-- /usr/lib/libthread.so.1
This tells _lwp_start function is from libthread.so library
Similarly:
6f89c1a4 Java_java_net_SocketInputStream_socketRead0 (651a44, 629ff500, 32, 629ff4f8, 0, 1) + 1fc
Based on pmap output, "6f89c1a4" fits in:
6F890000 72K r-x-- /WebLogic/utils/bea922/jdk150_10/jre/lib/sparc/libnet.so
6F8B0000 8K rwx-- /WebLogic/utils/bea922/jdk150_10/jre/lib/sparc/libnet.so
This tells the Java_java_net_SocketInputStream_socketRead function was from libnet.so
Finally,
fead346c JVM_Read (32, 629fec98, 1, 8460, 651988, ff0148f8) + b8
Based on pmapt output "fead346c" fits in:
FE800000 7936K r-x-- /WebLogic/utils/bea922/jdk150_10/jre/lib/sparc/server/libjvm.so
FEFC0000 400K rwx-- /WebLogic/utils/bea922/jdk150_10/jre/lib/sparc/server/libjvm.so
This tells JVM_Read function is from libjvm.so
Sunday, December 7, 2008
How SSL works under covers
SSL essentially uses hybrid crypto system (combination of symmetric & asymmetric algorithms)
i) Exchange of "symmetric key/s" is done through asymmetric algorithms (client encrypts the "symmetric key" with public key and server decrypts with its private key)
ii) Exchange of data is done with by symmetric key algorithm.
The sequence of events in one-way SSL communication are:
1) Client sends a SSL request
2) Socket level TCP handshake occurs.
3) Client sends ClientHello message. This includes information from client on its supported SSLversion, Ciphers,message authentication function
This is message from SSL debug enabled weblogic server accepting SSL browser request:
Nov 25, 2008 10:07:15 AM CST Debug SecuritySSL AdminServer
[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' HANDSHAKEMESSAGE: ClientHello
4) Server sends back ClientHello message and this includes:
The SSL or TLS version that will be used for the SSL session, cipher that will be used for the SSL session, session ID for the SSL session, server digital certificate (or chain of certificates)
At this point client and server agree on the ssl version,ciphers to be used, message authentication function to be used.
These are messages from SSL debug enabled weblogic server accepting/responding SSL browser request:
Nov 25, 2008 10:07:15 AM CST Debug SecuritySSL AdminServer [ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' 14297843 SSL3/TLS MAC
Nov 25, 2008 10:07:15 AM CST Debug SecuritySSL AdminServer [ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' Using JCE Cipher: SunJCE version 1.6 for algorithm RC4
Nov 25, 2008 10:07:15 AM CST Debug SecuritySSL AdminServer [ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' Will use default Mac for algorithm SSL3MAC_MD5
5) Client validates the server certificate by checking the certificate authority and other details like
domain name and expiration date. All browsers come preloaded with the certificates of common CAs. For Eg, in Internet Explorer, these can be viewed from: Tools-> Internet Options --> Content Tab-->Certificates -->Trusted Root CAs
After the validation client generates the MASTER_KEY and encrypts it with server's
public key(available from certificate) and sends back to server. Server decrypts the MASTER_KEY with its private key.
6) Client as well as the server use the MASTER_KEY to generate the session keys, which are symmetric keys used to encrypt and decrypt information exchanged during the SSL session.
7) From this point both client and server encrypt data with their own session keys.
Monday, December 1, 2008
What to look for in java thread dumps
More specifically it’s a snapshot of all the java threads at that point in time.
A typical thread dump comprises of JVM spawned and application spawned threads. Some of the JVM spawned threads are (Sun JVM1.5.x)
Low Memory Detector (Daemon thread to detect and report low memory conditions)
CompilerThreadn (Daemon thread/s to compile & optimize byte code)
Signal Dispatcher (Daemon thread to respond OS signals)
Finalizer (Daemon thread to handle objects in finalizer queue)
Reference Handler (Daemon thread to handle objects in reference queue)
VM Thread (Main thread from JVM)
GC task thread#n (Thread/s to perform garbage collection)
Application threads are spawned from application (can be a combination of application server/web server and java application deployed).
A typical thread dump snapshot would have most of the threads in one of the following states:
R - Runnable ; MW - monitor wait; CW - conditional wait. (Terminology might differ depending on JVM implementation)
CW - conditional wait:
A thread in conditional wait would mean its waiting on a certain condition to be fullfilled. Typical Scenarios would include:
i) An idle thread waiting for work to be assigned.
ii) Thread waiting to get an EJB instance from pool
iii) Thread waiting to get a connection from connection pool.
Eg1:
"ExecuteThread: '16' for queue: 'weblogic.kernel.Default'" daemon prio=5 tid=0x010dc720 nid=0x29 in Object.wait() [a1a7f000..a1a7fc30]
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:429)
at weblogic.kernel.ExecuteThread.waitForRequest(ExecuteThread.java:154) - locked <0xb1593fa0> (a weblogic.kernel.ExecuteThread)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:174)
The above thread is an idle thread in weblogic server and is in conditional wait within ExecuteThread.waitForRequest() i.e waiting on a condition to fullfill before it starts executing. Condition being, socket reader thread would accept the request and delegate the call to notify() on this waiting thread which then resumes the request execution.
Eg2:
http-8080-Processor25" daemon prio=1 tid=0x09298688 nid=0x2781 in Object.wait() [0xa82f8000..0xa82f8fc0]
at java.lang.Object.wait(Native Method)- waiting on <0xab436a68> (a org.apache.commons.pool.impl.GenericObjectPool)
at java.lang.Object.wait(Object.java:474)
at org.apache.commons.pool.impl.GenericObjectPool.bor rowObject(GenericObjectPool.java:810)- locked <0xab436a68> (a org.apache.commons.pool.impl.GenericObjectPool)
at org.apache.commons.dbcp.PoolingDataSource.getConne ction(PoolingDataSource.java:95)
at org.apache.commons.dbcp.BasicDataSource.getConnect ion(BasicDataSource.java:540)
at org.springframework.orm.hibernate3.LocalDataSource ConnectionProvider.getConnection(LocalDataSourceCo nnectionProvider.java:81)
at org.hibernate.jdbc.ConnectionManager.openConnectio n(ConnectionManager.java:417)
at org.hibernate.jdbc.ConnectionManager.getConnection (ConnectionManager.java:144
This is from tomcat/hibernate framework.
The thread is waiting on pool object, but looks like all the pool objects are in use. When one of the executing threads is done using the poolobject, it issues a notify() on this thread so that it can resume and pick the poolobject (JDBC connection) Any time a thread goes to wait(), it releases the current lock.
MW - Waiting for Monitor entry
MW is waiting for monitor entry i.e the thread is blocked and waiting to enter a synchronized block/method to get a java monitor. From the thread dump we can determine the monitor that is blocking the thread. If too many threads are in MW state across multiple snapshots, we know there is a contention for the lock.
Eg:
TP-Processor633" daemon prio=5 tid=0x004921d0 nid=0x5ba waiting for monitor entry [872fe000..872ffc28]
at oracle.jdbc.pool.OracleConnectionCacheImpl.closeSingleConnection(OracleConnectionCacheImpl.java:677) - waiting to lock <0xbf4e98f8> (a oracle.jdbc.pool.OracleConnectionCacheImpl) at oracle.jdbc.pool.OracleConnectionCacheImpl.reusePo oledConnection(OracleConnectionCacheImpl.java:592)
at oracle.jdbc.pool.OracleConnectionEventListener.connectionClosed(OracleConnectionEventListener.java:130) - locked <0xbf4e9e60> (a oracle.jdbc.pool.OracleConnectionEventListener)
at oracle.jdbc.pool.OraclePooledConnection.callListener(OraclePooledConnection.java:482)
at oracle.jdbc.pool.OraclePooledConnection.logicalClose(OraclePooledConnection.java:445) - locked <0xcc9af820> (a oracle.jdbc.pool.OraclePooledConnection)
at oracle.jdbc.driver.OracleConnection.logicalClose(OracleConnection.java:2936) - locked <0xcc9af8c8> (a oracle.jdbc.driver.OracleConnection)
at oracle.jdbc.driver.OracleConnection.close(OracleConnection.java:1454) - locked <0xcc9af8c8> (a oracle.jdbc.driver.OracleConnection)
..........
In the example above, the thread is waiting to lock "0xbf4e98f8" (object ID for "oracle.jdbc.pool.OracleConnectionCacheImpl") Note this thread is already holding locks for "0xbf4e9e60"(OracleConnectionEventListener),
"0xcc9af820" (OraclePooledConnection) and "0xcc9af8c8" (OracleConnection)
Having several threads in same state (waiting for same object), indicates there is a contention for the object.
In the current example, thread dump would reveal the thread holding the lock (Look for the thread with entry: - locked <0xbf4e98f8>)
R- Runnable
This state would mean thread is executing/utilizing the CPU or OS ready state for utilizing the CPU.
Eg:
"[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x005af448 nid=0x20 runnable [0x6847d000..0x6847f8f0]
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at oracle.net.ns.Packet.receive(Unknown Source)
at oracle.net.ns.DataPacket.receive(Unknown Source)
........................
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3316)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3361) - locked <0xe225f140> (a oracle.jdbc.driver.T4CPreparedStatement) - locked <0xe221b390> (a oracle.jdbc.driver.T4CConnection)
at weblogic.jdbc.wrapper.PreparedStatement.executeQuery(PreparedStatement.java:100)
at com.determine.log.FQLogListImpl.collect(FQLogListImpl.java:80)
at com.determine.persistence.DBFacade.collect(DBFacade.java:75)
......
The above thread is connected to DB (opened socket on server and reading response from DB server). Now if the thread is in same state across multiple thread dumps, we know either the query is taking long time to execute or there is some issue with DB server that’s causing the DB to respond slowly.
There are good tools that parse and provide graphical representation of threads dumps like Samurai , Thread Dump Analyzer.