Monday, December 1, 2008

What to look for in java thread dumps

A java thread dump is snapshot of JVM at that point in time.
More specifically it’s a snapshot of all the java threads at that point in time.
A typical thread dump comprises of JVM spawned and application spawned threads. Some of the JVM spawned threads are (Sun JVM1.5.x)
Low Memory Detector (Daemon thread to detect and report low memory conditions)
CompilerThreadn (Daemon thread/s to compile & optimize byte code)
Signal Dispatcher (Daemon thread to respond OS signals)
Finalizer (Daemon thread to handle objects in finalizer queue)
Reference Handler (Daemon thread to handle objects in reference queue)
VM Thread (Main thread from JVM)
GC task thread#n (Thread/s to perform garbage collection)

Application threads are spawned from application (can be a combination of application server/web server and java application deployed).
A typical thread dump snapshot would have most of the threads in one of the following states:
R - Runnable ; MW - monitor wait; CW - conditional wait.
(Terminology might differ depending on JVM implementation)
CW - conditional wait:
A thread in conditional wait would mean its waiting on a certain condition to be fullfilled. Typical Scenarios would include:
i) An idle thread waiting for work to be assigned.
ii) Thread waiting to get an EJB instance from pool
iii) Thread waiting to get a connection from connection pool.
Eg1:
"ExecuteThread: '16' for queue: 'weblogic.kernel.Default'" daemon prio=5 tid=0x010dc720 nid=0x29 in Object.wait() [a1a7f000..a1a7fc30]
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:429)
at weblogic.kernel.ExecuteThread.waitForRequest(ExecuteThread.java:154) - locked <0xb1593fa0> (a weblogic.kernel.ExecuteThread)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:174)

The above thread is an idle thread in weblogic server and is in conditional wait within ExecuteThread.waitForRequest() i.e waiting on a condition to fullfill before it starts executing. Condition being, socket reader thread would accept the request and delegate the call to notify() on this waiting thread which then resumes the request execution.

Eg2:
http-8080-Processor25" daemon prio=1 tid=0x09298688 nid=0x2781 in Object.wait() [0xa82f8000..0xa82f8fc0]
at java.lang.Object.wait(Native Method)- waiting on <0xab436a68> (a org.apache.commons.pool.impl.GenericObjectPool)

at java.lang.Object.wait(Object.java:474)

at org.apache.commons.pool.impl.GenericObjectPool.bor rowObject(GenericObjectPool.java:810)- locked <0xab436a68> (a org.apache.commons.pool.impl.GenericObjectPool)

at org.apache.commons.dbcp.PoolingDataSource.getConne ction(PoolingDataSource.java:95)

at org.apache.commons.dbcp.BasicDataSource.getConnect ion(BasicDataSource.java:540)

at org.springframework.orm.hibernate3.LocalDataSource ConnectionProvider.getConnection(LocalDataSourceCo nnectionProvider.java:81)
at org.hibernate.jdbc.ConnectionManager.openConnectio n(ConnectionManager.java:417)

at org.hibernate.jdbc.ConnectionManager.getConnection (ConnectionManager.java:144

This is from tomcat/hibernate framework.
The thread is waiting on pool object, but looks like all the pool objects are in use. When one of the executing threads is done using the poolobject, it issues a notify() on this thread so that it can resume and pick the poolobject (JDBC connection) Any time a thread goes to wait(), it releases the current lock.


MW - Waiting for Monitor entry
MW is waiting for monitor entry i.e the thread is blocked and waiting to enter a synchronized block/method to get a java monitor. From the thread dump we can determine the monitor that is blocking the thread. If too many threads are in MW state across multiple snapshots, we know there is a contention for the lock.

Eg:
TP-Processor633" daemon prio=5 tid=0x004921d0 nid=0x5ba waiting for monitor entry [872fe000..872ffc28]
at oracle.jdbc.pool.OracleConnectionCacheImpl.closeSingleConnection(OracleConnectionCacheImpl.java:677) - waiting to lock <0xbf4e98f8> (a oracle.jdbc.pool.OracleConnectionCacheImpl) at oracle.jdbc.pool.OracleConnectionCacheImpl.reusePo oledConnection(OracleConnectionCacheImpl.java:592)
at oracle.jdbc.pool.OracleConnectionEventListener.connectionClosed(OracleConnectionEventListener.java:130) - locked <0xbf4e9e60> (a oracle.jdbc.pool.OracleConnectionEventListener)
at oracle.jdbc.pool.OraclePooledConnection.callListener(OraclePooledConnection.java:482)
at oracle.jdbc.pool.OraclePooledConnection.logicalClose(OraclePooledConnection.java:445) - locked <0xcc9af820> (a oracle.jdbc.pool.OraclePooledConnection)
at oracle.jdbc.driver.OracleConnection.logicalClose(OracleConnection.java:2936) - locked <0xcc9af8c8> (a oracle.jdbc.driver.OracleConnection)
at oracle.jdbc.driver.OracleConnection.close(OracleConnection.java:1454) - locked <0xcc9af8c8> (a oracle.jdbc.driver.OracleConnection)
..........
In the example above, the thread is waiting to lock "0xbf4e98f8" (object ID for "oracle.jdbc.pool.OracleConnectionCacheImpl") Note this thread is already holding locks for "0xbf4e9e60"(OracleConnectionEventListener),
"0xcc9af820" (OraclePooledConnection) and "0xcc9af8c8" (OracleConnection)
Having several threads in same state (waiting for same object), indicates there is a contention for the object.
In the current example, thread dump would reveal the thread holding the lock (Look for the thread with entry: - locked <0xbf4e98f8>)

R- Runnable
This state would mean thread is executing/utilizing the CPU or OS ready state for utilizing the CPU.

Eg:
"[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x005af448 nid=0x20 runnable [0x6847d000..0x6847f8f0]
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at oracle.net.ns.Packet.receive(Unknown Source)
at oracle.net.ns.DataPacket.receive(Unknown Source)
........................

at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3316)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3361) - locked <0xe225f140> (a oracle.jdbc.driver.T4CPreparedStatement) - locked <0xe221b390> (a oracle.jdbc.driver.T4CConnection)
at weblogic.jdbc.wrapper.PreparedStatement.executeQuery(PreparedStatement.java:100)
at com.determine.log.FQLogListImpl.collect(FQLogListImpl.java:80)
at com.determine.persistence.DBFacade.collect(DBFacade.java:75)
......
The above thread is connected to DB (opened socket on server and reading response from DB server). Now if the thread is in same state across multiple thread dumps, we know either the query is taking long time to execute or there is some issue with DB server that’s causing the DB to respond slowly.

There are good tools that parse and provide graphical representation of threads dumps like
Samurai , Thread Dump Analyzer.

4 comments:

Anonymous said...

Thanks for the post. Wondering if you could blog on how to use the Samurai tool and what things to look for. There isn't much info on the Samurai site besides look for RED. Thank you!

vijay poluri said...

yes the tools only indicate a likely thread contention.
I will try to post a continuation to this providing more specfics on analyzing thread dumps

Anonymous said...

Nice explanation, thanks!

Anonymous said...

Nice post...