Without any testing, just seeing that it compiles, following code snippet should allow embedding periodic thread time usage analysis into application.

[code lang=”java”]
import java.util.HashMap;
import java.util.IdentityHashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.concurrent.atomic.AtomicInteger;

/**
* Thread time usage analyzer
*/
final class Analyzer extends Thread {
/**
* Accuraracy of measurement; coarse timing is enough, only large scale trend is interesting
*/
public static final int PERIOD = 5 * 1000;

/**
* Statistics for one thread
*/
static final class Statistics {
/**
* Map of (class.method, period-ticks)
*/
private final Map mTicks = new HashMap();

/**
* Update stats for thread
*/
void update(StackTraceElement[] pStack) {
StackTraceElement top = pStack[0];

AtomicInteger ticks;
{
String key = top.getClassName() + ”.” + top.getMethodName();
ticks = mTicks.get(key);
if (ticks == null) {
ticks = new AtomicInteger();
mTicks.put(key, ticks);
}
}
ticks.incrementAndGet();
}

/**
* Dump stats for thread
*
*

TODO KI sort keys
*

TODO KI calc percentages
*

TODO KI dump only top 10 methods
*/
void dump(StringBuilder sb) {
for (String key : mTicks.keySet()) {
AtomicInteger ticks = mTicks.get(key);
long secs = (PERIOD * ticks.intValue()) / 1000;
sb.append(key);
sb.append(’ ’);
sb.append(secs);
sb.append(’s’);
sb.append(’\n’);
}
}
}

/**
* Map of (Thread, Statistics-for-thread)
*/
private final Map mStats = new IdentityHashMap();
private long mLastDump = -1;

public void run() {
boolean alive = true;
while (alive) {
try {
Thread.sleep(PERIOD);

update();
if (dumpIfNeeded()) {
clean();
}
} catch (InterruptedException e) {
// immediate exit
alive = false;
}
}
mStats.clear();
}

/**
* Update stats
*/
void update() {
Map stacks = Thread.getAllStackTraces();
for (Thread thread : stacks.keySet()) {
Statistics stats = mStats.get(thread);
if (stats == null) {
stats = new Statistics();
mStats.put(thread, stats);
}
stats.update(stacks.get(thread));
}
}

/**
* Clean away dead threads
*/
void clean() {
Iterator aliveIter = mStats.keySet().iterator();
while (aliveIter.hasNext()) {
Thread thread = aliveIter.next();
if (!thread.isAlive()) {
aliveIter.remove();
}
}
}

/**
* @return true if dump was done
*/
boolean dumpIfNeeded() {
long now = System.currentTimeMillis();
if (now – mLastDump > 60 * 1000) {
dump();
mLastDump = now;
}
return now == mLastDump;
}

/**
* Dump stats for all threads
*
*

TODO KI define output medium; stdout is bad
*/
void dump() {
StringBuilder sb = new StringBuilder(100000);
for (Thread thread : mStats.keySet()) {
Statistics stats = mStats.get(thread);

sb.append(thread.getName());
sb.append(’:’);
sb.append(’\n’);
stats.dump(sb);
sb.append(’\n’);
}

System.out.println(sb.toString());
}
}
[/code]

Enjoy…

PS. Notice good pattern of avoiding long methods, but encapsulating logic into small maintainable chunks (sure, I’ve seen FAR too many times, when all code logic is embedded into single thread main loop, which makes code 100% unmaintenable). One invisible benefit of small methods is that they keep visibility scope of variables smaller, this effectively means two things, (a) Escape analysis logic can work more effectively, (b) variables are released for gc() early, (c) JVM is aggressively inlinining short methods (Inlining). Notice convention collection API spec documentation syntax, and also the fact that documenting non-public methods/classes is important.

Update 1:
Added working sample code:
Analyzer.java
Profiled

Download and do this in shell
[code]
javac *.java
java Profiled
[/code]
After 20 secs statistics are dumped.

Output from test program looks something like this (every method has total time, and own time in braces):
[code]
Finalizer:
java.lang.Object.wait 240s (240s)
java.lang.ref.Finalizer$FinalizerThread.run 240s (0s)
java.lang.ref.ReferenceQueue.remove 480s (0s)

Reference Handler:
java.lang.Object.wait 480s (240s)
java.lang.ref.Reference$ReferenceHandler.run 240s (0s)

Signal Dispatcher:

main:
Profiled.main 240s (0s)
java.lang.Thread.sleep 240s (240s)

test-0:
Profiled$1.bar 9s (3s)
Profiled$1.foo 231s (0s)
Profiled$1.run 240s (0s)
java.lang.AbstractStringBuilder. 1s (1s)
java.lang.AbstractStringBuilder.append 3s (0s)
java.lang.Integer.getChars 3s (3s)
java.lang.StringBuilder. 1s (0s)
java.lang.StringBuilder.append 3s (0s)
java.lang.Thread.sleep 231s (231s)
java.util.HashMap.put 2s (2s)

test-1:
Profiled$1.bar 10s (2s)
Profiled$1.foo 230s (0s)
Profiled$1.run 240s (0s)
java.lang.AbstractStringBuilder.append 6s (0s)
java.lang.Integer.getChars 6s (6s)
java.lang.StringBuilder.append 6s (0s)
java.lang.Thread.sleep 230s (230s)
java.util.HashMap.put 2s (2s)

test-2:
Profiled$1.bar 9s (2s)
Profiled$1.foo 231s (0s)
Profiled$1.run 240s (0s)
java.lang.AbstractStringBuilder.append 2s (0s)
java.lang.Integer.getChars 2s (2s)
java.lang.StringBuilder.append 2s (0s)
java.lang.Thread.sleep 231s (231s)
java.util.HashMap.put 5s (5s)

test-3:
Profiled$1.bar 7s (1s)
Profiled$1.foo 233s (0s)
Profiled$1.run 240s (0s)
java.lang.AbstractStringBuilder.append 4s (0s)
java.lang.Integer.getChars 4s (4s)
java.lang.StringBuilder.append 4s (0s)
java.lang.Thread.sleep 233s (233s)
java.util.HashMap.put 2s (2s)
[/code]

It’s easy to see that simplistic approach, isn’t producing yet good enough results. Simple map for collectings statistics is loosing important information regarding call hierarchy, which in real life is required.

For example, it’s not possible to indicate from this output, why foo() is dominator in time usage over bar() (i.e. foo seems to be hotspot).

However, since I changed method stats already have two separate counted values (own time vs. time spend in called methods), some ideas for managing call hierarchy came up. Trick, I guess so is how to maintain data structure without consuming too much memory and CPU just for storing info (since that would make this profling useless in any real application).

On the other hand, it could be sensible to leave such advanced profling for actual profilers.

Update 2:
Temptation was too high, so here is v3 of Analyzer

New Analyzer version:
Analyzer.java (v3)
Profiled.java (v3)

Output looks something like this now:
[code]
Reference Handler:
java.lang.Object.wait 80s (40s) [java.lang.Object.wait:40, java.lang.ref.Reference$ReferenceHandler.run:40]
java.lang.ref.Reference$ReferenceHandler.run 40s (0s) []

Signal Dispatcher:

main:
java.lang.Thread.sleep 40s (40s) [Profiled.main:40]
Profiled.main 40s (0s) []

test-0:
java.lang.Thread.sleep 37s (37s) [Profiled$1.foo:37]
java.util.HashMap.put 1s (1s) [Profiled$1.bar:1]
java.util.Arrays.copyOfRange 1s (1s) [java.lang.String.:1]
java.lang.Integer.getChars 1s (1s) [java.lang.AbstractStringBuilder.append:1]
Profiled$1.run 40s (0s) []
Profiled$1.foo 37s (0s) [Profiled$1.run:37]
Profiled$1.bar 3s (0s) [Profiled$1.run:3]
java.lang.StringBuilder.toString 1s (0s) [Profiled$1.bar:1]
java.lang.StringBuilder.append 1s (0s) [Profiled$1.bar:1]
java.lang.String. 1s (0s) [java.lang.StringBuilder.toString:1]
java.lang.AbstractStringBuilder.append 1s (0s) [java.lang.StringBuilder.append:1]

test-1:
java.lang.Thread.sleep 38s (38s) [Profiled$1.foo:38]
java.util.HashMap.put 1s (1s) [Profiled$1.bar:1]
java.lang.Integer.getChars 1s (1s) [java.lang.AbstractStringBuilder.append:1]
Profiled$1.run 40s (0s) []
Profiled$1.foo 38s (0s) [Profiled$1.run:38]
Profiled$1.bar 2s (0s) [Profiled$1.run:2]
java.lang.StringBuilder.append 1s (0s) [Profiled$1.bar:1]
java.lang.AbstractStringBuilder.append 1s (0s) [java.lang.StringBuilder.append:1]

test-2:
java.lang.Thread.sleep 37s (37s) [Profiled$1.foo:37]
Profiled$1.bar 3s (1s) [Profiled$1.run:3]
java.lang.Integer.getChars 1s (1s) [java.lang.AbstractStringBuilder.append:1]
java.lang.AbstractStringBuilder. 1s (1s) [java.lang.StringBuilder.:1]
Profiled$1.run 40s (0s) []
Profiled$1.foo 37s (0s) [Profiled$1.run:37]
java.lang.StringBuilder.append 1s (0s) [Profiled$1.bar:1]
java.lang.StringBuilder. 1s (0s) [Profiled$1.bar:1]
java.lang.AbstractStringBuilder.append 1s (0s) [java.lang.StringBuilder.append:1]

test-3:
java.lang.Thread.sleep 38s (38s) [Profiled$1.foo:38]
java.util.Arrays.copyOfRange 1s (1s) [java.lang.String.:1]
java.lang.Integer.getChars 1s (1s) [java.lang.AbstractStringBuilder.append:1]
Profiled$1.run 40s (0s) []
Profiled$1.foo 38s (0s) [Profiled$1.run:38]
Profiled$1.bar 2s (0s) [Profiled$1.run:2]
java.lang.StringBuilder.toString 1s (0s) [Profiled$1.bar:1]
java.lang.StringBuilder.append 1s (0s) [Profiled$1.bar:1]
java.lang.String. 1s (0s) [java.lang.StringBuilder.toString:1]
java.lang.AbstractStringBuilder.append 1s (0s) [java.lang.StringBuilder.append:1]
[/code]

For every method call, list of parents is dumped, indicating how many times every parent has invoked method. From this output is rather trivial to see why foo() is dominator, it’s due to Thread.sleep() what it’s calling, which means that foo() isn’t really consuming CPU. So it means that bar() is the actual hotspot, since foo() really is just idling.

However, still results are rather flawed, since statistics collecting frequence is rather slow.

Update 3: 3.6.2011
After profiling this analyzer, I can state, it’s not usable for any real application, except very small test programs. Yourkit results show very quickly, that actual CPU usage of this Analyzer isn’t too bad, but memory usage, is rather bad already with very simple test program.

Update 4: 3.6.2011
New Analyzer version: With few optimizations and notices
Analyzer.java
Profiled.java
trove.jar

Output looks something like this now:
[code]
DUMP Fri Jun 03 22:08:54 EEST 2011 – 31101 ms:

main:
Profiled.main 240000 ms
java.lang.Thread.sleep 240000 ms (240000 ms)

test-0:
Profiled$1.run 240000 ms
Profiled$1.bar 11610 ms
java.lang.Thread.sleep 11610 ms (11610 ms)
Profiled$1.foo 228390 ms
java.lang.Thread.sleep 228390 ms (228390 ms)

[/code]
Notice how period has drifted, it was supposed to be exactly 30s, but it has drifted to be 31.101s.

Important notice is that attempt to time sampling period via Thread.sleep() starts to fail miserably, when CPU usage grows too high (i.e. lots of threads), this causes that polling period starts to drift heavily, causing results to be completely incorrect. Now actual fact, however, here is that it’s not actually possible to have strictly exact polling cycle. So next attempt would be do enhance logic via some approximation. Like, collect times as ticks upto certain tickcount, and then re-check what is actual wall clock time consumed, and average ticks during that period and save as ms values. And then repeat cycle.

However, good news is that both CPU and memory usage look reasonable. However, results may differ with real large application, which actually has large call chains (and 1000s of methods invoked in single thread).

Update 5: 3.6.2011
Attempt to control drifting of results:
Analyzer.java

Tip: It seems that Yourkit ea 10 profiler is available.

Update 6: 5.6.2011
From that tip, it should be clear, that for profiling in real applications, I would suggest embedding libyjpagent.so Yourkit profiling agent (and redistributable YK profling API, yjp-controller-api-redist.jar) into application and use it to gather profiling statistics. Results will be more accurate, and Yourkit profiler has nice tools for analyzing results.

Of course, point of this exercise was more about learning, what can be done, and still own custom profiling can have some use (well, it almost better than JVisualVM of JDK, which seems to be 100% useless even with trivial programs, and freezes completely with large applications). Use case what comes in my mind, is own custom watchdog logic, which follows thread time usage, and notifies if some thread appears to be stuck.

For example, watchdog could be monitoring how much time EDT thread (very important in Swing programming) is stuck, or how long time is spend in remote RMI calls or Database access, etc. For such watchdog logic, accuracy of monitoring can be lowered to be into range of 50ms…100ms, since high precision is not needed for detecting stuck threads. I.e. Interesting times are in the range of seconds (not milliseconds).

References:
Got idea from Statistical profiling in Java: easy, fast, and useful
Java HotSpot VM Options: -XX:MaxInlineSize=35

/ java

Vastaa

Sähköpostiosoitettasi ei julkaista. Pakolliset kentät on merkitty *

This site uses Akismet to reduce spam. Learn how your comment data is processed.