NetKernel News Volume 1 Issue 19

March 12th 2010

What's new this week?

  • Performance tuning yields 30+% speed up.
  • New scalability profiling tools and 8-core server comparative results.

With the release of the NKMark10 peformance index last week we've done a little tuning. Over the course of the week, the updates listed below, have yielded a 30+% performance boost measured with NKMark10 on a range of systems. If the actual internal processing cost of the NKMark10 endpoints are taken into consideration then the net system optimization is actually considerably larger than 30%.

Repository Updates

kernel: Fixed a NPE potential. Rationalized response metadata. Optimization of data structures.

layer0: Removed a contention point on NKF async requests by removing a synchronized() block and replacing with a Java-5 countdown latch.

NKF async request handle now optimizes object creation by not constructing new proxy classes each time it is used. This reduces unneeded memory churn in permgen.

The grammar matching performance has been optimized to fail fast by matching static strings in the start of an identifier. This is a very common pattern for identifiers and so this enhancement often completely eliminates the need for full grammar matching during resolution.

standard-module: Optimized the post-commisssion cycle to speed live operational performance. Optimized private-filter endpoint which now has same proxy class optimization as layer0.

system-core: Fixed idle worker thread detection. Added garbage collection introspection to netkernel data services.

nkse-dev-tools: With system-core update we now show GC stats in the system information detail view.

nkse-doc-contents: Jeff Rogers has quite rightly pointed out that the documentation covering caching was pretty disgraceful. So we've added a new section to the docs which discusses caching. An online copy is available here...

We'll be going into more depth about how to analyse and design for caching optimization soon.


Thanks to all of you that sent in your numbers. It turns out that the NKMark10 algorithm was giving a pretty fair qualitative measure of platforms. So it remains unchanged and is now the official stable release.

So far the best score is 31.5 on an 8-core Xeon 2.5GHz box.

Scalability Tools

A new suite of tools for measuring your system's scalability is now included in the nkperf package. These measure throughput with increasing concurrency showing both synchronous and asynchronous request scaling.

Comparative results for 8-core Xeon and Opteron servers are shown here

These results demonstrate how NetKernel scales linearly with CPU cores and has an ideal load line when concurrency exceeds cores. The thing we want to spread the word on is that NKs linear scaling comes for free with no need to learn concurrent asynchronous multi-threaded coding techniques.

[Thanks to Chris Cormack and DeltaXML for running the tests on their Xeon box].

Install or update nkperf from apposite to run the tests on your own machine. The tests are very processor intensive and will take some time but after the first run they will be persisted and can be viewed anytime without fully rerunning. You can run any individual test as needed if you're using them for iterative system tuning. When installed the tools are located here...


[Note the graphs are dynamically generated SVG - so don't use IE since its the last browser holding out against rising tide of vector graphics.]

As you'll see in the AMD/Linux 8-core results, the stack on which you run NetKernel can affect the scaling profile. If you see non-linearities on your stack then you can start to play with tuning your hardware, OS and JVM configurations to optimize the system.

You can also decide if using the NetKernel throttle overlay to "lock down the loadline" would be beneficial (see 8-core results discussion for explanation of why throttle is a good thing even when you have a linear stack).

With the throttle in your application you will force the variability of the underlying stack to sit at a fixed point and so operate with an effective ideal scaling profile.

You might also want to consider multi-tiered throttles to provide fan-in profile shaping. So, for example, you might handle very large concurrency at the edge of your application but fan in the load shape around the processor intensive parts of your application so that it is ideally matched to your multi-core capability. [The NKEE endpoint profiler will show you where your hotspots are].

We'd love to hear how these tools work for you (please send us screenshots of the charts - saving SVGs doesn't work well in most browsers we find). Bragging rights still up for grabs for the best NKMark10 score!

Have a great weekend.


Please feel free to comment on the NetKernel Forum

Follow on Twitter:

@pjr1060 for day-to-day NK/ROC updates
@netkernel for announcements
@tab1060 for the hard-core stuff

To subscribe for news and alerts

Join the NetKernel Portal to get news, announcements and extra features.

NetKernel will ROC your world

Download now
NetKernel, ROC, Resource Oriented Computing are registered trademarks of 1060 Research

© 2008-2011, 1060 Research Limited