|
NetKernel News Volume 3 Issue 25
June 1st 2012
What's new this week?
- Repository Updates
- NKEE Cache optimisation
- Module Discovery Daemon
- Jena RDF Library Update
- How to assert your will over a dumb client: Explicitly Excluding Dependencies
Catch up on last week's news here
Repository Updates
The following updates are available in the NKEE and NKSE repositories...
- kernel-1.23.1
- fix to stop IndexOutOfBoundsException when HEADER_EXCLUDE_DEPENDENCIES is used in some circumstances
- fix to stop rare race condition that could cause "Scheduler unstable- endpoint block in wrong state" message
- layer0-1.85.1
- enhancement to pass-by-value to inhibit caching of response
- added proper equals and hashcode methods to ParsedIdentifierImpl to enable EE cache optimisations
- precompute deep hash of pass-by-value spaces to improve equality performance
- modules directory loading monitoring from Tom Geudens
- prevent potential irrelevant but scary ArrayIndex-OOB error message in Boot Order Optimizer
- nkse-dev-tools-1.46.1
- collisions view on representation cache for EE
- modules directory from Tom Geudens (edit in kernel properties tools)
- nkse-visualizer-1.18.1
- fix to stop NPE when a request header had a null value
- rdf-jena-1.8.1
- Updated to use latest Jena-2.7.1-SNAPSHOT
- rdf-rdfa-1.4.1
- Updated to work with rdf-jena update
- system-core-0.31.1
- modules directory loading in init from Tom Geudens
- xml-core-2.5.1
- added reference docs for the XML unit-test assertion library.
The following updates are available in the NKEE repository...
- cache-ee-1.2.1
- optimisation for increased throughput with highly homogeneous workloads (see below)
- nkee-dev-tools-0.25.1
- Visualizer cache comparison plug-in NPE fix.
NKEE Cache optimisation
We identified that in some situations where NetKernel was operating with a homogeneous workload - i.e. many requests with same identifiers (either externally or internally) but with slightly differing request scopes - could result in reduced performance of the cache. This was the result of many hash collisions within the underlying hash map used by the representation cache. Today's optimisation works by detecting collisions during each cull cycle and inhibiting caching on those representations that will cause large amounts of collisions.
It was noticed during this work that some internal operations with the documentation system and management console caused these collisions too, so this change will have positive, though small benefit on most operations. This change also results in the cache filling up more slowly as more non-reusable "junk" is not cached in the first place.
It is unlikely that there will be any detrimental effect from this optimisation, however as an advanced feature, we have added an additional view to the representation cache view tool to show these collisions. It shows each resource identifier and the space it is being requested in along with the number of collisions it has recently seen.
Tom's Blog
Tom's blog entry does a good job of pulling the rug from under my own news article (see below). Plus he's contributed a new daemon-like module discovery mechanism (also see below). The guy is on fire... what would have happened if he'd been able to read his own NK book too?
As for many of you - so I gather - I'm also still waiting to get the first physical copy of the book in my hands. The author seems to be the last to know about these things, so let me know when somebody does get his/her copy !
In the meantime here's a small technical caching tip.
http://practical-netkernel.blogspot.be/2012/06/cache-makes-netkernel-go-round.html
As always, your feedback and ideas for future posts are most welcome at practical<dot>netkernel<at>gmail<dot>com.
Module Discovery Daemon
This feature request came up during some recent training and I know has been on the wish list of many of you for a while. Tom, being at a loose end waiting by the post-box each day for his book, has taken matters into his own hands. The headline is: you can now drop files referencing modules into the [install]/etc/modules.d/ and they automagically get commissioned and changes tracked.
Here's his technical change report...
Purpose
Proposed change to the Module Manager. Provide dynamic Unix-style daemon discovery of modules in a monitored configuration directory. Tom Geudens - 20120531
Detail
Technical Explanation
Changes have been made to : - BootUtil.java in layer0 --method getInitModules() is replaced by getKernelURIProperty(), a general purpose method that returns the URI form of a given kernel.property value. - ModuleManager.java in layer0 --method Run() has been adapted and will use both the netkernel.init.modules property (location of modules.xml) and the netkernel.init.modulesdir property (location of a directory containing other XML files in the modules.xml format) to determine the modules. - InitEndpoint.java in ext.system --method initBoot() has been adapted in the same way as method Run() in ModuleManager.java. Care has been taken to make both pieces of code look alike as much as possible.
Enduser Explanation
The final authority (truth) on what runs in a NetKernel instance is the contents of [install]/etc/modules.xml. Very comparable to - for example - /etc/sudoers on a Linux system, which contains the final authority on what certain users are allowed to do as other users.
Both modules.xml and sudoers suffer from the same problem though. Changing them is potentially dangerous to your system and multiple ways of changing them are possible. On the other hand both files must be changed for the system to be able to evolve.
On Linux they tried to solve the problem with permissions, providing a safe edit tool (visudo), ... While this approach had benefits, they were easily circumvented and the basic problem remained ... multiple ways to change a single file.
Nowadays you'll find a /etc/sudoers.d/ directory on Linux systems, next to the /etc/sudoers file. The idea is thus : basic non-changing stuff goes into /etc/sudoers. More volatile changes go into multiple files (with the same format as /etc/sudoers) inside the directory. The "single truth" is the combination of all, /etc/sudoers and whatever files are found in the directory.
The same is now possible on NetKernel. The [install]/etc/modules.xml file will remain in place. It will remain the file that is used by 1060 and where Apposite does it's thing too. If you are happy with that (and/or use Apposite for your own stuff too) ... you need change nothing.
However, you can add a new property to [install]/etc/kernel.properties :
netkernel.init.modulesdir=etc/modules.d/
- the directory location is customizable, just make sure you create it and allow the NetKernel user to access the files inside it
This allows you to "drop" xml files in the same format as modules.xml inside the directory ([install]/etc/modules.d/ is the default, or whatever location you set in the above property) and the NetKernel system will pick up the changes (as if they were done to modules.xml). Here's for example my jquery.xml
<module runlevel="7">../project-modules/urn.org.netkernel.jquery.server-1.7.2/</module>
</modules>
You choose the granularity yourself. One file per module, one file for all non-1060 modules, one file for ...
This contribution has been through rigorous testing and has made it into todays repository update release of layer0. To start using it just add a modules.d/ directory beneath your etc/ path and drop some xml's containing module lists there. Easy.
A couple of things to be aware of
- Apposite will continue to write its state to etc/modules.xml - so this is still the "system of record for the system".
- The Deployment Editor tool (http://localhost:1060/tools/deployment/) has not been updated - and is unaware of the possible ways module listings can be located. If you use this then only expect it to behave well for modules in modules.xml. We will sort this out - but we thought this new feature was such a quality of life enhancer we didn't want to hold it back.
Jena RDF Library Update
Brian Sletten has prepared a mind-blowing demo for SemTech 2012 next week in San Francisco. To facilitate some of the cool features he's going to reveal, we worked together to update the core RDF jena module to use the latest 2.7.1-SNAPSHOT build. Jena is moving (or has moved) to Apache, which caused the RDFa parser module to need a small update to point to some relocated packages. These two updates are provided in the repos.
If you're at SemTech don't miss Brian's full-day tutorial on ROC/SemTech and get there early cos they'll be queuing round the block to see this year's rabbit-out-of-a-hat-trick in his main talk!
How to assert your will over a dumb client: Explicitly Excluding Dependencies
Tom sent me a link to his blog first thing this morning so that I could put a link in the news. When I saw the subject I immediately got on skype with "Damn you Geudens". Unwittingly, Tom had pre-empted the very subject I had intended to discuss this week. Not to worry though, his is a practical example, I want to explain some of the details and perhaps some of the engineering principles that are in play...
The best way to think about caching is to not think about it at all - just let NK take care of it and let the defaults support NK's multi-dimensional dependency model.
However when you're dealing with the edges of a system (usually outbound client-requests), you often need to think about the life-cycle of the representations you request and/or generate.
You'll probably be aware that you can take full control of the caching of a response which you issue from your endpoints by calling the setExpiry() method and choosing one of the variations of dependency, time and user-functions. This gives you full control over your response and if your endpoint is at the very edge (ie it goes outside for its representation state) then you have full control.
But there's a more subtle scenario. More often than not - you are not the edge endpoint - you're an endpoint (or even a logical endpoint via a mapping) that makes a request to a tool that is the edge-client. For example you're calling active:httpGet or making a request to a JMS bus or whatever. In these cases you get what you're given.
Perhaps, like the HTTP-client, the endpoint you invoke will do its best to set up a suitable expiry function on its response - which, since you're in the middle-layers will be inherited into the default dependency model.
But what if it doesn't? Or what if you know better than the default of the external tools? Or what if they don't give you any clue about the potential longevity of the resource state? In these cases you need to consider the engineering.
More often than not you can eliminate redundant or expensive out-bound client requests by holding onto a copy of the representation for more than one request. In short, you can impose your own overridden expiry model onto the state. But if you're going to assert your dominance over the universe you have to give NetKernel a clue about this.
You have to tell it that the client you are requesting is going to give back an expiry which should be ignored. In short the request you construct can indicate to NetKernel to stop tracking dependencies since they're not going to be used in our response - since we're going to take over at this layer.
The technical reference can be found here, but the summary is this...
INKFRequest request = context.createRequest(...); request.setHeader(INKFRequest.HEADER_EXCLUDE_DEPENDENCIES, true);
By setting the EXCLUDE_DEPENDENCIES header you are saying - I care about the resource I'm requesting but my response (out of this intermediate calling layer) will assert some new expiry model and so I don't want the request resource to influence the dependency model.
Often when taking control like this, you'll want to assert a time-based expiry. But also you may want to include the partial dependency relations of other resources (like scripts, config, passed state etc etc) which have influenced this request - but explicitly not the edge-client's response since we have excluded it.
This latter scenario can be done by setting the expiry of the response to one of EXPIRY_MIN_CONSTANT_DEPENDENT or EXPIRY_MAX_CONSTANT_DEPENDENT - ie to use time and the dependencies (less any excluded requests which were ignored by setting the EXCLUDE_DEPENDENCIES request header). The MIN and MAX here indicate whether the time element should trigger expiry eagerly or lazily respectively.
Engineering Compromise
There are no fixed rules about caching and dependencies. By default the system will maintain internal consistency through the dependency model - this means NK caches in all dimensions of a solution and tries to work out what is the best normalized state of the system in real time. When you go outside you have to take on the engineering responsibility - you have to consider what a "good enough" approximation to reality you want your system to provide.
If you must absolutely know - then you should expire out-bound client responses. But more often than not you can make good approximations that massively improve your overall solution with no loss of end-user coherence.
NK provides the levers - you just use them to tune your application to give a balanced solution.
Have a great weekend.
Comments
Please feel free to comment on the NetKernel Forum
Follow on Twitter:
@pjr1060 for day-to-day NK/ROC updates
@netkernel for announcements
@tab1060 for the hard-core stuff
To subscribe for news and alerts
Join the NetKernel Portal to get news, announcements and extra features.