WiNK edit

wiki /NetKernel /News /4 /19 /August_2nd_2013

NetKernel News Volume 4 Issue 19

August 1st 2013

Repository Updates
Dependency Caching and Control of State
NetKernel Architects Group

Catch up on last week's news here, or see full volume index.

Repository Updates

The following updates are available in the NKSE and NKEE repositories...

database-relational 1.15.1
- Added equality test on TransactedConnection so that scope comparisons during cache evaluation match. Increases the opportunity to cache query results when using the TransactionOverlay. Thanks to Gary Sole at Findlaw.com for helping discover and test this.
layer1 1.47.1
- File scheme accessor for file:/// resources updated to accommodate platform differences between Windows and Unix.
nkse-search 1.20.1
- Fix to Lucene indexer to ensure that, in the event of an error in the indexing of an item, the index gets closed and its lock removed. Thanks to Tom Geudens for discovering this.

Dependency Caching and Control of State

I was planning to continue the series On Empiricism, but something came to light this week which I thought was worth covering while its still fresh in my mind (and, anyway, as they say: "History can wait").

One of these days I'm going to have to ask Gary Sole at Findlaw.com to spill the beans on his Findlaw-on-Rails architecture.

The basic gist of it is that a primary system of record is abstracted behind REST services comprising composite resources. The goal is to abstract the DB and to model it as resources and, as a bonus side-effect, to provide efficient compositional cache re-use of common resources.

The design is very elegant in that to add new services, all that is needed is a declarative module of resources (such as SQL statements, scripts etc).

The declarative resource "bucket" is automatically discovered and corresponding logical endpoints are created.

Each logical endpoint is dynamically mapped to a runtime which does the necessary routing, query, transformation and composition.

This architecture is a variant of very similar patterns we use inside the NetKernel tooling. For example the documentation system has this pattern and so too do the search and indexing and the metadata driven Space Explorer tools. In fact so does this wiki application we're using now.

One of the advanced areas that Gary's solution is exploiting is the ability to dynamically manage and manipulate spacial scoping - but we'll save that for another day...

This week I spent some time with Gary reviewing and optimizing some of the caching in the system.

As ever, there's nothing like the real world to make you re-assess your assumptions. It was clear that the team were close to a balanced system but there was a little uncertainty about where their golden threads should be applied.

This question is really a symptom of a deeper underlying requirement: knowledge of the NetKernel dependency caching model.

I've been guilty of assuming that "dependency caching" was broadly understood or at least inferred - but of course, its a brand new concept that has a simple and predictable model, but which we have really not explicitly described in detail.

So today lets put right this error of omission. Here's the core details of what you need to know about NetKernel's dependency caching model and how to use it to apply arbitrarily complex control over composite resource state...

Dependency Model

The default caching model within NetKernel is to use dependency caching. This means that the representation that is returned for a request will be cached so long as all the resources that it depends upon are themselves not expired.

A resource acquires a dependency when a sub-request is made from an endpoint while it is currently evaluating a request. So, for example, any time you construct and issue a request inside your own endpoint's code then you are creating a dependency on the sub-requested resource.

Its easy to spot these dependents in our own code. But when you use a runtime, these also generally issue sub-requests in order to compute the resource you have requested, therefore, all of those compositional resources become dependencies on the response of the runtime.

For example, if you're using active:xrl2 then every xrl:include resource is a dependency of the composite resource computed by the XRL runtime. So too with references in xslt, trl, xquery etc etc.

So that's how the cache dependency tree grows. And, since in any system there are fewer top-level resources than middle tier resources, most resources end up being the result of a cascading tree of composite sub-requests.

It follows, by default, most dependency cached resources have associated with them an emergently-complex tree of dependencies.

Its at this point that the use of the word tree is not that helpful. Please click the "Tree" image above to see a different perspective...

Tree or Root?

We say "dependency-tree", since, formally, that is the structural form of the dependency hierarchy. However for a practical understanding of the dependency model it is probably a better idea to invert our perspective and instead think of it as a root structure.

A composite resource can be thought of as a plant that has grown from the emergently complex set of root tendril-resources. But, don't try too hard to take this metaphore literally, since, unlike a real plant, the composite representation will whither and die if any tendril-resource dies.

To see the different ways in which a composite representation can be killed by its dependencies, please click on the image below (stand back, Walt Disney's got nothing on this baby)...

Watch the animation for a few cycles.

You can see that if you shoot the composite resource in the head then obviously it dies. If you shoot it in its middle, it dies. Even if you shoot it in a toe - it dies.

On first impressions you might start to think that it is a very sensitive flower.

However, notice that while the composite flower as a whole is delicately balanced. Each root branch is quite robust. Shoot the head but all the roots are untouched. Similarly invalidate any one sub-branch and all the remaining sub-branches are untouched.

And this is the point. Resources in the real-world have different statistical life-cycles and distributions. Any given composite resource might depend on fast changing resources and slowly changing resources. The dependency cache model of NetKernel allows you to build flowers that are as strong as they need to be and no stronger. And if you have to grow another composite flower then only compute the resources that are not already known.

How do resources expire?

Composite resources tend to use the dependency caching model. However, toe-resources (or really in a formal tree language - leaf-resources) have more choices available.

For example, if a leaf resource were linked to a file on the filesystem, (like for example module resources) then we will (under the covers) ensure that if the file changes any previous use of that file will expire (and so kill all representations that have it in their dependency hierarchy).

Equally you might, like HTTP, decide to use time as the control variable. For any resource we can express validity in terms of absolute or relative time since it was computed. When time passes beyond the expiration limit then it will expire (and so will all composite resources that depend on it).

We can even assign our own expiry functions and have arbitrarily complex algorithms for determining if representation state is expired. The important caveats are that your function should be side-effect-free and should be very fast. Since, as you can see, determining expiration is a key requirement that the cache uses and if an expiration function is not efficient it will impose a cache-lookup-burden to the requestor of any higher order composite resource that depends on your resource. (Fortunately this is not a global burden since the cache is concurrent and atomic).

For good measure, we can also mix all of the above direct controls with the dependency model too. So, if we look at the image above again, resource D-1-3 depends on D-2-4 but also, because in the animation it seems to spontaneously die, it might also have a time dependency.

With NetKernel's cache model we have a lot more power than is available in REST.

The ultimate power is that we can allow our resources to depend, not just on physically computable resources (like files, or runtime computations, or time) we can even allow ourselves to depend on imaginary/virtual resources...

Virtual Resources - The Golden Thread Pattern

If a composite resource has a dependency hierarchy then it is valid so long as all its dependents are valid. What if one (or more) of the dependents is imaginary?

We can mix in imaginary resource dependencies at any point simply by making a request to the active:attachGoldenThread accessor.

In the diagram below we can see that the endpoints that computed D-2-1 and D-2-3 also made a sub-request to the golden thread accessor and ensured that they were dependent on GT-1.

In doing so they provide a new way for us to control the validity of the dependency branch that they belong to.

Click the diagram to see what happens when we shoot GT-1

How do we expire a golden thread resource? Well, any damn way we choose - we're masters of the state universe. We can conspire to ensure that changes in state in one or more entirely unconnected parts of our architecture will cause dependency trees to be expired in one or more other parts of the architecture simply by calling the active:cutGoldenThread accessor with the identity of the virtual resource.

The beauty here is that the implementer of the golden thread expiration has no need to know anything about the dependency model or the resources that have made themselves dependent on the virtual golden thread resources. Its sufficient to know that every resource that depends on the virtual resource will atomically expire.

We start to get into real power when we also see that we are free to make any representation depend on as many virtual resources as is necessary. We can have a range of control from broad-brush to ultra-fine grained.

Therefore we start to see that rather than have "fast lookup" expiration functions written in code at the "toe-level" - we can instead simply use a golden thread virtual resource and have external events cause expiration. No need to write low-level code and worry about side-effects and efficiency of the implementation - just have triggered dependency expiration.

So, for example, the crude time-based controls (which replicate the HTTP model) could be simply replaced with a golden thread dependency and use of the cron transport. Set up a repeating or one-shot cronjob and make it issue a request to expire the golden thread at a time of your choice. Or have synchronized expiry on a rolling cycle...

Its then just a short step to consider a cache control transport pattern. That is, we can readily construct a Transport endpoint whose job is to receive external events (you decide what those are - could be voltage on a sensor, could be an HTTP message, could be a trigger in a database). But the core pattern is that the cache-control transport should have an algorithm that maps the external event into a named virtual golden thread resource and when the event occurs it cuts that (or those) golden thread(s). Therefore, that subset of the state of the system that had the golden thread in its dependency tree gets shot.

Where do we place our Golden Threads?

Unfortunately there is no simple answer. Since really what you're asking is "where do I want to control and manage the state of my resources".

Well that's engineering. What are the balances and trade-offs you want in your system so that its state is a good-enough approximation to reality and such that when things cause state to change then that change propagates atomically through the representation state of the system?

However, clearly its generally a good thing to have resources that are computed from external system's of record be controlled without elaborate coupling to that system of record. So when you go outside for state (like a DB query or REST service request) consider adding one or more golden threads. [We can imagine this is the case with D-2-3 and D-2-1 in the diagram above].

Sometimes you want to control in a middle tier (like D-1-3). Sometimes you might even want direct control of the top level composite resources - but this is a rare pattern since you usually have many choices of ways to kill the flower by killing a tendril-root-resource, as we've seen.

Summary

NetKernel's dependency caching model is very powerful. It leads to very efficient self-balancing self-tuning systems. It is as close as we can get to being very simple to use - but when you start to build powerful systems you do eventually have to have some sense of how it operates.

If you need help and advice with tuning or designing the engineering balances of your system then get in touch. There's nothing more satisfying than knowing that at the end of the day we've built and are running a "normal solution". One that is sitting in a computational energy minima.

NetKernel Architects Group

We were recently joined by Tom Mueck as a member of the team. Tom's job is to sort out our business development, improve our visibility and generally drive the next phase of the NetKernel story. He's an MBA, but lets not hold that against him - he did after all independently discover NetKernel and grok the potential of ROC.

I guess over time I'll be introducing Tom to some of you directly. However in the mean time, Tom is starting to breathe life into the entirely neglected NetKernel Architects Group on Linked-In. If you're not a member please consider joining to connect with like-minded ROC professionals. Our hope is that this group grows to be a strong professional network (it is absolutely not a direct marketing channel).

Have a great weekend. Enjoy the summer.

Comments

Please feel free to comment on the NetKernel Forum

Follow on Twitter:

@pjr1060 for day-to-day NK/ROC updates
@netkernel for announcements
@tab1060 for the hard-core stuff

To subscribe for news and alerts

Join the NetKernel Portal to get news, announcements and extra features.

NetKernel will ROC your world

Download now

NetKernel, ROC, Resource Oriented Computing are registered trademarks of 1060 Research