NetKernel/News/3/2/November_4th_2011
search:

NetKernel News Volume 3 Issue 2

November 4th 2011

What's new this week?

Catch up on last week's news here

Repository Updates

The following NetKernel 5.1.1 updates are available in the NKSE and NKEE repositories...

  • layer0 1.74.1
    • Change to expand any jars contained in a lib/ directory of a jarred module to the lib/expanded/ directory with the module name and version as a prefix to avoid any possibility of namespace collisions.
  • lang-dpml 1.17.1
    • typo fix in exception handling doc.
  • lang-ncode 1.6.1
    • Restored import of text:search so that inline documentation for endpoints can be discovered.
  • nkse-search 1.16.1
    • Regression fix in move to Lucene 3.x so that user-supplied analyser in config is used for both index and search.

The following NK 5.1.1 updates are available in the NKEE repository...

  • nkee-apposite 1.29.1
    • System health check status is now performed asynchronously to eliminate potential for UI lag.
  • nkee-encrypted-module-factory 1.11.1
    • Updated to use the layer0 change to jarred lib expansion.

NetKernel 5.1.1 - Status

A few minor tweaks were released as updates in the repos this week (see above). We're still reporting zero-downtime on our primary servers. After a few weeks continuous operation we're consistently seeing approximately a 25-35% smaller baseline on the heap memory charts.

We migrated our nk4um over to NKEE 5.1.1 this week - this site gets a very heavy load of pseudo-random traffic due to its legacy of having lots of bot-indexed content over many years. Again we're seeing a similar reduction in operating heap size - heap baseline is consistently around 25MB with a full cache (yeah really that's MB).

Cache Tuning Tip

One thing we did notice after some time in production was that for those applications with lots of pseudo-static content we were showing a somewhat reduced cache-hit rate. We realised that this is a property of the new cache cost threshold parameter added in NKEE. By default this was 500 (micro-seconds) on our production systems - but we realised that many of the resources that were being requested were much cheaper than this and so were not above this cost threshold.

However for this type of application profile, holding them in cache is still a good thing, since the cached representation's hashcode is served as the ETag in the HTTP transport.

We lowered the cache "cost threshold" to zero using the kernel config tool...

http://localhost:1066/tools/kernelconfig

...and the cache hit rate returned to its usual (for our apps) 45% rate. We're also consequently getting high levels of 304 responses in the HTTP front-end.

The lesson here is that this lever is there for you to use to optimize a production system. For a system doing lots of ongoing computation then having a non-zero cache threshold will eliminate caching of trivially recomputable state. But for long-lived pseudo static systems it may pay to reduce the threshold to zero and allow it to work out what's useful based upon hits and load.

NoSQL and ROC

For some time I've been getting friendly prompting by Arun Batchu at BestBuy to look into providing some tools for accessing NoSQL stores from NK. In particular I was guided to look at Riak, a distributed key-value persistence store with circular node topology and redundant read/write capability. The Riak design intent is to trade off eventual consistency for replication and redundancy.

So this week I set up a 4-node cluster and have knocked up an initial implementation of some ROC tools to work with it. So far I have a transreptor for configuration to support either HTTP or Protocol Buffers client pools with unlimited nodes. And a complete set of CRUD operations on Riak's bucket/key values.

This is only a first cut, so its not something I'd feel comfortable putting in the repositories, but I'd welcome feedback and suggestions. So here's the nosql:riak module and its unit tests...

These are modules, so you'll need to download and add entries for them in modules.xml to install them. After installation there's some reasonably comprehensive documentation on the capabilities so far, available here...

http://localhost:1060/book/view/book:urn:org:netkernel:nosql:riak/

This release constitutes a minimal feature set and, so far, the tools are pretty solid and are passing some extensive unit-tests. However this is still a work in progress. For example I've implemented a set of index/search endpoints which are included in the jar but commented out as I've discovered that Riak requires quite a lot of knowledge of the backend storage architecture and indexes are not supported by the default BitCask store.

Experience so far is that Protocol Buffers are about twice as fast as HTTP. But I'm generally a little less than overwhelmed by the performance in general. I have my test cluster running on four virtual hosts on an 8-core box, but the best response for a small test-read is just 9ms (with PB client, 18ms on HTTP).

I'll give Riak the benefit of the doubt for now, as I said, there seems to be some dark-arts in actually setting up the cluster optimally. But by comparison the same host server has our development postgres and mysql DB's also running on an equivalent virtual host and these give us sub-millisecond access to Blobs.

If you have experience with Riak and can tell me if this is par for the course or needs some tuning please let me know.

Of course, as Arun keeps telling me, the point is not whether you have a little read latency for the data from Riak. Since once its in the ROC domain in NK, then it will be cached and used in composite resource architectures and cached in derivative resources. So the latency will be amortized out anyway. To paraphrase Arun, "NK makes an ideal middleware for NoSQL".

Actually the reason for my disappointment was not the first-order use of the key-value store for user data. When I worked out how it hangs together I had visions of a distributed golden thread pattern where the thread state was stored (in an eventually consistent manner) in Riak. Now that would have been really cool. But golden thread expiries are evaluated every time a cache key is examined - so think what multiple 9ms latency would do to the overall system performance!

This idea of networked expiry functions isn't just out there in fantasy land though, its viable and used already. For example resources requested over the NKP protocol acquire a distributed expiry function which, with a Gigabit network backbone, shows negligible latency and so makes the NKP distributed dependency model transparent.

Anyway, the world is awash with NoSQL products. So undoubtedly over time we'll explore further and look at tooling for the popular models. Again, your insight and feedback would really help here.

Incidentally, Tom has his finger on the NoSQL pulse and was way ahead of me, and has a chapter dedicated to integrating and using MongoDB with NK in his book (see below)...

Tom's Book: Draft 8 and NK5 Ready

Here's a note I received last week from Tom Geudens, I should have posted this then but it got overlooked, sorry Tom...


The 'Practical NetKernel Book', also known as 'Hello NetKernel' and 'NetKernel in Action' is still in its 8th iteration ... but has been updated completely to the new and shiny NK5 release.

Chapter 6 on DPML is also well on the way, progress is steady.

You can download iteration 8 - NK5 from

http://www.netkernelbook.org/serving/pdf/hello_netkernel_nk5-0.8.pdf

or if you want to follow along and have the latest 'build'

http://www.netkernelbook.org/serving/pdf/hello_netkernel_nk5.pdf

As always, your feedback and input is highly appreciated. You can send it to tom(dot)geudens(at)hush(dot)ai

Round And About Silicon Roundabout

I've decided I need to get out more (you knew that already). Next Wednesday I'm going to be round and about the Silicon Roundabout area of London. If you'd like to meet up just drop me a line or ping me on twitter (@pjr1060).

For those unfamiliar with Si-Roundabout, this is emerging as a self-selecting hub for dot-com businesses in the UK. The name owes a lot to British tongue-in-cheek humour - this area of London is unremarkable and its most notable feature is an ugly roundabout.

Having worked in Si-Valley and being a technology entrepreneur I fully appreciate the virtues of this arrangement. Drab-surroundings, no sunlight and 300-days of over-cast rain is a very good thing and makes being indoors hacking away seem the better option - you can take your palm trees, vineyards, citrus orchards, good food and sunshine... oh wait, better stop myself there... (see next item)...

LAWebSpeed NetKernel Workshop: 9th November, Santa Monica, CA

Our friends in California (you know, the place with the sunshine, vineyards etc etc) are having a NetKernel meetup in the LA area...

http://www.meetup.com/LAWebSpeed/events/36673312/

Joe Devon is the instigator, and Brian Sletten the presenter. Both kick-ass NK ninjas. As it says on the web-announcement...

"Fasten your seat belts. Brian will be doing a hands-on Workshop, showing us a thing or two about NetKernel."

My bet is that they won't be happy just showing "a thing or two" - probably "three, four or several". Looks like only a few places left, so sign-up quick.

On Scope

Last week I gave a short tribute to John McCarthy and teased with a suggestion that LISP and ROC were related. I think the broader points...

  • constrained and uniform data structures as representation models
  • the philosophy of building small units and composing into composite solutions

...are somewhat self-evident but that...

  • code is a resource processable within the process
  • the architectural power of dynamic scope

...deserve some elucidation. Today lets think about scope...

What's its name? No. Where's its name?

So what is scope? In programming language terms, scope is the mechanism by which the citation of a variable name in a program is mapped to the instance value of that variable.

Classically there are two models for implementing the scoping mechanism. Dynamic scope and lexical scope.

With dynamic scope the value associated with a name is dynamically built up as the function call stack is traversed. The most recent mapping of a name to a value is used when a variable name is referenced. As we'll see, dynamic scope whilst consistent to the computation system, can be conceptually ambiguous for a developer writing a program and so, over time, dynamic scope has not featured in mainstream languages.

Lexical scope is conceptually simple for a developer. It behaves strictly and exactly as written in the code (lexical: as or pertaining to words in a language). The nearest cited reference in the code will mask any other use of the variable name until the call stack is popped. For developers this is conceptually much simpler to understand.

The classic example of the difference between the two models is illustrated with this pseudo-code...

int x = 0;
function int f()
    {   return x;
    }
function int g()
    {
        int x = 1;
        return f();
    }

What do you get if you invoke g()? We're all more familiar with lexical scope so with a language using lexical scope we quickly see that g() will invoke f() and f() will discover the nearest lexically scoped reference to x is in the outer global level and so g() returns 0.

However in a language with dynamic scope the following happens. x=0 is placed in the stack in the global context, g() is invoked and places a second value for x in the stack, f() is invoked. f() now resolves x through the call stack and finds the first value of x is 1. f() returns 1. g() returns 1.

Same code, opposite answers.

Its important to emphasise that nothing is wrong here. Both computations are valid. But what we are seeing is that what we name things and how we determine how those names map to their values is a fundamental property of any computation system.

The reason that lexical scope has come to dominate language design is that its simpler for developers to understand. But also it allows libraries of functions to be written in which the names used within the library are isolated from any possibility of collision with the namespace of variables currently being used by any programme code that calls that library function.

All very good, but there's one pretty significant problem. You cannot normalize computational state with lexical scope. NetKernel's systemic memoization must be doing something different then...

LISP, Scope and the Art of the Interpreter

John McCarthy's first version of LISP had dynamic scope. Later variants introduced lexical scope and later standards formalized on lexical scope.

Steele and Sussman wrote a beautiful paper The Art of the Interpreter in 1978, which progressively tells the story of scope and the implications on language design. The details are hard-going (especially if, like me, you're on very remote terms with LISP) but stepping above the details, you'll get a good picture of the evolution of thinking in this area in the 1970's.

Maybe the reason I particularly enjoyed reading this was that Tony and I went through a similar (but different) progression in working out NetKernel 4's (and 5's) general ROC abstraction. Although we weren't reinventing the wheel. In ROC we are definitely doing something new, both with identifiers, scope and resolution.

Scope in ROC

Its a somewhat strange thing to try to take the classical definition of scope and apply it to ROC. Since in ROC identity of state exists outside of language.

That other ROC system, the world wide web, dodges the bullet by having one address space - a single global scope.

In NetKernel we have no such luxury, we have given ourselves the freedom of multi-dimensional address spaces but at the price of needing to define an equivalent to scope for ROC.

In ROC we call the mechanism by which an identifier is resolved to an endpoint 'resolution'. Way back in 2009 Tony wrote an article on this...

http://durablescope.blogspot.com/2009/10/roc-dynamic-resolution.html

Its also extensively documented in NK and we also provide a set of video tutorials covering resolution in great detail (linked in the docs).

Taking it as read that you've looked at Tony's article and/or read/watched the docs and tutorials etc, we can offer the following definition of scope in ROC: Scope is that state representing the accumulated spacial structure acquired during the resolution *and* execution phase of a resource request

You might see that this is really very different to a programming language's concept of scope. Not least because if you think about it for a moment you see that in ROC there is no such thing as a variable. There are resource identifiers, but whenever they are referenced they are resolved to an endpoint. (Imagine a language where there are no variables, only functions and aliases for functions). This means from a set-theoretic perspective everything in ROC's logical domain is a set, there are no "atoms" such as in LISP.

I am probably losing you. I'm probably losing myself. Time for something concrete...

demo-scope

As I was reading up on some history and thinking about this article I decided to play around with a set of patterns to show the scope-like nature of the ROC address space. It seemed like a good idea to take the classical g()f() double scope pseudo-code (shown above) and implement the equivalent in the ROC domain. For good measure, because we have a lot of freedom in the ROC domain, I tried to provide an exhaustive set of examples including pass-by-reference and pass-by-value examples for f(x) too.

You can install the demo from apposite as package demo-scope. After install the documentation is located here...

http://localhost:1060/book/view/book:urn:org:netkernel:demo:scope/

Why not take a look at this and play around with the examples. You'll see that by default NetKernel resolution exhibits the equivalent of lexical scope (although, since resolution is dynamic, the scope is always dynamic!).

You'll also see that if you reference a function as an import within a space (something that is not easy to do in a programming language) then that pattern will exhibit something akin to classical dynamic scope.

You'll also see that pass-by-value and pass-by-reference exhibit lexical scope. Next time I'll explain how that is possible. Since remember that with pass-by-value we are not passing state, we are modifying spacial scope and passing-by-reference. Which could only possibly work if we imposed dynamic scope... or perhaps not... ;-)


Have a great weekend,

Comments

Please feel free to comment on the NetKernel Forum

Follow on Twitter:

@pjr1060 for day-to-day NK/ROC updates
@netkernel for announcements
@tab1060 for the hard-core stuff

To subscribe for news and alerts

Join the NetKernel Portal to get news, announcements and extra features.

NetKernel will ROC your world

Download now
NetKernel, ROC, Resource Oriented Computing are registered trademarks of 1060 Research


WiNK
© 2008-2011, 1060 Research Limited