NetKernel/News/3/4/November_18th_2011
search:

NetKernel News Volume 3 Issue 4

November 18th 2011

What's new this week?

Catch up on last week's news here

Repository Updates

The following NetKernel 5.1.1 updates are available in the NKSE and NKEE repositories...

  • http-server-2.10.1
    • Added a default root servlet context to satisfy libraries that expect to be running within a regular servlet engine.
    • Added the ability to override all http headers. If you set a header value in your response it will be used in preference to any automatically assigned header that might have been set by the HTTP bridge.
  • nkse-dev-tools-1.41.1.nkp.jar
    • Prevented exposure of internal pds:/ resources from this module.
  • nkse-visualizer-1.15.1.nkp.jar
    • Fixed javascript event binding so that paginated traces are bound to their menu when the list of request traces updates. Thanks to Glenn Proctor for reporting this.

The following update is available in the NKEE repository...

  • nkee-http-server-2.7.1.nkp.jar
    • Added a default root servlet context to satisfy libraries that expect to be running within a regular servlet engine.

RESTOverlay Specification

Now that we're out the other side of the NK5 release, we have some bandwidth to give attention to value-added features. One of the most common discussions we have with NK users is about the interplay between the HTTP domain and the ROC domain within NK. In particular implementing REST services.

As you know, the design philosophy of NK is the same as Unix - create specific components and combine as necessary for higher-order solutions. The NK HTTP stack is a case in point.

The HTTPTransport is nothing but an "HTTP event handler" that constructs and issues a request into its fulcrum space. This request provides the low-level HTTPServletRequest and HTTPServletResponse as arguments.

Therefore, technically, its possible to provide your own handler layer after the HTTPTransport which would couple in any arbitrary Java REST library and provide it with these primary HTTP request/response objects.

But in the out-of-the-box HTTP fulcrums we combine the HTTPTransport with an ROC focused abstraction over the HTTP primitives - the so-called "HTTPBridge". The design goal of this tool is to expose the low-level state of the HTTP request/response as true ROC-domain resources with SOURCE/SINK indentifiers. It also endeavours to "just take care of" some of the domain specific protocol machinations of HTTP. For example it automates ETag generation and 302 responses for cached items etc.

Also, controversially, to some, it always issues its request into the ROC-domain with a SOURCE verb. I won't re-open this can of worms - other than to say, that to NK, "HTTP is outside" and to the ROC domain an HTTP request is just an external event. The transport and bridge are inside the ROC-domain and they SOURCE a resource in order to work out how to satisfy the event.

However, whilst the Transport/Bridge combo goes so far, they are not and never were intended as the final answer (that wouldn't be very unix-like). As with the layering of the HTTPBridge over the bare HTTPTransport, it is perfectly reasonable to consider adding another higher order layer above the HTTPBridge.

So, lets open a new can of worms...

RESTOverlay

We've been kicking around some ideas and I want to open it up and start a discussion on what a new RESTOverlay could provide.

Earlier in the summer we added a cool feature with an eye on this moment. We made it possible for any endpoint to declare arbitrary user-specified metadata. With a declarative <meta> tag on its declaration. This gives NK the power of arbitrarily rich annotation of endpoints.

JSR-311 uses annotations to implement HTTP-method binding. Essentially this is a routing-pattern whereby a particular method is selected for a given HTTP request based upon a certain selection algorithm.

  • What if we had a RESTOverlay that would provide a similar routing?
  • What if we came up with a user defined meta data resource specification that you could declare on your endpoints and which would provide the context for the RESTOverlay routing decision.
  • What if the RESTOverlay could route the request to the endpoint based upon the HTTP method (aka verb)?
  • What if the RESTOverlay could examine the Accepts headers of the HTTP requests and take an intelligent decision about which resource best satisified the expressed desire of the HTTP-client. In short, what about dynamic content-negotiation?
  • What if the RESTOverlay could route based upon supported languages?
  • What if the RESTOverlay could route based upon user-agent?
  • What if the RESTOverlay could automatically bounce an HTTPclient to the SSL channel on https:// if the request came in on http:// ?
  • What if the RESTOVerlay could examine the encoding preference of the client and automatically compress the representation if the called endpoint indicated that its representation is compressible?
  • What if the RESTOverlay could dynamically generate MD5 or SHA ETag hashcodes and allow the HTTPBridge to deal with 304 responses?
  • What if.... well that's why this is a discussion... What should its capabilities be? What would your dream REST service layer do?

Design Plan

Setting aside the feature set of this hypothetical magic layer. How would it be implemented. Well here we can be more definite.

Basically the design will comprise an overlay which will enclose a space in which the candidate endpoints would be instantiated (either directly, by mapper, by imports, or by dynamic imports).

The overlay would have a configuration resource. Initially we think the only config it would need would be a base path to provide the context for the rest services to live below.

Here's an example, which sets up a RESTOverlay for res:/basepath/*. The space inside contains an import of the endpoint implementations...

<overlay>
  <prototype>RESTOverlay</prototype>
  <config>
    <path>/basepath/</path>
  </config>
  <space>
    <import>
      <uri>urn:expt:rest:impl</uri>
    </import>
  </space>
</overlay>

Here's how you might go about declaring rest metadata for your endpoints to get them noticed by the RESTOverlay. Notice that <simple> is exposing the URL endpoint identifier which will be prefixed by the ROCOverlay path. So these example endpoints will both match an external request for res:/basepath/foo - but the ROCOverlay will select the appropriate one based upon its negotiated routing algorithm.

<config>
  <!--GET foo-->
  <endpoint>
    <meta>
      <rest>
        <simple>foo</simple>
        <verb>GET</verb>
        <produces>text/xml</produces>
      </rest>
    </meta>
    <grammar>
      <active>
        <identifier>active:fooGET-text/xml-en</identifier>
      </active>
    </grammar>
    <request>
      <identifier>active:groovy</identifier>
      <argument name="operator">res:/resources/fooGET.gy</argument>
    </request>
  </endpoint>
  <!--POST foo-->
  <endpoint>
    <meta>
      <rest>
        <simple>foo</simple>
        <verb>POST</verb>
        <produces>text/xml</produces>
        <consumes>text/xml</consumes>
      </rest>
    </meta>
    <grammar>active:fooPOST-text/xml-en</grammar>
    <request>
      <identifier>active:groovy</identifier>
      <argument name="operator">res:/resources/fooPOST.gy</argument>
    </request>
  </endpoint>
</config>

One requirement is that each implementation endpoint must have a unique grammar. The RESTOverlay has to be able to differentiate them, and of course, we have to remember that we're not actually trying to request the endpoint - remember we're in ROC-domain and the ReSTOverlay is trying to request a resource. So each resource in the impl space has to be differentiated with a different identifier. (Actually this is no different to JSR-311 - each method has to have a different signature even if the URL is the same).

One feature we want to leverage is NK's grammars. So we can use the new simple grammar and have the RESTOverlay construct logical endpoints which combine the base path with the <simple> content - to construct full grammars. If the grammar has argument fields then the implementation endpoint should have an interface that accepts those argument names...

<config>
  <!-- GET USA/New York/New York -->
  <endpoint>
    <meta>
      <rest>
        <simple>{country}/{state}/{city}</simple>
        <verb>GET</verb>
        <produces>text/json</produces>
      </rest>
    </meta>
    <grammar>
      <active>
        <identifier>active:countryStateCity</identifier>
        <argument name="country" />
        <argument name="state" />
        <argument name="city" />
      </active>
    </grammar>
    <request>
      <identifier>active:groovy</identifier>
      <argument name="operator">res:/resources/cscGET.gy</argument>
      <varargs />
    </request>
  </endpoint>
</config>

As with the HTTPBridge, the RESTOverlay will also "do the right thing" with respect of 406 and even potentially 300 (multiple choices) responses. It will also, wherever possible ensure that all the state generated inside the inner implementation space is cacheable and so allow the HTTPBridge to deal with 304's etc.

We think a RESTOverlay could go a long way to finally putting to REST (intended pun) the RESTless debate that has caused so much unREST.

The meta-data feature list so far looks like this...

  • <simple>foo</simple> - full support of simple grammars in resolution (with basepath prefix).
  • <verb>GET</verb> - select endpoint based on HTTP request verb
  • <produces>text/plain</produces> - weighted mimetype match on GET/POST request Accepts header.
  • <consumes>text/xml</consumes> - mimetype match on POST/PUT request body type.
  • <lang>en</lang> - weighted I18N language routing.
  • <compression>gzip</compression> - Post-process compression negotiation
  • <userAgentPattern>.*Explorer.*</userAgentPattern> - match on user agent
  • <mustBeSSL/> - Automatic 302 redirect to https:// if protocol is http://
  • <mustBeAuthenticated/>
  • <ETag>md5</ETag> - generate streamed checksum ETag header

Next Steps

As far as NK is concerned all the machinery to put together this working specification of the RESTOverlay is in place. In fact its only about a days work to turn this draft into a real tool. But this is a stone with which we want to kill as many birds as possible. What do you think? What would make this the killer REST plumbing you lot deserve?

I've started a topic on the forums where we can discuss this...

http://www.1060.org/nk4um/topic/858/

...meanwhile I'll go back under my ROC.

Tip: Java Memory PermGen Setting

One of the tuning parameters that its worth knowing about is the Perm Gen memory size of your JVM. Perm Gen is the semi-static allocation of memory for storing classes, method metadata etc etc. An explanation of Perm Gen and why a separate area of heap is used for classes is provided in this article.

The reason its important to know about this, is that in a dynamic modular architecture like NetKernel, with its large diversity of available languages and ROC-libraries, the range and variety of classes that the system could potentially load is significant. Quite possibly more significant than the defaults assumed by the Java engineers when they decided the default ratios for heap and perm-gen.

The upshot is that if your Perm Gen gets full, even with plenty of regular Heap, you will see the JVM start to thrash (100% CPU use by the GC) and may eventually see an Out of Memory exception. So its advisable to keep an eye on the Perm Gen allocation, which you can do with the detailed system status...

http://localhost:1060/tools/sysinfo

The memory bars labeled "Perm Gen" will show you the current use and maximum capacity for your JVM instance.

Perm Gen does get GC'd but the nature of classes is that they tend to hang around once they've been loaded. So ensuring you have a reasonable headroom on Perm Gen is good engineering practice.

For example, recently I added yet another language runtime module and last weeks EXI library and a few other things to my development system, and when running our total set of unit tests I maxed out my JVM's default 128Mb of Perm Gen - even though my regular heap was less than 30% full.

If you think your default perm gen is too tight (it varies by JVM and generation of JVM what the default is - so you'll have to look at the sysinfo report for your instance), you can add and tune the following JVM option in the jvmsettings.cnf file in bin/ directory of your NK installation...

-XX:MaxPermSize=256m

For more details about memory tuning see Sun's FAQ.

Semantic Markup at BestBuy

Congratulations to Jay Myers, Lead Web Development Engineer at BestBuy.com, for his cutting edge work augmenting the site with machine readable formats - which, to the non-technical person, translates as "selling more stuff".

Also for his impeccable taste in "transformation mechanisms", about which, I am sworn to secrecy...

http://semanticweb.com/schema-org-microdata-rdfa-and-black-friday-at-bestbuy_b24643

Good luck for Black Friday, BBY team.

On Scope - Part 2

At the start of the month, prompted by thoughts of Lisp and its evolutionary progression through dynamic and lexical scoping models, we started an exploration of ROC and scope. We discovered that in ROC we have to recast the idea of scope to be extrinsic, outside of the domain of language, and spacial.

With the demo-scope examples we showed that fundamentally NetKernel has a dynamic scope model, but yet to the developer, the scope feels as though it behaves the same as lexical scope. If you're trying to maintain your classical perspective of scope, as pertaining to variables within a language, this might well seem paradoxical.

Before digging into how this trick is pulled off, we need to understand why adopting dynamic scope is important.

It is a long established practice of computation that looking up a computational value from a set of pre-computed values is more efficient and less error prone than performing the computation algorithm directly. Often the computational cost of lookup for a stored value is cheap, typically constant to logarithmic in time, whereas direct computation is rarely as efficient and typically is linear or worse. A clear demonstration of this simple model for efficient computation was given by Napier with the publication in 1619 of the first table of logarithms in Mirifici logarithmorum canonis constructio.

Log tables are a resource oriented computing system!

In languages it is possible to introduce memoization (either explicitly coded into a function, or implicitly provided by language extensions) to hold onto the return values of functions in order to reuse them for subsequent invocations. But this has limited applicability since it demands that any function implementing memoization must have referential transparency - that is it cannot use any external state other than that which it receives as its arguments. Or put another way, it must not have side-effects, if it did, its returned state would be ambiguous.

Our conjecture is that you can minimize total computational state - across the entire computation system.

But, clearly lexical scope does not provide sufficient information to disambiguate state across an entire system, since it innately insists that identity is localized (within the lexical reach of the definition of a stateful entity).

To achieve systemic minimisation of state, you must be able to identify state non-locally. Therefore, you can see that the only way you can do this is if you allow scope to accumulate as you perform a computation - to have a dynamic model of scope.

In fact you might also see that once you think about the fundamental nature of state, then scope actually reveal its real nature. Scope is actually just a dimension of identity.

Within NetKernel, an identity of a resource consists of both the resource identifier *and* the dynamic scope within which that identifier was resolved.

Its not easy to picture, but way-back when we were working out the General-ROC abstraction I played around with a notation that gives a little insight into what's going on. The gist of it is this, each space is itself identified, the identity doesn't really matter - its relative - but it is consistant and sufficient to disambiguate all state within the system...

This is one of a series of diagrams which actually depict the entire scope+identity state of a very early prototype of the Wink wiki application. You can see this is historical (circa 2007) - I was still using NK3's ffcpl:/ and had hypothetical endpoints like the "rewrite overlay" which we eventually implemented as the mapper overlay.

One of the surprising discoveries we made was that its actually very valueable to keep two forms of dynamic scope - one called the resolution scope, which is the scope prior to resolution of a request. The second is the resolved scope, and is the scope available to an endpoint and which is used should it go on to issue further requests.

By maintaining these two dynamic scopes, we are able to unambiguously track all state in the system. To achieve systemic memoization. In essence, NetKernel is a system where it is not possible to have side-effects. Since all state is identified and all interactions with state are known.

Loose Ends

Before I forget, I still need to resolve the apparent paradox of lexical behaviour but dynamic scope.

Hopefully you can see in the scope demos that the pass-by-reference examples rely on the fact that the name of the resource to compute is part of the identifier of the requested resource. Therefore the invoked endpoint can simply issue a request for the named resource and the accumulated dynamic scope, which may include more than one endpoint for that same resource identifier, but will essentially be resolved in lexical order.

The subtlety comes up with pass-by-value. You'll recall that with pass-by-value we are actually passing-by-reference but with the state held in a new transiently injected space. We are manipulating scope on the fly.

But, we also pull another trick. Every pass-by-value resource is given a unique identifier. So when the requested endpoint wishes to use that state it can request its identifier and will only ever get that unique resource. This is really weird when you think about it classically - because what I just said, would translate in the language domain, as "every variable has a unique name".

How the heck do you ever actually do anything then? Well, this is where the beauty of arguments comes in. An argument is a local alias for a resource identifier. If you write your endpoint to work with the argument name, you never have to care what the real identifier of the resource is. So pass-by-value, is turned into pass-by-reference with a manipulated scope and a unique resource identifier. And when you do this, you can see that we maintain a totally extrinsic model for all state in the system.

What are the consequences for language design?

What are the consequences for language design? This is one of those areas that I'm really looking forward to seeing emerge. Once you take ROC for granted as the computational fabric, it gives you a whole raft of new opportunities for language design. Since fundamentally maintaining state and correlation of identity is no-longer the first job of a language. The first job of a language is simply to be a state-machine coordinating requests for state and, where necessary constructing new spaces for transient state.

Actually there already is a language which starts with the assumption of the existence of the ROC abstraction. Its DPML.

DPML is a language with no built-in functions, variables or "atoms". Everything is a resource, all state is obtained by resource requests. If DPML is told to assign state to a named variable (the @assignment attribute) it does the same thing as pass-by-value. It actually gives it a unique name and constructs a real ROC space for it. When that state is referenced in a DPML request - the scope includes the DPML variable state. So the invoked endpoint, when it wants to use that state, actually makes a request back into the operational context of the DPML language execution!!

It might seem bizarre, but when you step outside of language and consider identity and state as the first order concerns of computation, you are liberated beyond your wildest dreams. Anything you consider a feature of a language, is something that is just a pattern in ROC. From procedural versus functional design, aspect oriented, memoization, currying etc etc...

You may say I'm a dreamer, but I'm not the only one. One day I hope you'll join me...


Have a great weekend,

Comments

Please feel free to comment on the NetKernel Forum

Follow on Twitter:

@pjr1060 for day-to-day NK/ROC updates
@netkernel for announcements
@tab1060 for the hard-core stuff

To subscribe for news and alerts

Join the NetKernel Portal to get news, announcements and extra features.

NetKernel will ROC your world

Download now
NetKernel, ROC, Resource Oriented Computing are registered trademarks of 1060 Research


WiNK
© 2008-2011, 1060 Research Limited