WiNK edit

wiki /NetKernel /News /2 /17 /February_18th_2011

NetKernel News Volume 2 Issue 17

February 18th 2011

What's new this week?

Repository Updates
Infinite Monkeys
Book Review: QI Book of General Ignorance

Catch up on last week's news here

Repository Updates

The following update is available in both the NKEE and NKSE repositories...

http-client 2.3.1
- Added <expectContinue> configuration option to disable 100-expect-continue negotiation on entity bearing methods with legacy HTTP 1.0 servers/proxy. When working with legacy servers this can eliminate an unnecessary fallback delay. The default is unchanged and expect-continue optimisation is used. Thanks to Jeff Rogers at Findlaw.com for suggesting this.

Infinite Monkeys

To be or not to be? That is the question at the heart of ROC.

There's no need for me to tell you about the Infinite Monkey Theorem. Given enough time, or conversely enough Monkeys, it is proven that one of them will eventually type a copy of Hamlet.

Ay, but here's the rub... As Hamlet was only too well aware - we inhabit the finite envelope of reality...

Consider an ideal physical scenario: the largest possible set of monkeys, furnished with the most possible time. Could they reify a representation of Hamlet?

If we had as many monkeys as there are observable^* particles in the universe (10⁴⁰), and each one types 1000 characters per second for the life of the universe (10²⁰ seconds), the probability of "real" monkeys producing Hamlet is 1 in 10^129,938† which is a close approximation to zero.

Are we condemned to suffer the slings and arrows of [this] outrageous fortune? After all it is not zero, just very very very close to zero.

Unfortunately we monkey's have a very limited conception of large numbers. Any number with 129,000 odd zero's after it is really really big. So the reciprocal is near as damn it zero. Its still hard to get your head around - so consider this, you definitely would get a copy if you had 10^129,938 monkey universes working for you.

But it gets worse...

Notwithstanding that we've used up all the atoms in the universe just on monkeys and have nothing to write down the answer on. There's the problem of signal-to-noise. How would we ever find the one good representation if it ever did crop up? Like Tony's Death Star paradox, the information problem of measuring the monkeys kills us.

Computing a representation is one thing. But not knowing where it is when you have? That's just plain careless.

* "observable" gets us off the hook of the hypothetical "dark-matter monkeys"
† This is my number, the number quoted on Wikipedia is even less likely, it seems wrong so I didn't believe them. This is fantasy monkey world so you can make your own number up. A basic calculation goes: the chances of getting Hamlet at random by one monkey is 1 in 26 for each character of the 130,000 characters in the play (discounting captials, punctuation and whitespace - we've had enough of whitespace errors). Which is 1 in 2.6x10^130,001. Add more monkeys, even astronomically large numbers, with astronomically large amounts of time and you still barely dent this. (Divide the single monkey worst case by 10⁶³ = monkeys x age of universe x typing rate)

Lucy's Carelessness

Strange as it may seem, the infinite monkey problem is closely related to ROC. In Hamlet's terms, ROC endeavours to take arms against this sea of troubles...

In classical software, computation occurs in a context-free system. That is, the information that is computed is used once but generally it is very difficult to determine if it could be used again.

Here's a typical example, did you ever look at the Java Pet Store demo? Here's the first couple of lines of index.jsp, presumably the most visited page of the demo...

<%
try {
    CatalogFacade cf = (CatalogFacade)config.getServletContext().getAttribute("CatalogFacade");
    List<Tag> tags=cf.getTagsInChunk(0, 12);
    // since top 20 come from database or desending refCount order, need to reorder by tag name
    Collections.sort(tags, new Comparator() {
        public int compare(Object one, Object two) {
             int cc=((Tag)two).getTag().compareTo(((Tag)one).getTag());
             return (cc < 0 ? 1 : cc > 0 ? -1 : 0);
        }
    });    
%>
<html>
...etc etc etc...

The details are irrelevant, the thing to notice is: Collections.sort(...).

Yes, this implementation sorts the same pseudo static set of tags for every end-user page request...

Monkey produces Hamlet.
Hamlet is read once.
Hamlet is shredded.
Over and over and over and over...

Take a look through any other part of the pet store demo and you'll see this kind of thing repeated throughout. Recall this demo was the grounds for fervent competition on performance between Sun and Microsoft when .Net was launched. Presumably, this is not throw-away code? This is apparently the best it gets.

Now I'm being cruel to be kind. Its easy to throw stones, and if you have a classical code-based view of the world you wouldn't even conceive that there might be an alternative. I'm picking on the PetStore demo since its a definitive reference architecture and so it is the Lucy species archetype for conventional software.

The endemic problem with classical software models are that you have very little choice but to forget which monkey hit pay dirt.

The trouble is that a context-free computation system cannot differentiate the signal from the noise, it is careless by default. The Death Star problem is endemic to classical software.

Out of the trees, on to the plane

OK, so my monkey metaphore is getting stretched a little too far. No doubt you're ahead of me and saying "yeah but with *real* systems we're not dealing with random data sources". Excellent! You have fallen straight into my cunning trap!

Indeed, reality is a very interesting thing. You will be pleased to learn that I shall not attempt to define it. But I will point out some key features.

When we interact with the world, we constrain information problems with layer upon layer of context.

Shakespeare was not a random monkey, the accumulated state of his brain provided the context within which Hamlet could be created. The Hamlet soliloquy alone demonstrates that Shakespeare had thought long and hard about the nature of existence. I don't think it controversial to suggest that he drew on a deep and rich contextual foundation.

Similarly when an organism reproduces, the hard-won context of its evolutionary state to suit an environmental niche, represented in its DNA, is not discarded but is transferred, with slight variation, to the next generation. On face value, it seems like a tautology to say "Monkeys breed more monkeys", and yet that's a direct illustration of how real information systems behave. A 'proof-by-absurdity' is that "Monkeys give birth to Parrots" is patently nonsense.

In short, in the real world, accumulated state has a strong bearing on the future state of a system. Which you can put another way: the real world is not random, it obeys macroscopic statistical distributions.

Natural systems, like monkeys, commonly exhibit normal distributions. But constrained systems, like Web apps, or order processing systems, or CRMs, or warehouse management systems, or [you name your business IT system here], have much tighter constraints and so have much tighter statistical distributions (power distributions etc).

As a concrete illustration, every single person hitting the pet store requires the same information present in the sorted list of tags. This is a delta-function distribution.

Admittedly, sometimes you can spot reusable state and you can refactor and introduce a cache into a layer of your classical software architecture. But it is far from easy, especially if your starting point for an information system is code.

Resource moNKey

NetKernel takes a new and general approach. NetKernel considers all state as the outpourings of a sea of monkeys (I'm not being disparaging about any given set of applications - its just that down inside the kernel, the abstraction has zero interest in what it is processing, it is non-discriminating. Its all monkeys to NK).

When, having delivered some resource state (either externally via any transport, or internally to a another endpoint), rather than shred it, NetKernel labels it (with its resource identifier) and holds on to it...

Information is precious
Reality is statistical
"someone might want this again", it thinks.

It does something else too. It remembers the context in which a representation was created. It remembers the multi-dimensional spacial relationship between the requestor and the creator of the information.

On face value this might seem irrelevant. After all, didn't we say that we know its identity (label) - surely that's enough? Not so. A resource is only truly disambiguated when it has identity and context; a location in space.

If you're familiar with REST this is new territory. REST and the Web is mono-contextual - it only considers a single DNS-backed global space. So URL identity alone is sufficient for Web caches.

ROC is a general multi-dimensional generalisation from REST. We must confront the general situation: identity is relative. (In fact information is relative - but that's a can of worms we don't have time for now).

So NetKernel remembers the context of a given resource, which leads to a pretty amazing result...

We figured out in NetKernel 4 that often, two different requestors will have very different request contexts, but they are seeking the same information. NetKernel 4 is able to determine the point at which contexts overlap and can disregard the irrelevant contextual divergence. Which means NetKernel automatically normalizes the stored state. (this is very very nifty and bears further consideration to see how novel it really is).

All of which is great - but lets not let "the brains we had, go to our heads" (Oasis' best lyric?)...

The signal-to-noise problem is still there!

If you hang on to everything, eventually nothing has any value (the revenge of the Death Star?). Or put another way, how do you spot the valuable monkeys? How do you stop the unbounded growth of the useless state? How do you boost the signal and suppress the noise?

Well, it turns out that the ROC abstraction comes to our rescue again. One of the implications of separating logical resource requests from physical execution is that we automatically, and for free, measure the computational cost of every representation. We literally know the CPU time that any thread took when executing a given endpoint. So we know precisely how expensive a piece of information was to compute.

We also, trivially, know how often a given resource is requested and the time a given representation has been in the cache without being requested.

With these simple metrics, which are provided for free by the abstraction, we can work out a dimensionless cacheability index - a monkey value rating!

We use the old trick of taking an "inverse scale" - the closer to zero you are the more valueable something is. So the index is proportional to time and inversely proportional to a weighted combination of computation cost and number of requests.

Periodically, and/or based upon heap usage triggers, NetKernel looks at the stuff it is holding on to and culls the dead or dying monkeys.

Tracking Reality

In practice NetKernel accumulates operational context, it discovers the statistical distributions of the real world information. It automatically and continuously tunes the system so that it always sits in the sweet spot. (Thinking of the whole system thermodynamically, we're finding and tracking a minimum in the computational energy surface. Plus, since transreption is an innate property of the system, we're also finding and minimizing the entropy in realtime too.)

But there's one more trick up our sleeve...

NetKernel tracks the dependency relations innate to information resources. This is definitely a whole other topic, but it turns out that invariably the majority of information in an ROC system is derived from other information resources. (This shouldn't be too much of a surprise - its a feature of reality cf. Shakespeare and living organisms, above.).

For atomic consistency, we only give someone a resource from cache if all its resource dependencies are valid.

NetKernel doesn't care what the dependencies are. Just that they are valid.

It follows that when any given resource expires, all its dependents automatically and atomically expire too. In brutal monkey terms: if a monkey is not fit for purpose, it and all its progeny get efficiently discarded irrespective of their cacheability index. In a very real sense, the NetKernel cache is truly survival of the fittest.

One of the beautiful consequences of ROC's innate computational minimisation is that you can create patterns in which external information may be modelled within the system as resources and can, with no code linkage, co-ordinate its cacheability by attaching imaginary dependencies. We call this the "Golden Thread Pattern" - you can install a demo from the repositories, look in apposite.

Whether or not this article made sense, you can at this point happily disregard it^*. All of this happens under the covers and does not need you to consider it. Take it as read: NetKernel self optimizes.

If Hamlet had had ROC, things might have been so different:

To be or not to be, that is a question that doesn't worry me...

* Yes you just spent 10 minutes grappling with concepts and the ramblings of an evangelical ROC nutter, to be told you don't have to think about it. I wish that it were not so. In fact talking about the performance facets of ROC is a diversion - its a valuable side-effect that ROC is faster and more efficient - but that's not the point. The point of ROC is to make information systems less fragile, more evolvable and of a larger and more sophisticated scale. I hope that if I talk in depth about how we've thought through the tangible physical system properties, that you also gain the confidence to trust me when I say that the intangible properties of ROC systems are also true.

The undiscover'd country

In these articles, I am obviously trying to persuade you that ROC is something you should explore.

I know its not easy to step back and perceive the world beyond classical code. If that's where you've come from (who hasn't?), then it can feel unsettling to step into the ROC world. But its actually not difficult, just different.

Hamlet had similar concerns...

But that the dread of something [new],
The undiscover'd country...makes us rather bear those ills we have.

...but then he hadn't heard about ROC.

Book Review: QI Book of General Ignorance

The infinite monkey theory is one of those ideas that has been absorbed into popular culture. Unfortunately, it mostly gets stated as a positive assertion about the nature of randomness and the emergence of information. As usual things are more subtle and more interesting than the headline.

Anyway, it stands as an example of one of those remarkable "universal truths", common facts we believe are true but which are wrong. The fact that these are then absorbed into our culture is often the result of some fascinating journey.

In the UK we're very lucky to have the BBC programme QI - QI standing for Quite Interesting. This masterpiece of factual entertainment was devised by John Lloyd, after some kind of mid-life crisis and a dawning awareness that everything he thought he knew was maybe not true (I draw no allusions - I trust, dear ROCer, you are able to fill in this perfect NK marketing opportunity yourselves).

Anyway, the programme is fun and "quite interesting". But they obviously put in a ton of research for each question, most of which doesn't get seen due to the narrow format necessary for TV. Fortunately all this back material doesn't go to waste, much of it gets repurposed and published in the QI Book of General Ignorance.

I don't actually recall if infinite monkey's are mentioned in it (I made all that stuff up), though I'm sure it must have come up at some point, but thinking about the monkey theory reminded me that I read the QI book over the holidays. I wouldn't be mentioning this at all if I didn't highly recommend it. It'd be perfect for a flight. Plus even if you've seen the TV show and you remember some of the "anti-facts" there's still a lot extra stuff with each item.

Incidentally, the subtitle is: Everything you think you know is wrong. Which is a refreshing thought.

NetKernel West 2011

The conference will be quite interesting too. Find out all about it here...

http://www.1060research.com/conference/NKWest2011/

Have a great weekend.

Comments

Please feel free to comment on the NetKernel Forum

Follow on Twitter:

@pjr1060 for day-to-day NK/ROC updates
@netkernel for announcements
@tab1060 for the hard-core stuff

To subscribe for news and alerts

Join the NetKernel Portal to get news, announcements and extra features.

NetKernel will ROC your world

Download now

NetKernel, ROC, Resource Oriented Computing are registered trademarks of 1060 Research