|
NetKernel News Volume 1 Issue 48
October 1st 2010
What's new this week?
Catch up on last week's news here
Repository Updates
There are no package updates this week. Steady as she goes.
New Packages
- rdf-opencalais 1.1.1 *NEW*
- An opencalais document analyzer library (see below)
New: OpenCalais Library
The OpenCalais Library provides tools to work with the OpenCalais services operated by Thomson Reuters. OpenCalais is a document analysis service to which documents may be submitted and analyzed using a range of engines. Currently the library provides:
- active:openCalaisAnalyze - analyzes a resource with the OpenCalais service, which takes argument for the resource to be analyzed together with a flexible configuration operator providing a range of options including the required output format. Output formats include xml/rdf and JSON.
OpenCalais is a free-to-use service, but has certain usage restrictions and requires users to obtain a license key.
Install
The library is available from NKEE and NKSE multiverse repositories as package rdf-opencalais.
Example Usage
Here is a very simple example analysing a literal string with a basic paramsXML specification and returning xml/rdf...
message="""Friends, Romans and Countrymen, lend me your ears. Now is the winter of our discontent. What's in a name? That which we call a rose, by any other name would smell as sweet. The rain in Spain falls mainly on the plane. France is the home of smelly cheese. Albert Einstein discovered that E=mc^2 George Washington has a state named after him.""" params=""" <c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <c:processingDirectives c:contentType="text/raw" c:outputFormat="xml/rdf" /> </c:params>""" req=context.createRequest("active:openCalaisAnalyze") req.addArgumentByValue("operand", message) req.addArgumentByValue("operator", params) req.addArgument("licenseID", "h6jnqxxxxxxxxxxxxxxxxxxx") //Your opencalais license here resp=context.issueRequestForResponse(req) //RequestForResponse to keep mimetype set by opencalais context.createResponseFrom(resp)
which produces the following xml/rdf representation...
<!--Use of the Calais Web Service is governed by the Terms of Service located at http://www.opencalais.com. By using this service or the results of the service you agree to these terms of service.-->
<!--Relations: Country: France, Spain NaturalFeature: Spain falls Person: Albert Einstein, George Washington-->
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:c="http://s.opencalais.com/1/pred/"> ...Many details omitted...
<rdf:Description rdf:about="http://d.opencalais.com/er/geo/country/ralg-geo1/e165d4f2-174b-66a7-d1a9-5cb204d296eb">
<rdf:type rdf:resource="http://s.opencalais.com/1/type/er/Geo/Country" />
<c:docId rdf:resource="http://d.opencalais.com/dochash-1/e7466e29-8034-3c2e-988f-b814f452ddf5" />
<!--France-->
<c:subject rdf:resource="http://d.opencalais.com/genericHasher-1/e1fd0a20-f464-39be-a88f-25038cc7f50c" />
<c:name>France</c:name>
<c:latitude>46.0</c:latitude>
<c:longitude>2.0</c:longitude>
</rdf:Description>
</rdf:RDF>
ROC Architecture: Anatomy of a Fulcrum
When you start playing with NetKernel you quickly come across the term Fulcrum. There's a glossary entry that describes it as...
The reason it is known as a Fulcrum is that invariably the location of a transport becomes the pivot-point of an application architecture.
...which, when you're first finding your way in the ROC landscape, is not entirely enlightening.
As with Transreptor and Accessor and other concepts in ROC, we were breaking new ground, and as we started to structure our systems with architectural patterns we had to invent names. You may or may not like Fulcrum, but its kind of stuck.
Hopefully what follows will present a picture of why this particular pattern has this name, but more importantly, will give you a sense of the everyday composable architecture that can be applied in ROC engineering.
SSHd Fulcrum
There are several transports provided with NetKernel, including out-of-the box, two HTTP transports running in spaces called the "Frontend" and "Backend" Fulcrums. I don't want to dive into those as examples since they are by necessity somewhat more advanced than the core essence of the Fulcrum pattern. My aim is to highlight the general pattern, not specific instance-details.
So instead we'll consider the SSHd Fulcrum. As we'll see, it is a good example of the Fulcrum pattern, and also features a fair selection of other architectural tools and techniques. Which I hope will show how simple tools combine to provide robust engineering infrastructure.
If you missed it, there was a detailed introduction to the NetKernel SSH daemon back in early August, including a SliNKi presentation. A very brief summary is: The SSH transport allows you to use an ssh client to open a shell into NetKernel and issue resource requests directly into the space hosting the transport - you can also issue requests using ssh's remote command execution and scp file transfers.
NetKernel's SSH infrastructure comes with a pre-configured module which has a ready-to-use SSH Fulcrum space and, just as with the HTTP fulcrums, you use a dynamic import hook to expose your application to requests originating from the SSH transport. We're going to explore the details of how this is architected.
SSHd Fulcrum Spacial Architecture
The diagram below was generated using the current development of the architecture explorer tool. It shows the SSH Fulcrum space, its spacial import relations and, in this instance, a single dynamically imported application space (Demo1).
If you're tracking the progress of the tool, you'll see that it is going well, and can now group-by and show the physical module (tabbed bounding boxes) where spaces are defined (if you want to try it yourself here are the details). But lets not worry about that, lets use the diagram to discuss the detailed architecture of the SSH fulcrum. Each blue circled number shows a particular feature and has a corresponding paragraph below...
1. SSH Transport Instance
The SSHTransport is the first endpoint instantiated in the SSHFulcrum space. The red arrow indicates that this is a transport and that external events will initiate ROC requests into the space at this point. In this case, those events are ssh shell commands entered at the client-side console.
The diagram is showing that there is a real physical instance of an SSHTransport endpoint in the space at this point. However, as with most transports, it is instantiated using NetKernel's prototype mechanism, whereby an instance of the endpoint is resolved from the SSH Transport Module via an import.
2. SSH Transport Import Space
The SSH Transport Module is imported into the SSH Fulcrum (seen as the second endpoint declaration in the SSHFulcrum space). We see that the SSH Transport Module actually contains two spaces. The first, public space, contains a single endpoint PrivateFilterEndpoint. This is an overlay through which all requests to the wrapped inner space must pass.
The PrivateFilterEndpoint is a constraint which guarantees that any endpoints in the wrapped space that are marked as <private> cannot be accessed by external requestors on the public interface.
This pattern is an in-built construct of the Standard Module. The wrapper space and the filter are automatically created whenever you declare an endpoint as <private>...
<endpoint>
<private /> -- Because of this, the rootspace is auto-wrapped by the PrivateFilter overlay ...
</endpoint> ...
</rootspace>
The PrivateFilterEndpoint mediates access to the inner SSH Transport Module implementation space. In it we see there are various implementation endpoints including the SSHTransport endpoint from which prototypes are instantiated.
Also present in the library are some utility services that allow the SSHTransport to use an existing SSH authorized_keys public keys file to authenticate an SSH connection using PKI. The SSHTransport supports pluggable declarative requests for authentication and other operations but we don't need to worry about these details now.
In summary, we can see that the SSHFulcrum is receiving all its required SSH infrastructure services from the SSH Transport Module via the import and that only public endpoints are accessible to the SSHFulcrum space.
3. Throttle
Returning to the SSHFulcrum space, we see that the third endpoint declaration is a ThrottleOverlay.
The throttle is a transparent overlay that provides managed access to a wrapped space. We see here that the wrapped space is called Throttle C2Q500 - Dynamic Imports.
Requests to the space wrapped by a throttle are managed such that only a maximum number of concurrent requests are admitted. If the concurrency limit is exceeded, other requests are made to queue. If the throttle is queuing requests, then when an inner request returns, one of the queued requests is allowed to enter the managed space.
The throttle has a non-blocking asynchronous design so, although a logical request may be queued, the underlying physical threads can be assigned useful work in the mean time.
In this case, the throttle has a concurrency of two and a queue of five hundred (you may have guessed this from my cryptic name for the wrapped space "C2Q500").
The architectural reason we've introduced a throttle here is to manage any possible denial of service attack and to generally provide a "blow-off" preventer in the architecture to ensure our inner applications can never be overloaded by malicious or accidentally excessive numbers of externally originating requests.
The settings for the throttle are specified by a declarative configuration resource. If needed, they could be provided from an independent dynamic control application or sourced from a database, but here we're just using a static resource defining a fairly narrow throughput.
In summary, the Throttled C2Q500 - Dynamic Imports space is protected from overload and is guaranteed to never receive more than two concurrent requests. Through the throttle, the throttled space will receive the SSH requests originating from the transport.
4. Throttle / Dynamic Import Space
Within the managed Throttle C2Q500 - Dynamic Imports space we first see a matched pair of endpoints, SimpleImportDiscoveryAccessor and DynamicImportEndpoint.
The DynamicImportEndpoint is relatively simple to describe, it will import all spaces specified by a declarative configuration resource (the details are available here).
So you can think of it as being like a multi-valued version of the Standard Module's built-in <import>. Just like the import, it transparently provides a route from its origin space to the other spaces that it imports.
The difference with DynamicImportEndpoint is that it is Dynamic. If the configuration resource that defines the spaces to import is changed, then the DynamicImportEndpoint will reconfigure itself to import the newly specified imports.
If you wished, just as we did with the Throttle, you could configure a DynamicImportEndpoint with a static resource defining a set of permanent imports. However, things get interesting when you instantiate it so that its configuration resource is provided from a dynamic service.
This brings us back to the first endpoint in the space, the SimpleImportDiscoveryAccessor (SIDA). Again, this is also relatively simple to describe. This is an accessor which asks the Kernel for the list of all public addressable spaces in the running system. For each public space, the accessor issues a request to that space for the resource res:/etc/system/SimpleDynamicImportHook.xml. If that resource is present, and it has a value which matches the SIDA's type then that space is added to a list (for details of the type argument see the reference).
The resulting representation returned by the SIDA is a list of all spaces that declare a simple dynamic import hook that matches this SIDA's type argument. In this case it is the list of all modules that are 'tweeting' "IMPORT ME TO THE SSH FULCRUM SPACE" (metaphorically not literally).
Unsurprisingly the representational form of the SIDA's discovered import list, is exactly what is required by the DynamicImportEndpoint. So, to close the loop, in this instance the DynamicImportEndpoint is configured to request its configuration from the SIDA endpoint located next to it.
The net result of these two very simple endpoints, is that they combine as a pair to automatically find and dynamically import all spaces in the system which want to be exposed to the SSH transport requests.
If, at some point, a new space is added to the system (say a package is hot-installed), the SIDA's discovered import list becomes expired causing the DynamicImportEndpoint to re-request an up-to-date import list, causing the SIDA to rediscover spaces that wish to be imported, including (if appropriate) any of the new spaces. This all happens automatically and is a natural re-annealing of the architecture, enabled (with no technical considerations) by the ROC abstraction.
5. Dynamic Import Hook
To aid clarity, in the system under discussion, we've only got one space which is declaring res:/etc/system/SimpleDynamicImportHook.xml for the SSH Fulcrum. Its the Demo1 space and it has a static resource representation for its SIDA-Hook, which has this form...
<type>SSHFulcrum</type>
</connection>
At this point, we don't need to go into the details of what the Demo1 space is providing. If you've watched my recent presentation you'll have seen its just some basic ROC tricks.
However, with this one declarative resource, we can see in the diagram that the Demo1 space is imported into the main throttled Dynamic Import space and so is able to receive all requests from the SSH transport that are resolvable to it.
You can also see that you may have any number of other 'application spaces' each declare a matching import hook and get sucked in and exposed to the SSH transport in the same way. (This is what we're doing with the HTTP fulcrums and part of the reason I didn't show their diagrams is that they dynamically import lots and lots of stuff using this same pattern - which would be too much detail for the discussion of the Fulcrum pattern).
One last thought, with ROC its turtles all the way down, there's nothing that says res:/etc/system/SimpleDynamicImportHook.xml has to be a static resource. Within your application space you could easily map it to an implementation driven from dynamic code or sourced from a database. So for example, you could have a space that only allows itself to be imported to another space between a fixed time, or when other state is valid etc etc etc.
6. Masking Mapper
At this point in the discussion we've completed the architecture we require. We have an SSH Fulcrum which is able to issue requests to a freely and dynamically configured arrangement of applications. In essence, this is the architectural picture that we mean to convey whenever you hear the short-hand term Fulcrum.
You can also see why we named this architectural pattern a Fulcrum, since the space containing the transport which originates ROC requests is invariably the balance point above an array of application spaces that wish to be exposed to those requests.
However there's still one further pattern that's worth highlighting.
Notice that after the DynamicImportEndpoint we have a MapperOverlay which is providing a mapping to a wrapped space called Filter Limiter Jail. Also, below the mapper declaration, in the throttled dynamic import space there is one further import. That import is for the layer1 library (not shown for clarity).
Layer1 provides a host of useful infrastructure pieces including the implementations of both the SimpleDynamicImportAccessor and the DynamicImportEndpoint. So we need to import layer1 in order to be able to instantiate the first two endpoints in the space.
However layer1 has several other tools, many of which you wouldn't want to allow arbitrary SSH originated requests to reach. For example it contains an endpoint for file:/ requests to the host file system. It also has active:exec which can invoke arbitrary native process executions.
We don't trust whoever's logged in with SSH, and we need to keep a tight sandbox around the space and only allow ssh requests to reach the applications which are wanting to be exposed to them.
By default a basic standard-module <import> is non-discriminating and in this case it provides access to all of layer1's tools and services. Therefore we have placed the mapper ahead of the layer1 import (in resolution order) to mask it.
The mapper is configured to map (redirect) all potentially malicious requests that might reach layer1, to a new request with identifier GoDirectlyToJail. The GoDirectlyToJail request is issued into the mapper's wrapped space Filter Limiter Jail.
7. Limiter
If we look inside the Filter Limiter Jail space we see it contains a single endpoint, the LimiterEndpoint. This endpoint has been configured with a grammar that matches GoDirectlyToJail. So any jailed malicious requests will resolve to it. The Limiter, as its name suggests, terminates any requests it receives in a safe and ROC-consistent way. Therefore all GoDirectlyToJail requests are gracefully terminated.
In summary, the Mapper is providing a filter that is deliberately masking the address space that follows it. The Limiter is a very simple tool to terminate requests. When combined together, the two tools provide an elegant sandbox for our address space.
Summary
Its taken me a lot more time to describe this ROC architectural structure than it did to actually design and implement it. This particular engineering solution took about 15 minutes from start to finish and could, if required, be reconfigured just as easily. There was no code at all required to do this (you are rightfully sceptical, so here's the complete SSH Fulcrum's declarative module.xml).
The advantage of the Fulcrum pattern is that it allows an external transport to be set-up and managed as a stable architectural feature whilst being independently and dynamically impedance matched to a set of applications.
It is due to the Fulcrum pattern that you are able to hot-install applications and control panel tools to NetKernel without having to modify any configuration. ROC eliminates brittleness, separates architecture from code and runs faster too - please retweet.
Have a great weekend.
Comments
Please feel free to comment on the NetKernel Forum
Follow on Twitter:
@pjr1060 for day-to-day NK/ROC updates
@netkernel for announcements
@tab1060 for the hard-core stuff
To subscribe for news and alerts
Join the NetKernel Portal to get news, announcements and extra features.