|
NetKernel News Volume 5 Issue 14
November 21st 2014
Repository Updates
The following updates are available in the NKSE and NKEE repositories...
- layer0-1.109.1
- fix to FastXPath to stop matching on bad expressions
- fix to stop NPE in legacy HDS equals method
- hds-g2-1.4.1
- Adds the active:fragmentHDS accessor (see below)
- mod-ldap-1.2.1
- NEW: First release (see below)
LDAP Resurfaces
A couple of weeks ago Dave McCormack of Boston College, asked us what had happened to the LDAP module between NK3 and NK5? The answer, was "Wow somehow it had got lost!".
To spare our blushes it took about an hour to sort this out and Dave kindly verified that the accessors do what they always used to.
So, we're pleased to say the LDAP tools are now back and can be installed from Apposite as mod-ldap. If (like the rest of us) you can't remember what is provided the docs are available here...
http://docs.netkernel.org/book/view/book:urn:org:netkernel:mod:ldap/
Thanks for keeping us honest Dave!
On HDS - Generation II
Back in May we released HDS G2, a new and fully revised version of our Hierarchical Data Structure (HDS). Today we've posted an update, adding the active:fragmentHDS accessor, which supports extraction of a sub-structure from HDS based upon an XPath expression.
Through the power of transreption, the second generation model is seamlessly compatible with the original HDS representation model provided in layer0. So the new model can be progressively introduced into existing systems without any side-effects.
For the record, HDS has long been our "House Red" representation type.
That is, when we're building stuff, if there isn't a compelling reason to use another representation model, then you can be pretty sure we'll use HDS. For example, all the backend NetKernel system services and tools, such as the visualizer, use HDS.
All well and good. But why should you care and how do you use it and what's so good that justifies a second generation? Time to explain...
What's the best structure for a Representation?
We all know by now that a Resource is intangible and abstract. A Representation can only provide a concrete snapshot of the state of the abstract Resource. We also know that there is no single Representation for any given resource's state. If there were then Transreption wouldn't have any purpose.
As usual I've left you with philosophical castles in the air. When we do things for real, we need to get off the fence. We need to ask the question:
What's the best structure for a Representation?
Ha! Best! Yeah right, good luck answering that one...
...actually I think there is a way to answer this question.
Back in early 2012 I wrote an article ROC and Data Structures that attempts to outline how we can determine the "goodness" of a given representation structure.
If you've not read it before take a look. The concise summary is:
Given R = f(x) we require that when x changes we do not have to recode f() in order to obtain R. That is, we require code stability irrespective of whether x is modified (strictly: expanded).
It turns out that tree structured data is special in that it enables this very stable solution.
To understand a little more, we can see that trees are elegant in that we can uniquely identify any sub-part of a tree using extremely stable "vectors" (in HDS we call them XPaths). Provided we code by using these vectors, our code will remain functional even if large parts of the tree are altered, so long as the original structure is still present as a sub-set of the new structure.
It turns out that the evolution of representation models by expansion (preserving existing sub-structures) is a very very common occurrence in real world information systems.
For example, I am doing it now. Whenever we write something we are expanding a tree structure whilst retaining the existing sub-structure. In fact pretty much whenever you communicate you are exchanging (and expanding) tree structures - this is why the software in our brains can cope with the tens of thousands of brand-new sentences we encounter every day. (See the reference to Pinker's "The Language Instinct" in the earlier article).
Maybe you prefer a more technical example. Well its not coincidence that HTML, XML and now JSON are tree structured data models. The representation for the exchange of state for Web systems has also discovered that trees are good.
Our next question might then be:
So what's with HDS? Why not use XML or JSON?
This is a good question - indeed if you are exchanging state with a REST service then using these representations is nearly obligatory. However, in ROC we have more freedom, most notably because we are not constrained by whatever language/data model the browser imposes. In ROC our endpoints are pure and located within the ROC domain, therefore we can use the best possible representation model...
Desirable Properties for ROC Representations
Having established that trees are wonderful representations from a system perspective we must also acknowledge that one size does not fit all. There are also challenges associated with trees.
The first issue is that when we write code (our f(x) functions over the tree state), programming languages are naturally oriented to iteration. That is, sequences of operations on identically structured states. It is therefore desirable that any representation model should be able to rapidly yield Lists for efficient iterative processing.
The second issue is that we don't always know where in the structure the information is located. We can search for it (the initial requirement when interacting with graphs) but searching imposes a significant processing overhead. The best thing would be if any given value in the structure had a unique name. Indeed this is precisely what a Map provides - a key-to-value structure that has constant time usage cost. Trees don't seem much like Maps do they?
Finally, sometimes one tree is not enough. Sometimes its useful to be able represent a set of trees: this structure is called a Forest. Neither XML nor JSON support forests.
Interestingly, a forest of single root trees is actually a List. A forest of linearly branched trees is an array, and carrying on branching we can construct multi-dimensional arrays. All highly iterable.
Lastly we also have some requirements that are important for an ROC system.
It is very desirable that a representation model be immutable - the reasons are well understood and are the same for any highly concurrent system: immutability ensures that parallel processing is side-effect free and requires no locking.
A resource can have any number of representations so it is very important that any representation structure does not impose any restriction upon the type of representation that it may hold. While somewhat flexible, XML and JSON have a limited set of supported types which is over-constraining on our representation choices.
With all of these considerations we start to see that neither XML nor JSON stand-up. They are indeed mostly fine for REST services but with what we know about Resource Oriented state they are unsatisfactory for ROC solutions.
It was with this context, that we developed HDS.
Properties of HDS
The HDS representation model is very simple:
- HDS consists of named nodes.
- Each node may have a value and this may be an arbitrary object.
- Each node may have any number of child nodes.
Supported Structures
- An HDS with a single node is just a named value object.
- An HDS with a single node with any number of child nodes (with or without further children) is a tree structure.
- An HDS with several nodes and no children is a list.
- An HDS with several nodes with or without children is a forest.
XPath Vector Locations
Since every node in an HDS has a name, then every location has associated with it a unique vector (the step-wise path that can be followed to reach the node).
- HDS structures can be very rapidly navigated by expressing an XPath (vector) for a node.
- HDS structures can rapidly yield lists by expressing an XPath for a set of nodes.
- HDS structures behave exactly like maps in that every XPath vector *is* the distinct name of a given node (or nodes).
- HDS caches XPaths which, as with a map, yields constant time lookup for repeated operations
- HDS supports inversion - that is finding nodes by their values. Keys (a named XPath vector for a set of node values) may be specified and a node (or nodes) in the key-set found from its value.
Immutability
- HDS is immutable
- HDS can be evaluated using a Reader providing threadsafe concurrent access to the structure
- HDS can be created/modified using a Mutator
- Obtaining a mutator from an existing HDS yields a new mutable clone and the existing HDS is untouched.
- Mutators support iterative construction (step wise addition of nodes with push/pop navigation of the structure).
- Mutators support XPath vector construction of structures (instant tree - just add values!)
- Mutators permit restructuring operations - nodes can be moved, nodes can be copied, nodes can be deleted
- Mutating restructure operations are expressed using XPath vectors without the need to navigate.
- Mutators support an underlying cursor which provides relative context for operations - mutators therefore support either global (rooted XPath vector) or local (relative XPath vector) operations.
Structural Operations
- Both Readers and Mutators support full XPath indexing and value based selection.
- XPaths supports a rich set of functions evaluated over the node names or values
- XPath statements on an HDS may also yield aggregate/meta values from the HDS - for example, you can retrieve a count of all the nodes named "foo"
- XPath supports named contextual axes: children, parent, ancestor-of etc
- XPath supports logical set concatenation including negative membership ( not(...) )
Representations
- HDS can be serialized for wire-transfer between systems
- HDS can be serialized to an XML represetation
- HDS can be constructed from XML
- HDS to and from JSON is coming soon!
In summary, HDS is the distillation of all our experience in creating ROC solutions to provide a representation model with the flexibility, ease of use and change invariant characteristics that the ROC abstraction makes possible.
HDS by Example: Getting Started
Here is an example of the interplay between the primary classes in the HDS model. First we construct a new document, notice that the constructor immediately provides a Mutator since what use is an empty document?
import org.netkernel.mod.hds.* //Create a new document and get its IHDSMutator IHDSMutator m = HDSFactory.newDocument() //Add some nodes m.pushNode("a").pushNode("b","B") //Obtain the immutable IHDSDocument from the mutator IHDSDocument d=m.toDocument(false) //Obtain a reader for the IHDSDocument IHDSReader r=d.getReader() //Read the value of the b child node of a. String value=r.getFirstValue("/a/b") //Return the value (which should be "B") context.createResponseFrom(value)
Secondly we add some nodes by using the pushNode() method. At this point our HDS looks like this (this is an example of the result of a toString() on an HDS - the null: shows that there in every HDS there is an anonymous root node - sort of a hidden root which can never be accessed)
null: a: b: B
Thirdly we "fix" the mutable HDS structure into an immutable IHDSDocument by calling toDocument(). The boolean flag determines if the fixation should in addition clone the HDS. It can be useful to split off an independent immutable copy before continuing further mutation operations.
Lastly we obtain a threadsafe IHDSReader from the IHDSDocument. The Reader allows us to access the structure to interrogate values and/or evaluate structural XPath expressions. Notice that the HDS cannot be modified and so lock-free concurrency is supported.
Compositing HDS Structures
In ROC we frequently find we want to construct a representation as the composition of one or more other resources (representations). HDS is very easy to "slap together" (to quote the technical term favoured by Mr B Sletten).
Here's an example that combines two HDS structures...
import org.netkernel.mod.hds.* //Create d1 m = HDSFactory.newDocument() //Add some nodes IHDSDocument d1=m.pushNode("a").pushNode("b","B").toDocument(false) //Create d2 m = HDSFactory.newDocument() IHDSDocument d2=m.pushNode("c").pushNode("d","D").toDocument(false) //Mash d2 into d1.. //Get the root "c" node of d2 c=d2.getReader().getFirstNode("/c") //Get a mutator for d1 m=d1.getMutableClone() //Select a context node /a/b and append c (and its children) m.getFirstNode("/a/b").append(c) //Return the value of /a/b/c/d - should be "D" context.createResponseFrom(m.getFirstValue("/a/b/c/d"))
Here we construct d1 that looks like
null: a: b: B
We construct d2 that looks like
null: c: d: D
We then obtain a mutatable clone of d1, obtain a reference to the root /c node of d2. Finally we use append to attach the c-sub-tree onto the /a/b branch. The result is the expanded tree...
null: a: b: B c: d: D
Sometimes we don't want to compose a tree but just need to group them together in a Forest...
import org.netkernel.mod.hds.* //Create d1 m = HDSFactory.newDocument() //Add some nodes IHDSDocument d1=m.pushNode("a").pushNode("b","B").toDocument(false) //Create d2 m = HDSFactory.newDocument() IHDSDocument d2=m.pushNode("c").pushNode("d","D").toDocument(false) //Get a mutator for d1 m=d1.getMutableClone() //Get the root "c" node of d2 c=d2.getReader().getFirstNode("/c") //Append the c tree to the root of d1 m.getFirstNode("/").append(c) context.createResponseFrom(m.toDocument(false))
This results in the following structure...
null: a: b: B c: d: D
Finding leaf nodes
Typically we find that all the really interesting and useful state is held in leaf nodes. We can use XPath to find and iterate over any set of nodes of interest. This example finds a set of leaf nodes and attaches them to a new HDS structure...
import org.netkernel.mod.hds.* //Create d1 IHDSDocument d1=HDSFactory.newDocument() .pushNode("a") .addNode("b","B") .toDocument(false) //Create d2 IHDSDocument d2=HDSFactory.newDocument() .pushNode("c") .addNode("d","1") .addNode("d","2") .addNode("d","3") .addNode("d","4") .toDocument(false) //Append all "d" nodes as children of "a" in d1.. //Get a mutator for d1 m=d1.getMutableClone() //Select /a as the place to append the nodes m=m.getFirstNode("/a") //Append all the d nodes... d2.getReader().getNodes("/c/d").each { d -> m.append(d).popNode() //Append onto m and then popNode to step back to parent again } //Return the count of d nodes where the value is greater than 2 (there should be 2)... context.createResponseFrom(m.getFirstValue("count(/a/d[ . > 2])").toString())
Notice that we are finding the leaf nodes of interest using the IHDSReader.getNodes(). Notice also that when we append each found node the mutator cursor is moved to the newly appended node. To ensure our nodes are attached in sequence rather than as a deeply nested hierarchy, we call popNode() on the mutator to nudge the cursor context back to "/a"
The resulting HDS looks like this...
null: a: b: B d: 1 d: 2 d: 3 d: 4
Finally, notice that we are returning this response...
m.getFirstValue("count(/a/d[ . > 2])")
This is using getFirstValue() to evaluate an XPath expression, in this case we are counting all the /a/d nodes whose value is greater than 2. So the response is 2.
Inversion using Keys
Another common pattern in ROC is to know a value but not know its location in the structure. For this we can use keys...
import org.netkernel.mod.hds.* IHDSDocument d1=HDSFactory.newDocument() .pushNode("a") .addNode("b","1") .addNode("b","2") .pushNode("c") .addNode("d", "X") .popNode() .addNode("b","3") .addNode("b","4") .pushNode("c") .addNode("d", "Y") .popNode() .toDocument(false) //Append all "d" nodes as children of "a" in d1.. //Get a reader for d1 and declare a key "DSearch" on all c nodes with child d r=d1.getReader() r.declareKey("DSearch", "//c", "d") //Find a c node whose d node has the given value "X" c=r.getFirstNode("key('DSearch', 'X')") //Return the c node context.createResponseFrom(c)
The HDS structure we create as d1 looks like this...
null: a: b: 1 b: 2 c: d: X b: 3 b: 4 c: d: Y
We get a reader and declare a key with name "DSearch"...
r.declareKey("DSearch", "//c", "d")
this key indexes the first child node d of all c nodes no matter what depth in the tree they occur.
With the key declared we can use it by name and find all relevant nodes that have a given value...
c=r.getFirstNode("key('DSearch', 'X')")
This matches on the value of d, but because the key was declared on c nodes, the result is not the d node, but the full structure of the relevant c node (whose d has value "X")...
c: d: X
The index associated with the key is cached so we have effectively turned the document structure into a constant time inverted map.
Summary
Flexible representation structures that support vector access enable very long-lived, stable and high performance solutions. HDS is what we use when we have the choice (and often even when we don't!). The second generation API has a great deal of power and elegance - if you've not tried it then its definitely worth getting familiar with.
hds-g2 can be installed from Apposite and used as a library by importing
<uri>urn:org:netkernel:mod:hds</uri>
</import>
After installation its a good idea to generate the javadoc and then you can view the full HDS G2 API here.
Please let us know how you get on...
Have a great weekend!
Comments
Please feel free to comment on the NetKernel Forum
Follow on Twitter:
@pjr1060 for day-to-day NK/ROC updates
@netkernel for announcements
@tab1060 for the hard-core stuff
To subscribe for news and alerts
Join the NetKernel Portal to get news, announcements and extra features.