Agile Renaissance


Using FitNesse to Validate Nested Business Structures

Last year while working at a client site I helped a team validate the work they were doing on one of the company's products. The product has a business structure that is nested in nature and looks like this:


The E, B, and W structures are similar in nature in that they each have a definition part and then contain one or more nested structures. There can be as many as twenty five B objects nested inside an E object and up to a total of one hundred W objects nested inside the different B structures. The H structure is actually one of over one hundred different types and one of more of them, usually four to eight, are contained in a W structure.

By the way, this nested business structure pre-dates computers and is well understood by the organization's staff.

The team I was working with were developing a new component that performed a core business process using information from all levels of this business structure. The development team were creating and validating their work using unit tests. However they needed to have component level tests as well. These would not only be written by members of the team but needed to also be written and executed by other members of the organization over the lifetime of the component. As the organization already had positive experience using FitNesse, it was decided that it would be used to do the component level testing. The team started by creating tables of test facts for definition values: EDef, BDef, WDef and one of the H structures, h1. Simultaneously, I created the associated fixtures, which were based on ColumnFixture. A typical table would look like this (note that it is proceeded by a table that defines the import tat all subsequent tables will use:

import
com.client.lob.product1.edeffacts

 

edeffacts
val1val2>val3>=create()
12>3>edefval1
45>6>edefval2

Once all of the definition tables had been created, the next step was to assemble a test scenario. To this, required developing intermediate objects, thus creating a multi-step process. These objects are created in reverse order starting with W and then B and then E. The following six tables show this process, assume that the FitNesse symbols wdefval1, bdefval1, bdefval2, h1val1 and h1val2 exist. Construction of a E, B or W object requires two tables: the first to create the object binding in its definition and the second binds in its nested structures. I did this way as opposed to having one table composed of 27 columns, because for most of the tests, most of the columns would have been empty and that bothers me and there was no guarantee that the current limit of 25 would be kept. In the following example, the table creates two W objects, w1 and w2:

w
def==create()
wdefval1w1
wdefval1w2

The next table populates each of these W objects with a H1 object:

w
w=h=add()
w1h1val1true
w2h1val2true

Now to create the two B objects and then populate them with the created W objects:

b
def==create()
bdefval1b1
bdefval1b2

 

b
b=w=add()
b1w1true
b2w2true

Now to create an E object and then populate it with the created B objects:

e
def==create()
edefval1e1

 

e
e=b=add()
e1b1true
e2b2true

Note how in the last table, the e1 object appears in two rows, this how the two B objects are added to it. The result of all of this composition is that the E object, e1, contains two B objects, b1 and b2, and these in turn contain w1 and w2 respectively and these in turn contain h1val1 and h1val2 respectively. With the test scenario object created it is now time to tested that it was processed correctly. The first set of values to be validated occur at the E level and the test table to do this looks like this:

validateElevel
e=process()eLevelVal1()eLevelVal2()eLevelVal3()
e1truee1xxxe1yyye1zzz

The processing also created values at the B and H levels, which also need to validated., The following shows how H level values are validated:

validateHlevel
e=b=w=h=wLevelVal1()wLevelVal2()wLevelVal3()
e1b1w1h1val1e1b1w1xxxe1b1w1yyye1b1w1zzz
e1b2w2h1val2e2b2w1xxxe2b2w1yyye2b2w1zzz

Notice that to test the values computed for the H object values, h1val1 & h1val2, all of the E, B and W level objects they are nested in need to be specified. This is because H level objects can be attached to more then one W object and this full nested specification is the only way to correctly identify the H object whose values are to be validated.

The team successfully used the above mechanism to validate their first external release. However, they found it clumsy to compose test scenarios and when trying to find a test fact when a defect occurred they found it confusing to do the trace back. So they asked if there was a easier way to compose tests, one that did not involve intermediate tables. I said I didn't know, but that first we should figure out an easier way to specify a test scenario. After some dialog we came up with the following:

e2
e1edefval1bdefval1wdefval1h1defval1
e2edefval2bdefval2wdefval2h1defval2h2defval1
e3edefval3bdefval1wdefval1h1defval1h2defval1
+ 
wdefval2h1defval2h2defval2
+ bdefval2wdefval3h1defval3h2defval3h3defval1

If the first cell of a row contains a symbol then it starts the definition of a test scenario, which is then attached to the symbol. In the example table, the first two rows create test scenarios and attached them to the FitNesse symbols e1 and e2. The third test scenario, symbol e3, started in row three is different because of the fourth row which starts with a plus sign (+). The plus sign indicates that the test scenario, e3, continues in this row and in fact it also continues into the fifth row. The plus sign is continuation or concatenation operator and it was introduced so that large test scenarios, some contain two or more B level objects plus multiple W and H level ones, could be formatted in such a way as to make it easier to read. All other cells in the rows contain symbols that reference test facts each of which is to be added to the parent object. The team did not want to have to scroll to the left to read a full test scenario, especially given the fact that columns on the left would go out of view. There are two other points to draw your attention to: empty cells are permitted and are just skipped over. The other point is that the type of object referenced determines how the nesting occurs: subordinate objects, for example a B object after an E, causes a new nested level to be opened; encountering a object of parent type, for example a W object after an H object causes the current nested level to be closed and a peer of its parent to be opened; encountering an ancestor element, for example a B object after an H causes the one of more, in this case two levels of nesting to be closed and new peer at the same ancestor level to be opened. This fluid format allows the team to compose nested test scenarios without having to specify intermediate objects like B or W. The one downside to this approach is that it is not self-describing, like most other fixtures, column headers in ColumnFixture, but this did not turned out to be a problem for two reasons: once the grammar is explained to someone who understands the business object they can easily read and create test scenarios using it and the second the names of the symbols matched the business domain and most people were able to duce the structure from that.

With the desired format specified, the next step was to see if a fixture that supported it could be built. To do this I examined the fixture types supplied with FitNesse and I also looked around on the web to see if there was anything like it and found nothing; so I set out to create one. The base class used by all fixtures is called unsurprisingly Fixture and it is found in the Fit system. Upon examination I found that this class provides all of the functionality needed to create the new fixture and to build it all I had to do was override its public method doTable. Once the new test scenario assembly fixture was created, the team re-wrote their existing tests using it in about 45 minutes, and in doing so eliminating three sets of tables. After making sure that these test still ran correctly, they were ready for more. They started by pointing out that only 1 of the H objects had been implemented and that the remaining 99 were not exactly like the first one as many of them had other test value types. This meant that using the existing approach would require the addition of more columns and that the validation rows would contain many empty cells and this did not appeal to them. They also wanted to use the more fluid approach developed in the test scenario assembly fixture to specify the validation conditions.

Before I describe the solution we came up with, I want to point out that validation can occur at all levels and to do it requires solving three problems:

  1. Specifying the object to be tested. For example. an H object is located in a W object, which is located in a B object which is located within the E object.
  2. Specifying which property of the object is to be validated.
  3. Specifying the expected value.

After another design session with the team we came up with the following solution which merges the validation into the test scenario composition (I have added green and red cell formatting to give you a flavour of what it looks like when the FitNesse test has been executed):

validateE
e1eLevelVal1 == e1xxxeLevelVal2 == e1yyyeLevelVal3 == e1zzz
+bdefval1wdefval1wLevelVal1 == e1b1w1kkk
+  h1defval1hLevelVal1 == e1b1w1h1xxxhLevelVal2 == e1b1w1h1yyyhLevelVal3 == e1b1w1h1zzz
 
e2eLevelVal1 == e2xxxeLevelVal2 == e2yyyeLevelVal3 == e2zzz
+bdefval2wdefval2wLevelVal1 == e2b2w2kkk
+  h1defval2hLevelVal1 == e2b1w1h1xxxhLevelVal2 == e2b1w1h1yyyhLevelVal3 == e2b1w1h1zzz
+  h2defval1hLevelVal1 == e2b1w1h2xxxhLevelVal7 == e2b1w1h2yyy
 
e3eLevelVal1 == e3xxxeLevelVal2 == e3yyyeLevelVal3 == e3zzz
+bdefval1wdefval1wLevelVal1 == e3b1w1kkk
+  h1defval1hLevelVal1 == e3b1w1h1xxxhLevelVal2 == e3b1w1h1yyyhLevelVal3 == e3b1w1h1zzz
+  h2defval1hLevelVal1 == e3b1w1h2xxxhLevelVal7 == e3b1w1h2yyy
+ wdefval2wLevelVal1 == e3b1w2kkk
+  h1defval2hLevelVal1 == e3b1w2h1xxxhLevelVal2 == e3b1w2h1yyyhLevelVal3 == e3b2w1h1zzz
+  h2defval2hLevelVal1 == e3b1w2h1xxxhLevelVal2 == e3b1w2h1yyy
+bdefval2wdefval3wLevelVal1 == e3b2w3kkk
+  h1defval3hLevelVal1 == e3b2w3h1xxxhLevelVal2 == e3b2w3h1yyyhLevelVal3 == e3b2w3h1zzz
+  h2defval3hLevelVal1 == e3b2w3h2xxxhLevelVal7 == e3b2w3h2yyy
+  h3defval1hLevelVal1 == e3b2w3h3xxxhLevelVal8 == e3b2w3h3yyyhLevelVal27 == e3b2w3h3zzz

There is lot going in this table:

  1. the fluid test scenarios composition style was kept in this expanded form.
  2. The first cell of a test scenario, the one containing the symbol to which the test scenario is bound is marked green. This indicates that the processing of the scenario occurred correctly, i.e. connection to the system-under-test was successful, that transmission and reception of messages was successful and that no process execution occurs were flagged. If there has been an error in any of these steps the cell would have been marked red.
  3. Cells that contain the pattern 'string == string' specify an equality test. They are associated with the object that they follow, be it an E, B, W or H one. This association makes it easy to see what property on which object is being validated. The left-hand string is the name of the property associated with the current object, whose value is to be validated. The right-hand string specifies the expected value. Note that when a test fails the received value is also written into the cell so that user can see the difference between the two values. The property string is dynamically evaluated by the fixture and as it turns out this reduces the amount of maintenance programming needed to support the fixture as new H objects and their associated properties can be added to the system-under-test without the need to do any new fixture programming.

This fluid composition style for handling their nested business structure was well received by the team and I thank them for their co-operation and support and for their insistence that there must be a better way to validate the nested business structure; this was an interesting challenge and I like the solution we came up with. So gentle reader, I am interest to hear both how others have tackled this type of problem with FitNesse or with some other testing tool and if this technique is one that others might use or how it can be improved. I would also like to the thank the two developers who helped me implement these fixtures.

MORE >>
Posted by Norbert Winklareth at 4/30/2009 10:13 PM | View Comments (0) | Add Comment | Trackbacks (0)
TDD Coaches Should Not Pair with their Coachees

This summer I had the opportunity to work coaching some C++ developers in how to do Test-Driven Development (TDD). The workspace assigned to these developers made pair programming an almost impossibility. Rather than fight the physical limitations I decided to coach them without using the keyboard. To my surprise, coaching this way was both easier and more effective then I thought it would be. Coaching TDD is about form: is the coachee following the methodology; where are they having difficulties; how is their rhythm; are they partitioning the problem as fine grained as they need to; are they refactoring; and so. When you don't pair, when you don't have the ability to just grab the keyboard, you are free to focus entirely on the coachee's form.

I also notice that the coachee sets the pace of the work. The coachee is doing it all and so they learn it at their own pace. There is also no expert magic involved in the outcomes. Every person that I coached extensively - there were some who due to scheduling I only had one session with - reached a point of puzzle/amazement. During a review of their work, they would look at their code and their tests in a form of disbelief, coupled with knowing that they had written all of it and that the coach had not made the code magically appear. When this happened, they now knew the value of TDD.

I also discussed this coaching style with a number of the coachees and the majority stated that they preferred it and found it more effective then the pairing approach. So the next time you find yourself coaching someone in TDD consider not grabbing the keyboard and focus on the coachee's form and letting them learned TDD directly and at their own pace.

MORE >>
Posted by Norbert Winklareth at 11/25/2007 5:02 PM | View Comments (0) | Add Comment | Trackbacks (0)
Links for 2007-07-11 Part 2

MORE >>
Posted by Norbert Winklareth at 7/11/2007 12:57 PM | View Comments (0) | Add Comment | Trackbacks (0)
Links for 2007-07-11 Part 1

I let doing this on a regular basis slip, so here comes a little rush of links I liked or found interesting, enjoy!

MORE >>
Posted by Norbert Winklareth at 7/11/2007 12:02 PM | View Comments (0) | Add Comment | Trackbacks (0)
Links for 2007-05-27
  • Challenging why (not if) Scrum works
    Alan Shalloway argues that the structure provided by Scrum, which is really just applying Little's Law, is sufficient to improve an organization's productivity. He does not argue that it will provide maximal benefits which you would get if you would also apply the humanistic aspects of Agile/Lean philosophy. I agree with him.
  • Design verification capability
  • 7 faces of leadership
    I like when people try to distill an open-end broad activity down to a few principles. It gives you a view of the person's biases and belief systems.
  • PMI Community Post
    The title is misleading as it refers to what inspired the post. The post is about accounting for and handling variance in projects, which is in my opinion the least understood of the 4 fundamental underlying issues of product development or project development.

MORE >>
Posted by Norbert Winklareth at 5/27/2007 3:20 PM | View Comments (0) | Add Comment | Trackbacks (0)
Links for 2007-05-25

MORE >>
Posted by Norbert Winklareth at 5/25/2007 8:44 AM | View Comments (0) | Add Comment | Trackbacks (0)
Links for 2007-05-23

MORE >>
Posted by Norbert Winklareth at 5/23/2007 4:36 PM | View Comments (0) | Add Comment | Trackbacks (0)
Links for 2007-05-22

MORE >>
Posted by Norbert Winklareth at 5/22/2007 1:45 PM | View Comments (0) | Add Comment | Trackbacks (0)
Links for 2007-05-12
MORE >>
Posted by Norbert Winklareth at 5/12/2007 4:15 PM | View Comments (0) | Add Comment | Trackbacks (0)
On the Semantic Distance between Input & Output, Problem Partitioning & Complexity

I first used the term Semantic Distance as a way of discussing the distance between the execution capabilities of a programming language and the execution requirements of the solution. Reg Braithwaite took this concept and expanded on it in by looking at "how well programs are written for human consumption" — this is yet another in a long list of excellent posts from Reg (just call me fanboy). His observation linking comprehensibility of the program to "the form of the code matches the data the code generates" is the same as how my colleagues and I at OmniMark Technologies used to judge how good an OmniMark program was, by how much it's structure matched the structure of the solution. There is, however, more to Semantic Distance and in particular its relationship to input and output, problem partitioning and complexity. In this post, I want to explore this, by recounting how I first learned about Semantic Distance and how to overcome it and then briefly explore its implications on dealing with complexity.

Back in the fall of 1990, my friend Sam Wilmott used OmniMark to convert a document, an Intergraph document (if I remember correctly), to one based on a specific SGML DTD. It took him about 2 weeks to write the conversion program. Afterwards, he was unhappy with his solution so he decided to re-write it. This time as two programs coupled together using the OmniMark co-routine mechanism context-translate. I had naively thought that it would take him another 2 weeks, but he was done in 3 1/2 days. Boy were he and I surprised. A little while latter, I observed again this non-linear decrease in the amount of development time when writing a solution for a second time using the co-routining model. Then it happened again and again and it was also observed by some of my other colleagues and some of our customers. This phenomenon delighted me. It also bothered me, because I could not find an explanation for why it was happening. So I started trying to solve it myself.

My first attempt was to attribute this non-linear decrease in development time to the reduced learning curve when you solve a problem for a second time. However, at best, this explanation would only account for some percentage of the time difference and in some cases the programmers were already knowledgeable about the input and output formats which meant that the learning curve time difference between the two solutions was essentially zero. My first clue towards a better explanation was the observation that this decrease in development time happened independent of the source and target languages but that the amount of the decrease in development time did vary.

Specifically, I noticed a difference in time decreases between different translations into the same CALS DTD. In other words, the target language was the same but the source languages were different and the time taken to develop their respective solutions was also different. To capture this observation, I started drawing two points in space and labeled the left dot L1, the source language, and the other dot L2, the target language. I then drew a straight line between them and said that this was the syntactic distance between the lexical units of the two languages. This model was too simplistic. It could not explain the non-linear decrease in time caused by introducing an intermediate language, L1i, between L1 and L2. Another issue with it, was that I had observed in one case, that it took more time to write a program for two lexically similar languages then it took for a program between two lexically very dissimilar languages. This was clue number two.

When I looked at the two sets of languages I observed that the two lexically dissimilar languages in fact had similar semantics and the two lexically similar languages had very different semantics. To capture this notion of semantic difference I drew a curve over top of the line connecting the two dots. I then reasoned that the area under the curve was where the semantics lay. Assume for now, that the curve is a 180 degree arc of a circle, the area under the curve is then

(PI * ((L2 - L1)/2)**2) / 2
When you introduce an intermediate language between L1 and L2 you end up with two smaller semi-circles whose combined area is less than the area of the initial semi-circle. This difference in area is due to the use of the power function, square, in the formula for the area of a circle. This model shows the non-linear decrease in time that was being observed. However, the magnitude of difference between the areas is not as large as what we were observing. So, on the suggestion of Mark Baker, I now use spheres when presenting this model today. The volume of sphere is calculated with the power function, cubed, which more approximates the observed decreases in time.

Implications

  1. Semantic Distance is actually more "volumetric" than scalar in nature.
  2. Breaking up the Semantic Distance between the Languages L1 and L2, into two smaller distances by introducing an intermediate language is problem partitioning. The effect of this partitioning is non-linear in nature.
  3. The amount of complexity associated with translating between input and output is proportional to the Semantic Distance between input and output.

Problems with the Model

Here are the major issues with this model that I have identified:

  1. This weakness is a variation of Zeno's Paradox about Achilles and a tortoise. The model predicts that if partitioning were recursively applied, then there should be no work to be done, which is equivalent to the argument that Achilles can never pass the tortoise. Clearly, this is not the case. In fact, there are aspects of the Semantic Distance that cannot be sub-divided, they are atomic in nature. These atomic "aspects" are sometimes completely orthogonal to any other aspects of the Semantic Distance between the input and output languages. This means that Semantic Distance is in fact n-dimensional in nature. And yes Reg, this means that there cannot be one language which has the shortest Semantic Distance between its capabilities and those needed to handle the n-dimensional nature of the Semantic Distance between any possible pair of input and output languages.
  2. As my friend Jacques Legare, pointed out, the model does not show or prove that there is a relationship between Semantic Distance, complexity and implementation effort. To me there has to be, it is the only ting that makes sense to me, but showing it, proving it, is an open question. I am hoping that some reader will be able to provide assistance in answering this question, one way or the other?
  3. As nice as I find the explanation, and while it seems correct to me, I wonder if there is a better or more correct and accurate explanation?

Two Bits of Advice on Writing Conversion Programs I Learned from Sam Wilmott

When translating from one input language to another, create the first intermediate language close to the input rather than the output language. Many times it is enough to just normalize the input. That is convert the raw and messy input into the format that a perfect author would created. This often times or with a little bit of design eliminates most of the corner cases and makes translating it into the output straight forward. The first output to create when using this technique should be a stylized human readable version, which captures as much information from the input as possible. This readable version provides you with three things: visibility or feedback, easily testable output and easier maintenance. Readability and testability helps to make sure that you are translating to the normalized form correctly and quickly and the same applies if maintenance is required. Once you have the conversion completed, you can convert the readable format into the required output format. If conversion to and from the readable output becomes a bottleneck, you just have to replace the output statement in the first program.

To eliminate problematic complexity between different Semantic Distance axes, introduce intermediate languages which only deal with changing one of the Semantic axis at a time.

Summary

To me, the neat things about this journey, was learning to use problem partitioning to reduce the complexity and time to overcome the Semantic Distance between input and output and coming to understanding just how powerful and wise the age old advice that you conqueror large problems/tasks by breaking them down into smaller ones.

MORE >>
Posted by Norbert Winklareth at 5/10/2007 4:56 PM | View Comments (2) | Add Comment | Trackbacks (0)