The Source for Java Technology Collaboration
User: Password:



William C. Wake's Blog

June 2005 Archives


Fit code, part 8 of 8 - RowFixture

Posted by wwake on June 25, 2005 at 09:08 AM | Permalink | Comments (0)

A RowFixture is used to test that a set of items is as expected. The fixture flags surplus or missing items. They look like this:
MyRowFixture
firstlaststatus()
Alexanderthe Greatok
Alexanderthe Mediocreunknown
Winniethe Poohlagging
Each row represents a domain object of some sort. The columns have inputs and outputs, as for ColumnFixtures.

Data and Abstract Methods

First, I'll note that this fixture extends ColumnFixture. This lets it pick up bind() and check(). The former handles the "header" row; the latter makes sure execute() is called. But due to the way the overrides happen, that method is called under different circumstances than for ColumnFixture. I don't see anything that will call reset() on a per-row basis.

The fixture holds three bits of data: an array containing the results of the query, a list of surplus items, and a list of missing items. From showing usages, I see that the list of surplus items is a list of domain objects; the missing list is a list of Parses.

The first abstract method is query(). It is responsible for producing an array of the "actual" results. The second abstract method is getTargetClass(). It returns the class object representing the type of the row. It's abstract for an interesting reason: the parent class ColumnFixture defines that method to return the type of the fixture itself. That would just lead to weird errors. By making it abstract, it forces the user to override it.

This is an interesting twist - usually my abstract methods are at the top of the hierarchy, and may get filled in along the way down. In this case, the method is becoming abstract in the middle.

In a sense, that happens because RowFixture and ColumnFixture have a slightly strained relationship. Maybe I'm just not getting why the latter is an example of the former; it feels like the inheritance is more for implementation than anything.

doRows() - The Overall Algorithm

The main algorithm is in doRows(): bind the columns (ala ColumnFixture), run the query() to get a list of actuals, run a match(), then add rows for any surplus or missing items.

Along the way, this method calls two overloaded list() methods: one for making a list of Parses, the other for making a list of objects. This parallel structure (methods for each of the two main data structures) continues across the class.

Method doRows() calls buildRows() to add in new rows for the surplus values. This method works by building up a "fake" head of a parse chain, then adding each item to the last one in the list. In the end, it throws away the head and returns the interesting part of the list, which gets attached to the end of the table. This seems like a little pattern worth remembering if I ever need to add rows to a fixture.

match() - The Heart

The match() routine is a recursive algorithm. Given the list of expected items, the list of computed items, and a column to start looking in, it figures out what matches and what's missing or surplus.

Since this algorithm looks complicated, I'm going to start by just looking around. First, what are the places that call it? By doRows() certainly, since that's how we got here. Then it's called recursively at two places inside match() itself. The good news is we don't have some sort of pair of mutually recursive methods.

Recursive algorithms have a base case and a recursion case. The recursive case here is just incrementing column, and passing along lists. The column is always incrementing, and the first if says that if we've exceeded the number of columns, we should just do a check on the lists. That makes it look like we'll always terminate: we either increment column, in which case we'll eventually stop, or we do something that doesn't recurse, which will stop as well. (Or rather, if it doesn't, it won't be because of the recursion here.)

The other thing to look at on these recursive calls is the lists. We know the column gets bigger - do the lists get smaller? One case passes on the originals, so we know it's no bigger. The other is trickier - I see things that seem to indicate that the lists will shrink (tests for 0 or 1 item), but it's not obvious that it must be so without a little digging.

So, from the top of match(): the first case we mentioned before - if we're past the number of columns, do a check() on the currents lists. (We'll come back to that method later.) The second case says that if the current column binding is null, move on to the next column. The third and final case is where the meat is: we're in a column in the middle, trying to match.

So, we build up two maps: one for the expected, one for the computed ("actual"). Each map has a list of items that have the given value at the chosen column. We pull out the keys in either list, and work our way through them. Here, there are four interesting cases:

  1. The expected list is empty - we have a value found only in the computed list, so add it to the surplus list.
  2. The computed list is empty - we have a value found only in the expected list, so add it to the missing list.
  3. There is only one value in each list, and they have the same key (by how we got here) - check this row (actual vs. computed).
  4. Finally, both lists have more than one item with the same key value - recurse, but only on the list of items with this same key value. (Now I can see that I'm recursing on a list that's no bigger. It could be the same size if all the keys are the same.)
I'm left to wonder - does this work as a set or a multi-set? That is, can we have comletely duplicate rows if the "same" object is in the list twice? I'll come back to this.

eSort() and cSort()

There are two "sort" routines, one for eexpected and one for computed. They're pretty similar, so I'll describe them together. (I don't get why they're a sort of any sort, though.)

Each routine produces a Map, from key values to a List of Objects (either domain objects or Parses). The bin() method takes care of putting items in the map. That method expands an Array into a List; the RowFixtureTest mentions a bug in that neighborhood and I suspect this is to address that.

The sort() methods handle exceptions and rows with no value in a particular cell.

Back to the Algorithm

I think I understand what's going on enough to put it into words now. To make a match, we start in a given column. Each list gets divided into buckets, based on the value of the cell at that column. If buckets have 0 or 1 item, then we have a good enough match. Otherwise, we'll look at more columns for those items. Eventually, we'll run out of items or columns.

check()

The last interesting bit is around the check() routine. It goes through and checks the columns one at a time, using the normal TypeAdapter facilities. The routine is recursive: it peels off the front of each list, recursing until one or both lists is empty.

Leftovers and Learnings

I had a question about whether it acted like a multi-set or a set. It looks like it's basically multi-set-like, from a simple test with a list of integers.

The other big thing for me to wonder is how I'd have done a similar fixture. I think I'd have expected a simple set. The problem is, that's fine for the query() side, but not so good for the "expected" side: how would you get from those values to construct the objects to compare as sets? (Knowing the contents of an object's fields and return values from its methods doesn't tell you how to construct it.)

Another alternative would be to get the query values, and match each one up against the rows. The naive algorithm for this is a little slow (n^2). It might be a bit simpler. I suspect its report wouldn't be as nice.

The current algorithm is able to take advantage of partial matches - if enough data cells make it a unique match, it can then know it has the "right" element even though some of the fields/methods are wrong.

Closing Out...

That concludes my tour of Fit. I focused on the main fit package, skipping a couple more minor classes. The code reading was a good exercise for me - I have a better sense of some of the tradeoffs in the code, and of the dynamics in the Fit community.

Fit code, part 7 - ColumnFixture

Posted by wwake on June 24, 2005 at 12:28 PM | Permalink | Comments (0)

ColumnFixture is an easy fixture to understand from the user's point of view: each row is a test case, with some columns being inputs, and others being outputs:
MyCalculatorFixture
xyplus()
022
112

doRows() - Capture the Header Row

Method doRows() calls bind() to peel off the header row, then processes the rest of the table. Bind() creates an array of TypeAdapters, one per column in the table. If the column header cell is empty, the ColumnBinding is set to to null. (An opportunity for a Null Object? Later, we check for null.)

If the cell ends in "()", it's a method call, and set via bindMethod(). Peeking inside there, it camel cases the name (so "shirt size()" becomes "shirtSize()") and uses the TypeAdapter.on() method to create the adapter.

Otherwise, the cell is assumed to be a field name, set via bindField(). That helper method also camel-cases the name and uses the "field" version of TypeAdapter.on().

Any problem in the header parsing marks the cell with an exception.

doRow() - Handling the Basics

DoRow() is fairly simple. It calls a stubbed-out reset() method before handling anything in the row. This lets a fixture reset anything on a per-test basis. It lets the normal row-processing take place, then it checks if execute() has been called; if not, it calls it.

Execute() is to be called before processing the first column that represents a method name: you can have a bunch of inputs, use execute() to make things happen, then check the outputs. If you don't override execute(), then you either have to know which column is first and make it kick things off (which is brittle), or you let each output column compute "from scratch". (Reading it here makes me wonder if I've been diligent about this in all my ColumnFixtures:)

If there are any unhandled exceptions, they're attached to the first leaf cell of the table.

doCell() - Per-Cell Handling

This boils down to four cases: empty text, null TypeAdapter (mentioned earlier), field, and method. For empty text, we call execute() and move to the superclass' handling of the cell, which marks the cell as "info". This seems a little odd to me - why should a missing value trigger that?

I'm going to write a test for it, but it took a couple minutes to figure out how to do so. I think what I'll do is create a table with a default value for x and y, leave x blank, and have execute() print the value of y. If it's called when x is processed, it'll print the default value rather than the one that was set.

(Pause)

OK - I'm back, and it does act like I expected from the code - execute() is called for a blank cell. I guess that makes sense to do if the blank cell were a call cell (like plus()); I'm not clear on the value for an input cell.

Back to doCell(): if the TypeAdapter is null, it ignores the cell. If it's a field, it parses the text and sets the field. If it's a method, it calls check().

check() - Calling execute()

Method calls go through the (overridden) check() routine. This is really here to make sure the execute() method gets called if it hasn't yet. Then it just defers to the superclass version, which calls the method and compares the result.

What I've learned

  • Don't forget about reset() and execute()
  • Execute() is a little word in the face of blank input cells.


Martin Fowler's article on Language Workbench

Posted by wwake on June 24, 2005 at 04:42 AM | Permalink | Comments (0)

Martin Fowler posted a good article on the idea of Language Workbenches, followed up by some nice links and more reading: http://www.martinfowler.com/articles/languageWorkbench.html and http://martinfowler.com/bliki/LanguageWorkbenchReadings.html

National Games Week

Posted by wwake on June 16, 2005 at 07:01 PM | Permalink | Comments (0)

National Games Week is Nov. 20-26, 2005. See www.NationalGamesWeek.net. Yes, it's an event focused on non-electronic games, but those still have a lot to teach us about interaction and play.

JUnit 4 for JDK 1.5

Posted by wwake on June 15, 2005 at 04:23 AM | Permalink | Comments (2)

JUnit 4 is out for JDK 1.5. Gunjan Doshi summarizes the changes here. It uses the JDK 1.5 "attribute" feature, so you label tests with "@Test" rather than following the convention of naming them "testSomething()".

Fit code, part 6 - TypeAdapter

Posted by wwake on June 14, 2005 at 07:38 PM | Permalink | Comments (0)

C# Fit

I've gotten some mail letting me know that the C# Fit has forked a bit - there's a newer version that's the regular Fit distribution, and an older/modified version that's part of Fitnesse. I was having trouble extending the Fitnesse version. There's an effort to do some unification work this summer; that should help.

TypeAdapter

TypeAdapter exists to give a common interface to types, so they can all have setters, getters, and parsers. There are three factory methods, all named on(): one takes a fixture and a class, another a fixture and a field, and a third takes a fixture and a method.

This gives the unification of fields and methods. In Fit, you can have a ColumnFixture with a field, and it has an implicit setter ("name") or getter ("name()"). Or you can have a method (also "name()"). For most purposes, we don't care what it is, we just want to treat it as a setter or getter.

The TypeAdapter has five fields:

  • target - the fixture this adapter is "on" (set for a field or method, but not a type)
  • fixture - the fixture this adapter is "on" (always set)
  • field - the Field (only set for field references)
  • method - the Method (only set for method references)
  • type - the type, always set
I'm struck by the combinations of "this field set / that one not" - would a couple helper classes be an improvement?

Methods

The get() method tries to do a field access if the adapter is a field, or a method invocation if it is a method.

The set() method does a field set. (Could we extend this to call "setter methods" with a signature like setFoo(Foo value)?)

The invoke() method assumes that a method is set, and calls it with no parameters.

The parse() method asks the fixture to parse the string according to type. In C#, parse() is something each type (even primitives) define. I'm sure that simplifies some of this code.

So let's say we have a ColumnFixture where phone() is of type PhoneNumber. How do we make that get parsed naturally? It looks like it works its way back to Fixture, which has a parse() method. So the ColumnFixture overrides it, and checks for an attempt to parse a class it knows about.

It seems like we could do some fanciness here, too, pushing an attempt to parse onto the domain classes. (So let Fixture.parse() take a look for "type.getMethod("parse")"; if we don't want that we could subclass and override to avoid it.)

Primitives and Their Classes

The rest of this file is a whole bunch of subclasses of TypeAdapter: one for each primitive type, and one for each corresponding Class. Most of these are the same: the primitive type's adapter is a subclass of the Class one. The primitive's defines set(), and the class one defines parse()

The last one is the only exception: Arrays. There, the parse() method tokenizes it by looking for commas. Like so many other places in Fit, it trims spaces. Each element of the array is given its own TypeAdapter. The toString() method puts the commas in when printing it out.

Reflection

The big surprise here is the idea of unifying methods and fields. I'm not sure how I'd have come to the realization that they're the same at a level we care about. (That insight is of a piece with the whole framework - I've understood reflection for years, but I've used it more for plugin-style work than anything like Fit that uses test data to drive the reflection.)

One-button games

Posted by wwake on June 13, 2005 at 06:02 AM | Permalink | Comments (0)

I love when somebody just digs in and shows the possibilities. Berbank Green has an article at gamasutra showing what you can do with just one button: http://www.gamasutra.com/features/20050602/green_01.shtml

Just think of the possibilites if you have two buttons! :)

Fit code, part 5 - ActionFixture

Posted by wwake on June 13, 2005 at 05:47 AM | Permalink | Comments (2)

Wow - this one is a lot cleaner than I expected. I had tried overriding the C# version and had all kinds of grief. This version is straightforward and extensible.

Fields

The class has three fields:

  • cells - a Parse
  • actor - a Fixture
  • empty - an array of Class
Cells holds the list of cells for this row. It's used by the action methods (such as enter()) to pull out data from the row.

Actor holds the object created by a start() action. Notice that it is static - that is what lets separate ActionFixture tables keep working with the same object without repeating start in every table.

Empty is the easiest - it's just an empty list so that things that want a list of argument types can have one. I marked it final, since it's never changed.

Methods: doCells()

The first method is doCells(). It saves off the cells, so other methods have access to this row's Parse. Then it looks up the method in the first cell, and invokes it. (This method will be one of "start," "press," "enter," or "check.")

The fixture invokes the method on itself by "getClass().getMethod()" - looking for the method on itself. This is a place where the Java version is nicer than the C# one. The C# version hard-coded that line to the equivalent of "ActionFixture.class.getMethod()". That meant that a subclass of ActionFixture would only have access to the four methods ("start" etc.) originally planned. The Java version lets you extend this vocabulary easily.

Another thing to notice is that the fixture calls getMethod() on cells.text(), not camel(cells.text()). That's a pity - my extended vocabulary has to be spelled exactly. (I don't think the rules for camel casing are consistent throughout. I'm probably getting hung up from Fitnesse experience - I think it may have slightly different rules.)

Methods: Actions

Start() is straightforward. It creates an object of the named type, and stashes it in the actor field. I note that it doesn't camel-case its argument, so "start MyFrameObject" is different from "start my frame object". (The latter won't work.)

Enter() looks on the actor for a one-argument method it can use as a setter. It creates a TypeAdapter, which knows how to parse objects, passing it the cell text. Then it invokes the setter.

Press() invokes the named 0-argument method on the actor.

Check() assumes its 0-argument method is a getter, fetches the result, and passes it to Fixture's check() routine, which does the comparison and cell coloring.

Methods: method()

The two variants of method() try to find an n-argument method on the actor The simple form camel-cases, so the fields on the start object can have the more user-friendly form. ("start MyFrameObject // press the rightmost button".) It double-checks that there is only one possible method. (So, if "firstName(int)" and "firstName(String)" both exist, it will report that it doesn't know which to use.)

Next time, I'll take up TypeAdapter.

Fit code, part 4 - Fixture

Posted by wwake on June 03, 2005 at 02:02 PM | Permalink | Comments (0)

Fixture: Fields and Two Helper Classes

There's a Map summary that accumulates things like the "run date." I don't know why the top-level Fixture has this, but it does. The fixture fit.Summary walks through this table and gives summary statistics.

There's a field counts that has counts of tests passed, failed, and exceptions/errors. The Counts class is just a data bag for these things. When a fixture calls wrong(), for example, the count is incremented.

The last field is args, which has the arguments from the first row of the fixture. The method getArgs() returns a String[] and lets a fixture use them. I don't think this is in the C# version yet but we definitely use that sort of thing there.

There's an internal class RunTime. It takes a snapshot of the current time. Right now, the only use of this is to put it in the summary, under the key "run elapsed time". Presumably some fixtures pull the RunTime object back out, and use toString() to display the elapsed time. But nothing in the standard distribution appears to use it directly. (Fit.Summary will display the elapsed time when it dumps the summary table.)

Starting Fixtures

Now we come to doTables(), the top-level method. (It's called by FileRunner, passing in a Parse for each table.) This method first looks at the name in the first cell of the first row of the table. Then it tries to create the fixture, then use it via interpretTables(). Along the way I note that this routine is using a couple null checks; I wonder if those are necessary? If the first table's fixture fails to be created and run, it runs the remaining fixtures via interpretFollowingTables().

Method getLinkedFixtureWithArgs() tries to load the fixture named in header.text(), then it sets up the arguments (for getArgs().

The method loadFixture() takes the name of the fixture, and attempts to "new up" the named fixture via reflection. Between the last method and this, I'm worried by what I don't see: what routine uses the camel method? That suggests a test: let's load "fit.ActionFixture", "fit.Action Fixture", and "fit.Action fixture" and see what happens. From what I understood going in, all three should be ok. From what I'm seeing here, I don't see what would make that work.

Why did I expect this? Because ColumnFixture does it for column names. It turns out that's not a good enough reason. The test shows that only "fit.ActionFixture" loads.

Up to interpretTables() again. It does getArgsForTable() - again. There's even a comment to that effect. I don't see why it should be necessary, though. Actually - it's all a little subtle, and I'd say the comment is misleading. The comment says, "// get them again for the new fixture object". But really, that's what we did in getLinkedFixtureWithArgs(). Now we're getting the arguments for the original fixture.

It works like this: when FileRunner starts, it runs doTables() on a new Fixture object. That's the object that tries to pull fixtures from tables and run them. When the first table is seen, its arguments are pulled out and given to the corresponding fixture. But then they're also copied back to the initial fixture as well. I imagine they're actually rarely needed there.

At any rate, interpretTables() then calls doTable(), which does a straightforward job of working its way into doCell(). Finally, it calls interpretFollowingTables().

InterpretFollowingTables()

By the time we're here, we run through a loop, looking up fixtures and then interpreting them with doTable(). For these, we don't change the arguments on the fixture that started it all. Why not? I can only guess it has to do something with the way DoFixture wants to work - treating the first table special.

All this work seems a little off - it seems the Fixture class is paying for interpretation that a particular table wants. I'm a long way off from looking at DoFixture, but if that's the table that should be first, it seems to me like it should pay for this complexity. I know I'm second-guessing here...

Check()

The other routines are either straightforward, or I've looked at them already. The exception is a largish routine at the bottom: check(). This is a helper method, used by some subclasses. It deals with blank cells, null adapters, "error" expected (to deal with expected exceptions), and text that should match. In each case, this method puts the output in the cell, colored appropriately.

Up next...

I think I want to look into ActionFixture next. I had an unhappy session trying to extend the C# version (which appears to be older). Then I want to dig into how TypeAdapters work.

Fit code, part 3 - Parse and Fixture

Posted by wwake on June 02, 2005 at 07:32 AM | Permalink | Comments (0)

Parse

First I want to chase down a couple oddities in what I saw last time. It boils down to these two tests:
// This test shows offset isn't applied the way I expected
   public void testOffset() throws Exception {
        int offsetToData = 2;
        Parse p = new Parse(
                "xx
data
", Parse.tags, 0, offsetToData); assertEquals("", p.leader); }
and:
// This test shows cells with embedded tables "go away"
   public void testInnerTables() throws Exception {
        Parse p1 = new Parse(
"
stuff plus
"); Parse p2 = new Parse( "
stuff " + "
inner
plus
"); Parse cell = p1.at(0,0,0); assertFalse(cell.body.equals("")); assertEquals("stuff plus", cell.text()); cell = p2.at(0,0,0); assertFalse(cell.body.equals("")); assertEquals("stuff inner plus", cell.text()); }
(I sent email to the maintainers; these may just be demonstrating my ignorance of how it's intended to work.)

FixtureTest

It won't take long to look at this: it only has one test!
  assertEquals("     ", Fixture.escape("     "));
The method basically handles converting plain text to have HTML entities.

But there's a little more going on in Fixture...

Fixture

I've only got a few minutes, so this is a quick overview. From the FileRunner, I can see that doTables() is an important entry point. Last time I looked in the Java version, it was simpler than it is now. There are comments showing code added for DoFixture and for fitnesse, and they've made it all a little trickier. This is all to make an improvement at the user level.

The core of this is: doTables() calls getLinkedFixtureWithArgs() and interpretTables(), which (eventually) calls doTable(), which calls doRows(), which calls doRow() once for each row, which calls doCells(), which calls doCell() once for each cell. In Fixture, doCell() calls ignore(), which marks the cell gray. ColumnFixture(), for example, overrides doCell() to do something interesting (like look for expected results).

I'm going to skip over how the first table gets loaded and interpreted - it looks interesting (i.e., tricky:) Instead, I'll peek down to a section labeled "Annotation". This area contains methods that can mark and color cells: right(), wrong(), info(), ignore(), error(), and exception(). I've seen these called in several of the standard fixtures before.

I see where exception() puts the stack trace into the cell. That gets SO ugly when something goes wrong. (It makes the cell huge, full of scary content useful only to programmers.) Maybe someday I'll take a whack at a more readable version.

Utilities

The final section is Utilities. The lightly tested escape() method is there. There's also a method to put words into camel case. I'll add a test:
assertEquals("twoWords", Fixture.camel("two words"));
assertEquals("MiXedCAsE", Fixture.camel("MiXed cAsE"));
assertEquals("aFewWordsTogether", Fixture.camel("aFewWordsTogether"));
assertEquals(
    "acronymsLikeHTMLStillUppercase", 
    Fixture.camel("acronyms like HTML still uppercase"));
Hmm. This is perhaps not the pattern I'd have chosen. I have the impression some of the other languages do it differently.

There's a parse() method that appears to handle only Strings, Dates, and ScientificDoubles. I know that C# works a little differently, since parse() is more integrated.

There's a check() method that looks too complicated to understand in 30 seconds. It uses a TypeAdapter, which is another class I want to look at soon.

Finally, there's a method to get the arguments from a Fixture. This is new - it used to be that the first row had the fixture only. (Rather, you had to parse any arguments out yourself.) Now that's built in, accessed via getArgs(), which returns a String array.

That's it for today. Next time, I want to dig into how fixtures get started (since this has changed some), and into the check() method.

Fit code, part 2

Posted by wwake on June 01, 2005 at 06:24 AM | Permalink | Comments (0)

Inside Parse

I spent last time on tests only - this time I want to go inside the Parse class.

The top of the class reveals strings for leader, tag, body, end, and trailer, as expected. There are also parts and more, which are Parses. A skim through the class, looking for big routines, shows that the constructor, findMatchingEndTag(), removeNonBreakTags(), print(), and footnote() methods seem to be the biggest and most complex.

Footnote? What's that? The tests didn't mention it! Looks odd - it's not referenced inside fit anywhere; rather, it's used by some clients, typically after a call to wrong(). It appears to create a file Reports/footnote/n.html, and prints the parse to it.

My strategy today is to chew off the routines that are small and/or simple, then go back and figure out the big routines. I have two things I'm trying to understand: "What happens with nested tables?" and "How do I insert stuff into the middle of a Parse?" (I need the latter for fixtures that want to report a little more nicely.) I guess I have a third question too - "how are spaces handled?" This arises because I saw a note on the mailing list that says there are differences in the various fit implementations.

Small Fry

There are some small and simple recursive routines: size(), last(), leaf(), and at(). There are a bunch of little routines for escaping characters and dealing with HTML; I'll come back to those.

There's a little helper routine addToBody() that just appends text to the body. That doesn't sound like much - and it's a one line routine, basically "body = body + text", but a search for usages shows that this is what fixtures use to get their info into the output. (If a fixture wants to show a cell's expected value, it uses this method to append some HTML text to the cell's Parse.) That answers one of my questions. I'll have to play with it to learn it better.

The print() routine is longer than these one-line methods, but looks straightforward. It writes the Parse out: leader, tag, then either body or parts, the end tag, and either the more or the trailer. I knew body and parts were mutually exclusive; I hadn't realized that more and trailer are exclusive as well. I wonder if body and trailer appear together, and parts and more appear together? If so, I wonder about splitting Parse up so subclasses can deal with that difference. It's not a huge class; may not be worth it.

Constructors

That leads me up to the first constructor - Parse(String tag, String body, Parse parts, Parse more). Note that it has parameters for both body and parts. So much for my theory of a paragraph ago. But it's close - I did a search and found 15 places that called this constructor. All but three used either body or parts exclusively.

One of the ones that didn't is in fat.Table It is using this constructor to copy an existing Parse. That looks misplaced - if we need to copy these, then we can put a method on Parse to do so. A second place is fat.FixtureNameFixture. The GenerateRowParses() method passes in a string for "body" and a Parse for "more". (So we have an example where "parts" and "more" don't go together.) I can't tell why it does this on a quick look. The final place is eg.AllFiles.td(), which also uses "body" and "more" together.

The first constructor passes in all the pieces separately. Then there are a few constructors that default tags and so on, to the main constructor that actually parses some HTML. That fixture looks for several key positions in the input: the start of the target tag, the end of the target tag, the start and end of the corresponding end tag, and the start of the rest of the text.

I see that the first search starts at the beginning of the string, rather than at "offset". That seems odd.

We'll have to double-check how findMatchingEndTag() works, but the rest of the constructor looks straightforward: if there are more tag levels, turn the body into a new Parse (and set body to null). If there's a nested table, parse the table and set the body to "". (That seems odd also, like it's throwing away any non-table stuff. I'm not sure what the "" body accomplishes either.) Finally, if there are more tags at the current level, null out the trailer and parse the remaining tags into "more".

FindMatchingEndTag() looks like an implementation of the parenthesis-balancing rule - add 1 every time you see a left parenthesis, subtract 1 every time you see a right parenthesis. If you're balanced, you'll have a net of 0.

So I have an answer about nested tables: it's trying to handle them. I'm seeing a little weirdness that makes it look like a nested table is the only thing retained inside a cell. But at least I know it's trying. I'll make some tests to fill in what I'm seeing. I only have a few minutes left, so I want to move on to the htmlToText() part of the code.

Html to Text

The htmlToText() routine has four steps: normalize line breaks, remove non-break tags, condense white space, and unescape. Normalizing line breaks turns <br> and strings of <p> tags into <br />

Removing non-break tags is a little tricky-looking, but it basically squeezes out tags other than the normalized break tags we just produced. The method "looks forward" to see an end-of-tag; if it's there, it trims out the tag and looks at the rest of the string.

Condensing white space applies the rule: convert multiple blanks to a single blank, convert a "160" to a space, and convert &nbsp; to a space. I assume 160 is the code for a non-breaking space in Word's font.

Unescaping is simple too: br tags are converted to newlines, standard entities such as <lt; are converted to their simple character, and smart quotes are converted to " or ' as appropriate.

The result of all this is that text() produces the Parse in straight text form - no tags. This is what fixtures will want when they compare expected values.

Summary

I had three questions:
  • What happens with nested tables? They are apparently handled, althrough it looks like only the nested table is retained, not anything surrounding it.
  • How do I put stuff inside a Parse? Use the addToBody() method.
  • What happens with spaces? Multiple spaces get converted into one, and non-breaking spaces get converted into one space each.
I'm left with a little bit of question in my mind about why the Parse constructor doesn't use the offset when it's looking for the first tag, and about the details of nested tables. But that's ok; I learned a lot today.



Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds