|
|
||
Jody Garnett's BlogJuly 2005 ArchivesOpen-Source Factors of SuccessPosted by jive on July 31, 2005 at 07:31 PM | Permalink | Comments (1)Open source projects live or die by contributions. There is the occasional project like Open Office that can get by based on financial backing, but most of us have to suffer on collaboration and communication. What makes a Successful Open Source ProjectMost Open Source projects simply have to follow a basic formula:
I am tempted to list an Open-Development process as essential as well (it seems to be for making a community, but not always required for a project). I am going to put it down as a tool for building trust which goes towards making the decision to contribute easier for people. Do Something WorthwhileNow from where I am setting the first one is always easy. Either you are doing it or you are not. Not very helpful let me try again.Since you wrote the software chances are it does something useful for you and hopefully others (with the exception of the anti-pattern of Resume Ware). If you are having trouble:
We all have plenty of fascinating ideas, actual need is generally the focus needed to make something worthwhile (You can also try and find someone with the actual need and get them to pay you money). Accepting ContributionsThis second point is actually surprisingly difficult. It is one thing to give out access to the source code and it is another to survive in the face of changes.For this me it comes down to two points:
At a Design & Architecture level the concept of a Plug-in based system has proven itself again and again as the way to do things. The success of Firefox with respect to Mozilla is one small example. The strength of the plug-in model with open source projects rests on two benifits. At a community/people level a plug-in based system provides a sense of ownership and responsibility over a section of code. It also keeps developers from tripping over each other quite so much. At a technical level it does its job: allowing a system to be extended. This magic combination of a an architecture matching the technical needs and the social needs of an open source community means I look a bit sideways at any open source project that is not plug-in based (often they are the result of a single developer). If a project (like GeoTools below) combines plug-ins with well-known hoops to jump through for inclusion things start to look pretty good. The range of responses to these forces is different for different projects:
Making Contributions EasyThere are technical aspects to this problem, but mostly it is a social issue that must be thought through. Are people willing to put up with the license? Can someone learn enough to contribute to the project? This is all about the people.Often it easier to maintain an external fork then put up with a unappealing license, or put up with too much process. GeoTools has only recently escaped this by being active enough that maintaining a fork is more expensive then feeding the changes back to the community. (This is about the only benefit to API churn that I can imagine, Ever). The technical issues are there all right - and one can get them wrong. Using an unfamiliar version control system, not providing source code downloads, not supporting the IDE used by the majority of those interested in contributing. Requiring unit tests that take 15 minuets to run before accepting a commit. Heck I have even made each and everyone of these mistakes (most in the last week). But they all fade in comparison to the Learning Curve. What is the learning curve like - will they loose motivation before being able to fix the thing they wanted? Documentation and Learning CurveOut of the projects mentioned above only two make use of an external framework (GeoServer and UDIG). In terms of learning what would the advantage of doing so be?There is a chance (if you choose the right technology) that contributors will already be familiar with the ins & outs and will have an easier go of it. This is of course the ideal. The risk is that if you choose wrong people will just have two things to learn. I chose wrong with STRUTS for GeoServer (most contributors simply have to learn two things). The other downside to STRUTS, or are inexperienced use of it, is a lack of support for the plug-in style of contribution. Branch and Merge simply takes longer then plug-in based alternatives. You can witness the success of Spring with open source projects as indicative of this tradeoff. I suspect I am choosing right with uDig and RCP – now if only I can help people lean it. One thing all these projects have in common is the use of industry standards. While we may talk about standards being wonderful and trump up the idea of interoptability, one of their main practicle advantage to an open source project is in terms of documentation. (There is stillroom to argue over the standards, but they often give everyone a common language to debate with). The ability to print out and read a 100-page document about what is going on should not be underrated. This is a problem with ISO based standards (where it is pay to play). This is one of the driving factors in the adoption of OGC standards by the Java GIS Community. One benefit to the use of both standards and frameworks it the possibility one can buy/find existing documentation. Open source projects are suckers for even bad documentation - either in the form of books or websites, anything that helps will be of use.
A Reading List for UDIGI have been putting together a Reading List in the hopes that uDig will do a bit better then we did for GeoServer/STRUTS. I found something out as I assembled the list. I don't use all these tutorials or books. Don't get me wrong they each saved days of work, but their usefulness entered in at specific point of the learning curve.
Please check out the above reading list, feedback is welcome. Taking my own advice (this being a form of contribution), please add a comment to bottom of the page.
XML Standards as ObjectOriented Code Part IPosted by jive on July 24, 2005 at 02:44 PM | Permalink | Comments (0)In the open-spatial Java community we have a problem: XML I know that does not sound like much of a problem, as long as you are an XML god. Spatial data has a habit of being very large and breaking to existing XML tools and assumptions. This blog post is not about that. There are three projects I am going to talk about today:
But wait that is only two? The other is a quick hack I am doing trying to combine their idea of what a Filter is. What is a Filter - the quick answer is this specification Filter Encoding 1.0 that talks mostly about Featurs. And this one Filter 1.1 which expands the domain of discource to include metadata. (You can see why I am looking into Filter before trying my mad metadata plan). What does all that mean? A filter basically gives you a "pass/fail" for an Object (Features are objects in GIS world). Basically this is a test of set membership. If you know SQL this is the SQL WHERE clause. The specification mostly is worth knowing about so you can talk to Web Feature Servers and get real data (rather then those pictures you see on google maps). Three Takes on the problemThe standard is defined in terms of XML. And we have two sets of interfaces to play off of:
The Geotools interfaces are mutable, and the GeoAPI ones are not. Traversal XSLT vs Visitor vs IteratorBoth Geotools and GeoAPI define an accepts( Visitor ) method allowing these constructs to be traversed. This is in marked contrast to the XML approach where due to a common model, a declarative language (XSLT) can be used to target and transform Filter expressions no matter where they are in a document. The other object oriented way to do things is using an Iterator. Iterator is best known these days for its role in the java Collections API:
This construct does provide traversal without revealing the internals of the object. Depending on the iterator you may not even reveal internal order. A Visitor acts exactly like the inside of that above for loop:
The advantage being that you can reuse your visitor, an implementation can cheat and run the visitor across cached copies and is generally free to hack a little bit more. What it boils down to for GeoAPI:
Where: Accepts a visitor. Subclasses must implement with a method whose content is the following: return visitor.visit(this, extraData); The description is clearly wrong, the definition left out the taverse part, they also developed their visitor takes an Object along for the traversal ride. Normally this is a Collector object much like you see in JUnit tests grabbing all the passes and fails. The combination is powerful and lets implementers actually forgo traversal is feed in ached results if they are known. As I said visitor allows for more hacking. Lets see how GeoTools does it. The implementation is a normal accepts method. Usually the visitor acts as its own Collection object. So why should we have FilterVisitor at all ... The point of being Object-Oriented?One of the points of having a Filter and Expression API (that is having these as Objects that can be implemented), is to allow client code to do their own thing. That is people will:
Earlier the question was asked why have FilterVisitor at all - this is built into the motivation for the visitor pattern. Visitor lets people write a data structure, and hide the implementation details while still offering traversal. With the existing GeoAPI FilterVisitor and ExpressionVisitor this is not possible, and hense there is no point to having the methods. As long as the set of Filters and Expressions are closed you can just define a "Walker" interface that knows how to traverse, and it can take a visitor along for the ride. For known, closed domains, this is actually a good thing. As example see the geotools graph module where the computer science construct of a graph of relationships can be built (Builder Pattern), based on a relationship (Stratagy Pattern), traversed by Walkers using common algorithms (like shortest path) and visited in the manner described above. Filter and Expression are not a close system so we need to indicate that at the GeoAPI interface level:
I am sure this is just an oversight. The current interfaces have a visit method for each and every of the specific subclasses of Filter defined by the library. If we do not have these methods you "close the door" on people implementing custom work. Not all filters can and should be written in terms of the basic opperations defined by the standard. Yes that means that you will not be able to traverse every Filter/Expression ever made and generate SQL for it. Some filter/expressions will require post processing by actual code. Visitor lets a custom implementation provide traversal of its internal data structure. Not all nodes of which will be available to a Filter or Expression visitor. It is something the actual implementation needs to do. This is a lot harder to accomplish in an Object-Oriented system. Transformation and TraversalAs mentioned above if everything was available as an XML document there are known ways to accomplish a transformation in the XML world. Lets consider a similar transformation in Objectland (which is like the best book ever). This also brings us to our first real problem. How can we allow transformations in one of these object oriented system? Such as reprojection of literal Geometry in a Filter/Expression structure. In geotools this was/is easy. The expressions can be modified as part of their interface. A visitor could just change things as it wandered around the data structure. Mutability has its own set of consequences (especially if the filter is just on "loan" from client code. Often this means that any code calling a visitor had made a copy (to prevent random damage), or can be broken by a badly constructed visitor. In the GeoApi interfaces we don’t offer modification. Which begs the questions on how transformations should actually happen. It is my understanding was that for the motivation for the Object returned by the GeoAPI visitor was going to *be* the Filter/Expression resulting from a transformation.. And if no change was required the immutable interface would let us safely return a branch with out the need to copy. So the visitor would be used to "Build" the transformation as it is walked across the object structure. So what is the problem with that?Two problems actually:
In order to smoothly apply a transformation FilterVisitor/ExpressionVisitor to a higher order object like WFS Query we may need to formalize what is expected of a transformation. We can see the problem in the small thus even with the API we have defined already: If we had:
Matched with a method such as: This would be defined as a clone of the filter logic, with the transformation applied to the contained Expressions. Perhaps this is best illustrated with an example: After a Expression transform: Note that no change is required to the Expression interfaces to allow transformation to happen (this is good for future extensibility as Filter is used in all maner of places these days – from FeatureType models to Styling. The PayoffThose of you that play with XML may be in horror by now – yes XML buys you a lot. The visitor pattern above let hide the idea of traversal: I will also point out that this technique has worked well over the years for turning Filter expressions into SQL: The power of each DB is different and many can do a subset of the Filter standard if not all. By splitting your filter into two (that which can be transformed into SQL and that which cannot) you can smoothly support the entire specification on top of a wide range of databases. Anything that cannot be encoded as SQL can be applied after the fact on the resulting result set. Where to next? Well how about the follow up article: Supporting Hacks and Versions Mad Metadata PlanPosted by jive on July 15, 2005 at 04:21 PM | Permalink | Comments (0)The use of Extensible-Interface pattern for an origional take on the metadata problem plaguing the spatial world (see EOGEO for background). Thanks to those at OSG'05 for the inspiration. Now if only someone will pay me to solve this problem :-) A couple people have asked about my mad metadata plan (tm). Since it is a Friday, and other mad plans are flowing around the email lists I thought I should play.... Briefly:
Now in the XML world metadata is relatively easy:
In the object oriented world metadata has given us a little bit of grief:
As for specification we have a few:
want to play with (ISO 19115 and ISO 23950 (Z39.50) So here is the start of the mad metadata plan:
The solution is to use the *Extensible Interface* pattern... QWhat is the Extensible Interface Pattern? AKA:Extensible Object, IAdaptable (Eclipse), IResolve (udig) Intent:"Anticipate that an object's interface needs to be extended in the future. Extension Object lets you add interface to a class and lets clients query whether an object has a particular extension." Q:How is that done?
Where null is returned if the requested interface is not available
Q:Why is that cool? Because of the part I did not tell you yet, the implementation of getExtention should be backed by a *Factory*, and not just any factory one that can be extended by a plug-in system. In geotools an example is the use of FactoryFinder and DataStoreFactory. This lets us teach an old object new tricks. This lets us accomplish something very cool, it lets us have our metadata Object *in the format it arrived in* (backed by a DOM, or a JDOM or by a cluster of Objects), and it gives us a method to call to ask for that data in an interface known to us. Better yet it lets the people capturing the information not have to worry about ISO19115 or ISO19119 or ebRIM, such mappings can be handled by someone else. What would these mappings look like? Well if it was a simple XML problem we would provide some of this with XPath expressions, if we use JXPath (the apache project) we can use XPath for both DOM and clusters of Objects/Collections (aka POJO). If we play our cards right we can make the mappings completely orthogonal to the metadata storage facility. Basically it should only matter that we know how to go from ISO19115 to DublinCore, not if we are using the geoapi ISO 19115 interfaces, or a XML document with the ISO19115 schema. Very cool. Okay a few more bits of the puzzle Q So what is an implementor to do?
Sounds good to me, our Metadata object should *be* a Feature, the FeatureType can capture the available *slots* as attributes. Usually these are a superset of those defined by DublinCore. Putting the bits together:
We get a system that can pass extra data through, can be used with OGC Filter, allows the use of our existing Metadata interfaces for ISO 19115, and can be taught new tricks. Jody Java Open Souce communityPosted by jive on July 14, 2005 at 11:25 PM | Permalink | Comments (0)I recently had the privalage of teaching a training course to Centro Internacional De la Papa (part of CGIAR). One of the things we covered was the Java Open Source GIS community. I thought I would share this presentation with everyone: If you want a bit of history please check out James Macgill's recent presentation that was part of the EOGEO section of OSG'2005. James himself has a provided bit of a write up about the experience. Where 2.0 ConferencePosted by jive on July 06, 2005 at 06:45 PM | Permalink | Comments (0)A bit more about the where 2.0. So how do you find out location? At least one of the GPS providers is hiding and retreating Idea: The charge per location request has hamstrung this industry, I What does this mean for Java? It means that there is no interest by those selling phones in providing us with a standard API? I know the OGC standards body is up to something - but the earlier this gets done the more cool stuff can get done. Idea: I wonder if the carrier providers would provide free location pings for us hackers? We are more like to make them cool toys to play with debugging does not involve paying money. Actually I guess we would make an API and test against that, but it is worth a shot. Do they get Open Standards? They say yes - Yahoo is taking the "open standards road" by this they This actually caused people to run to find their friends and drag them back to see. The bright side is that the web hackers love XML and are not scared off by a get capabilities file, they jump for joy when they see an SLD document ... they like small bits of GML, but cannot handle the idea of a 600 page schema (but really who can?). In short the hackers show no loyalty to google and can be trained, and made excited about the open stuff as well. Free data is free data, all the better if it looks cool. Still almost made me feel like a librarian when the computer science types divided up the world into I needed to get DM solutions to give Microsoft a kamap demo before Google understands what the open standards are, they are however In short trade your data for the ability to publish it and see it on Uh-ohs It was fun seeing ESRI and Microsoft showing the same data, and (Unlike the OOPSLA experience - where people walked out, only the hackers felt the pain for this guy). Ku-dos In fact the hackers were all on freenode#where2.0 IRC channel during Nat was on the channel as well and we asked him to intervene, move bits of furnature around so we could see the stage or pass on a few questions. I am sure orielly is sifting through that log, and will consult it for months to come. It was a priceless slice of community building. Talks All this and I did not talk about the speakers. They were good, they The speakers were a greating talking point at what was a high Where 2.0 Art ThouPosted by jive on July 02, 2005 at 06:21 PM | Permalink | Comments (1)Where 2.0 I went low profile the first day by wearing a suite (so I could blend in). The second day was a tshirt so I could meet the "hackers". Google released an AJAX API The commitment of a major web player to an API at this level is actually a big thing. It will be remembered long after maps are so absorbed by the internet that they lack interest. Ideas: Make a Kamap Flicker demo Ideas: If this had been done as a calendar app it would still be signigicant - and the conference would be called When 2.0. Concern: It is not at all clear that anthing that does not go through AJAX is welcome on web 2.0. Where 2.0 is the Question? Address is the answer Where 2.0 is hackers: one of the few people to get applause when they steped up to the podium was the hacker that combined craiglists with google maps. This is *why* everyone showed up at a conference about location technologies. In short the fact that hackers "had an itch to scratch" and did something about it is exactly thing that everyone was excited about. The hackers were excited about it because it was possible. The spawning industry is excited because it means people will make up their own super specific services, long-tail was mentioned so many times it turned into a drinking game. Where 2.0 is data miners: one of the only things that you "discover" when you do data mining is location. Seriously. Location is often the strongest corralation of any data you can find. The companies "giving away data" claim they don't know how they can recover costs. It is very obvious that they do infact know, and the answer is golden. Location represents the keys to the data mining kingdom. In the large, if location is known, you can adjust your sales figures for population and spots trends. GNat (the conference organizer) almost wept with joy when he finally relized that he could make sense of book sales. It took James McGill to finally explained this to him. In the small location represents they key to know darn well most things about you (there is a reason it is on all the register your product cards). For your buying patterns they found that people were willing to trade airmiles. For location they found people are willing to trade driving directions. And they all hope the hackers find the next thing people are willing to trade. Or if they don't they will create a swarm of services leading people towards the same AJAX interface. Ideas: Why wasn't OGC here? If there was every a place they need to be it was "where". Several hold overs from the OSG2005 conference put in a good showing. Ideas: I explained what a Capabilities file was to each of the big three, the deal is the first engine to crawl this stuff get a community of GIS types behind them. Ideas: We had to call off our Java BOF (chances were that Java developers were elsewhere as Java One going on - to the point that billboards were up advertising to Java developers *everywhere*). What was needed was an open standards talk. We did fairly well, but what the location community could not understand was that there was no standard for *address*. They got a bit demoralized when they relized what a hard problem it actually is, however VCard does have structured location information in it. Ideas: Make a WFS that serves up VCard as a format Ideas: Hook up uDig to an address locator (have done - the code is in my laptop). The preception of how much data should cost is going down GIS customers for data are not willing to pay as much, this has the people drive roads scared. We all know that that the value is in the attribtues (apparently 50 groups are driving the US roads collecting different bodies of attribution). This is what turns something from a pretty picture into something we can reason with. But google being able to "give" away the information (or at least the appearance of the information) is causing a reset that we should even feel in the GIS community. | ||
|
|