The Source for Java Technology Collaboration
User: Password:



Eitan Suez's Blog

April 2005 Archives


community docs update..

Posted by eitan on April 22, 2005 at 08:54 AM | Permalink | Comments (0)

in a java.net article i wrote on ashkelon a few months ago, i'd mentioned the idea of resurrecting a community docs web site based on ashkelon. the original site was at dbdoc.org, whose name always felt a little inadequate. so i checked on openjavadocs.org and lo and behold, it was available.

so i've been thinking about doing this very thing: bringing the site back, under a new (more appropriate) domain name.

then i realized that i originally wrote ashkelon primarily to serve my personal javadoc api referencing needs. i mean, i didn't design it to support large numbers of users, or for the population and update of apis to be so frequent to become a significant overhead. i personally use it to document the dozen or so apis that i happen to be using over a given period of time. when i come across another api that i want to leverage, i'll put it in my local copy of ashkelon.

the process used to go like this: download the source code, update the sourcepath, create an api xml file for it, invoke the ashkelon 'add' command to populate the db, and restart the web app. the whole process could take me 5-10 minutes at the most. for 100 apis, that's a couple of days! (ouch).

so i spent a little time to try to improve the situation. i haven't yet cut a new version of ashkelon but i've committed a fairly significant number of changes and improvements. here are a some of them:

1. sourcepath is no longer required, even discouraged. instead, ashkelon is now aware of source code repositories (svn or cvs). in the api xml file, i now specify where the repository lives. when i do an 'add' now, ashkelon will simply fetch the code for me.

2. i've had this small "apixml" script that would generate the skeleton xml for me from a javadoc package-list. that's a nice little time saver. i also keep these xml files around in ashkelon's source code repository so that others could use them.

3. i've slightly improved the post-population algorithm. adding j2se14 on my old powerbook used to take me 7 minutes. now it takes 4.

4. i've improved the web app's caching mechanism from being totally dumb to being just a little dumb. i don't technically need to restart the web app after an 'add' anymore


so now the process is a little more streamlined: create the api xml file (using the apixml tool), invoke the 'add' command. this takes me maybe 3-4 minutes now for your average api.

the dilemma is that we always generate more great ideas than we have time for. i have lots of plans for this site but who knows if i'll ever get to them. maybe as a community we can. but there are a few things that i have to do to allow openjavadocs.org to truly be a community-enabled system: giving you control over managing apis.

here's a partial to-do list, sorted by degree of importance (descending):

1. set up a system where adding or updating apis on openjavadocs.org is distributed. the api management overhead could be made minute if it were shouldered by the entire community. anyone wishing to add (or update) an api would simply complete an online form. the request would be placed on a queue and processed into ashkelon shortly thereafter, or maybe that night (i have begun working on this).

2. i wrote ashkelon back in the days before the ascent of o/rm's (hibernate et al). it's high time i totally replace direct jdbc calls with hql. for one thing it'll take care of caching for me.

3. let go of ashkelon's api xml format altogether and rely on stronger maven integration. the maven project.xml file could become the defacto source of all the information ashkelon needs to do its javadoc'ing. (actually i already have a maven project.xml adapter. but i haven't yet changed my personal practice of using the original xml format)

4. i want to take the ui's markup to a higher semantic level (yeah, i'm a ui freak). it's already highly semantic but i want to be able to make ashkelon into an ashkelon zen garden.

there you have it. an update on where i am. i know that david walend had blogged on the topic of community javadocs for java.net. i also realize the response out there was very positive. anyone interested in helping out please let me know. as with most things open source, i won't commit to a specific timeline. but now at least you know where i'm trying to go with this. thanks. / eitan



on languages and apis

Posted by eitan on April 22, 2005 at 08:53 AM | Permalink | Comments (7)

i've recently been thinking about two related topics:
- on programming languages vs programming apis
- on the relationship between human languages and programming languages

the sources of these thoughts have been coming from two separate corners:

1. discussions we sometimes have at nfjs symposia on the ever-popular question "which language should i program in" or "which language is best;"

2. my incidental knowledge of french, hebrew, and english and their bearing on my thoughts about software designs

everyone has an opinion on the first issue of 'which language is best.' my personal opinion is that different languages suit different people. that is, my answer is that there is no such thing as one language fits all people (or purposes, or both). i believe our liking of a programming language is somewhat related to our character and our intellectual capabilities. most _really_ smart people i know love languages like LISP, and in general languages that are extremely terse. my explanation (to myself) for this is that the terseness is not a problem for really smart people. once your learn a language it becomes natural, second nature. then what really matters is how terse it is. so a language like perl is very terse and you have really smart people who absolutely love perl and some not so smart people (like me) like it less.

but what does smart mean anyway? my wife tells me i'm smart and that makes me happy. but i know that i'm not the kind of person who will solve a difficult problem in real-time. i also know that in real-time conversations i'm a bore. where i do ok is when i take my own time to think about something. i'll resurface and by then i will have thought about the problem, brooded over it, understood it a little better, and so my proposed solution is usually a little better than what others propose. at least that seems to be my own subjective analysis of myself. so in the last paragraph, by "smart" i mean people who can think quickly in real-time. in that respect i'm somewhat of a disapointment. :-)

i'm not a programming language expert and i believe that i know fewer programming languages compared to many other software developers. i've coded a little in c (way too long ago). i know javascript and perl and java. i did some php years ago. i probably know a couple of other languages and forgot that i did (i forget things a lot and i'm sure it's not going to get better with age :-) ). i also read dave thomas' excellent book on ruby though i have little practice with it. and very recently i read through a book on objective c.

i'm not so interested in the question "which language is best" anyway. that was just a trigger that led me to thinking about programming languages vs programming apis.

the reason i don't like the discussion of "which programming language is best" is because to me they look all the same. i mean, the differences between languages are cosmetic and insignificant. i believe all the action is elsewhere: in the api. when i look at javascript vs perl or vs java i see different syntax for a for-loop, for iteration, using one kind of brace as opposed to another. but they all basically have the same set of constructs: conditions, looping, perhaps threading, scope, closures, etc.. some are terse. some require less typing. and that's good. some are more strongly typed. and many have oo concepts built-in, which is good. some day maybe aop will also be a concept that's built-in to the language. but then if this turns out to be useful or popular then probably most languages will follow suit and adopt these notions. and that's exactly why we have c++ and objective c today.

to me, one language can be replaced with another. the essence is elsewhere: in the api. here's an example: pdflib (see http://www.pdflib.org/). pdf lib is a library for working with pdf documents. here are a few facts about pdflib:

1. The PDFlib core is written in the ANSI C language.

2. PDFlib supports language bindings for all common programming environments:
http://pdflib.org/products/pdflib/languages.html

that is, pdflib clients can be written in:
c, php, java, .NET (.WHAT?), Perl, Python, Cobol, and more!

the striking thing is that the way you interface to the api in perl compared to java is essentially the same. pdflib is indifferent to the language used by its clients.

i think the same is true for other types of apis. take interfacing to a database as an example. all languages have a jdbc-like api for interfacing to a db. you submit sql queries and get back resultsets and you can iterate over the resultset and extract field values, etc.. if you do this in perl or in java the code is pretty much the same (albeit the perl version is more terse or obfuscated depending on how you look at it). so what defines the way the program is going to be written is the database api and not as much the language.

so my point is this: the api is a language. it's a domain-specific language. how you design an api has direct impact on whether it feels natural (e.g. jdom) or not (e.g. w3c dom). and to some extent, that seems to be independent of [a] the programming "language" you use to write the api or [b] the programming language you use to write the client.

so it turns out that we're developing languages every day in our programming work but we may not really be conscious of this fact (maybe we should call ourselves computer linguists instead of computer programmers).

let me turn now to the other thoughts in my head: the thoughts about languages. i'm most fluent today in english. this is no surprise since i've been living in the usa for the last 20+ years. but i'm in a somewhat unique position. i look at code i've written a number of years ago and think to myself: "did i write this code? it's terrible! it shouldn't be designed this way." i take this as a sign that i've become a better software designer (or maybe just more pompous) in the last few years. finding the right design is much more natural to me today than it was, say, 5-6 years ago. the principal design tools i believe one needs are: [1] refactoring + [2] design catalogs. my current understanding of software design is shedding light into the design of human languages. i look at english. english is _the_ international common language. it's spoken almost everywhere. i believe english's design is not so ideal, not so great. other languages have better designs. i believe that java is like english is, but for programming languages. i don't see a rush for english speakers to find another language to express themselves in because of "inadequacies of expression" that exist in the english language (inadequacies that may not exist in other languages). i look at hebrew specifically. what really strikes me about hebrew is that it looks like a language that was designed, not one that emerged; almost a "cleanroom" design. but the same goes for evolution: this world looks like it was designed (in the hitchhiker's guide you actually get to meet the designer) although it really evolved. so maybe the fact that hebrew looks like a better-designed language is that it's older than english and has had more time to evolve (this reasoning is flawed though because hebrew was a dead language for 2,000 years and was revived only in the last century; the elements of the hebrew language i describe below were present in the language as far back as 1200 BC during the exodus).

if i could describe to you the design of the hebrew language. it's beautiful. there is a notion of a "root" which most of the time is composed of three consonents. i don't even know how to describe it. you can say that almost the entire vocabulary of the hebrew language is a set of roots. then from that root are various derivatives. there is a consistent way to "transform" a root for conjugation, to create a verb form, noun form, to imply possession, and much more. like everything else, an illustration is more revealing than a description. take for example the root "YLD" which means "child" or is the concept that is related to the word "child." yes, a Y is a consonent in the hebrew alphabet. that's because a consonent in hebrew is defined as "that thing that goes in between two vowels." take for example the word "maya" -- the "y" goes in between two vowels. by the way, since the latin alphabet is a derivative of the hebrew alphabet (or maybe they have a common ancestor), the Y ("yud") corresponds to the letter "J" in latin, which i believe in german is pronounced "Y" but in english for some reason has morphed into the current "Jay" sound. so jonathan for example was never called jonathan, but yonatan. but i digress. so there is this model in hebrew that all writing is basically an alternating pattern of consonents and vowels. as in "pomade" - 3x(consonent followed by vowel). to handle the exceptional case of two consonents back to back, there is the notion of a null vowel: it's called the "shva." and there is a null consonent too: that's the letter "aleph" (which somehow morphed into the letter "a" in latin). it turns out that hebrew doesn't even have vowels: it's all consonents. i mean they're there but they've become implicit in the language. there are no letters for vowels. people have discovered they can read and write faster if they just omitted the vowels altogether. but again i digress. ok, so back to YLD. "yeled" means boy. "yalda" means girl. any word that is somewhat related to the concept of childhood is derived from that root. take birthday: that's two words: birth and day. the "birth" in birthday is "holedet." ok, the y has disappeared but take my word for it: the 'l' and 'd' come from the root of yld. "toldot" means "the telling of the generations of" or "recounting the genealogy of" which occurs frequently in the bible (as in "these are the generations of jacob (israel)." "toldot" again derives from yld. "yaldut" means childhood. "lehivaled" means "to be born." "yelid" means "native." so in english we have ten words that have no phonetic relationship to one another: boy, girl, birth, genealogy, child, etc.. (only a relationship in meaning). in hebrew they're all related both in meaning and phonetically. i realize that latin and greek have that concept as well. "gene" in genealogy is essentially a root. and there are many words in english that derive from the "gene" root such as "generation", "genes", "gender", etc.. that's why people recommend learning latin and greek: because so many words in the english language derive from lating and greek roots.

now that i think of it, i think hebrew could use a javadoc tool that will tell you all words that derive from a specified root. kind of like an "descendants" cross-reference in the java almanac (or ashkelon). maybe there's already one out there, who knows.

what's even more important is that the rules for derivation of words from their root does not change from one root to another. so if you learn the rules once, you now can generate almost the entire hebrew language from its set of roots. it's like the root being a machine and the user interface for that machine being the set of rules you apply to generate sentences. you learn how to operate one machine, and now you know them all. that reminds me of naked objects: naked objects does that for user interfaces: it generates a _standard_ ui for any set of types you desire. if you know how to work with one type of object, you've just learned the user interface for all of them. (that's why i believe that hebrew is an easy language to learn: it's less redundant: it's well refactored.) that's powerful stuff. the type can be parameterized. just like in hebrew the root can be parameterized. that's what the java reflection api gives us: programming model constructs are elevated to types, which can, among other things, be used as parameters. the type can be treated in a standard way. that's also the reason for the command pattern: elevating a method to a type.

so back to java. java is a fairly new language. i think we're slowly witnessing the evolution of a programming language. i think java has been evolving remarkably well. we wanted assertions, we got them. we wanted complementary apis that were not in j2se, we got them from apache and sourceforge and all kinds of places. we wanted a simpler way of doing iteration: we got it in java 5. we wanted parameterized types / generics: we got them. we also got annotations. we'll see which will survive and which will not. they might remain in the dna.. i mean in j2se. but they might not be used as much (e.g.: the preferences api). we are also experimenting with other languages that are byte-code compatible with java, groovy being an example.

i know there are a few things in j2se that haven't exactly evolved very much. the issue of ownership can also sometimes get in the way of evolution. maybe evolution deals with that by being slow. people die and then no one is left to fight for ownership .. of ideas, at least.

well, if you made it this far the only thing i can say is: thanks!





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds