Skip to main content

Spell Checking algorithms

Posted by daniel on September 23, 2004 at 9:19 AM PDT

What word did you mean to use?

Remember when spell checkers first started appearing in consumer
applications. We all had a lot of fun with the wacky suggestions that
we would get. You would leave out the space in "a business" and the
suggestion would come up "Did you mean 'abysmal'."

How would you implement a spell checker? How would you create
lists of words that the author may have meant to use? In href="http://www-106.ibm.com/developerworks/java/library/j-jazzy/">Can't
beat Jazzy, Tom White discusses two types of algorithms: phonetic
based and string similarity based. He describes how the Aspell
algorithm takes combines the two techniques to one that is used in
Jazzy.

These algorithms are based around western alphabets and, in the
cases given in the article, on English pronunciations. Never-the-less,
it's really interesting to see how you might tag words as candidates
for sounding alike. The second method of string similarity is based on
how many changes you have to make to turn one word into another. This
provides another measure of distance. Combining the two techniques and
tuning allows you to locate more likely replacement words.

Also in
Also in Java Today
, Leon Messerschmidt's JavaWorld href="http://www.javaworld.com/javaworld/jw-09-2004/jw-0920-coefficient.html">
article on Coefficient describes "an extensible Java platform
for online collaboration software [that] can be run either in an
EJB (Enterprise JavaBeans) server or as a standalone servlet."
After describing how to get started with Coefficient, the author
shows you how to add a custom wiki module. More useful modules
will need to be tied to a database. Coefficient handles the
database communication through Hibernate. Coefficient is targetted
at those who want to roll your own collaboration tools within a
framework.


Jack Shirazi writes that he has been a Java developer "for 9
years, and written dozens of personal Java apps. And enjoy it all
the time." In today's href="http://weblogs.java.net/">Weblogs, Jack reports
on Java and
coolness
and reports on a discussion with a guy who says that
he only writes in Java "to pay the bills. Never under any
circumstances have I written a personal application in Java. I
feel I fall into the category of one who thinks Java is woefully
uncool, and knows intimately why."

Joshua Marinacci continues his exploration of tiny applications in
New MiniApp:Storm
Drain
. He reports "While playing around some more with this
miniapp idea, I came across geographer Tyler Mitchell's weblog post
about hurricane tracking using Web Map Service urls. I thought this
would make an interesting MiniApp and give me a good opportunity to
play with a few webservices. Starting from his base (and with some
greatly appreciated clarification emails from Tyler), I've created
StormDrain, a simple program that loads WMS data and displays it
graphically."


In Projects and
Communities
, the href="http://wiki.java.net/bin/view/Javapedia">JavaPedia page
href="http://wiki.java.net/bin/view/Javapedia/Applications">Applications
has recently been refactored. Add your production quality Java
application. Sub-sections include desktop, internet, multimedia,
development, and GUI apps.

The Java
Communications
community project href="https://openim.dev.java.net/">OpenIM is developing "a fast,
simple, and highly efficient instant messager server" using the Jabber
protocol. It works with multiple Jabber clients, including GAIM.


The discussion of href="http://forums.java.net/jive/thread.jspa?messageID=672&tstart=0#672">
Hackers and Painters returns in today's

Forums
. John Mitchell asks "How much does taste
really matter in the software industry? Do you evaluate your
code in terms of taste (or "smell" or "style")? [..] Graham
ended with: "The recipe for great work is: very exacting
taste, plus the ability to gratify it." Is there any great
work being done in software?"

Yishai adds to the discussion on forking and releasing the TCK
saying "What you can do is put a 'You cannot use the Java brand or
name in your marketing without express permission from Sun' and then
tie the permission to passing the TCK and whatever other non-free
conditions Sun would like. So you could fork it, but you couldn't call
it Java, or claim that it runs like Java which would immediately give
it a marketing problem.


In today's java.net
News Headlines
:

Registered users can submit news items for the href="http://today.java.net/today/news/">java.net News Page
using our news
submission form
. All submissions go through an editorial
review before being posted to the site. You can also subscribe to
the href="http://today.java.net/pub/q/news_rss?x-ver=1.0">java.net
News RSS feed.


Current and upcoming
Java Events
:

  • September 23, 2004 href="https://see.sun.com/Apps/DCS/mcp?q=ST4hlKTFvkJhYA">Chat with
    Sun's Chief Web Services Strategist
  • September 23, 2004 href="http://www.CompuwareOJX.com/">Compuware OJX
  • September 24-26, 2004 href="http://www.nofluffjuststuff.com/2004-09-detroit/">Michigan Java
    Software Symposium
  • September 29-October 1, 2004 href="http://oscom.org/events/oscom4">OSCOM

Registered users can submit event listings for the href="http://www.java.net/events">java.net Events Page using our href="http://today.java.net/cs/user/create/e"> events submission
form. All submissions go through an editorial review before being
posted to the site.


Archives and Subscriptions: This blog is delivered
weekdays as the
Java
Today RSS feed
. Also, once this page is no longer featured as the
front page of java.net it will be
archived along with other past issues in the href="http://today.java.net/today/archive/">java.net Archive.