The Source for Java Technology Collaboration
User: Password:



Daniel Brookshier's Blog

Databases Archives


Project Spotlight: Zemberek- Turkish NLP and Turkish OpenOffice Spellchecker

Posted by turbogeek on May 14, 2005 at 11:47 PM | Permalink | Comments (0)

Project Spotlight Interview with Ahmet Akin of project Zemberek.

Project Name: Zemberek
Summary: Turkish NLP library and Open Office Turkish Spellchecker Plugin
Owner Names: Ahmet Akin & Mehmet Dundar AKIN
City: Hato Rey, San Juan
Country: Puerto Rico

Tell us a little about yourself. I am 31 years old and originally an Electronics and Communications engineer. I worked in very different areas from Embedded system design to Technology Newspaper editor. I graduated and finished my master degree in Yildiz Technical University Electronics and Communications department in Turkey. I always had a weakpoint on software so changed focus to the higher level Software two years ago. Currently I am involving with Java development. My current employer is Softek Inc, I work as a developer there.

What schools/universities did you attend? Yildiz Technical University in Turkey and Istanbul Tecnical University in Turkey

Are you a member of any Java user groups? I moved to Puerto Rico two years ago, Java is not as popular as I want here and there is no local JUG. But I consider myself lucky because I'm involved with Java at work. I meet really great developers and my supervisor is a real Java guru (Victor Salaman) and I have learnt a lot from him. I still consider myself as a Java apprentice.

Tell us a little about the project and why you started it. Well, almost 5-6 years ago, I was interested in Mozilla project and I thought it would be cool to implement a real time spell checker for Turkish Language in it. Then I started to think how would it be and noticed that making a spell checker for Turkish is extremely hard. Truth is nothing like that is available in the open source area.

After I search about the subject, I became more interested in Natural Language Processing. I started a C++ project for the spell checker, and prototypes worked well. After 3 years and several changes in my life, with the help of my brother I decided to make the project alive again. But this time, we made a decision and rewrite the whole project in Java. It was a real breeze after C++. Seriously the difference in ease of development and deployment is huge, without sacrificing performance. We started a project in Java.net with the name of Tspell (the original name of the C++ project too).

Our scope was broader, we wanted to make a base for all kind of Turkish related computing and NLP problems. After almost one year, project was able to make Turkish spell checking, morfological extraction of the root and affixes of words, word suggestion for wrong words, and deasciifying texts written without using Turkish spesific characters. Then we changed the project name to Zemberek (Means main spring of clock) because "TSpell" was not Turkish and users did not like that. Now we also provide the first open source Turkish Spell checker for Open Office project and it works successfully. Zemberek is the only open source project in its area and we are proud of it. It bacame a part of the national Linux Project: Pardus ( http://uludag.org.tr/projeler/masaustu/zemberek-pardus/index.html ).

What is the project's current status and plans for the future? Although I still think that project is in its infancy, it is very active and usable for real life applications, Open Office plug-in is the proof of it. We also start developing a server project based on the core library. Server will hopefully provide language related services to other applications written in different languages, such as Mozilla and KDE. However, for us, there are a lot of work to do. Honestly right now Zemberek is still not doing serious "NLP" jobs. I can say it has a relatively simple structure and parsing mechanism is not really difficult.

After stabilizing the spell checker we will hopefully move on to more complicated and intresting subjects. Such as creating an open source wordnet for Turkish, sentence analysis, grammar checking, statistical analysis, maybe voice applications (TTS, Recognition, with the help of Free TTS and Sphynx4 libraries), translation, SQL with natural language, Shell commands with natural language, etc. Subjects in NLP are endless and when it is about Turkish there are very limited work available ( we know that in several universities in Turkey, there are advanced work available on the subject, but there are not many implemetations are available, especially in Java)

What kind of help are you looking for on this project? Of course, like all the other projects we are looking for developers. Currently three people are actively developing and it is really not enough. Unfortunately we cannot receive much help from international Java developers because of the nature of the project.

We are hoping that more help will come from Turkish Java developers. Knowledge related information is also crucial and project other members are helping. Turkish Linux communities helped a alot when we introduced Open Office plug in. Also we need linguists, experts in Turkish language and NLP.

Suggestions for GELC or Java.net It is great. I mean I really wish java.net would have started earlier.The services are improved nice and the projects in GELC are intresting. I know some NLP projects exist but since our main interestis Turkish I couldnt examine them in detail.

Suggestions, you should make yourselves more visible in educational environment. In schools MS is trying hard to lure the students, I think java.net, and Sun ingeneral should be doing this, because java's potential is greater.

Thanks, Ahmet!

If you have a project on Java.net and could deal with a little extra press, please contact me for a spotlight interview - Daniel Brookshier



JavaOne Community Corner

Posted by turbogeek on May 13, 2005 at 04:37 PM | Permalink | Comments (0)

Have you ever wished there was real focus on open source at JavaOne? Well, we have heard the call and are putting open source on stage all week long at JavaOne. Every day in the JavaOne Pavilion we are running mini talks on projects in the java.net community. The talks will be about 20 minutes long and run throughout the day. We will have a plasma display and seating for the audience in what is called the Java.Net Community Corner.

Community Corner is our way of helping everyone at JavaOne learn about Java.net and many of the great people and projects hosted there. As a leader in the Global Education and Learning Community I'll be there to talk about my part as well as a little with the JXTA community where I am a board member.

My guess is that we will have a lot to see all week long. We also have the advantage that we are in the Pavilion rather than the more formal sessions which means it will be easy to interact with anyone interesting hanging out at the Community Corner.

Now to the subjects. The talks are open to project owners of Java.net. That means if you or your buddies run a project on Java.net or move a project to Java.net real soon, you can probably get to talk about it. All you need to do is pick a time and propose it in the Community Corner wiki (I'll post it for you if you are unfamiliar with the java.net wiki). The process is that the community that your project is hosted in will approve your project and you are all set to go.

Scattered through the day, the community leaders of java.net will be giving talks too. This is so that you can see what is going on in our many Java.net specialties. We will also include info on how to create your own open source projects and how do it successfully.

In addition to the talks that you can both give yourself or just listen too, this will be the best place to talk to your community leaders and Java.net management.

Since we are a community here at Java.net, the most important things are sharing working together. At JavaOne this year, we'll be doing it all week long!

Sign your project up now for the community corner at JavaOne. Session times are limited!



New Open Source Projects in Global Education and Learning Community (GELC)

Posted by turbogeek on April 29, 2005 at 04:42 PM | Permalink | Comments (4)

We have another crop of new projects in in the Global Education and Learning Community (GELC). This week we have a system to publish theses, a Bliki written in Java, a Java Exam simulator, masters thesis investigating an agent-based privacy directive, a recorder that watches you work and listens to your speech, a compression database, language parser, and a purchase calculator. Take a look at these great projects and lend a hand to help them start.

liber-theses - Publish and catalog theses online
jrapido - Simple Java Bliki (Blog+Wikki)
JMockgen - Online Mock Simulator
Information-Agents - Investigating needed software for an active privacy directive
Copycat - A notepad that records writing and speaking
Compression - Compression of files to a database
CNLangParser - CN Language parser
Checkboxes - Display currency

Read on for more details on these projects

Continue Reading...





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds