The Source for Java Technology Collaboration
User: Password:



Start New Message Delete Post a Reply

Article: 
 Lucene Intro
Subject:  Skip new TAGS using JavaCC for HTMLparser.jj
Date:  2004-05-11 04:32:25
From:  karthik_net


Guys
If u encounter no nonsense tags while stripping them( tags ) using lucene,The problem lies with the File "HTMLParser.java"
Since This File is the One which strips out the Tags out of the content fronm HTML tags.

Some times this may not work relating to o/p during indexing ....
=====================================
ex:-
Parse Aborted: Encountered "=" at line x , column y.
Was expecting one of:
<ArgName> ...
<TagEnd> ...
=====================================

In such a case
Use JavaCC to precompile the HTMLParser.jj file which replaces the HTMLParser.java file with escape syntax appropriately written in another file HTMLParseConstant.java

Experimented Resource "Javacc-tutorial.pdf"

Enjoy Indexing..........:)

 Feed java.net RSS Feeds