Guys
If u encounter no nonsense tags while stripping them( tags ) using lucene,The problem lies with the File "HTMLParser.java"
Since This File is the One which strips out the Tags out of the content fronm HTML tags.
Some times this may not work relating to o/p during indexing ....
=====================================
ex:-
Parse Aborted: Encountered "=" at line x , column y.
Was expecting one of:
<ArgName> ...
<TagEnd> ...
=====================================
In such a case
Use JavaCC to precompile the HTMLParser.jj file which replaces the HTMLParser.java file with escape syntax appropriately written in another file HTMLParseConstant.java
Experimented Resource "Javacc-tutorial.pdf"
Enjoy Indexing..........:) |