Skip to main content

Removing elements from Swing HTMLDocument

Posted by g_s_m on June 18, 2007 at 8:36 AM PDT



Removing elements from Swing HTMLDocument


Suppose you have an HTML document, like this:

<ul>
  <li id="LI1">Item 1</li>
  <li id="LI2">Item 2</li>
  <li id="LI3">Item 3</li>
</ul>
         

And you want to programmatically remove one of the list items (say,
the second one).

For this, you load the document into Swing’s JEditorPane component:

JEditorPane p = new JEditorPane();
p.setContentType("text/html");
p.setText("HTML text from the above example");
   

As you know the HTML element’s ID, you can easily obtain
a reference to the corresponding Element object:

HTMLDocument d = (HTMLDocument) p.getDocument();
Element e = d.getElement("LI2");
   

But how to remove this element from the document? A quick
glance through the HTMLDocument API
reveals the natural candidate for the task: the setOuterHTML method which allows to
replace an arbitrary element with some new content. So you think,
“if I specify an empty string as the new content, the
element will be just removed from the document as there’s no
replacement.” Pretty simple and elegant solution,
really.

Except that it doesn’t work.

The setOuterHTML method is
implemented so that if there’s no content in the
replacement string, the target document won’t be altered at
all. The method call is just a no-op in this case.

So how to actually remove an element from the document? Quite
surprisingly, until recently there was no easy way of doing this.

Starting with JDK7
build 10
, the Swing Text subsystem provides the new public
method in the DefaultStyledDocument
class: the removeElement method. It
takes an element to remove as the sole parameter and removes the
element from the document tree, as well as the corresponding text
from the document content. So in order to remove the element, you
just invoke this method:

d.removeElement(e);
   

Due to specifics of the default Element and Content implementations, there are some
caveats:

  • Empty branch elements are not allowed in the default
    implementation; so if you remove the last child of some element,
    the element itself will be removed as well, recursively (this
    means that if you need to replace the sole child, you
    should add the new child first and then remove
    the old one, not vice-versa).
  • Element-less documents are not allowed in the default
    implementation; so you’ll get an exception if you try to
    remove the last leaf element in the document (because this will
    eventually require removal of the document root element).
  • The default Content
    implementation requires the presence of the trailing newline
    character in the document; so if you remove the leaf element
    containing the trailing newline, it will be added to
    the preceding leaf element (unless that element already ends
    with newline).


Related Topics >>