Skip to main content

Simple database maintenance for databases embedded in NetBeans RCP applications

Posted by fabriziogiudici on May 29, 2008 at 9:41 AM EDT
JPA is a very good tool for simplifying the task of performing queries and updates to a relational database while keeping the Java code as much object oriented as possibile; but one of the tasks that are not covered is the standard maintenance of the database, such adding or dropping columns from an existing table, or renaming a table or a column because something has been changed in the application code.

Indeed, all existing implementations of JPA that I know have a limited capability in this area: the capability of generating the SQL schema out of the annotations if the database doesn't exist (or if you want to create it from scratch). For instance, let's suppose to have a persistent class Entity1 as follows:

@Entity
@Table(name="ENTITY1")
public class Entity1
  {
    @Id
    @Column(name="OID")
    private String id;
  
    @Column(name="STRING1")
    private String string1;
  
    @Column(name="STRING3")
    private String string3;
  
    @Column(name="INT1")
    private int int1;  
  }

By proper configuration it will be possible to instruct JPA to create for us a table with the following characteristics:
CREATE TABLE ENTITY1 (OID VARCHAR(10) PRIMARY KEY, INT1 INT, STRING1 VARCHAR(10), STRING3 VARCHAR(10))
This holds for every new entity that we could introduce in our application - but JPA won't be able to discover changes in the Entity1 structure and reflect them to the database schema if the related table already exists. While this is not usually a problem for enterprise applications (which usually have specific personnel to do the administration tasks when they are planned), things change radically in the desktop application perspective. Here you might have an embedded database that would need changes at some point in the release history of the application and for the very nature of a desktop application it will be run by end users on their computers, thus requiring unattended and automatic upgrade of the database.

Up to now I didn't addressed this task in blueMarine with an explicit infrastructrure; but this drove me to the introduction of some very poor code with some very bad characteristics:
  1. It is centralized in a single module and takes care for all the entities in the application, by assigning a version number to the whole database. This is bad, since blueMarine is very modular now, with each module being completely autonomous for what concerns JPA tasks (especially after some fixes to the JPA configuration mechanism). Indeed, cleaner code should be used instead, and should be implemented only in the relevant module.
  2. It jeopardizes JPA's capability of creating the schema in the properly fashion. For instance, when I have to add a new column in a table I must explicitly write a small SQL script that performs the needed changes to the existing table; but this information is already present in the JPA annotations - a DRY (Don't Repeat Yourself) violation.
For this reason, I've just started coding a new specific infrastructure. The basic concepts are:
  1. Every table has its own version number.
  2. Every table is maintained by a specific small class which can upgrade the managed table from version n to n+1; multiple classes can be run in sequence to upgrade a table starting from an arbitrary original version number.
  3. No SQL is needed if possible and let's always delegate to JPA the responsibility of creating the schema.
The maintenance classes extend the base class MaintainerTask and are registered by means of the META-INF/services facility.

Let's have a look at an example. We have previously listed how the version #0 of the table (and the related entity) looks like. The first change is for version #1, in which we discover that we don't need the string3 attribute. At this point, the new Entity1 class becomes:

@Entity
@Table(name="ENTITY1")
public class Entity1
  {
    @Id
    @Column(name="OID")
    private String id;
   
    @Column(name="STRING1")
    private String string1;
   
    @Column(name="INT1")
    private int int1;
  }

To tell the maintainer that we want to drop the column STRING3 we just need to implement this class:

public class ENTITY1_v0_v1 extends MaintainerTask
  {
    public ENTITY1_v0_v1()
      {
        super(Entity1.class, "ENTITY1", 0, 1);
      }

    @Override
    public void run()
      {
        updateColumns(drop("STRING3"));
      }
  }
The name of the class is not relevant - I'm just using the a coding convention of using the table name (hence the uppercase letters) with the version transition as a suffix. The constructor is declaring to the maintainer that the table name is still ENTITY1 and we're upgrading from version 0 to version 1. The run() method then tells the maintainer to update the columns in the table, by dropping STRING3.

Please note that declaring ENTITY1 and STRING3 does not violate the DRY intent, as you might suppose: we're giving to the system the historic information of the names at the moment of the transition from version 0 to version 1; if in future some names will change, the JPA annotations will carry only the names for the latest version, not for the historic ones.

Next step: let's suppose want to create two new attributes, int2 and string2:
@Entity
@Table(name="NEW_ENTITY1")
public class Entity1
{
@Id
@Column(name="ID")
private String id;

@Column(name="STRING1")
private String string1;

@Column(name="STRING2")
private String string2;

@Column(name="INT1")
private int int1;

@Column(name="INT2")
private int int2;
}
and we want to initialize int2 with the value 4, String2 with the value 'foobar'. Here's what we need:
public class ENTITY1_v1_v2 extends MaintainerTask
  {
    public ENTITY1_v1_v2()
      {
        super(Entity2.class, "OLD_ENTITY1", 1, 2);
      }

    @Override
    public void run()
      {
        updateColumns(create("INT2", "INTEGER", 4),
                      create("STRING2", "VARCHAR(10)", "foobar"));
      }
  }
The constructor is declaring that the table name is still the same, the run() method asks to create two new columns with the specified type and initial value.

Actually, there is a DRY violation here, as I'm repeating the column type and initial value here. I think/hope that I'll be able to get rid of this as the facility evolves.

The third step consist in a change of the table name from ENTITY1 to NEW_ENTITY1 and a change of the name of the primary key from OID to ID:
@Entity
@Table(name="NEW_ENTITY1")
public class Entity1
{
@Id
@Column(name="OID")
private String id;

@Column(name="STRING1")
private String string1;

@Column(name="STRING2")
private String string2;

@Column(name="INT1")
private int int1;

@Column(name="INT2")
private int int2;
}

This is accomplished by a slightly more complex MaintainerTask:
public class NEW_ENTITY1_v0_v1 extends MaintainerTask
  {
    public NEW_ENTITY1_v0_v1()
      {
        super(Entity1.class, "NEW_ENTITY1", 0, 1);
      }

    @Override
    public void run()
      throws SQLException
      {
        if (existsTable("ENTITY1"))
          {
            importFromTable("ENTITY1");
          }

        updateColumns(rename("OID", "ID"));
      }
  }

First, you should note that we are resetting the version number. In fact, it refers to a table name, not a Java entity, and we have basically a new table name NEW_ENTITY1 now whose initial version is 0 (doesn't exist) and being upgraded to 1. We are explicitly coding in the run() method that we want to import data from the old ENTITY1 table, if it exists, and that we want to rename a column. The facility will drop the old table after the import has been performard.

Enough for today. In a future post I'll tell you how this is being implemented.
Related Topics >>

Comments

Thank you for addressing the issue of schema evolution. It is an important issue that is too often overlooked. We use a VERY similar technique!

@sandeepspatil123, as code consolidates I'll publish it in the blog, stay tuned.

Thanks for touching upon this topic. This is exactly what I was looking for. Would be of great help if you can also post the implementation details.

Check out "Refactoring Databases: Evolutionary Database Design" by Scott Ambler and Pramod Sadalage. The accompanying website contains descriptions of all the refactorings explained in the book: http://databaserefactoring.com/

Thank you for sharing this information - if others are doing similar things, that means that I'm not completely off track ;-)