Database Refactoring with Liquibase

Introduction

The world of software development has changed radically.

In past years, prior to deploying an application, significant time and effort was expended in gathering requirements to produce a design. The design was completed up front, before any code was written or database schema created. Furthermore, the design was comprehensive and static – it was expected to be viable for years without requiring significant modifications. When changes eventually were needed, they became a major redesign effort that often impacted multiple facets of the application and database. None of this was completed quickly, or without risk.

Fast-forward to the present. Much has changed in the world of code development since that prior time. Now, it’s all about deployment speed and agility. Whoever gets there first, wins. That is the new mentality. So over the years, new practices and tools were developed to accelerate and simplify code development and deployment processes. Instead of developing comprehensive applications, the new methods emphasized developing what was most needed now, without too much concern for future requirements. Just-in-time development, in a sense.

This mindset spawned additional processes and practices that radically accelerated application deployments. Agile development and DevOps promoted continuous integration and continuous deployment practices. With effective tools, the latest version of application code could be organized and deployed automatically, quickly and reliably. With careful planning, applications could even be deployed in real-time, incrementally transitioning the workload in short order. Just as important, this process could be backed out in real-time, without concern. Deployment risks were greatly diminished.

While these modern approaches have been used very successfully to deploy new code capabilities, traditional database deployment practices have negated much of that benefit, because data and database structures, along with their embedded relationships – especially in large databases – are not easy to modify quickly. Database modifications became a bottleneck that constrained deployment productivity, and database refactoring emerged as a solution to this problem.

In ways that are similar to code refactoring, database refactoring is a process that methodically evolves the structure and content of a database by making small, structured, and incremental changes over time, synchronized and versioned with application code. Database refactoring is implicit in the concept of evolutionary database design, a practice which is based on the premise that the database, much like application code, will undergo modifications as business requirements evolve. Instead of producing a static, comprehensive database design as typically done using the waterfall model, the evolutionary model produces a design that is both expedient and expected to change.

While database refactoring is typically used in agile development environments that exhibit more dynamic and frequent deployments, it can always be used with traditional database deployments. The productivity benefits are especially well-suited for development, test and Q/A environments.

So even if your company is not yet fully immersed in agile practices and DevOps, DB refactoring methodologies may still bring benefits.

 

A closer look at database refactoring

What types of modifications are included in database refactoring? Well, just about anything and everything. The point is to evolve a design as requirements dictate, and that includes all types of changes, starting with the initial creation of the database and all intermediate changes throughout its lifetime.

All types of objects are subject to refactoring, such as tables, indexes, views, sequences, triggers, etc. Simple refactorings include changing constraints, nullability and default values, or adding columns. More complex changes include splitting or merging columns and tables, or migrating columns from one table to another.

It’s important to understand that refactorings apply incremental changes to an object as opposed to replacing an object by simply recreating it. Incremental changes reduce risk and facilitate back-out contingencies, and support transitioning through application code versions more effectively. Multiple versions of the application can be supported by a properly refactored database until fully migrated.

The collection of refactorings for a database can be viewed as a file or script of all changes, so that repeating the refactorings sequentially produce an identical schema up to that point in time.

Let’s call this file a changelog. A changelog-based approach provides several advantages, including:

  • Auditing
    An audit trail of all actions taken on the database. Changes are not made manually on the fly, but are first incorporated into the changelog and validated against a test database before further deployment. There is never any doubt about whether or what type of changes were applied to a database – the changelog keeps a record of all modifications.
  • Documentation
    The changelog documents all changes made and includes the ability to add commentary as needed to explain or justify the changes.
  • Replication
    A database can be precisely replicated from one platform or environment to another. This facilitates continuity among testing databases and the production database (and any replicas).
  • Contingency
    The changelog provides the means for sequentially undoing any modifications to the database, migrating the schema to any previous state. Well-constructed changelogs always contain the inverse refactoring operation, such as a DROP TABLE operation for a CREATE TABLE refactoring.
  • Automation, code synchronization and versioning
    The changelog is the basis for automating database refactorings. Furthermore, it can be incorporated into other DevOps toolsets for synchronization and versioning with application code.

 

A refactoring example

Let’s begin with a simple example and see how database refactoring may be used to evolve the schema over time. Each of the steps below would be defined in the changelog and executed sequentially.

The example consists of a sequence generator object (not shown, but would be used in an actual implementation) and one initial database table, PERSON, which contains: a person identifier, first and last names, and an email address.

The initial version of this table contains a single column for the person’s email address – there is no provision for multiple email addresses, and this is can be very limiting for certain applications.

To improve this design, we will evolve, or refactor, the database into two tables to support multiple email addresses, as shown below, which enables a one-to-many relationship of person to email addresses. This design change is much more flexible and accommodating for people using multiple email addresses.

To execute this refactoring, the column, EMAIL_ADDRESS is migrated from the PERSON table to the new EMAIL_ADDRESSES table, and supplemented with additional columns, as needed. You can see these columns in the new table, EMAIL_ADDRESSES.

In addition to defining the column in the new table, the column data must also migrate. Operations such as this can be integrated into the changelog, but if they are overly complex, they may need to be performed outside the changelog. In this example, the email addresses could be propagated to the new table using a simple INSERT-SELECT SQL statement, included in the changelog.

Although not shown in the above example, the EMAIL_ADDRESS column would remain in the PERSON table until such a time it is no longer needed. If the migration were to fail, or must be backed-out for any reason, having the column remain in the original table simplifies contingencies. The column can be renamed at a later time so users do not access it (or suppressed via a SQL view) until it can be removed.

The changelog commands for the preceding operations can be represented by the following SQL statements:

— Initial DB creation

CREATE SEQUENCE sq_1 …

CREATE TABLE person …

— Refactoring for email addresses

CREATE TABLE email_addresses …

INSERT INTO email_addresses

SELECT person_id, sq_1.NEXTVAL, email_address, ‘Y’

FROM person …

ALTER TABLE person ALTER COLUMN email_address RENAME TO obsolete_1 …

 

Deployment options

At this point, there are several options available to deploy these changes. In the simplest migration, the database refactorings are integrated with the new application code and deployed as they are. Once the database refactoring completes, the new version of the application can use the newly modified database.

For greater resiliency however, the email addresses could be maintained in both tables and synchronized for an indefinite period of time. This approach allows both old and new versions of the application to execute concurrently and extends the time for the code migration to be completed. It reduces application availability risk, should problems appear with the new code, and greatly simplifies contingency planning for backing-out changes.

Synchronizing column changes between tables such as in this example are typically carried out using database triggers defined on each table. For example, whenever the application makes a change to the email address in the PERSON table a trigger would propagate that change to the email address in the EMAIL_ADDRESSES table, and vice versa. Appropriate actions would be defined using additional triggers for all change types (update, insert, delete) that could occur on either table.

For this type of deployment, the changelog would include these additional operations:

— Add triggers to sync email addresses

CREATE TRIGGER insert_person … ON person …

CREATE TRIGGER update_person … ON person …

CREATE TRIGGER delete_person … ON person …

— Similar triggers for EMAIL_ADDRESSES table

Once the application is completely migrated to the new code version, the database triggers can be removed and the old email address column deprecated.

— Remove support for refactoring

DROP TRIGGERs …

ALTER TABLE person ALTER COLUMN email_address RENAME TO obsolete_1 …

Some customers refine the deployment process further by executing the refactoring steps in real-time, to eliminate application downtime during the migration. This is not always feasible, but if considered, it must be tested carefully to insure there is no excessive disruption to application performance.

 

An introduction to Liquibase

Liquibase (https://www.liquibase.org) is one of the most common database refactoring tools. It is available as a free, open source Java jar file, which means it can be deployed on Linux, Windows, and even directly on the HPE NonStop system.

The open source version of Liquibase contains over 30 built-in database refactorings. A commercial version, Liquibase Enterprise, extends the core capabilities with additional features.

Database refactorings are described in Liquibase changelogs using a choice of language formats: XML, YAML, JSON or SQL. Changelogs can be nested to accommodate complex environments. All changes applied to a database are recorded in a Liquibase changelog table (created on the host database system), and concurrency control is managed using a Liquibase lock table. The changelog table is used both for control purposes as well as being a means for the user to validate the state of database refactorings.

Liquibase can be executed on the command line, integrated with tools such as Apache Maven, Apache Ant and the Spring framework, or embedded within Java application code to initiate database refactorings as part of application startup.

The intent of Liquibase is to provide a means for executing database refactorings in a generic manner, without the user requiring deep SQL language skills. This point is illustrated below, with a snippet from a changelog file, expressed in XML, to create a SQL table in the host database:

There are several points to note in this example:

  • Refactorings in Liquibase are implemented between tags called “changeSet”, with all operations between these tags applied atomically. A changelog file may contain many changesets, and each changeset may contain one or more operations.
    • Changesets contain two main parts: the “command” part, which implements an operation, such as “createTable”, and a “rollback” part, which undoes the command, such as “dropTable”.
  • The ability to provide extensive free-form comments that improve readability and describe intent.
  • A unique free-form identifier for each refactoring step (noted as “changeSet id”) that includes an author ID. This is recorded in the Liquibase changelog table.
  • A formal comment tag, which is also recorded in the Liquibase changelog table.
  • The use of predefined Liquibase refactoring commands, such as, “createTable”. With these commands, the user – who is often an application developer and not a DBA – can define database objects without knowing specific SQL syntax. Each built-in refactoring is documented with a set of associated attributes which are used to customize the object/operation. In this example, the table being created is defined with three columns, appropriate datatypes, and desired constraints, such as a primary key.

Furthermore, a changelog can use strategically placed tags to group together changesets. Note the tag named, “tagDatabase” in the example below:

These tags are useful for delineating logically complete groups of changesets and can be used to more easily rollback database changes to a specific state.

Instead of (or in addition to) using the built-in Liquibase commands, you can use native SQL database statements. This is useful when using changesets that are not supported in the open-source version, or when you need greater control for specific database customization.

The example below shows how custom data generation for producing test data can be described in the changeset using SQL:

Alternatively, Liquibase changesets can load and/or update CSV data from an external file, and execute system commands for custom control.

Other uses for SQL-based changesets include specifying table or partition management functions (e.g., split partition) or defining SQL triggers, which are not available as built-in commands in the open-source version.

There are a number of ways to control and customize Liquibase execution. These are used to set error handling behavior and run modes. Precondition attributes control the circumstances of how changesets are applied. The example below illustrates how the changeset can only be executed against a SQL/MX database, as user DEMO.LIQBASE.

When working in a multi-database environment, the “preConditions dbms” attribute is used to customize refactorings for specific databases, thereby allowing the user to maintain a single changelog used across multiple database platforms.

The use of changelogs and changesets is very flexible and can be combined in various ways to conform to different deployment preferences. For example, you can use a single changelog file to hold all changesets, or conversely, use a master changelog file that only contains references to subordinate changelog files, with each file containing a logical set of changesets. Whatever works best.

 

Executing refactorings

There are two basic commands in Liquibase to execute refactorings. The “update” command applies all outstanding changesets within the associated changelog file, while the “rollback” command reverses changesets back to a specified point. Rollbacks can specify a fixed number of refactorings or a database tag, which reverses all changes back to that point, accordingly. Here is an example of each.

./liquibase –changeLogFile=lb-master1.0.xml update

./liquibase –changeLogFile=lb-master1.0.xml rollback DB_version_2

Each command also includes a “SQL” mode which displays the SQL statement(s) that the changeset will execute, without actually executing the command. This mode is very helpful for validating the exact actions Liquibase will take when processing each refactoring.

 

NonStop support for Liquibase

NonStop SQL/MX has been enhanced to support Liquibase commands via a Liquibase extension jar file which is included with the system software[1].

The diagram below illustrates how Liquibase interacts with the other components:

As noted previously, Liquibase is simply a Java jar file that executes on Windows, Linux or the NonStop platform. It uses the SQL/MX JDBC driver to connect to the target database platform[2]. Liquibase creates two tables for its own use (one to record refactorings and the other to manage concurrent Liquibase executions).

Liquibase reads changesets from one or more changelog files, and using the SQL/MX extensions jar, transforms the generic changeset commands into the appropriate SQL statements. These are passed to the SQL/MX database for execution using the JDBC driver. Upon successful completion, the changeset operation is added to the list of completed operations in the Liquibase changelog database table.

 

Summary[3]

Changes in the application development life-cycle have fostered the use of many new technologies, all of which have the objective of producing and deploying code more quickly, with greater accuracy, and with better control and orchestration. While these tools and techniques have resulted in radical improvements for code development and deployment, applying changes to the database often remains a bottleneck that constrains overall productivity.

The concept of evolutionary database design introduces a new approach that incorporates the concepts of code refactoring into the process of database design maintenance. The popular open-source product, Liquibase, takes these concepts further by materializing them into an easy-to-use, effective productivity tool. Linux-based databases have benefitted from this and other similar tools for several years. And now, Liquibase can be used with NonStop SQL/MX databases just as easily as with Linux-based databases through enhancements and support from the SQL/MX extensions file.


1 For installation and other details, please consult the manual, Open Source Java Frameworks on NonStop User’s Guide (Spring 3.1.0, Hibernate 5.4.10, MyFaces 2.0.2, Axis2Java 1.5.2, Liquibase 3.8.0)

2 A database connection is established to the SQL/MX Communication Subsystem (MXCS), and requires an available datasource with free ports. For details, please refer to the manual, HPE NonStop SQL/MX Connectivity Service Manual for SQL/MX

3 This article is based on the presentation from the author during TBC 2020 (Tandem Boot Camp) and is available for download/viewing.

 

Author(s)

  • Paul Denzinger is a Distinguished Technologist in the NonStop Advanced Technology Center, focusing on innovative mission-critical solutions for HPE NonStop customers. He has provided architectural and technical consulting support to customers worldwide for applications as diverse as high-performance trading systems to very large database systems. In addition to his focus on relational database technology, he has worked with several big data products, including Vertica, Hadoop, and HBase, and with the emergent blockchain technology. Among other activities, Paul provides training, workshops and consultation services in the areas of database technology and performance.

Be the first to comment

Leave a Reply