Support for VARCHAR2 in NonStop SQL/MX

Introduction

Migrating from one SQL database management system to another is a task that needs careful planning and testing. One would expect, however, that the result of standard SELECT functions will return the same outcomes on different database platforms. Unfortunately, there are circumstances where the outcomes can be different.

When migrating from the Oracle RDBMS to other platforms, in general, one must change from the Oracle-specific VARCHAR2 data type to the ANSI VARCHAR data type. While these datatypes describe the same kind of data, variable length character arrays, there are differences in how NULLs are treated.

Like most SQL database management systems, HPE NonStop SQL/MX has implemented VARCHAR columns according to the ANSI standard. However, to facilitate porting from Oracle to SQL/MX it provides support of both VARCHAR and VARCHAR2 data types with their specific behavior. This support greatly reduces migration effort, as no source code changes are required to address the specific Oracle implementation.

Difference between Oracle and ANSI VARCHARs

Oracle users typically use the Oracle VARCHAR2 datatype in their data definitions. There is an important difference between how ANSI SQL and Oracle handle NULL values. This difference may lead to additional work when converting a database from Oracle to other databases.

A SQL column defined as VARCHAR can contain character data, letters, and numbers, up to a specified length. The space it occupies in the database depends on the actual number of characters in the column. The VARCHAR datatype is supported by probably all relational database management systems. Oracle supports the VARCHAR definition; however, it is treated the same as VARCHAR2.

ANSI SQL uses a three-valued logic: besides true and false, the result of logical expressions can also be unknown. SQL’s three-valued logic is a consequence of supporting NULL to mark absent data. If a null value affects the result of a logical expression, the result is neither true nor false but unknown.

In the ANSI SQL definition, a column that has not been given a value when a row was inserted is NULL, like an uninitialized variable in a program. Variable length columns can be empty, i.e., the length is 0, they can contain a string value, but they can also be NULL, i.e., never initialized or explicitly set to NULL.

In the Oracle RDBMS, an empty VARCHAR2 column, one with length of 0f bytes, is treated as NULL. This means that the same query executed on Oracle and other databases can return different results. When a database is migrated from Oracle to another database, not only the DDL must be changed from VARCHAR2 to VARCHAR; also code changes may be required to provide the same outcome of queries as seen in the original application.

HPE NonStop SQL/MX supports both ANSI VARCHAR and the VARCHAR2 datatype and treats NULLs in VARCHAR2 columns the same way as Oracle. This simplifies migrations to NonStop SQL/MX.

The next sections show examples that illustrate these differences.

Examples

To show the difference in behavior, the following SQL/MX table is used. It contains the same values in columns defined as VARCHAR and VARCHAR2 which makes it easy to see how query results can differ.

CREATE TABLE VARCHARS2 (
 ID    INTEGER PRIMARY KEY
,VC    VARCHAR  (10)  
,VC2   VARCHAR2 (10)
,DESCR CHAR(15)  NOT NULL
) NO PARTITION
;

INSERT INTO VARCHARS2 (ID, VC, VC2, DESCR) VALUES
 (1,'AAAAAAAAAA','AAAAAAAAAA','Upper AA')
,(2,'BBBBBBBBBB','BBBBBBBBBB','Upper BB')
,(3,'aaaaaaaaaa','aaaaaaaaaa','Lower aa')
,(4,'bbbbbbbbbb','bbbbbbbbbb','Lower bb')
,(5,'          ','          ','Ten spaces')
,(6,' ',' '  ,'One space')
,(7,NULL,NULL,'NULL')
,(8,NULL,NULL,'NULL')
,(9,'','','Empty str')
;

An additional row is inserted where the database provides the default (NULL) values for the missing columns.

INSERT INTO VARCHARS2 (ID, DESCR) VALUES (10,'Not provided');

SQL>select * from VARCHARS2;

ID          VC         VC2        DESCR
----------- ---------- ---------- ---------------
          1 AAAAAAAAAA AAAAAAAAAA Upper AA
          2 BBBBBBBBBB BBBBBBBBBB Upper BB
          3 aaaaaaaaaa aaaaaaaaaa Lower aa
          4 bbbbbbbbbb bbbbbbbbbb Lower bb
          5                       Ten spaces
          6                       One space
          7 NULL       NULL       NULL
          8 NULL       NULL       NULL
          9            NULL       Empty str
         10 NULL       NULL       Not provided

--- 10 row(s) selected.

As can be seen from the output of the SELECT statement, there are four rows with undetermined, or NULL, values in column VC2. However, in row 9, the value of VC is not NULL, because it does have a value in the ANSI sense: it is a string with 0 characters.

Migrating Oracle VARCHAR2 columns to ANSI VARCHAR columns may lead to different results from SQL queries that perform calculations across a set of values. For example, to calculate the average string length, NULL values are not included in the calculation. Because there are more non-NULL values for column VC, the average size is smaller than for VC2, even though the sum of the lengths is the same.

SQL>SELECT AVG(CHAR_LENGTH(VC)) AS LEN_VC, AVG(CHAR_LENGTH(VC2)) AS LEN_VC2
+>,      SUM(CHAR_LENGTH(VC)) AS SUM_VC, SUM(char_length(VC2)) AS SUM_VC2
+>FROM VARCHARS2;

LEN_VC             LEN_VC2              SUM_VC             SUM_VC2
------------------ -------------------- ------------------ -----------------
                 7                    8                 51                51

--- 1 row(s) selected.

Source code modifications

This section describes the migration efforts that are required when the target DBMS does not support the VARCHAR2 datatype when an application is moved from Oracle.

To mimic the way Oracle treats NULLs in VARCHAR2 columns in ANSI VARCHAR columns of other DBMSes, source code must be modified to check for empty strings. For example, by checking the length of the column for being 0. The SQL CASE statement can be used in these situations.

The two previous examples, when migrated to ordinary VARCHAR columns, will need the following CASE statement that returns a NULL value if the string length equals 0.

SQL>SELECT ID, CASE WHEN LENGTH(VC) = 0 THEN NULL ELSE VC END AS VC, VC2, DESCR  FROM VARCHARS2;

ID          VC         VC2        DESCR
----------- ---------- ---------- ---------------
          1 AAAAAAAAAA AAAAAAAAAA Upper AA
          2 BBBBBBBBBB BBBBBBBBBB Upper BB
          3 aaaaaaaaaa aaaaaaaaaa Lower aa
          4 bbbbbbbbbb bbbbbbbbbb Lower bb
          5                       Ten spaces
          6                       One space
          7 NULL       NULL       NULL
          8 NULL       NULL       NULL
          9 NULL       NULL       Empty str
         10 NULL       NULL       Not provided

--- 10 row(s) selected.

Beyond the simple SELECT output, there are many functions that are used with NULL values, such as arithmetic functions on strings such as AVG or MIN. Note that the SUM function is not affected by ignoring NULL values.

SQL>SELECT AVG(CHAR_LENGTH(
+>(CASE when length(VC) = 0 then NULL else VC end)
+>)) AS LEN_VC
+>, AVG(CHAR_LENGTH(VC2)) AS LEN_VC2 FROM VARCHARS2;

LEN_VC               LEN_VC2
-------------------- --------------------
                   8                    8

--- 1 row(s) selected.

The functions NVL, NVL2, COALESCE, the statements IS NULL and IS NOT NULL require such a CASE statement to return the same output.

String concatenations with NULL strings return a string in Oracle, but NULL in other databases, and, like Oracle, a string with SQL/MX VARCHAR2.

SQL>SELECT ID, VC||'***' VC , VC2||'***' VC2 FROM VARCHARS2;

ID          VC            VC2
----------- ------------- -------------
          1 AAAAAAAAAA*** AAAAAAAAAA***
          2 BBBBBBBBBB*** BBBBBBBBBB***
          3 aaaaaaaaaa*** aaaaaaaaaa***
          4 bbbbbbbbbb*** bbbbbbbbbb***
          5           ***           ***
          6  ***           ***
          7 NULL          ***
          8 NULL          ***
          9 ***           ***
         10 NULL          ***

--- 10 row(s) selected.

The COUNT function does not count NULL values, and this may affect COUNT (column_name) results.

The following example shows three different results when the COUNT function is used in the dataset that contains 10 rows. There are 7 rows that have a non-NULL value according to the ANSI VARCHAR rule, and there are only 6 rows that have non-NULL values according to the Oracle VARCHAR2 rule.

SQL>SELECT COUNT(VC) COUNT_VC,COUNT(VC2) COUNT_VC2,COUNT(*) COUNT_ALL
+>FROM VARCHARS2;

COUNT_VC             COUNT_VC2            COUNT_ALL
-------------------- -------------------- --------------------
                   7                    6                   10

--- 1 row(s) selected.

When ordering the output on the VARCHAR2 column, Oracle output will group empty strings and NULL values together, by default as highest values. In ANSI VARCHARs empty strings have the lowest value while NULL values are ordered highest¹.

SQL>SELECT * FROM VARCHARS2 ORDER BY VC2;

ID          VC         VC2        DESCR
----------- ---------- ---------- ---------------
          6                       One space
          5                       Ten spaces
          1 AAAAAAAAAA AAAAAAAAAA Upper AA
          2 BBBBBBBBBB BBBBBBBBBB Upper BB
          3 aaaaaaaaaa aaaaaaaaaa Lower aa
          4 bbbbbbbbbb bbbbbbbbbb Lower bb
          8 NULL       NULL       NULL
          7 NULL       NULL       NULL
          9            NULL       Empty str
         10 NULL       NULL       Not provided

--- 10 row(s) selected.
__________________________________

¹Oracle and other DBMSes support the ANSI option “NULLS {FIRST | LAST}” in the ORDER BY clause.

Conclusion

When migrating Oracle VARCHAR2 data to ANSI VARCHAR, developers usually expect that there will be no difference. In many cases, this will be true, however, as this article has shown, there are differences one needs to be aware of in the case of empty strings and NULL values. To be on the safe side, SQL statements must be changed when moving from VARCHAR2 to VARCHAR unless the data is guaranteed not to include empty strings. Because NonStop SQL/MX supports the use of the same VARCHAR2 datatype and its behavior this simplifies migration and testing.

Author

  • Frans Jongma

    Frans Jongma is a Master Technologist for the NonStop Advanced Technology Center (ATC) and is based The Netherlands. Frans has worked in several consulting positions for the NonStop Enterprise Division since 1989. His main areas of expertise are: NonStop SQL application design, performance analysis and high availability. Prior to joining Tandem, Frans has worked on the design and implementation of database management systems and developer productivity tools for UNIX and proprietary systems. As part of the ATC Frans has been involved in proving the concept of SQL/MX as a Service which resulted in the SQL/MX DBS product and an HTML-based prototype of a web-based user interface to SQL/MX DBS. This HTML version is the basis of WebDBS, the GUI interface for administrators and users of NonStop SQL/MX DBS.

Be the first to comment

Leave a Reply

Your email address will not be published.


*