Introduction
Migrating from one SQL database management system to another is a task that needs careful planning and testing. One would expect, however, that the result of standard SELECT functions will return the same outcomes on different database platforms. Unfortunately, there are circumstances where the outcomes can be different.
When migrating from the Oracle RDBMS to other platforms, in general, one must change from the Oracle-specific VARCHAR2 data type to the ANSI VARCHAR data type. While these datatypes describe the same kind of data, variable length character arrays, there are differences in how NULLs are treated.
Like most SQL database management systems, HPE NonStop SQL/MX has implemented VARCHAR columns according to the ANSI standard. However, to facilitate porting from Oracle to SQL/MX it provides support of both VARCHAR and VARCHAR2 data types with their specific behavior. This support greatly reduces migration effort, as no source code changes are required to address the specific Oracle implementation.
Difference between Oracle and ANSI VARCHARs
Oracle users typically use the Oracle VARCHAR2 datatype in their data definitions. There is an important difference between how ANSI SQL and Oracle handle NULL values. This difference may lead to additional work when converting a database from Oracle to other databases.
A SQL column defined as VARCHAR can contain character data, letters, and numbers, up to a specified length. The space it occupies in the database depends on the actual number of characters in the column. The VARCHAR datatype is supported by probably all relational database management systems. Oracle supports the VARCHAR definition; however, it is treated the same as VARCHAR2.
ANSI SQL uses a three-valued logic: besides true and false, the result of logical expressions can also be unknown. SQL’s three-valued logic is a consequence of supporting NULL to mark absent data. If a null value affects the result of a logical expression, the result is neither true nor false but unknown.
In the ANSI SQL definition, a column that has not been given a value when a row was inserted is NULL, like an uninitialized variable in a program. Variable length columns can be empty, i.e., the length is 0, they can contain a string value, but they can also be NULL, i.e., never initialized or explicitly set to NULL.
In the Oracle RDBMS, an empty VARCHAR2 column, one with length of 0f bytes, is treated as NULL. This means that the same query executed on Oracle and other databases can return different results. When a database is migrated from Oracle to another database, not only the DDL must be changed from VARCHAR2 to VARCHAR; also code changes may be required to provide the same outcome of queries as seen in the original application.
HPE NonStop SQL/MX supports both ANSI VARCHAR and the VARCHAR2 datatype and treats NULLs in VARCHAR2 columns the same way as Oracle. This simplifies migrations to NonStop SQL/MX.
The next sections show examples that illustrate these differences.
Examples
To show the difference in behavior, the following SQL/MX table is used. It contains the same values in columns defined as VARCHAR and VARCHAR2 which makes it easy to see how query results can differ.
CREATE TABLE VARCHARS2 ( ID INTEGER PRIMARY KEY ,VC VARCHAR (10) ,VC2 VARCHAR2 (10) ,DESCR CHAR(15) NOT NULL ) NO PARTITION ; INSERT INTO VARCHARS2 (ID, VC, VC2, DESCR) VALUES (1,'AAAAAAAAAA','AAAAAAAAAA','Upper AA') ,(2,'BBBBBBBBBB','BBBBBBBBBB','Upper BB') ,(3,'aaaaaaaaaa','aaaaaaaaaa','Lower aa') ,(4,'bbbbbbbbbb','bbbbbbbbbb','Lower bb') ,(5,' ',' ','Ten spaces') ,(6,' ',' ' ,'One space') ,(7,NULL,NULL,'NULL') ,(8,NULL,NULL,'NULL') ,(9,'','','Empty str') ;
An additional row is inserted where the database provides the default (NULL) values for the missing columns.
INSERT INTO VARCHARS2 (ID, DESCR) VALUES (10,'Not provided'); SQL>select * from VARCHARS2; ID VC VC2 DESCR ----------- ---------- ---------- --------------- 1 AAAAAAAAAA AAAAAAAAAA Upper AA 2 BBBBBBBBBB BBBBBBBBBB Upper BB 3 aaaaaaaaaa aaaaaaaaaa Lower aa 4 bbbbbbbbbb bbbbbbbbbb Lower bb 5 Ten spaces 6 One space 7 NULL NULL NULL 8 NULL NULL NULL 9 NULL Empty str 10 NULL NULL Not provided --- 10 row(s) selected.
As can be seen from the output of the SELECT statement, there are four rows with undetermined, or NULL, values in column VC2. However, in row 9, the value of VC is not NULL, because it does have a value in the ANSI sense: it is a string with 0 characters.
Migrating Oracle VARCHAR2 columns to ANSI VARCHAR columns may lead to different results from SQL queries that perform calculations across a set of values. For example, to calculate the average string length, NULL values are not included in the calculation. Because there are more non-NULL values for column VC, the average size is smaller than for VC2, even though the sum of the lengths is the same.
SQL>SELECT AVG(CHAR_LENGTH(VC)) AS LEN_VC, AVG(CHAR_LENGTH(VC2)) AS LEN_VC2 +>, SUM(CHAR_LENGTH(VC)) AS SUM_VC, SUM(char_length(VC2)) AS SUM_VC2 +>FROM VARCHARS2; LEN_VC LEN_VC2 SUM_VC SUM_VC2 ------------------ -------------------- ------------------ ----------------- 7 8 51 51 --- 1 row(s) selected.
Source code modifications
This section describes the migration efforts that are required when the target DBMS does not support the VARCHAR2 datatype when an application is moved from Oracle.
To mimic the way Oracle treats NULLs in VARCHAR2 columns in ANSI VARCHAR columns of other DBMSes, source code must be modified to check for empty strings. For example, by checking the length of the column for being 0. The SQL CASE statement can be used in these situations.
The two previous examples, when migrated to ordinary VARCHAR columns, will need the following CASE statement that returns a NULL value if the string length equals 0.
SQL>SELECT ID, CASE WHEN LENGTH(VC) = 0 THEN NULL ELSE VC END AS VC, VC2, DESCR FROM VARCHARS2; ID VC VC2 DESCR ----------- ---------- ---------- --------------- 1 AAAAAAAAAA AAAAAAAAAA Upper AA 2 BBBBBBBBBB BBBBBBBBBB Upper BB 3 aaaaaaaaaa aaaaaaaaaa Lower aa 4 bbbbbbbbbb bbbbbbbbbb Lower bb 5 Ten spaces 6 One space 7 NULL NULL NULL 8 NULL NULL NULL 9 NULL NULL Empty str 10 NULL NULL Not provided --- 10 row(s) selected.
Beyond the simple SELECT output, there are many functions that are used with NULL values, such as arithmetic functions on strings such as AVG or MIN. Note that the SUM function is not affected by ignoring NULL values.
SQL>SELECT AVG(CHAR_LENGTH( +>(CASE when length(VC) = 0 then NULL else VC end) +>)) AS LEN_VC +>, AVG(CHAR_LENGTH(VC2)) AS LEN_VC2 FROM VARCHARS2; LEN_VC LEN_VC2 -------------------- -------------------- 8 8 --- 1 row(s) selected.
The functions NVL, NVL2, COALESCE, the statements IS NULL and IS NOT NULL require such a CASE statement to return the same output.
String concatenations with NULL strings return a string in Oracle, but NULL in other databases, and, like Oracle, a string with SQL/MX VARCHAR2.
SQL>SELECT ID, VC||'***' VC , VC2||'***' VC2 FROM VARCHARS2; ID VC VC2 ----------- ------------- ------------- 1 AAAAAAAAAA*** AAAAAAAAAA*** 2 BBBBBBBBBB*** BBBBBBBBBB*** 3 aaaaaaaaaa*** aaaaaaaaaa*** 4 bbbbbbbbbb*** bbbbbbbbbb*** 5 *** *** 6 *** *** 7 NULL *** 8 NULL *** 9 *** *** 10 NULL *** --- 10 row(s) selected.
The COUNT function does not count NULL values, and this may affect COUNT (column_name) results.
The following example shows three different results when the COUNT function is used in the dataset that contains 10 rows. There are 7 rows that have a non-NULL value according to the ANSI VARCHAR rule, and there are only 6 rows that have non-NULL values according to the Oracle VARCHAR2 rule.
SQL>SELECT COUNT(VC) COUNT_VC,COUNT(VC2) COUNT_VC2,COUNT(*) COUNT_ALL +>FROM VARCHARS2; COUNT_VC COUNT_VC2 COUNT_ALL -------------------- -------------------- -------------------- 7 6 10 --- 1 row(s) selected.
When ordering the output on the VARCHAR2 column, Oracle output will group empty strings and NULL values together, by default as highest values. In ANSI VARCHARs empty strings have the lowest value while NULL values are ordered highest¹.
SQL>SELECT * FROM VARCHARS2 ORDER BY VC2; ID VC VC2 DESCR ----------- ---------- ---------- --------------- 6 One space 5 Ten spaces 1 AAAAAAAAAA AAAAAAAAAA Upper AA 2 BBBBBBBBBB BBBBBBBBBB Upper BB 3 aaaaaaaaaa aaaaaaaaaa Lower aa 4 bbbbbbbbbb bbbbbbbbbb Lower bb 8 NULL NULL NULL 7 NULL NULL NULL 9 NULL Empty str 10 NULL NULL Not provided --- 10 row(s) selected. __________________________________
¹Oracle and other DBMSes support the ANSI option “NULLS {FIRST | LAST}” in the ORDER BY clause.
Conclusion
When migrating Oracle VARCHAR2 data to ANSI VARCHAR, developers usually expect that there will be no difference. In many cases, this will be true, however, as this article has shown, there are differences one needs to be aware of in the case of empty strings and NULL values. To be on the safe side, SQL statements must be changed when moving from VARCHAR2 to VARCHAR unless the data is guaranteed not to include empty strings. Because NonStop SQL/MX supports the use of the same VARCHAR2 datatype and its behavior this simplifies migration and testing.
Be the first to comment