Native LOBs in SQL/MX Featured by Python on NonStop

Introduction

Native Large-Objects (LOBs) were introduced in NonStop SQL/MX release 3.8 and replace the previous implementation of LOBs for JDBC in which the driver manages the LOBs. That previous implementation is still available, however, migration to the native LOBs is recommended.

Support for the Python programming language

The use of Python on NonStop is growing as customers start appreciating the same qualities that made the language popular in the larger IT community. It is easy to learn, object-oriented, and can be used in a wide range of applications, from scripting to full-scale applications. The combination of scripting and database access makes it a powerful alternative to using regular shell programming in OSS and TACL programming in the Guardian execution environment.

Python programs access SQL/MX data using the pyodbc module, which is included with the Python Language for NonStop. Pyodbc is an open-source Python module that makes accessing ODBC databases simple. It implements the DB API 2.0 specification but is “packed with even more Pythonic convenience”[1].

This article describes how Python programs can configure pyodbc to recognize the LOB data types that are used by SQL/MX. Such configuration is required because there is no specific datatype defined for LOBs in the ODBC specification.

SQL/MX Driver support for Large Objects`

In addition to Python with pyodbc, native LOBs can be used by PL/MX (Procedural Language for SQL/MX), Java programs using the JDBC Type 4 driver, the mxci command interface for NonStop OS, and programs that use the NonStop OSS ODBC driver.

Support for the JDBC type 2 driver and the ODBC drivers for Linux and Windows Operating systems are not yet available.

Summary

With only a few lines of configuration, Binary Large Objects (BLOBs) and Character Large Objects (CLOBS) in NonStop SQL/MX can be manipulated using standard Python via the pyodbc module. The SQL/MX objects only need two simple output converters to convert data to Python datatypes as shown in this article. The best performance for inserting and updating LOBS is achieved by using a specific pyodbc connection attribute called maxwrite. Examples of Python code that inserts and updates a LOB are also included. Note, however, that these examples are provided for educational purposes and should not be considered as “production-ready” code.

The LOB implementation in SQL/MX

There are two types of large objects in SQL/MX: Character large objects (CLOBs) and Binary large objects (BLOBs). Both data types are treated similarly when they are used to store data into data tables. The actual LOB data is placed in a separate table, automatically created by SQL/MX, with as many rows as are needed to store the data in chunks of about 32K bytes, which is also the maximum row size in SQL/MX.  The base table contains just a reference to the primary key, called the record locator, of this separate LOB table. When LOB columns are used in SQL DML, the driver communicates with the executor to read and write the LOB data.

For example, an employee table that holds employee information and contains a photo of the employee may be created as follows

Figure 1: Source DDL of Employees table

The actual table definition shown by the SHOWDDL command reveals the additional LOB table named “LOB_TABLE_<number>”[2], where the large number in the name is used to make it unique. Note that the LOB tables are always hash partitioned on the record locator.

Figure 2: DDL generated by SQL/MX

The LOB table is a hash-partitioned table with a block size of 32K and contains three columns: the record locator and a sequence number which together form the primary key, and a character array of 32500 bytes which contains the actual LOB data. The rows are hashed based on the record locator column, such that all rows of a LOB reside together in a partition.  When LOB columns are manipulated, the JDBC or ODBC drivers will manage the data in the LOB_table.

The record-locator is also stored in the user-record. If the size of a LOB exceeds 32500 bytes, SQL/MX needs to add more rows to the LOB_table, incrementing the sequence number to be able to reconstruct the LOB on retrieval.

SQL/MX specific considerations for pyodbc

Pyodbc uses the database vendor’s ODBC driver and in theory, all features of a database that is offered via the ODBC driver should be available to Python programs. However, in some cases, a user may need to add specific information. In the case of NonStop SQL/MX, to use the large objects, the following must be considered.

Output Converters

Output Converter functions offer a flexible way to work with returned results that pyodbc does not natively support. For example, SQL/MX returns values from a BLOB column as SQL type 2004, which does not have native support in pyodbc (and many other ODBC libraries).[3]

SQL/MX LOBs have the same SQL type as used in JDBC, types 2004 and 2005. The output converters are configured on the connection object once a connection has been established.

For example, the utility routine that is used in the examples below also includes routines called CLOBS2varchar and BLOBS2binary.

Figure 3: example use of output converters

They are configured as follows on a connection object called conn:

conn.add_output_converter(2005, CLOBS2varchar)

conn.add_output_converter(2004, BLOBS2binary)

 

The record types 2005 and 2004 are the defined JDBC types for CLOBs and BLOBS. The above configuration allows vendor specific conversion. For SQL/MX the conversion is simple.

When needed, converters can also be defined and configured in user programs once a connection has been created.

Maxwrite connection attribute

To write large objects to the database, the connection attribute maxwrite must be set to a value high enough to contain a LOB in memory. More details are provided below in Maximum size for LOBs.

Use of autocommit

HPE recommends managing your transactions using explicit COMMIT statements instead of committing transactions after every single SQL statement. The default in many tools is to automatically commit a transaction after every SQL statement. A connection attribute, called autocommit, is used to control this and is usually set to ON by these tools. By default, when a LOB does not exceed 32500 bytes, AUTOCOMMIT ON can be used to insert a LOB together with the user table row.

Note: This default size of 32500 bytes is set by a control variable (CQD) called LOB_PARAM_SIZE. The default value is 32500 bytes and if altered, must by a multiple of this. This CQD determines the message flow between the driver and the SQLMX executor. When multiple messages are required, which happens if LOBs are greater than 32500 bytes, and autocommit is ON, error 8453 is returned and the statement fails.

Inserting rows with LOBs

Thanks to the features included in pyodbc, inserting LOB data into SQL/MX tables is as easy as adding data to regular columns.

The following code snippet shows how a row is inserted into the EMPLOYEES table.

Figure 4: example insert function

In this example, the Python tuple demoEmployee contains values that already exist for another employee, except for the value of the ID for the new row. This value, empID, is provided by the caller of insertEmployee. The last column has the value None, which is the Python equivalent of SQL NULL.

The string stmt contains the SQL statement that will be prepared and executed, and it contains the parameter list used by the cursor, one for each column.

The first statement in the try/catch block is used to turn off the automatic commit attribute for the connection, which means that all SQL until the next commit or rollback statement are treated as one atomic transaction. This is important, because the insert may involve multiple SQL statements for the EMPLOYEES and the LOB table.

The statement is executed by the cursor.execute command which passed the statement and the tuple containing the values. Next, conn.commit() completes the transaction.

  Figure 5: Execution plan of Insert      

Note: Because the example assumes that autocommit is ON by default, the attribute is reset in the finally: branch.

Figure 5 shows how inserts are done in the two tables. One insert into the EMPLOYEES table and zero or more inserts into the LOB table. The driver provides the data in chunks.

Instead of setting the value to NULL, an image can be included with the insert statement, however that image must be inserted as a parameter. This is explained next with the UPDATE statement.

Updating rows, adding LOB data

The following code shows how a photo is added to an existing row in the database using the UPDATE command. The statement is the same as when regular datatypes are modified. Filling the parameter with the actual image from disk is done in this example using a separate function.

 

Figure 6: example update function   

Figure 7: Update execution plan      

The update function is called with the ID of the employee and the name of the file that contains the photo-ID of the employee.

The contents of the file are read by convertToBinaryData which uses the Binary function from pyodbc to create a bytes object.

Just like in the insert example, autocommit is turned OFF, to allow for the multiple statements that are required to handle images of more than 32K bytes.

The execution plan shows the update of the EMPLOYEES table along with the deletion of the previous contents of the LOB and the insertion of the new data.

Maximum size for LOBs

Since LOBs in SQL/MX are stored in the LOB table in chunks, there is no limit to the size of an individual object other than the available disk space or a specific limit that has been used in the definition of the LOB.

The pyodbc module hides the complexity of binding parameters to statements and filling them with values. However, there is one specific connection attribute that is used by pyodbc called maxwrite.

Maxwrite attribute in the connection object

If needed, pyodbc uses multiple ODBC SQLPutData API calls to fill a parameter until all data is passed to the driver. The default value used by pyodbc for the SQL/MX driver is 254 bytes per call which is too small for large objects. However, if the value of maxwrite is larger than the size of the LOB, all data is transferred the driver memory, thus bypassing SQLPutData.

In the current implementation we have found that the SQL/MX driver may return the error ‘[22001] [HPE][ODBC/MX Driver] STRING DATA RIGHT TRUNCATION. (0) (SQLPutData)’.

For SQL/MX, to avoid calling SQLPutData, it is a requirement to set the value of maxwrite large enough to hold the values of LOBs that are used in an application. Alternatively, set the value of maxwrite to the actual size of the LOB before executing the insert.

Reading LOB data

When LOBs are fetched from the cursor, under the covers, two calls are made to the ODBC API SQLGetData. It is the way how the executor tells the driver the size of the LOB. The size of any LOB is recorded in the first chunk in the LOB table. This allows SQL/MX to return the length of a LOB, for example, in a statement like SELECT LENGTH(PHOTO)FROM EMPLOYEES, efficiently using the primary key of the LOB table, not having to access all LOB data.

Built-in functions: EMPTY_CLOB and EMPTY_BLOB

SQL/MX supports two commonly used functions in other databases to create large objects that are not NULL, but do not contain a value. The built-in functions are EMPTY_CLOB() and EMPTY_BLOB().

When these built-in functions are used, they should be part of the statement. For example:

stmt = “UPDATE employees SET photo = EMPTY_BLOB() WHERE employee_id = ?”

These functions will generate a record locator in the user table row and a corresponding row in the LOB table. Empty LOBs are NOT NULL. This is consistent with the three-value model that is used in ANSI SQL, where a CHAR or VARCHAR[4] column can have three values: Unspecified or NULL, empty, i.e., a length of zero bytes and a regular value. An empty LOB is a LOB that has a length of 0 bytes.

 

[1]From: https://github.com/mkleehammer/pyodbc. Documentation is at https://github.com/mkleehammer/pyodbc/wiki

[2] It is possible to define the name of the LOB tables. See the SQL/MX User Guide: Create Table.

[3] See https://github.com/mkleehammer/pyodbc/wiki/Using-an-Output-Converter-function

[4] SQL/MX supports the Oracle VARCHAR2 data type. With VARCHAR2, empty strings are NULL, just as in Oracle.

Author

  • Frans Jongma

    Frans Jongma is a Master Technologist for the NonStop Advanced Technology Center (ATC) and is based The Netherlands. Frans has worked in several consulting positions for the NonStop Enterprise Division since 1989. His main areas of expertise are: NonStop SQL application design, performance analysis and high availability. Prior to joining Tandem, Frans has worked on the design and implementation of database management systems and developer productivity tools for UNIX and proprietary systems. As part of the ATC Frans has been involved in proving the concept of SQL/MX as a Service which resulted in the SQL/MX DBS product and an HTML-based prototype of a web-based user interface to SQL/MX DBS. This HTML version is the basis of WebDBS, the GUI interface for administrators and users of NonStop SQL/MX DBS.

Be the first to comment

Leave a Reply

Your email address will not be published.


*