Ultra-High Performance SQL and PL/SQL in Batch Processing

 

Introduction

Processing large amounts of data using SQL and PL/SQL poses a unique set of challenges that require a different way of thinking about programming. Applying traditional programming techniques to large batch routines typically results in inadequate performance. When the same batch routines are turned over to Java programmers for development, the performance results may be even worse. As an example, after many months of development on a system with 14 million objects to be processed, the Java team achieved processing performance of one object per minute, meaning that it would take 26.5 years to execute the month-end routine. This same code refactored in PL/SQL (using exactly the same development algorithm as the Java code) ran significantly faster, but still would have required many days to execute. Even rewriting the routine using traditional PL/SQL techniques such as nested cursor FOR loops with embedded INSERT and UPDATE statements still resulted in unacceptable performance.

Intuitively, performing a bulk operation by copying large portions of a database to another location, manipulating the data, and moving it back, does not seem like the best approach. However, ETL vendors are specialists at performing complex transformations using this obviously sub-optimal algorithm very successfully. Unless the database developers apply a different style of programming, specifically suited to batch development, they may be unable to outperform the available ETL tools.

This paper will use three different large batch routine case studies to discuss best practices in batch programming.

 

Case Study #1: Multi-Step Complex Transformation from Source to Target

This case study involves a classic data migration problem. The client organization had 14 million objects that needed to go through a complex set of transformations from source to target. As mentioned, traditional coding techniques in both Java and PL/SQL failed to yield satisfactory performance results. The following scenario simulates the Case Study #1 environment.

Starting with a table containing only a few columns (3 character and 3 numeric), load into a similar table while performing some transformations on the data.  The new table will have a million records and be partitioned by the table ID (one of the numeric columns).

Three transformations of the data will be shown to simulate the actual complex routine. The transformations were all similar to the following:

 

select a1,

       a1*a2,

       bc||de,

       de||cd,       

       e-ef,

       to_date('20050301','YYYYMMDD'), -- option A;  sysdate, -- Option B

       a1+a1/15,  -- option A and B;  tan(a1),  -- option C

       abs(ef)

from testB1

 

A. Complexity of Transformation Costs

The functions called in the transformation were purposely varied.  Varying the parameters created very significant differences.

Regarding the effect of different functions on the transformation, doing simple things (add, divide, concatenate, etc.) had no effect on performance.  Using any number of simple functions in the code had no measurable impact.

Calls to sysdate are known to be cumulatively expensive. Calls to sysdate that are included in a loop can destroy performance.  However, calls to sysdate in a SQL statement are only executed once and have no impact on performance.

Calls to floating point operations in PL/SQL are very expensive.  Calls to TAN( ) or LN( ) take longer than inserting a record into the database.  Complex financial calculations involving a fractional exponent or trigonometric functions can be performance killers. This cost is independent of how records are processed.  Floating point operations in PL/SQL are just slow.

 

B. Methods of Transformation Costs

Various ways of moving the data were attempted.  The worst method was to loop through a cursor FOR loop and use INSERT statements.  Even the simplest case takes about twice as long as other methods so some type of bulk operation was required. As a rule of thumb, assume that you can process about 10,000 records/second using a cursor FOR loop method.

 

1. Create-table-as-select (CTAS)

One fairly fast mechanism is to use sequential CREATE-TABLE-AS-SELECT (CTAS) statements.  For each step in the algorithm, you create a global temporary table.  This may sound like a crazy idea, but the performance is very fast.   Even on a less powerful computer, this mechanism can process records at the rate of nearly 100,000 per second.  In this test, three sequential transformations were done and the results still beat the cursor FOR loop mechanism by 50%.  However, it should be noted that adding a call to a floating point operation drastically impacted performance. It took three times as long to calculate a TAN( ) or LN( ) as it did to move the data. 

 

2. Bulk load into object collections

The next method tried was loading the data into memory (in either a collection or VARRAY) and manipulating the data there.  It was suspected that working on the data in memory had to be faster than writing the data to disk in a temporary table.  The problem was exceeding the memory capacity of the server.  Massive collects are not well behaved and, rather than just slowing the process down and swapping to disk, they actually will run out of memory and crash (ORA-600). 

If the number of records processed is limited to 250,000 at a time, this allows the routine to complete, but does not give very good performance.  Another problem with very large data sets is that if you have to only process sets of data, you must make sure that the source data is partitioned for quick access.  Otherwise there will be further degradation in performance. Assuming no impact from partitioning the data, this method was still 60% slower than using CTAS.

 

3. Load data into object collection N records at a time

The next method was to use bulk collect but to limit the number of records retrieved using the following code:

 

declare

    b1 testB1_tt:=testb1_tt();

    b2 testB2_tt:=testb2_tt();

    cursor c1

    is …;   

BEGIN

    open c1;

    loop

        fetch c1 bulk collect into b1 limit 1000;       

        b2.delete;

        for i in b1.first..b1.last loop

            b2.extend;

            b2(b2.last):=testB2_t(b1(i).a1,

                       b1(i).a1*b1(i).a2,

                       b1(i).bc||b1(i).de,

                       b1(i).de||b1(i).cd,

                       b1(i).e-b1(i).ef,

                       to_date('20050301','YYYYMMDD'),--sysdate,

                       b1(i).a1+b1(i).a1/15,  --tan(a1),

                       abs(b1(i).ef)

                       );

        end loop;       

       

        insert into testC

        select a1-a12 c1, bcde||decd c2, substr(bcde,1,10) c3, add_months(x_dt,-20) c4, y*z c5

        from table(cast(b2 as testb2_tt));       

       

        exit when c1%notfound;

    end loop;

    close c1;

 

END;

/

 

The first step fetches 1000 records at once. The simple loop was used for transformation from one object collection to another. The last step was the second transformation from the object collection cast as a table.

 

This method turned out to perform at the same speed as using CTAS.  To improve performance, changing the way the final table (“C”) is loaded was tried. The idea was to use FORALL; but in 9i, Release 2 it cannot work against object collections based on complex object types. This is the reason why, on the first transformation, object collections of simple types (one for each column) were used as shown here:

 

declare

    b1 testB1_tt:=testb1_tt();

 

    cursor c1

    is

 

    a1 number_tt:=number_tt();

    a12 number_tt:=number_tt();

    bcde varchar2_tt:=varchar2_tt(); 

    decd varchar2_tt:=varchar2_tt(); 

    eef number_tt:=number_tt();

    x_dt date_tt:=date_tt();

    y number_tt:=number_tt();

    z number_tt:=number_tt();

BEGIN

    open c1;

    loop

        fetch c1 bulk collect into b1 limit 1000;    

 

 

        for i in b1.first..b1.last loop

            a1.extend;

            a1(i):=b1(i).a1;

           

           

           

        end loop;       

       

        forall i in a1.first..a1.last

        insert into testC

        values ( a1(i)-a12(i), bcde(i)||decd(i), substr(bcde(i),1,10), add_months(x_dt(i),-20), y(i)*z(i));

       

        exit when c1%notfound;

    end loop;

    close c1;

END;

/

 

This method turned out to be about the same speed as CTAS.  Then we tried to improve on that performance by changing the way we load the final table (“C”).

 

We tried using a bulk insert after casting the collection to a table as in:

        insert into testC

        select a1-a12 c1, bcde||decd c2, substr(bcde,1,10) c3, add_months(x_dt,-20) c4, y*z c5

        from table(cast(b2 as testb2_tt));   

 

This approach provided the best performance yet.  Another 8 seconds were saved while processing 1 million records and reduced the overall processing speed to 42 seconds.

 

4. Load data into object collection 1 record at a time

The final method tried used a cursor FOR loop to load a COLLECT and then operate on the collection.  As before, memory capacity was exceeded unless the number of records processed was limited.  Even then, this method did not perform significantly faster than using a simple cursor FOR loop.

 

The results of all of the tests mentioned are summarized in Table 1.

 

Method

Extra

File

Data

A

Simple

B

+sysdate

C

+sysdate

+tan( )

D

+sysdate
+tan( )+ln( )

 

CTAS

2 buffer temp tables

1_ctas.sql

1M

51

51

137

202

 

Full bulk load into object collections

 

Cast result into table

2_FullBulkLoad_cast.sql

1M

Out of memory

250K

19

19

42

60

Process second step as FOR-loop

2_FullBulkLoad_loop.sql

1M

Out of memory

250K

14

14

37

55

Load data into object collection N records at a time – first step is BULK COLLECT LIMIT N

1000 rows per inserts

3_Bulk_1000_small_ insert.sql

1M

54

54

126

219

1000 rows per bulk

Splits into the set of collections

 FORALL

3_Bulk_1000_FORALL.sql

1M

42

42

135

206

Load data into object collection 1 record at a time – first step is regular loop

Next transformation via loop (full spin)

4_ForLoop_ForLoop.sql

1M

Out of memory

250K

20

20

43

61

Next transformation cast

4_ForLoop_cast.sql

1M

Out of memory

250K

24

24

47

65

Table 1: Summary of Case Study 1 Tests

 

Case Study #2: Periodic Modification of a Few Columns in a Database Table with a Large Number of Columns

Case Study #2 involved a table with over 100 columns containing 60 million records. Each month, a small number of columns within these records needed to be updated.  The Informatica routine used by the client generated update statements that updated all 100 columns (even though only a few were changed).  The goal was to find out the impact of this sub-optimal code.

For this test, the source table had 126 columns, five of which represented changed data. One and two million records were processed. The target table being updated either had 5 columns or 126 columns.

As expected, the standard PL/SQL method of spinning through the cursor produced terrible performance.  Bulk updates are much faster using syntax like the following:

 

Update target t

Set (a,b,c,d,e) =(select a,b,c,d,e from source where oid = t.oid)

 

If only five columns were updated, updating a five-column table or a 126-column table did not produce very different performance (140 versus 210 seconds).  However, updating all of the columns (unnecessarily) on the 126-column table more than doubled the processing time (485 seconds).

The results of this test produced two very clear lessons:

1)1.        Use bulk update commands.

2.        Only update the necessary columns.

The results are summarized in Table 2.

Method

File

Data

Target has 5 cols

Target has 126 cols

Update only 5 columns:

Update target t

Set (a,b,c,d,e) =(select a,b,c,d,e from source where oid = t.oid)

 

1_update_5.sql

1M

140

210

2M

310

445

Update all columns:

Update target t

Set (a,b,c,d,e) =(select a,b,c,d,e from source where oid = t.oid)

 

2_update_all.sql

1M

N/A

485

Cursor spin:

Declare

            Cursor c1

            Is

            Select *

            From source

Begin

            For c in c1 loop

                        Update target …

            End loop;

end;

 

3_cursor.sql

1M

200

235

2M

420

630

Table 2: Summary of Case Study 2 Tests

  

Case Study #3: Loading New Objects Into a Database

For this case study, several million new objects needed to be read into the system on a periodic basis. The objects enter the system in a large 120-column table which, when processed, is divided up to create several parent-child object relationships in the system.

In this test, while reading from the source table, all three child tables were loaded at the same time.  It is interesting to note that this functionality is not possible with most ETL tools. Most ETL tools write to one table at a time.  Therefore, you would need to write to the parent table and then reread the parent table for each child table in order to know where to attach the child records.  At the 2005 Informatica conference, there was discussion of generating database code that might support this functionality, but it would require an architectural modification of the way that the maps are specified.

A description of the source and target tables for the case study follows:

Source table:

·         120 columns

-o        40 number

-o        40 varchar2(1)

-o        40 varchar2 (2000) with populated default values

-o        OID column – primary key

 

Target tables:

·         Table A

o        ID

o        40 varchar2(2000) columns

·         Table B

o        ID

o        40 Number columns

o        child of table A

·         Table C

o        2 number columns, 2 varchar2 columns, 1 date column

o        child of table A

 

The traditional method of spinning through a cursor would undoubtedly result in poor performance.  In addition, it generated an ORA-600 error.  Since the results of the cursor FOR loop were not of interest, 250,000 records were used as a benchmark.  As predicted, the results were worse than any other method tried.   

Bulk collecting a limited number of records turned out to be the best approach.  What was surprising was that the best performance was achieved with a rather large limit (5000).  Conventional wisdom usually indicates that smaller limits are optimal. 

 

However, simply using bulk operations does not guarantee success.  The first method tried was to bulk collect the source data into an object collection, N rows at a time.  Then the primary key of table A was generated. Three inserts of N rows were performed by casting the collection. This method provided no better performance than the simple cursor FOR loop.

 

However, by altering the script and using bulk ForAll…Inserts, the performance was much better, taking roughly half the time of the cursor FOR loop. 

 

One idea was to use a “key table” to make lookups with the cursor FOR loop faster.  There was no performance benefit to that approach.

 

The Case Study #3 results are summarized in Table 3.

 

Method

File

Extra

Data

Timing

Loop source table à 3 consecutive inserts (commit each 10,000 records)

1_loop_with_inserts.sql

 

1M

ORA-600

250K

127 sec

Bulk collect source data into object collection N rows at a time and generate A_OID (primary key of table A) à 3 inserts of N rows (cast the collection)

2_BulkCollect_3Inserts.sql

50 rows

1M

578 sec

250K

141 sec

100 rows

1M

558

250K

137 sec

1000 rows

1M

522

250K

130 sec

5000 rows

1M

503 sec

250K

124 sec

10000 rows

1M

512 sec

250K

126 sec

Bulk collect source data into set of object collections (one per each column) N rows at a time + generate A_OID (primary key of table A) à 3 inserts of N rows (FORALL … INSERT)

 

3_BulkCollect_FORALL.sql

50 rows

1M

344 sec

250K

84 sec

100 rows

1M

317 sec

250K

79 sec

1000 rows

1M

271 sec

250K

66 sec

5000 rows

1M

263 sec

250K

65 sec

10000 rows

1M

265 sec

250K

68 sec

Full insert with recording pairs (Source_ID; A_OID) into PL/SQL table. Next steps are querying that table to identify parent ID

 

4_Buffer_PLSQL.sql

 

1M

605 sec

250K

120 sec

Table 3: Summary of Case Study 3Tests

 

 Conclusions

In general, by using “smart” PL/SQL, you can count on almost doubling your performance speed.  Correct usage of bulk collect with a high limit (about 5000) and ForAll…Insert are the keys to fast manipulation.  Do not update columns unnecessarily.

 

NOTE: You can find all of the scripts used to create the tests on the Dulcian website (www.dulcian.com).

 

About the Author

Dr. Paul Dorsey is the founder and president of Dulcian, Inc. an Oracle consulting firm specializing in business rules and web based application development. He is the chief architect of Dulcian's Business Rules Information Manager (BRIM®) tool. Paul is the co-author of seven Oracle Press books on Designer, Database Design, Developer, and JDeveloper, which have been translated into nine languages and the Wiley Press PL/SQL for Dummies. Paul is an Oracle Fusion Regional Director. He is the President of the New York Oracle Users’ Group and a Contributing Editor of the International Oracle User Group’s SELECT Journal.  In 2003, Dr. Dorsey was honored by ODTUG as volunteer of the year, in 2001 by IOUG as volunteer of the year and by Oracle as one of the six initial honorary Oracle 9i Certified Masters.  Paul is also the founder and Chairperson of the ODTUG Business Rules Symposium, (now called Best Practices Symposium), currently in its sixth year, and the J2EE SIG.