Ultra High Performance SQL and PL/SQL in Batch Processing

Ultra-High Performance SQL and PL/SQL in Batch Processing

Dr. Paul Dorsey, Dulcian, Inc.

Introduction

Processing large amounts of data using SQL and PL/SQL poses a unique set of challenges that require a different way of thinking about programming. Applying traditional programming techniques to large batch routines typically results in inadequate performance. When the same batch routines are turned over to Java programmers for development, the performance results may be even worse. As an example, after many months of development on a system with 14 million objects to be processed, the Java team achieved processing performance of one object per minute, meaning that it would take 26.5 years to execute the month-end routine. This same code refactored in PL/SQL (using exactly the same development algorithm as the Java code) ran significantly faster, but still would have required many days to execute. Even rewriting the routine using traditional PL/SQL techniques such as nested cursor FOR loops with embedded INSERT and UPDATE statements still resulted in unacceptable performance.

Intuitively, performing a bulk operation by copying large portions of a database to another location, manipulating the data, and moving it back, does not seem like the best approach. However, ETL vendors are specialists at performing complex transformations using this obviously sub-optimal algorithm very successfully. Unless the database developers apply a different style of programming, specifically suited to batch development, they may be unable to outperform the available ETL tools.

This paper will use three different large batch routine case studies to discuss best practices in batch programming.

Case Study #1: Multi-Step Complex Transformation from Source to Target

This case study involves a classic data migration problem. The client organization had 14 million objects that needed to go through a complex set of transformations from source to target. As mentioned, traditional coding techniques in both Java and PL/SQL failed to yield satisfactory performance results. The following scenario simulates the Case Study #1 environment.

Starting with a table with only a few columns (3 character and 3 numeric), load into a similar table while performing some transformations on the data. The new table will have a million records and be partitioned by the table ID (one of the numeric columns).

Three transformations of the data will be shown to simulate the actual complex routine. The transformations were all similar to the following:

select a1,

a1*a2,

bc||de,

de||cd,

e-ef,

to_date('20050301','YYYYMMDD'), -- option A; sysdate, -- Option B

a1+a1/15, -- option A and B; tan(a1), -- option C

abs(ef)

from testB1

A. Complexity of Transformation Costs

The functions called in the transformation were purposely varied. Varying the parameters created very significant differences.

Regarding the effect of different functions on the transformation, doing simple things (add, divide, concatenate, etc.) had no effect on performance. Using any number of simple functions in the code had no measurable impact.

Calls to sysdate are known to be cumulatively expensive. Calls to sysdate that are included in a loop can destroy performance. However, calls to sysdate in a SQL statement are only executed once and have no impact on performance.

Calls to floating point operations in PL/SQL are very expensive. Calls to TAN( ) or LN( ) take longer than inserting a record into the database. Complex financial calculations involving a fractional exponent or trigonometric functions can be performance killers. This cost is independent of how records are processed. Floating point operations in PL/SQL are just slow.

B. Methods of Transformation Costs

Various ways of moving the data were attempted. The worst method was to loop through a cursor FOR loop and use INSERT statements. Even the simplest case takes about twice as long as other methods so some type of bulk operation was required. As a rule of thumb, assume that you can process about 10,000 records/second using a cursor FOR loop method.

1. create-table-as-select (CTAS)

One fairly fast mechanism is to use sequential CREATE-TABLE-AS-SELECT (CTAS) statements. For each step in the algorithm, you create a global temporary table. This may sound like a crazy idea, but the performance is very fast. Even on a less powerful computer, this mechanism can process records at the rate of nearly 100,000 per second. In this test, three sequential transformations were done and the results still beat the cursor FOR loop mechanism by 50%. However, it should be noted that adding a call to a floating point operation drastically impacted performance. It took three times as long to calculate a TAN( ) or LN( ) as it did to move the data.

2. bulk load into object collections

The next method tried was loading the data into memory (in either a collection or VARRAY) and manipulating the data there. It was suspected that working on the data in memory had to be faster than writing the data to disk in a temporary table. The problem was exceeding the memory capacity of the server. Massive collects are not well behaved and, rather than just slowing the process down and swapping to disk, they actually will run out of memory and crash (ORA-600).

If the number of records processed is limited to 250,000 at a time, this allows the routine to complete, but does not give very good performance. Another problem with very large data sets is that if you have to only process sets of data, you must make sure that the source data is partitioned for quick access. Otherwise there will be further degradation in performance. Assuming no impact from partitioning the data, this method was still 60% slower than using CTAS.

3. Load data into object collection N records at a time

The next method was to use bulk collect but to limit the number of records retrieved using the following code:

declare

b1 testB1_tt:=testb1_tt();

b2 testB2_tt:=testb2_tt();

cursor c1

is …;

BEGIN

open c1;

loop

fetch c1 bulk collect into b1 limit 1000;

b2.delete;

for i in b1.first..b1.last loop

b2.extend;

b2(b2.last):=testB2_t(b1(i).a1,

b1(i).a1*b1(i).a2,

b1(i).bc||b1(i).de,

b1(i).de||b1(i).cd,

b1(i).e-b1(i).ef,

to_date('20050301','YYYYMMDD'),--sysdate,

b1(i).a1+b1(i).a1/15, --tan(a1),

abs(b1(i).ef)

);

end loop;

insert into testC

select a1-a12 c1, bcde||decd c2, substr(bcde,1,10) c3, add_months(x_dt,-20) c4, y*z c5

from table(cast(b2 as testb2_tt));

exit when c1%notfound;

end loop;

close c1;

END;

The first step fetches 1000 records at once. The simple loop was used for transformation from one object collection to another. The last step was the second transformation from the object collection cast as a table.

This method turned out to perform at the same speed as using CTAS. To improve performance, changing the way the final table (“C”) is loaded was tried. The idea was to use FORALL; but in 9i, Release 2 it cannot work against object collections based on complex object types. This is the reason why, on the first transformation, object collections of simple types (one for each column) were used as shown here:

declare

b1 testB1_tt:=testb1_tt();

cursor c1

is …

a1 number_tt:=number_tt();

a12 number_tt:=number_tt();

bcde varchar2_tt:=varchar2_tt();

decd varchar2_tt:=varchar2_tt();

eef number_tt:=number_tt();

x_dt date_tt:=date_tt();

y number_tt:=number_tt();

z number_tt:=number_tt();

BEGIN

open c1;

loop

fetch c1 bulk collect into b1 limit 1000;

for i in b1.first..b1.last loop

a1.extend;

a1(i):=b1(i).a1;

…

end loop;

forall i in a1.first..a1.last

insert into testC

values ( a1(i)-a12(i), bcde(i)||decd(i), substr(bcde(i),1,10), add_months(x_dt(i),-20), y(i)*z(i));

exit when c1%notfound;

end loop;

close c1;

END;

~~This method turned out to be about the same speed as CTAS. Then we tried to improve on that performance by changing the way we load the final table (“C”).~~

~~We tried using a bulk insert after casting the collection to a table as in:~~

~~insert into testC~~

~~select a1-a12 c1, bcde||decd c2, substr(bcde,1,10) c3, add_months(x_dt,-20) c4, y*z c5~~

~~from table(cast(b2 as testb2_tt));~~

This approach provided the best performance yet. Another 8 seconds were saved while processing 1 million records and reduced the overall processing speed to 42 seconds.

4. Load data into object collection 1 record at a time

The final method tried used a cursor FOR loop to load a COLLECT and then operate on the collection. As before, memory capacity was exceeded unless the number of records processed was limited. Even then, this method did not perform significantly faster than using a simple cursor FOR loop.

The results of all of the tests mentioned are summarized in Table 1.

Method	Extra	File	Data		A Simple		B +sysdate		C +sysdate +tan( )		D +sysdate +tan( )+ln( )
CTAS	2 buffer temp tables	1_ctas.sql	1M		51		51		137		202
Full bulk load into object collections	Cast result into table	2_FullBulkLoad_cast.sql		1M		Out of memory
				250K		19		19		42		60
	Process second step as FOR-loop	2_FullBulkLoad_loop.sql		1M		Out of memory
				250K		14		14		37		55
Load data into object collection N records at a time – first step is BULK COLLECT LIMIT N	1000 rows per inserts	3_Bulk_1000_small_ insert.sql		1M		54		54		126		219
	1. 1000 rows per bulk 2. Splits into the set of collections 3. FORALL	3_Bulk_1000_FORALL.sql		1M		42		42		135		206
Load data into object collection 1 record at a time – first step is regular loop	Next transformation via loop (full spin)	4_ForLoop_ForLoop.sql		1M		Out of memory
				250K		20		20		43		61
	Next transformation cast	4_ForLoop_cast.sql		1M		Out of memory
				250K		24		24		47		65

Table 1: Summary of Case Study 1 Tests

Case Study #2: Periodic Modification of a Few Columns in a Database Table with a Large Number of Columns

Case Study #2 involved a table with over 100 columns containing 60 million records. Each month, a small number of columns within these records needed to be updated. The Informatica routine used by the client generated update statements that updated all 100 columns (even though only a few were changed). The goal was to find out the impact of this sub-optimal code.

For this test, the source table had 126 columns, five of which represented changed data. One and two million records were processed. The target table being updated either had 5 columns or 126 columns.

As expected, the standard PL/SQL method of spinning through the cursor produced terrible performance. Bulk updates are much faster using syntax like the following:

Update target t

Set (a,b,c,d,e) =(select a,b,c,d,e from source where oid = t.oid)

If only five columns were updated, updating a five-column table or a 126-column table did not produce very different performance (140 versus 210 seconds). However, updating all of the columns (unnecessarily) on the 126-column table more than doubled the processing time (485 seconds).

The results of this test produced two very clear lessons:

1) Use bulk update commands.

2) Only update the necessary columns.

The results are summarized in Table 2.

Method	File	Data	Target has 5 cols	Target has 126 cols
Update only 5 columns: Update target t Set (a,b,c,d,e) =(select a,b,c,d,e from source where oid = t.oid)	1_update_5.sql	1M	140	210
	1_update_5.sql	2M	310	445
Update all columns: Update target t Set (a,b,c,d,e) =(select a,b,c,d,e from source where oid = t.oid)	2_update_all.sql	1M	N/A	485
Cursor spin: Declare Cursor c1 Is Select * From source Begin For c in c1 loop Update target … End loop; end;	3_cursor.sql	1M	200	235
	3_cursor.sql	2M	420	630

Table 2: Summary of Case Study 2 Tests

Case Study #3: Loading New Objects Into a Database

For this case study, several million new objects needed to be read into the system on a periodic basis. The objects enter the system in a large 120-column table which, when processed, is divided up to create several parent-child object relationships in the system.

In this test, while reading from the source table, all three child tables were loaded at the same time. It is interesting to note that this functionality is not possible with most ETL tools. Most ETL tools write to one table at a time. Therefore, you would need to write to the parent table and then reread the parent table for each child table in order to know where to attach the child records. At the 2005 Informatica conference, there was discussion of generating database code that might support this functionality, but it would require an architectural modification of the way that the maps are specified.

A description of the source and target tables for the case study follows:

Source table:

120 columns

-§ 40 number

-§ 40 varchar2(1)

-§ 40 varchar2 (2000) with populated default values

-§ OID column – primary key

Target tables:

Table A

ID
40 varchar2(2000) columns

Table B

ID
40 Number columns
child of table A

Table C

2 number columns, 2 varchar2 columns, 1 date column
child of table A

The traditional method of spinning through a cursor would undoubtedly result in poor performance. In addition, it generated an ORA-600 error. Since the results of the cursor FOR loop were not of interest, 250,000 records were used as a benchmark. As predicted, the results were worse than any other method tried.

Bulk collecting a limited number of records turned out to be the best approach. What was surprising was that the best performance was achieved with a rather large limit (5000). Conventional wisdom usually indicates that smaller limits are optimal.

However, simply using bulk operations does not guarantee success. The first method tried was to bulk collect the source data into an object collection, N rows at a time. Then the primary key of table A was generated. Three inserts of N rows were performed by casting the collection. This method provided no better performance than the simple cursor FOR loop.

However, by altering the script and using bulk ForAll…Inserts, the performance was much better, taking roughly half the time of the cursor FOR loop.

One idea was to use a “key table” to make lookups with the cursor FOR loop faster. There was no performance benefit to that approach.

The Case Study #3 results are summarized in Table 3.

Method	File	Extra	Data	Timing
Loop source table à 3 consecutive inserts (commit each 10,000 records)	1_loop_with_inserts.sql		1M	ORA-600
	1_loop_with_inserts.sql		250K	127 sec
Bulk collect source data into object collection N rows at a time and generate A_OID (primary key of table A) à 3 inserts of N rows (cast the collection)	2_BulkCollect_3Inserts.sql	50 rows	1M	578 sec
		50 rows	250K	141 sec
		100 rows	1M	558
		100 rows	250K	137 sec
		1000 rows	1M	522
		1000 rows	250K	130 sec
		5000 rows	1M	503 sec
		5000 rows	250K	124 sec
		10000 rows	1M	512 sec
		10000 rows	250K	126 sec
Bulk collect source data into set of object collections (one per each column) N rows at a time + generate A_OID (primary key of table A) à 3 inserts of N rows (FORALL … INSERT)	3_BulkCollect_FORALL.sql	50 rows	1M	344 sec
		50 rows	250K	84 sec
		100 rows	1M	317 sec
		100 rows	250K	79 sec
		1000 rows	1M	271 sec
		1000 rows	250K	66 sec
		5000 rows	1M	263 sec
		5000 rows	250K	65 sec
		10000 rows	1M	265 sec
		10000 rows	250K	68 sec
Full insert with recording pairs (Source_ID; A_OID) into PL/SQL table. Next steps are querying that table to identify parent ID	4_Buffer_PLSQL.sql		1M	605 sec
	4_Buffer_PLSQL.sql		250K	120 sec

Table 3: Summary of Case Study 3Tests

Conclusions

In general, by using “smart” PL/SQL, you can count on almost doubling your performance speed. Correct usage of bulk collect with a high limit (about 5000) and ForAll…Insert are the keys to fast manipulation. Do not update columns unnecessarily.

NOTE: You can find all of the scripts used to create the tests on the Dulcian website (www.dulcian.com).