Ultra-High Performance SQL
Dr. Paul Dorsey, Dulcian, Inc.
Introduction
Processing large amounts of data using SQL and PL/SQL poses a unique set of challenges that require a different way of thinking about programming. Applying traditional programming techniques to large batch routines typically results in inadequate performance. When the same batch routines are turned over to Java programmers for development, the performance results may be even worse. As an example, after many months of development on a system with 14 million objects to be processed, the Java team achieved processing performance of one object per minute, meaning that it would take 26.5 years to execute the month-end routine. This same code refactored in PL/SQL (using exactly the same development algorithm as the Java code) ran significantly faster, but still would have required many days to execute. Even rewriting the routine using traditional PL/SQL techniques such as nested cursor FOR loops with embedded INSERT and UPDATE statements still resulted in unacceptable performance.
Intuitively, performing a bulk operation by copying large portions of a database to another location, manipulating the data, and moving it back, does not seem like the best approach. However, ETL vendors are specialists at performing complex transformations using this obviously sub-optimal algorithm very successfully. Unless the database developers apply a different style of programming, specifically suited to batch development, they may be unable to outperform the available ETL tools.
This paper will use three different large batch routine case studies to discuss best practices in batch programming.
Case Study #1: Multi-Step Complex Transformation from Source to
Target
This case study involves a classic data migration problem. The client organization had 14 million objects that needed to go through a complex set of transformations from source to target. As mentioned, traditional coding techniques in both Java and PL/SQL failed to yield satisfactory performance results. The following scenario simulates the Case Study #1 environment.
Starting with a table with only a few columns (3 character and 3 numeric), load into a similar table while performing some transformations on the data. The new table will have a million records and be partitioned by the table ID (one of the numeric columns).
Three transformations of the data will be shown to simulate
the actual complex routine. The transformations were all similar to the
following:
select a1,
a1*a2,
bc||de,
de||cd,
e-ef,
to_date('20050301','YYYYMMDD'),
-- option A; sysdate, -- Option B
a1+a1/15, -- option A and B; tan(a1), --
option C
abs(ef)
from testB1
A. Complexity of Transformation Costs
The functions called in the transformation were purposely varied. Varying the parameters created very significant differences.
Regarding the effect of different functions on the transformation, doing simple things (add, divide, concatenate, etc.) had no effect on performance. Using any number of simple functions in the code had no measurable impact.
Calls to sysdate are known to be cumulatively expensive. Calls to sysdate that are included in a loop can destroy performance. However, calls to sysdate in a SQL statement are only executed once and have no impact on performance.
Calls to floating point operations in PL/SQL are very expensive. Calls to TAN( ) or LN( ) take longer than inserting a record into the database. Complex financial calculations involving a fractional exponent or trigonometric functions can be performance killers. This cost is independent of how records are processed. Floating point operations in PL/SQL are just slow.
B. Methods of Transformation Costs
Various ways of moving the data were attempted. The worst method was to loop through a cursor FOR loop and use INSERT statements. Even the simplest case takes about twice as long as other methods so some type of bulk operation was required. As a rule of thumb, assume that you can process about 10,000 records/second using a cursor FOR loop method.
1. create-table-as-select (
One fairly fast mechanism is to use sequential
CREATE-TABLE-AS-SELECT (
2. bulk load into object collections
If the number of records
processed is limited to 250,000 at a time, this allows the routine to complete,
but does not give very good performance.
Another problem with very large data sets is that if you have to only
process sets of data, you must make sure that the source data is partitioned for
quick access. Otherwise there will be
further degradation in performance. Assuming no impact from partitioning the
data, this method was still 60% slower than using
3. Load
The next method was to use bulk collect but to limit the
number of records retrieved using the following code:
declare
b1 testB1_tt:=testb1_tt();
b2 testB2_tt:=testb2_tt();
cursor c1
is …;
BEGIN
open c1;
loop
fetch c1 bulk collect into b1 limit
1000;
b2.delete;
for i in b1.first..b1.last loop
b2.extend;
b2(b2.last):=testB2_t(b1(i).a1,
b1(i).a1*b1(i).a2,
b1(i).bc||b1(i).de,
b1(i).de||b1(i).cd,
b1(i).e-b1(i).ef,
to_date('20050301','YYYYMMDD'),--sysdate,
b1(i).a1+b1(i).a1/15, --tan(a1),
abs(b1(i).ef)
);
end loop;
insert into testC
select a1-a12 c1, bcde||decd c2,
substr(bcde,1,10) c3, add_months(x_dt,-20) c4, y*z c5
from table(cast(b2 as testb2_tt));
exit when c1%notfound;
end loop;
close c1;
END;
/
The first step fetches 1000
records at once. The simple loop was
used for transformation from one object collection to
another. The last step was the second transformation from the
object collection cast as a table.
This method turned out
to perform at the same speed as using
declare
b1 testB1_tt:=testb1_tt();
cursor c1
is …
a1 number_tt:=number_tt();
a12 number_tt:=number_tt();
bcde varchar2_tt:=varchar2_tt();
decd varchar2_tt:=varchar2_tt();
eef number_tt:=number_tt();
x_dt date_tt:=date_tt();
y number_tt:=number_tt();
z number_tt:=number_tt();
BEGIN
open c1;
loop
fetch c1
bulk collect into b1 limit 1000;
for i in
b1.first..b1.last loop
a1.extend;
a1(i):=b1(i).a1;
…
end
loop;
forall i in
a1.first..a1.last
insert into
testC
values (
a1(i)-a12(i), bcde(i)||decd(i), substr(bcde(i),1,10), add_months(x_dt(i),-20),
y(i)*z(i));
exit when
c1%notfound;
end loop;
close c1;
END;
/
This
method turned out to be about the same speed as CTAS. Then we tried to improve on that performance
by changing the way we load the final table (“C”).
We
tried using a bulk insert after casting the collection to a table as in:
insert into testC
select a1-a12 c1, bcde||decd c2,
substr(bcde,1,10) c3, add_months(x_dt,-20) c4, y*z c5
from table(cast(b2 as testb2_tt));
This approach provided the best performance yet. Another 8 seconds were saved while processing 1 million records and reduced the overall processing speed to 42 seconds.
4. Load
The final method tried used a cursor FOR loop to load a COLLECT and then operate on the collection. As before, memory capacity was exceeded unless the number of records processed was limited. Even then, this method did not perform significantly faster than using a simple cursor FOR loop.
The results of all of the tests mentioned are summarized in
Table 1.
Method |
Extra |
File |
Data |
A Simple |
B +sysdate |
C +sysdate +tan(
) |
D +sysdate |
|
|||||
CTAS |
2 buffer
temp tables |
1_ctas.sql |
1M |
51 |
51 |
137 |
202 |
|
|||||
Full bulk
load into object collections |
Cast result
into table |
2_FullBulkLoad_cast.sql |
1M |
Out of
memory |
|||||||||
250K |
19 |
19 |
42 |
60 |
|||||||||
Process
second step as FOR-loop |
2_FullBulkLoad_loop.sql |
1M |
Out of
memory |
||||||||||
250K |
14 |
14 |
37 |
55 |
|||||||||
Load data
into object collection N records at a time – first step is BULK COLLECT LIMIT
N |
1000 rows
per inserts |
3_Bulk_1000_small_
insert.sql |
1M |
54 |
54 |
126 |
219 |
||||||
1.
1000
rows per bulk 2.
Splits
into the set of collections 3.
FORALL |
3_Bulk_1000_FORALL.sql |
1M |
42 |
42 |
135 |
206 |
|||||||
Load data into
object collection 1 record at a time – first step is regular loop |
Next
transformation via loop (full spin) |
4_ForLoop_ForLoop.sql |
1M |
Out of
memory |
|||||||||
250K |
20 |
20 |
43 |
61 |
|||||||||
Next
transformation cast |
4_ForLoop_cast.sql |
1M |
Out of
memory |
||||||||||
250K |
24 |
24 |
47 |
65 |
|||||||||
Table 1: Summary of Case Study 1 Tests
Case Study #2: Periodic Modification of a Few Columns in a Database
Table with a Large Number of Columns
Case Study #2 involved a table with over 100 columns containing 60 million records. Each month, a small number of columns within these records needed to be updated. The Informatica routine used by the client generated update statements that updated all 100 columns (even though only a few were changed). The goal was to find out the impact of this sub-optimal code.
For this test, the source table had 126 columns, five of which represented changed data. One and two million records were processed. The target table being updated either had 5 columns or 126 columns.
As expected, the standard PL/SQL method of spinning through the cursor produced terrible performance. Bulk updates are much faster using syntax like the following:
Update target t
Set (a,b,c,d,e) =(select a,b,c,d,e from source where oid = t.oid)
If only five columns were updated, updating a five-column table or a 126-column table did not produce very different performance (140 versus 210 seconds). However, updating all of the columns (unnecessarily) on the 126-column table more than doubled the processing time (485 seconds).
The results of this test produced two very clear lessons:
1) Use bulk update commands.
2) Only update the necessary columns.
The results are summarized in Table 2.
Method |
File |
Data |
Target has 5 cols |
Target has 126 cols |
Update only 5 columns: Update target t Set (a,b,c,d,e) =(select a,b,c,d,e from source where
oid = t.oid) |
1_update_5.sql |
1M |
140 |
210 |
2M |
310 |
445 |
||
Update all columns: Update target t Set (a,b,c,d,e) =(select a,b,c,d,e from source where
oid = t.oid) |
2_update_all.sql |
1M |
N/A |
485 |
Cursor spin: Declare
Cursor c1
Is
Select *
From source Begin
For c in c1 loop
Update target …
End loop; end; |
3_cursor.sql |
1M |
200 |
235 |
2M |
420 |
630 |
Table 2: Summary of Case Study 2 Tests
Case Study #3: Loading
For this case study, several million new objects needed to be read into the system on a periodic basis. The objects enter the system in a large 120-column table which, when processed, is divided up to create several parent-child object relationships in the system.
In this test, while reading from the source table, all three
child tables were loaded at the same time.
It is interesting to note that this functionality is not possible with
most ETL tools. Most ETL tools write to one table at a time. Therefore, you would need to write to the
parent table and then reread the parent table for each child table in order to
know where to attach the child records.
At the 2005 Informatica conference, there was discussion of generating
database code that might support this functionality, but it would require an
architectural modification of the way that the maps are specified.
A description of the source
and target tables for the case study follows:
Source table:
-§
40
number
-§
40
varchar2(1)
-§
40
varchar2 (2000) with populated default values
-§
OID
column – primary key
Target tables:
The traditional method of spinning through a cursor would undoubtedly result in poor performance. In addition, it generated an ORA-600 error. Since the results of the cursor FOR loop were not of interest, 250,000 records were used as a benchmark. As predicted, the results were worse than any other method tried.
Bulk collecting a limited number of records turned out to be the best approach. What was surprising was that the best performance was achieved with a rather large limit (5000). Conventional wisdom usually indicates that smaller limits are optimal.
However, simply using bulk operations does not guarantee success. The first method tried was to bulk collect the source data into an object collection, N rows at a time. Then the primary key of table A was generated. Three inserts of N rows were performed by casting the collection. This method provided no better performance than the simple cursor FOR loop.
However, by altering the script and using bulk ForAll…Inserts, the performance was much better, taking roughly half the time of the cursor FOR loop.
One idea was to use a “key table” to make lookups with the
cursor FOR loop faster. There was no
performance benefit to that approach.
The Case Study #3 results are summarized in Table 3.
Method |
File |
Extra |
Data |
Timing |
|
1_loop_with_inserts.sql |
|
1M |
ORA-600 |
250K |
127 sec |
|||
Bulk collect source data
into object collection N rows at a time and generate A_OID (primary key of
table A) à
3 inserts of N rows (cast the collection) |
2_BulkCollect_3Inserts.sql |
50 rows |
1M |
578 sec |
250K |
141 sec |
|||
100 rows |
1M |
558 |
||
250K |
137 sec |
|||
1000 rows |
1M |
522 |
||
250K |
130 sec |
|||
5000 rows |
1M |
503 sec |
||
250K |
124 sec |
|||
10000 rows |
1M |
512 sec |
||
250K |
126 sec |
|||
Bulk collect source data
into set of object collections (one per each column) N rows at a time +
generate A_OID (primary key of table A) à 3 inserts of N rows (FORALL
… INSERT) |
3_BulkCollect_FORALL.sql |
50 rows |
1M |
344 sec |
250K |
84 sec |
|||
100 rows |
1M |
317 sec |
||
250K |
79 sec |
|||
1000 rows |
1M |
271 sec |
||
250K |
66 sec |
|||
5000 rows |
1M |
263 sec |
||
250K |
65 sec |
|||
10000 rows |
1M |
265 sec |
||
250K |
68 sec |
|||
Full insert with recording
pairs (Source_ID; A_OID) into PL/SQL table. Next steps are querying that
table to identify parent ID |
4_Buffer_PLSQL.sql |
|
1M |
605 sec |
250K |
120 sec |
Table 3: Summary of Case Study 3Tests
Conclusions
In general, by using “smart” PL/SQL, you can count on almost doubling your performance speed. Correct usage of bulk collect with a high limit (about 5000) and ForAll…Insert are the keys to fast manipulation. Do not update columns unnecessarily.
NOTE: You can find all of the scripts used to create the
tests on the Dulcian website (www.dulcian.com).