Ultra-High Performance SQL
Dr. Paul Dorsey, Dulcian, Inc.
Introduction
Processing large amounts of data using SQL and PL/SQL poses a
unique set of challenges that require a different way of thinking about
programming. Applying traditional programming techniques to large batch
routines typically results in inadequate performance. When the same batch
routines are turned over to Java programmers for development, the performance
results may be even worse. As an example, after many months of development on a
system with 14 million objects to be processed, the Java
team achieved processing performance of one object per minute, meaning that it
would take 26.5 years to execute the month-end routine. This same code
refactored in PL/SQL (using exactly the same development algorithm as the Java
code) ran significantly faster, but still would have required many days to
execute. Even rewriting the routine using traditional PL/SQL techniques
such as nested cursor FOR loops with embedded INSERT and UPDATE statements
still resulted in unacceptable performance.
Intuitively, performing a bulk operation by copying large
portions of a database to another location, manipulating the data, and moving
it back, does not seem like the best approach. However, ETL vendors are
specialists at performing complex transformations using this obviously sub-optimal
algorithm very successfully. Unless the database developers apply a different
style of programming, specifically suited to batch development, they may be
unable to outperform the available ETL tools.
This paper will use three different large batch routine case
studies to discuss best practices in batch programming.
Case Study #1: Multi-Step Complex Transformation from Source to
Target
This case study involves a classic data migration problem.
The client organization had 14 million objects that needed to go through a
complex set of transformations from source to target. As mentioned, traditional
coding techniques in both Java and PL/SQL failed to yield satisfactory
performance results. The following scenario simulates the Case Study #1
environment.
Starting with a table containing only a few columns (3 character and 3 numeric), load into a similar table while
performing some transformations on the data.
The new table will have a million records and be partitioned by the
table ID (one of the numeric columns).
Three transformations of the data will be shown to simulate
the actual complex routine. The transformations were all similar to the
following:
select a1,
a1*a2,
bc||de,
de||cd,
e-ef,
to_date('20050301','YYYYMMDD'), -- option A; sysdate, -- Option B
a1+a1/15, -- option A and B; tan(a1),
-- option C
abs(ef)
from testB1
A. Complexity of Transformation Costs
The functions called in the transformation were purposely
varied. Varying the parameters created
very significant differences.
Regarding the effect of different functions on the
transformation, doing simple things (add, divide, concatenate, etc.) had no
effect on performance. Using any number
of simple functions in the code had no measurable impact.
Calls to sysdate are known to be cumulatively expensive. Calls
to sysdate that are included in a loop can destroy performance. However, calls to sysdate in a SQL statement
are only executed once and have no impact on performance.
Calls to floating point operations in PL/SQL are very
expensive. Calls to TAN( ) or LN( ) take
longer than inserting a record into the database. Complex financial calculations involving a fractional
exponent or trigonometric functions can be performance killers. This cost is
independent of how records are processed.
Floating point operations in PL/SQL are just slow.
B. Methods of Transformation Costs
Various ways of moving the data were attempted. The worst method was to loop through a cursor
FOR loop and use INSERT statements. Even
the simplest case takes about twice as long as other methods so some type of
bulk operation was required. As a rule of thumb, assume that you can process about 10,000 records/second using a cursor FOR loop method.
1. Create-table-as-select (
One fairly fast mechanism is to use sequential
CREATE-TABLE-AS-SELECT (
2. Bulk load into object collections
If the number of records
processed is limited to 250,000 at a time, this allows the routine to complete,
but does not give very good performance.
Another problem with very large data sets is that if you have to only
process sets of data, you must make sure that the source data is partitioned for
quick access. Otherwise there will be
further degradation in performance. Assuming no impact from partitioning the
data, this method was still 60% slower than using
3. Load
The next method was to use bulk collect but to limit the
number of records retrieved using the following code:
declare
b1 testB1_tt:=testb1_tt();
b2 testB2_tt:=testb2_tt();
cursor c1
is …;
BEGIN
open c1;
loop
fetch c1 bulk collect into b1 limit
1000;
b2.delete;
for i in b1.first..b1.last loop
b2.extend;
b2(b2.last):=testB2_t(b1(i).a1,
b1(i).a1*b1(i).a2,
b1(i).bc||b1(i).de,
b1(i).de||b1(i).cd,
b1(i).e-b1(i).ef,
to_date('20050301','YYYYMMDD'),--sysdate,
b1(i).a1+b1(i).a1/15, --tan(a1),
abs(b1(i).ef)
);
end loop;
insert into testC
select a1-a12 c1, bcde||decd
c2, substr(bcde,1,10) c3, add_months(x_dt,-20)
c4, y*z c5
from table(cast(b2 as testb2_tt));
exit when c1%notfound;
end loop;
close c1;
END;
/
The first step fetches 1000
records at once. The simple loop was
used for transformation from one object collection to
another. The last step was the second transformation from the
object collection cast as a table.
This method turned out
to perform at the same speed as using
declare
b1 testB1_tt:=testb1_tt();
cursor c1
is …
a1 number_tt:=number_tt();
a12 number_tt:=number_tt();
bcde
varchar2_tt:=varchar2_tt();
decd varchar2_tt:=varchar2_tt();
eef number_tt:=number_tt();
x_dt date_tt:=date_tt();
y number_tt:=number_tt();
z number_tt:=number_tt();
BEGIN
open c1;
loop
fetch c1
bulk collect into b1 limit 1000;
for i in b1.first..b1.last loop
a1.extend;
a1(i):=b1(i).a1;
…
end
loop;
forall i in a1.first..a1.last
insert into
testC
values (
a1(i)-a12(i), bcde(i)||decd(i),
substr(bcde(i),1,10), add_months(x_dt(i),-20), y(i)*z(i));
exit when
c1%notfound;
end loop;
close c1;
END;
/
This method turned out to be about the same speed
as CTAS. Then we tried to improve on
that performance by changing the way we load the final table (“C”).
We tried using a bulk insert after casting the
collection to a table as in:
insert into testC
select a1-a12 c1, bcde||decd c2, substr(bcde,1,10) c3,
add_months(x_dt,-20) c4, y*z c5
from
table(cast(b2 as testb2_tt));
This approach provided the best performance yet. Another 8 seconds were saved while processing
1 million records and reduced the overall processing speed to 42 seconds.
4. Load
The final method tried used a cursor FOR loop to load a
COLLECT and then operate on the collection.
As before, memory capacity was exceeded unless the number of records
processed was limited. Even then, this
method did not perform significantly faster than using a simple cursor FOR loop.
The results of all of the tests mentioned are summarized in
Table 1.
Method |
Extra |
File |
Data |
A Simple |
B +sysdate |
C +sysdate +tan( ) |
D +sysdate |
|
|||||||
CTAS |
2 buffer temp tables |
1_ctas.sql |
1M |
51 |
51 |
137 |
202 |
|
|||||||
Full bulk load into object collections |
Cast result into table |
2_FullBulkLoad_cast.sql |
1M |
Out of memory |
|||||||||||
250K |
19 |
19 |
42 |
60 |
|||||||||||
Process second step as FOR-loop |
2_FullBulkLoad_loop.sql |
1M |
Out of memory |
||||||||||||
250K |
14 |
14 |
37 |
55 |
|||||||||||
Load data into object collection N records at a time –
first step is BULK COLLECT LIMIT N |
1000 rows per inserts |
3_Bulk_1000_small_ insert.sql |
1M |
54 |
54 |
126 |
219 |
||||||||
1000 rows per bulk Splits into the set of collections FORALL |
3_Bulk_1000_FORALL. sql |
1M |
42 |
42 |
135 |
206 |
|||||||||
Load data into object collection 1 record at a time – first
step is regular loop |
Next transformation via loop (full spin) |
4_ForLoop_ForLoop.sql |
1M |
Out of memory |
|||||||||||
250K |
20 |
20 |
43 |
61 |
|||||||||||
Next transformation cast |
4_ForLoop_cast.sql |
1M |
Out of memory |
||||||||||||
250K |
24 |
24 |
47 |
65 |
|||||||||||
Table 1: Summary of Case
Study 1 Tests
Case Study #2: Periodic Modification of a Few Columns in a Database
Table with a Large Number of Columns
Case Study #2 involved a table with over 100 columns
containing 60 million records. Each month, a small number of columns within
these records needed to be updated. The Informatica routine used by the client generated update
statements that updated all 100 columns (even though only a few were
changed). The goal was to find out the
impact of this sub-optimal code.
For this test, the source table had 126 columns, five of
which represented changed data. One and two million records were processed. The
target table being updated either had 5 columns or 126 columns.
As expected, the standard PL/SQL method of spinning through
the cursor produced terrible performance.
Bulk updates are much faster using syntax like the following:
Update target t
Set (a,b,c,d,e) =(select a,b,c,d,e from source where oid =
t.oid)
If only five columns were updated, updating a five-column
table or a 126-column table did not produce very different performance (140
versus 210 seconds). However,
updating all of the columns (unnecessarily) on the 126-column table more than
doubled the processing time (485 seconds).
The results of this test produced two very clear lessons:
1)·
Use bulk update commands.
·
Only update the necessary columns.
The results are summarized in Table 2.
Method |
File |
Data |
Target has 5 cols |
Target has 126 cols |
Update only 5 columns: Update target t Set (a,b,c,d,e) =(select a,b,c,d,e from source where oid
= t.oid) |
1_update_5.sql |
1M |
140 |
210 |
2M |
310 |
445 |
||
Update all columns: Update target t Set (a,b,c,d,e) =(select a,b,c,d,e from source where oid
= t.oid) |
2_update_all.sql |
1M |
N/A |
485 |
Cursor spin: Declare
Cursor c1
Is
Select *
From source Begin
For c in c1 loop
Update target …
End loop; end; |
3_cursor.sql |
1M |
200 |
235 |
2M |
420 |
630 |
Table 2: Summary of Case
Study 2 Tests
Case Study #3: Loading
For this case study, several million new objects needed to
be read into the system on a periodic basis. The objects enter the system in a
large 120-column table which, when processed, is divided up to create several
parent-child object relationships in the system.
In this test, while reading from the source table, all three
child tables were loaded at the same time.
It is interesting to note that this functionality is not possible with
most ETL tools. Most ETL tools write to one table at a time. Therefore, you would need to write to the
parent table and then reread the parent table for each child table in order to
know where to attach the child records.
At the 2005 Informatica conference, there was
discussion of generating database code that might support this functionality,
but it would require an architectural modification of the way that the maps are
specified.
A description of the source and target tables for the case
study follows:
Source table:
120 columns
-·
40 number
-·
40 varchar2(1)
-·
40 varchar2 (2000) with populated
default values
-·
OID column – primary key
Target tables:
·Table
A
o·
ID
o·
40 varchar2(2000) columns
·Table
B
o·
ID
o·
40 Number columns
o·
child of table A
·Table
C
o·
2 number columns, 2 varchar2
columns, 1 date column
o·
child of table A
The traditional method of spinning through a cursor would
undoubtedly result in poor performance. In
addition, it generated an ORA-600 error.
Since the results of the cursor FOR loop were not of interest, 250,000 records
were used as a benchmark. As predicted,
the results were worse than any other method tried.
Bulk collecting a limited number of records turned out to be the
best approach. What was surprising was
that the best performance was achieved with a rather large limit (5000). Conventional wisdom usually indicates that
smaller limits are optimal.
However, simply using bulk operations does not guarantee
success. The first method tried was to
bulk collect the source data into an object collection, N rows at a time. Then the primary key of table A was generated.
Three inserts of N rows were performed by casting the collection. This method provided
no better performance than the simple cursor FOR loop.
However, by altering the script and using bulk
ForAll…Inserts, the performance was much better, taking roughly half the time of
the cursor FOR loop.
One idea was to use a “key table” to make lookups with the
cursor FOR loop faster. There was no
performance benefit to that approach.
The Case Study #3 results are summarized in Table 3.
Method |
File |
Extra |
Data |
Timing |
|
1_loop_with_inserts.sql |
|
1M |
ORA-600 |
250K |
127 sec |
|||
Bulk collect source data into object collection N rows at a
time and generate A_OID (primary key of table A) à
3 inserts of N rows (cast the collection) |
2_BulkCollect_3Inserts.sql |
50 rows |
1M |
578 sec |
250K |
141 sec |
|||
100 rows |
1M |
558 |
||
250K |
137 sec |
|||
1000 rows |
1M |
522 |
||
250K |
130 sec |
|||
5000 rows |
1M |
503 sec |
||
250K |
124 sec |
|||
10000 rows |
1M |
512 sec |
||
250K |
126 sec |
|||
Bulk collect source data into set of object collections
(one per each column) N rows at a time + generate A_OID (primary key of table
A) à
3 inserts of N rows (FORALL … INSERT) |
3_BulkCollect_FORALL.sql |
50 rows |
1M |
344 sec |
250K |
84 sec |
|||
100 rows |
1M |
317 sec |
||
250K |
79 sec |
|||
1000 rows |
1M |
271 sec |
||
250K |
66 sec |
|||
5000 rows |
1M |
263 sec |
||
250K |
65 sec |
|||
10000 rows |
1M |
265 sec |
||
250K |
68 sec |
|||
Full insert with recording pairs (Source_ID; A_OID) into
PL/SQL table. Next steps are querying that table to identify parent ID |
4_Buffer_PLSQL.sql |
|
1M |
605 sec |
250K |
120 sec |
Table 3: Summary of Case
Study 3Tests
Conclusions
In general, by using “smart” PL/SQL, you can count on almost doubling
your performance speed. Correct usage of
bulk collect with a high limit (about 5000) and ForAll…Insert are the keys to
fast manipulation. Do not update columns
unnecessarily.
NOTE: You can find all of the scripts used to create the tests on the Dulcian website (www.dulcian.com).
About the Author
Dr.