Business Rules: Theory & Implementation
(What and How)
Introduction of the What
Business Rules are the most essential part of our databases. It is imperative for the information science profession to deal with them in a structured way. Business Rules add semantic value to the data in the tables, by describing how the real world that we work in is modelled into the relational database tables. The biggest part of time spent on designing a database should be spent on modelling the involved Business Rules: finding out and documenting, what the rules are that will make the database a satisfactory fit to reality. Accurately, that is, unambiguously, documenting Business Rules can spell the difference between success and failure of an information system. Still today, documenting Business Rules is most widely done using the natural language, which often produces a quick dive into ambiguity. This paper will show that mathematicians have a nice theory known as Set Theory that, combined with Predicate Calculus, lends itself excellently to capture databases (data models) with their Business Rules (constraints) in a formal manner. By doing so, we will come up with a classification scheme of constraints in a straightforward manner. This knowledge is essential for keeping a clear and professional view on the issues that play a role in almost any information systems project nowadays, where Business Rules, or more precisely, the code that enforces them, is to be clearly distinguished (and implemented separately) from all other involved application code.
Introduction of the How
Similar to the amount of time spent on modelling Business Rules, the major part of the time spent on implementing information systems, should be spent on implementing the involved Business Rules in a structured, and therefore well performing and maintainable manner. Whenever transactions (DML-statements) change the contents of our database tables, we must perform checks to enforce the continued validity of our Business Rules. In order to end up with maintainable systems it is essential that implementation standards and guidelines be developed in this area. This can be nicely achieved using a class-by-class approach. Having come up with a classification scheme of constraints, this paper will give implementation guidelines per constraint class. Implementing Business Rules is by far not a trivial task. In addition, we have different ways to approach this: fundamentally different ways. There are those who say that since the arrival of the so-called Active Database System (systems that have some kind of implementation for the Event-Condition-Action paradigm, most notably through a database table trigger mechanism), all Business-Rule-checking code should be implemented using the Active Database System feature(s). We will refer to this way of implementation via table triggers as the post-DML[1] way to implement Business Rules. Others, however, say that triggers aren’t the answer to the question of how to implement Business Rules: it produces trigger code that can be very complicated indeed. They choose some way of checking the validity of a transaction (DML-statements) before it is actually submitted to the database system. We will refer to this way of implementation as the pre-DML way. Indeed, as this paper will show, designing optimal (or better, the minimal) checks necessary to be performed whenever transactions are submitted to our database tables is by far not a trivial task. The question is whether this is due to a trigger mechanism per se, or due to the (inherent) complexity of checking a Business Rule (given SQL as the vehicle that manipulates our data)?
What are Business Rules?
By looking at how other people define the term Business Rule, it quickly becomes clear that no real consensus is available yet.
BR-Symposium 2002
In his introduction of last
years Business Rules symposium
“A precisely articulated requirement that is both machine readable & user readable.”
Chris Date
In his book “What Not How” [DAT00] Chris Date offers the following definition for a Business Rule:
“The term business rules includes all of the following:
1. Presentation Rules
2. Database Rules
3. Application Rules
In other words rules apply to all three aspects of applications: Presentation aspects, Database aspects, and aspects specific to the Business Function per se.”
Ron Ross
In his book “The Business
Rule Book” [ROS94] Ron Ross offers the following definition for a Business
Rule:
“A Rule may be defined as a
constraint, or a test, exercised for the purpose of maintaining the integrity
(i.e. correctness) of persistent data.”
Oracle Documentation
The Application Developers
Guide offers the following definition for a Business Rule:
“Business rules specify conditions
and relationships that must always be true, or must always be false. Because
each company defines its own policies about things like salaries, employee
numbers, inventory tracking and so on; you can specify a different set of rules
for each database table.”
In this paper
Before we proceed, let us first loosely[2] define what the term Business Rule means in the scope of this paper.
Business Rules º “All assertions that constrain the possible values in a database state.”
This definition implies that if a rule (a rambling in the real world) cannot be paraphrased in terms of the table values (values in a database state) that together implement the data model supporting our application, then it is not considered a Business Rule in this paper. Rather some aspect of the Business that may, or may not, be implemented (elsewhere) in the application code. If such a rule is to be considered a Business Rule then the data model should be extended (additional tables and/or attributes) such that it indeed can be described in terms of the data model, thereby constraining the possible values of the database state. Otherwise put, a Business Rule should always fully map to the database schema (skeleton) at hand.
Note the use of the word assertions, instead of for instance conditions. The assertion is meant to describe the What of the Business Rule. Actual Business Rule specifications not seldom describe the How of the Rule in terms of conditionally coded logic. An assertion is meant to state the rambling in the real world, without describing how it should be checked.
Formal database definition
Having set the scene, let us now proceed and demonstrate how Set Theory and Predicate Calculus enables us to get a clear and professional view on the Business Rules area.
A database state at any given point in time is essentially a set of tables, where each table has a valid value (i.e. one that complies with the business rules). In the course of time transactions occur which change database states into new database states. What we need to model is the set of all (valid) database states. This set is called the database universe.
Within a database universe (i.e. the set of admitted database states):
· Each database state is an admitted set of tables and
· Each table is an admitted set of tuples (records) and
· Each tuple is an admitted set of attributes and
· Each attribute is an admitted value from a given set of values[3].
Since all objects above are sets, they can all be defined mathematically using Set Theory and Predicate Calculus. Such formalisation has the big advantage of avoiding any possible ambiguity in the documentation of a data model and its constraints. At the same time, this will supply us with a straightforward classification for them (as will become clear shortly).
So how do we model, in a formal way, this set called the database universe?
This is done in a phased (layered) approach as follows. Firstly, we define what our vocabulary is: what are the things, and aspects of them in the real world that we want to deal with in our database? In other words what tables do we need and what attributes does each table need? This is formally defined in what is called a database skeleton. Once we have this, we then define for each attribute the set of admitted attribute values. These sets of admitted attribute values are then used as building blocks to construct (define) for each table the set of admitted record values. The sets of admitted record values are then used to build for each table the set of admitted table values. Finally, the sets of admitted table values are used to define the set of admitted database states, which is called the database universe. This technique of constructing a database universe has been developed by Bert De Brock in the eighties. See [BRO89] and [BRO94] for a full explanation of this methodology to define databases in a formal way. An example of a database universe construction will be given in a later section of this paper.
In this bottom-up solid construction of a database universe we explicitly only allow admitted sets of values at each of the levels described above. This means that at each level the set must conform to the data constraints (Business Rules). These constraints tell us which sets are valid ones. They condition the contents of the sets. This brings us straightforward to four classes of data constraints. They are (in increasing scope order):
Attribute constraints
These are assertions that values of attributes (single column values within a record) should comply with. They can be specified and checked without referring to values of other attributes within the tuple. For instance, in the employee table: 'Admitted values for attribute Job are “Manager”, “Clerk” or “Salesman”', or 'The value for the Allowed-vacation-days attribute is always greater than 10' (see example database in next section).
Tuple
(inter-attribute) constraints
These are assertions that combinations of values regarding more than one attribute within a tuple (record) should comply with. They can be specified and checked without referring to other tuples. For instance, a constraint between attributes Job and Allowed-vacation-days: 'Managers should have more than 15 vacation days'.
Table (inter-tuple) constraints
These are assertions that combinations of different tuples within the same table should comply with. They can be specified and checked without referring to other tables. For instance inside a VAC table that holds employee vacation records, 'Vacation records of the same employee may not overlap' or 'Directly adjacent vacation records are not allowed'.
Database (inter-table) constraints
These are assertions that combinations of tables within the same database state should comply with. They can be specified and checked without referring to other database states. For instance conditions between the employees and vacation records tables: 'Each vacation record belongs to an existing employee' or 'The total amount of vacation days taken by an employee, may not exceed the number of vacation days allowed for that employee'.
These four classes[4] of constraints enable us to accept or reject a given database state. They condition database states and are often referred to as static constraints: they can be validated within the context of a (static) database state. In the real world, there is also a need to constrain database state transitions, on grounds other than static constraints. Constraints that condition transitions of database states (transactions) are referred to as dynamic constraints, or transition constraints.
Dynamic (database state transition) constraints
These are assertions, other than static constraints, that database state transitions (transactions) should comply with. They can only be specified and checked while referring to both the database state from which the transaction originated and the database state which is to be the result of the transaction. Some examples: 'The allowed number of vacation days can only be increased, never decreased, for an employee' or 'Only most recent vacation records are allowed to be updated or deleted'.
Since the above classification scheme is driven by the scope of data that a constraint talks about, it has the advantage of being closely related to implementation issues of constraints: obviously the type of effort involved in implementing different constraints within one such class is the same. In addition, the complexity involved in implementing constraints probably increases as the scope of data increases.
Example database definition
We will now demonstrate the database universe construction method described above by defining an example database universe which is used in the remainder of this paper. More examples can be found in [KOP94-1], [KOP94-2], [KOP95-1], [KOP95-2], [KOP95-3] and [KOP96]. During this definition some clarification of the used set-theoretic and predicate calculus symbols is given. Full treatment of the formalisms used can be found in books covering Discrete Mathematics such as [SM77].
· The database holds employees and their vacation records.
· Each employee is identified by a number and has a name, a job and a maximum number of allowed vacation days.
· Each employee can have multiple vacation records.
· Each vacation record registers a period (first_day, last_day) during which the employee took (is to take) a vacation.
More English language documentation of
constraints involved in this two-table database universe, is embedded within
the definition between comment markers (/* and */). Note that the major part of
this (and any other) database definition, is the definition of its data constraints.
Defining the set of admitted database states
(database universe) starts off by defining what the database skeleton for the universe is: it introduces the names of
the involved tables (relations) in our universe and per table the names of the
involved columns (attributes). A
database skeleton is a set-valued function[5]:
for each table this function yields the set of attributes (heading) of that
table. Our database skeleton, say G, for the little Employee/Vacation example
looks as follows:
Database skeleton
G
/* Define our vocabulary: tables names
and attribute names */
G = {(EMP; /*
Employee */
{EMPNO, /*
Employee number */
ENAME, /*
Employee name */
JOB, /*
Employee job */
VACATION_DAYS}), /* Maximum number of allowed vacation
days */
(VAC; /*
Vacation */
{EMPNO, /*
Employee number */
FIRST_DAY, /*
First day of this vacation period */
LAST_DAY})} /*
Last day of this vacation period */
Here we depict a function as a set of ordered pairs, where the first element of each pair belongs to the domain[6] of that function (in this case the set of table names) and the second element of each pair belongs to the range of that function (in this case sets of attribute names). Function G could also have been defined by first telling what its domain (Dom) is and then for each element in its domain, what its range element is:
Dom(G) = {EMP,VAC}
G(EMP) = {EMPNO,ENAME,JOB,VACATION_DAYS}
G(VAC) = {EMPNO,FIRST_DAY,LAST_DAY}
Next, we define for each table, for each attribute, what its admitted set of attribute values is. This is done by (again) defining two set-valued functions, say f-EMP and f-VAC, whose domains are G(EMP) and G(VAC) respectively. Functions f-EMP and f-VAC are called the object characterisations of EMP and VAC respectively.
Note that (for our convenience) the object
characterisations given below use the base data types of the DBMS that will
implement our data model. These data types are sets: NUMBER(i,j) is the set of
all numbers of scale i-j and precision j, VARCHAR2(n) is the set of all
variable length strings no longer than n positions and DATE is the set of all
date values. The base data types of the DBMS are narrowed down by specifying
the attribute constraints.
Object characterisations f-EMP and f-VAC
/* Define attribute constraints */
f-EMP =
{(EMPNO ; { v Î NUMBER(4,0) | v > 0 }),
(ENAME ;
VARCHAR2(20)),
(JOB ;
{ v Î VARCHAR2(8) | v Î
{'MANAGER','CLERK','SALESMAN'}),
(VACATION_DAYS ; { v Î NUMBER(3,0) | v > 10 })}
The expression { vÎW | v > 0 } should be read as 'The set of all values v that are an element of the set W and for which holds v > 0'. In other words, this is the subset of all W elements that are greater than zero.
For attributes FIRST_DAY and LAST_DAY of table
VAC we first introduce an auxiliary data type that is based on the DATE data
type:
WORKDATE
:= { trunc(d) | d Î DATE /* DATE-values without time component */
Ù to_char(d,’Day’) {‘Saturday’,’Sunday’} } /*
Not weekend days */
The symbol Ù is the logical AND operator. Data type WORKDATE represents the set of all date values that have no time component and do not represent a (: are not an element of) Saturday or Sunday (here again, for our convenience, we are using two built-in functions of the DBMS that will implement our data model).
f-VAC =
{(EMPNO ; { v Î NUMBER(4,0) | v > 0 }),
(FIRST_DAY ;
WORKDATE),
(LAST_DAY ;
WORKDATE)}
Having defined for each attribute in each table what its admitted attribute values are, we can now proceed to the definition of tuple universes. Tuple universes represent the data type of tuple variables or, if you prefer, they define the admitted record values of record variables for each of our two tables EMP and VAC. The definition of each tuple universe, say t-EMP and t-VAC, is done by first constructing the set with all possible tuple values based on the admitted attribute values (object characterisations). In this set of tuple values, all previously defined attribute constraints will hold. We generate such a set by using the so-called generalized product function (symbol P) applied to an object characterisation, such as f-EMP and f-VAC. We then narrow down the set of possible tuples to only admitted tuples by specifying the inter-attribute (tuple) constraints.
Tuple
universes t-EMP and t-VAC
/* Define tuple constraints */
t-EMP =
{
t | tÎP(f-EMP) Ù
/*
managers have more than 15 vacation days */
t(JOB)='MANAGER'
Þ t(VACATION_DAYS)>15
}
The symbol Þ is the implication operator. The expression A Þ B, read 'A implies B', is logically equivalent to ØA Ú B, which can be read as 'not A or B'
(the symbol Ú is the logical OR operator and Ø the negation operator).
t-VAC =
{
t | tÎP(f-VAC) Ù
/*
first_day cannot be after last_day */
t(FIRST_DAY)
£ t(LAST_DAY) Ù
/*
Vacation cannot be more than 3 weeks */
(t(LAST_DAY)
- t(FIRST_DAY) + 1) £ 21
}
P(F) The 'generalized product' of set function F is formally defined as follows:
For F a set-valued function,
P(F) = { f | f is a function where Dom(f)=Dom(F) Ù ("xÎDom(f): f(x) Î F(x)) }
The "
symbol is the 'universal quantification' operator. An expression such as ("tÎT: Q(t)) should be read as follows:
'For all values t, where t is an element of set T, the Boolean expression Q(t)
holds'. Q(t) can be any valid Boolean expression over variable t.
Let us explain the generalized product by giving
a small example: let F be the following set: {(A; {1,2}), (B; {3,4,5})}. This
set holds two elements each of which is an ordered pair. It attaches A to the
set {1,2} and B to the set {3,4,5}. Note that F can be considered an object
characterisation: compare with f-EMP and f-VAC, where A and B are attribute
names and the sets {1,2} and {3,4,5} are the admissible attribute values. Given
this F, then P(F), the generalized product of F is
the following set:
{ { (A;1), (B;3) }, { (A;2), (B;3) }, { (A;1), (B;4) }, { (A;2), (B;4) }, { (A;1), (B;5) }, { (A;2), (B;5) } }
This set holds six elements where each element
represents a possible tuple value given the object characterisation F.
The set P(f-EMP) is the set of all possible tuple values (depicted as functions). For each attribute, an element of its corresponding set of admitted attribute values is picked. t-EMP then is the set of all admitted tuple values for the EMP table. t-VAC is the same for the VAC table.
In the definition of t-EMP the tuple constraint
defines that employees with job 'MANAGER' always have more than 15
allowable vacation days. In the definition of t-VAC the first tuple constraint
defines that FIRST_DAY cannot be after LAST_DAY. The second one restricts the
maximum length of a vacation to 21 days.
With t-EMP and t-VAC, representing the data
types of tuple variables for the EMP and VAC tables, we can continue the
construction of our database universe. We do this by generating table
universes, say w-EMP and w-VAC, for tables EMP and VAC. Again like tuple
universes, the table universes represent the data type for our table variables
EMP and VAC: we again start with the sets that hold all possible table
values. By specifying predicates, representing table constraints we narrow
these sets down to only admitted table values. The set holding all possible
table values is the set of all subsets of the corresponding tuple
universe. In such a subset, all previously defined attribute and tuple
constraints will hold.
Table
universes w-EMP and w-VAC
/* Define table constraints */
w-EMP =
{
T | T Í t-EMP Ù
/*
EMPNO uniquely identifies an employee tuple */
("s,tÎT: s(EMPNO) = t(EMPNO) Þ s=t) Ù
/*
No CLERK has more vacation_days than a SALESMAN */
Ø($cÎT: c(JOB) = ’CLERK’ Ù
($sÎT: s(JOB) = ’SALESMAN’ Ù s(VACATION_DAYS) < c(VACATION_DAYS) ) ) Ù
/*
T can hold no, or one MANAGER only */
0
£ ½{ tÎT | t(JOB) = ‘MANAGER’}½ £ 1 Ù
/*
Everybody has less vacation_days than the MANAGER */
($mÎT: m(JOB) = ’MANAGER’) Þ
("eÎT : (e(JOB) = ‘CLERK’ Ú e(JOB) = ‘SALESMAN’) Þ
(e(VACATION_DAYS)
< s{ m(VACATION_DAYS)| mÎT Ù m(JOB) = ‘MANAGER’})
)
}
The Í symbol is the subset symbol: A Í B should be read as 'A is a subset of B'. w-EMP is the set of all possible subsets of t-EMP, where every subset of t-EMP (i.e. a set of admitted tuple values) is a possible table value for EMP. The constraints then dismiss every such generated subset of t-EMP that does not comply with them, making w-EMP the set of admitted table values.
s Is the singleton operator: for S a set, sS picks an (arbitrary) element out of set S.
The $ symbol is the 'existential quantification' operator. An expression such as ($tÎT: Q(t)) should be read as follows: 'There exists a value t, where t is an element of set T, for which the Boolean expression Q(t) holds'. Q(t) can be any valid Boolean expression over variable t.
The first table constraint in w-EMP defines the
key for EMP: if two tuples are the same on their EMPNO-attribute value, then
they must be the same tuple. The second constraint specifies that there cannot
exist a CLERK that has more vacation_days than any SALESMAN. The third
constraint states that any valid table value for EMP can hold at most one
MANAGER. Finally, the last constraint specifies that if the MANAGER is present
in the current table-value for EMP, then all other employees must have less
vacation_days than this MANAGER has.
Intermezzo 1
Let’s briefly go back to the second table constraint of w-EMP: “No Clerk has more vacation days than a Salesman”. Using English language as our vehicle for describing this business rule, other people may have specified this rule differently, for instance as follows: “All Clerks should have less, or equal, vacation days then Salesmen”. Transforming this rule to Predicate Calculus, one may have come up with the following (seemingly different) specification:
("cÎT: c(JOB) = ‘CLERK’ Þ
("sÎT: s(JOB) = ‘SALESMAN’ Þ c(VACATION_DAYS) £
s(VACATION_DAYS)
)
)
One of the great benefits of specifying rules using Predicate Calculus is that the full power of the calculus becomes available to us to analyse them. This can be used to answer the question whether this seemingly different predicate is different, or in fact is equivalent to the predicate of the same rule as specified within the w-EMP definition.
Using the following rewrite rules that
are available within the Calculus, one can prove that this predicate is
equivalent to the predicate within w-EMP:
A Þ B, is equivalent to: ØA Ú B
and,
("tÎT: Q(t)), is equivalent to: Ø($tÎT: ØQ(t)).
The exercise to actually proof this is left to
the reader. As we will see later on, the ability to rewrite constraint
predicates will also aid us when we need to find the (minimal) checks to be
performed when transactions change our data.
End-Intermezzo 1
We continue with the table universe for the VAC
table.
w-VAC =
{
T | T Í t-VAC Ù
/*
Empno and first_day (concatenated) uniquely identify a vacation tuple */
("s,tÎT: (s(EMPNO) = t(EMPNO) Ù s(FIRST_DAY) = t(FIRST_DAY)) Þ s=t) Ù
/*
No directly adjacent vacation records allowed */
("s,tÎT:
(s(EMPNO)
= t(EMPNO) Ù s(FIRST_DAY) < t(FIRST_DAY))
Þ
½{ dÎWORKDATE| s(LAST_DAY) < d <
t(FIRST_DAY)}½ > 0 ) Ù
/* There may never be more than
ten employees on vacation at the same day */
Ø($dÎWORKDAY: ½{v(EMPNO)|
vÎT Ù t(FIRST_DAY £ d £ t(LAST_DAY)}½ >10)
}
The first table constraint in w-VAC defines the key for VAC: if two tuples are
the same on both their EMPNO and FIRST_DAY attribute values, then they must be
the same tuple. The second constraint specifies that for every combination of
two vacation records of the same employee where one precedes the other, there
must be at least one workday in between those two vacation periods. The third
constraint specifies that there may not exist a WORKDAY on which more than ten
employees are on vacation.
Intermezzo 2
Typically in a table such as VAC we would find a table constraint such as: “Two vacation records of the same employee are not allowed to overlap.” A formal specification for this constraint (embedded in the definition of w-VAC) could be:
("s,tÎT:
(s(EMPNO)
= t(EMPNO) Ù s(FIRST_DAY) ¹ t(FIRST_DAY))
Þ
(s(LAST_DAY)<t(FIRST_DAY)
Ú s(FIRST_DAY)>t(LAST_DAY))
)
Again, using rewrite rules that are available
within the Calculus, we can prove[7]
that the predicate above is actually implied by the predicate that
prohibits “directly adjacent vacation records” (2nd table constraint
inside w-VAC). Since the “no-overlaps” rule above can be deduced from
that second table constraint, we need not mention it.
End-Intermezzo 2
Having defined w-EMP and w-VAC as the sets of admitted values for our table-variables EMP and VAC, we are now one step away from the definition of our database universe.
Database universe UEX
We conclude by defining what the set of admitted database states is, which was our goal viz. to define the database universe. Again, we first construct the set of all possible database states based on the table universes (by taking the generalized product of a set function HEX) and then narrow it down by specifying inter-table constraints.
Let HEX (also called the database characterisation)
be a set-valued function as follows,
HEX = /* attach table universes to table aliases
*/
{(EMP ; w-EMP),
(VAC ;
w-VAC)}
Now a set can be defined as the generalized
product of HEX: an element of this set is a database state represented as a
function whose domain consists of the table names. Per table (name), such
function gives an element of the respective table universe.
P(HEX) can be depicted as follows:
{ {(EMP; E1), (VAC;V1)}, --
1st possible db-state
{(EMP; E2), (VAC;V2)}, --
2nd possible db-state
{(EMP; E3), (VAC;V3)}, -- 3rd possible
db-state
… /* and so on and so forth */
}
Where E1, E2, E3, … are elements of w-EMP, and V1, V2, V3, … are elements of w-VAC.
P(HEX) will generate every possible
combination of an element out of the w-EMP set and one out of the w-VAC set.
Every such generated combination is an element of P(HEX) and represents a possible database state in which all
previously defined attribute, tuple and table constraints will hold. Only those
states that comply with the database constraints are admitted in the UEX
set (our database universe).
/* Define database constraints */
UEX =
{
v | vÎP(HEX) Ù
/*
Each vacation record has corresponding employee */
{ t(EMPNO) | tÎv(VAC) } Í {
t(EMPNO) | tÎv(EMP) } Ù
/* The vacation taken may not
exceed the maximum allowed */
("tÎv(EMP):
(Ss: sÎv(VAC) Ù s(EMPNO)=t(EMPNO): ½{ dÎWORKDATE| s(FIRST_DAY) £ d £ s(LAST_DAY)}½)
£
t(VACATION_DAYS) )
}
The symbol S is the summation operator. The expression (St: tÎT Ù Q(t): R(t)), should be read as 'the sum of R(t) values over all elements t in T for which Q(t) holds'.
The first database constraint makes sure that only those database states are allowed where every EMPNO-value present in the VAC table state is also present in the accompanying EMP table state: this is done by specifying that the EMPNO-values in VAC are a subset of the EMPNO-values in EMP. The second database constraint makes sure that only those database states are allowed where the VAC table state does not hold vacation records for an employee that represent more days (sum over the cardinality of the set of WORKDATE values between FIRST_DAY and LAST_DAY) than the allowed VACATION_DAYS in the accompanying EMP table state for this employee. The cardinality of a set S is defined as the number of elements within set S and can be depicted as ½S½.
Having defined the database universe in a formal way, we now know exactly which database states are allowed (valid) for our EMP/VAC database: if such database state is an element of UEX then (and only then) it is a valid database state.
Valid database state transitions VEX
The minimum requirement for each transaction (causing a transition from one database state to another database state) is: if it starts out in a valid database state (an element of UEX), then it must result in a valid database state (another element of UEX). Given the definition of UEX, the set of possible transactions, say PEX, can very quickly be defined as follows:
PEX = { (v; w) | vÎUEX Ù wÎUEX }
Here every possible transaction is
depicted as an ordered pair (v; w), where v stands for the database state in
which the transaction started and w the database state in which the transaction
results. We restrict the PEX set, to say the VEX set, that only holds valid
transactions, through specifying the dynamic (also referred to as transition)
constraints.
/* Define dynamic (transition) constraints */
VEX =
{
(v; w) | (v; w) ÎPEX Ù
/* Do not allow updates of
uniquely identifying columns in EMP and VAC */
("sÎv(EMP), tÎw(EMP): syskey(s)=syskey(t) Þ s(EMPNO)=t(EMPNO)) Ù
("sÎv(VAC), tÎw(VAC): syskey(s)=syskey(t) Þ s(EMPNO)=t(EMPNO) Ù
s(FIRST_DAY)=t(FIRST_DAY)) Ù
/*
Allowed vacation days can only increase */
("sÎv(EMP), tÎw(EMP): s(EMPNO)=t(EMPNO) Þ s(VACATION_DAYS)£t(VACATION_DAYS)) Ù
/* Only the most recent VAC
record of an employee can be updated */
("sÎv(VAC), tÎw(VAC): (s(EMPNO)=t(EMPNO) Ù
s(FIRST_DAY)=t(FIRST_DAY) Ù
s(LAST_DAY) ¹t(LAST_DAY)) /* updated */
Þ
Ø($uÎv(VAC): u(EMPNO)=s(EMPNO) Ù
u(FIRST_DAY)>s(FIRST_DAY))
)
Ù
/* Only the most recent VAC
record of an employee can be deleted */
("sÎv(VAC): Ø($tÎw(VAC): t(EMPNO)=s(EMPNO) Ù t(FIRST_DAY)=s(FIRST_DAY)) /* deleted */
Þ
Ø($uÎv(VAC): u(EMPNO)=s(EMPNO) Ù
u(FIRST_DAY)>s(FIRST_DAY))
)
/* Newly inserted VAC records must
be ‘most recent’ */
("tÎw(VAC): Ø($sÎv(VAC): s(EMPNO)=t(EMPNO) Ù s(FIRST_DAY)=t(FIRST_DAY)) /* inserted */
Þ
Ø($uÎw(VAC): u(EMPNO)=t(EMPNO) Ù
u(FIRST_DAY)>t(FIRST_DAY))
)
}
The first transition constraint uses a
syskey-function. This function can be applied to a tuple of any tuple type and
renders a System Key that uniquely identifies the tuple in the DBMS that
implements our database universe. We assume the availability of such a function
in the DBMS to enable us to quickly verify whether two tuples are actually the
same tuple. In Oracle, the syskey-function maps to the well-known rowid
function. This constraint then states that if an EMP tuple was present in the
state in which the transaction started, and is still present in the state in
which the transaction results, then the EMPNO-value of the tuple may not have
changed. The same is done for the VAC-table. The second transition constraint
reads: for every combination of an EMP-tuple available in the start-state of
the transaction and EMP-tuple available in the end-state of the transaction, if
these two tuples have the same value for their EMPNO attribute, then the
VACATION_DAYS attribute may not have decreased. The third transition
constraints states that for every VAC tuple that has been updated on its
LAST_DAY attribute, there may not have existed another VAC tuple of the same
Employee that was more recent. The fourth transition constraint states that for
every VAC tuple that existed in the start-state but does no longer exist in the
end-state (ergo, it has been deleted), there may not have existed another VAC
tuple of the same Employee that was more recent. Finally, the last transition
constraint states that for every VAC tuple that exists in the end-state, but
did not exist in the start-state (ergo, it has been inserted), there may not
exist another VAC tuple of the same Employee that is more recent.
This completes the definition of our example database universe. Formalisation of constraints (Business Rules) reveals the class (scope) of each data constraint without doubt. It also shows for each constraint which attributes (columns) play a role. We need to know this before we can design procedural checks to maintain the validity of the constraint.
Transactions and checking time of constraints
SQL transactions
In the previous section, the term transaction was loosely introduced. A formal specification of what type of changes (units of work) in terms of database state changes can be implemented with a transaction was not given. It is beyond the scope of this paper to investigate the formalisation of transactions. Note that in the real world changes of our table variables are implemented using the Data Manipulation Language part (DML) of SQL, more precisely using Insert, Update or Delete statements. Using these DML-statements only one table can be changed at a time and even in only one manner too (it’s either an Insert, or an Update, or a Delete). Typically, transactions need to change more than one table and possibly also in more than one manner too. For this reason, transactions (in the SQL-world) must be allowed to consist of multiple DML-statements that are serially executed after each other. This fact brings us to the following question: the validity of all Business Rules must be upheld at the end of the transaction (given they were upheld at the beginning of the transaction), but what about the validity of Business Rules in between the execution of two DML-statements inside the same transaction? Should the rules be valid at that point in time, or could they be (temporarily) violated as long as they become valid again at the end of the transaction? Before answering this question, let us first have a closer look at the implications of temporary rule violations inside a transaction.
Semantic optimisation
There are implications in the area of query correctness and optimisation when we allow rules to be violated. Typically, when we formulate queries, as Select-statements or as where-clauses of Delete and Update statements, we may (perhaps unknowingly) use Business Rule knowledge to more efficiently word a particular query. For instance:
“Give the number of employees that are on vacation at June 23rd 2003.”
SELECT
“Number of employees on vacation at June 23rd 2003:
“||to_char(count(*))
FROM
VAC v
WHERE
to_date(’23-JUN-2003’,’DD-MON-YYYY’) between v.first_day and v.last_day;
Note how this (very simple) query uses the knowledge of the “no-directly-adjacent vacation records” Business Rule (table constraint of VAC). This rule implies that VAC-records of the same employee will never overlap and therefore enables us to specify a count(*), instead of a count(distinct v.empno), which would be more expensive to evaluate (requires extra SORT operation).
Using Business Rules knowledge to rewrite queries that are still correct, yet often more efficient to execute is referred to as Semantic Optimisation. Semantic query optimisation, a whole research area in its own, is about using semantic knowledge (read: Business Rules) of your admitted database states, to rewrite queries to facilitate more efficient query evaluation. Optimisers of current commercially available database systems (including Oracle) offer virtually no support yet for this type of optimisation. This is to no surprise at all, since these database systems have virtually no semantic knowledge of the rules that govern the contents of the tables. They will first need to support a much wider range of declarative Business Rules (assertions), before they can start adding semantic optimisation features to their optimisers.
This closer look at the implications of temporary rule violations during a transaction has taught us the following:
· Semantic optimisation is not allowed for queries submitted by the application in between DML-statements during the transaction.
Moreover the implications need to be extended to include the following:
· Semantic optimisation is not allowed for the query part (where clause) of successive DML-statements during the transaction.
· The optimiser, should it support semantic optimisation, must be disabled to optimise using semantic knowledge during the transaction. Or better, it should know which rules are potentially violated and exclude those in the optimisation process.
Note that we are never allowed to semantically optimise statements that are part of (self-written procedural) code sections, whose explicit goal it is to (re) check the validity of Business Rules. By their nature, these code sections will be executed at times that Business Rules may very well be not upheld. It seems sensible though that we may assume all constraints involved in a transaction to be validated in a ‘increasing scope-of-data’ order. However, an order of validation of different constraints of the same class may never be assumed[8]. Therefore, code that checks tuple constraints may assume the validity of all attribute constraints. Code that checks table constraints may assume the validity of all attribute and tuple constraints. Code that checks database constraints may assume the validity of all table, tuple and attribute constraints. Finally, dynamic constraints may assume the validity of all static constraints.
Deferred checking
Now we go back to our original question: Do we allow temporary rule violations inside a transaction? Judging from above elaboration it seems wise to never allow this and recheck every involved rule whenever a DML-statement has made a change inside a table. Thankfully, today’s Active Database Systems (including Oracle) offer mature Event-Condition-Action mechanisms that enable us to indeed implement such statement-level (immediate) recheck of potentially violated Business Rules. Therefore, we are inclined to never allow violations during an SQL-transaction. However…
There is a class of Business Rules that causes problems and will always need to be temporarily violated during the transaction and rechecked in a deferred way, at the end of the transaction. One of the most well known examples of such a rule is the mandatory child rule: this is the case of a parent and child table with a referential integrity (foreign key) rule (RULE1) from the child to the parent, but also, with a rule (RULE2) that states that every parent tuple must have at least one child tuple. Since, with SQL-DML, we cannot insert both the parent and the child at the same time, a transaction that creates a new parent/child combination will have to issue two inserts and depending on the order of the inserts, either the foreign key, or the at-least-one-child rule must be temporarily violated. This example gives us a first reason why, in our SQL-world, we have to deal with temporary rule violations.
DML-statements always operate on a single table
Given our set of rules, certain valid database state transitions may well require a DML-statement on more than one table. Since rules in the database constraint class have a data scope of multiple tables, there is a good possibility that a database constraint violation will be caused by a first DML-statement on one of the involved tables and that this violation will be corrected by a second DML-statement on one of the other involved tables. Note the use of the word 'possibility'. Some database state transitions will always need to temporarily violate a database constraint, regardless of the order in which the DML-statements within that transaction are executed. There are many database constraints, however, that never need deferred checking. The data conditioned by them can be kept clean at all times by issuing the different DML-statements in a specific order implied by the constraint. This is the case with for instance a foreign key only situation, where there is never a need for temporary violations as long as DML-statements are executed in the right order. In our little example above, we must choose either RULE1, or RULE2 to be checked in a deferred way, since the implied orders of DML-statements for these two rules conflict.
The second shortcoming of the SQL transaction mechanism and also reason for deferred checking, is:
DML-statements always operate in a single way
They are either of type insert, or of type delete, or of type update. A valid transition of data values within one table might however take more than one type of DML-statement to achieve and thus possibly give rise to the need to allow temporary violations for table constraints too. Take for example the following (not very realistic) constraint: 'In the EMP table the number of Salesmen plus twice the number of Clerks, must equal either one hundred or zero'. Let's look at the example transaction “add a Clerk” and assume that the EMP table currently holds one hundred Salesmen only (rendering our table constraint true). As soon as we introduce the new Clerk, either by updating JOB of an existing record or by inserting a new one, we will always introduce a violation and need a second different type of DML-statement to restore the truth of the table constraint.
This brings us to a revised answer of our original question whether rules should be allowed to be temporarily violated inside a transaction: Given the world in which we live, where SQL is the vehicle to implement transactions, it depends on the rule or combination of rules. If a rule, or combination thereof, is such that hops from one valid database state to another valid database state always require more than one type of DML-statement on the same table, or always require changes in more than one table (with conflicting execution order), then temporary violations must be allowed[9].
Alternative
implementation strategies
Before we discuss some different Business Rule implementation strategies, we will first take a high-level look at the overall code-architecture of an application system. Given the definition of a Business Rule, earlier in this paper, we will now introduce two other definitions: Business-Rule-Checking code (BRC-code) and Application code (APP-code).
BRC-code º “All code that has been produced
specifically to check the (continued) validity of Business Rules when
transactions make changes to our
persistent data.”[10]
APP-code º “All other code.”[11]
Here are a few observations regarding the implementation strategy for these two types of code.
· BRC-code can be implemented in a pre-DML, or a post-DML way.
· Post-DML implemented BRC-code always sits in the (active) database server tier tucked away in table triggers.
· Pre-DML implemented BRC-code can sit either in the database server (stored PL/SQL), or in the (fat) client.
· APP-code can be implemented in the database server (stored PL/SQL) or in the client.
Let us clarify these observations a bit more. At the end of the eighties / early nineties, when client-server was hot and SQL-databases were the rising stars (though they did not yet offer Active features), we used to implement all BRC-code in conjunction with the APP-code in the client tier. We even used to do this in a very spaghetti-like way: not clearly separating the two types of code from each other. Then a bit further, in ’93-’94, when Active databases became available, we started distributing the BRC-code across the client and database server tiers. Many immediately adopted the functionality of implementing (post-DML) declarative constraints (a minor subset of all Business Rules) inside the database tier. Few started implementing triggers to validate Business Rules (post-DML). Others started moving APP-code (including the embedded pre-DML BRC-code) from the client to the database using features such as stored procedures. We even started replicating (manually) BRC-code across the tiers. Still we were not doing all of this in a structured way, on the contrary: BRC-code was being distributed everywhere, which made maintenance of Business Rules a real nightmare. During the course of the nineties existing central databases were made available to a variety of new client-tools, some of them capable of transacting and frequently circumventing existing (non post-DML) BRC-code when doing so. This rather is where everybody ended up at the start of the new millennium. This past decade we may have learned the following lessons:
· Clearly separate server-side pre-DML BRC-code from server-side APP-code
· Clearly separate client-side BRC-code from client-side APP-code
· Prefer building robust[12] databases to avoid execution of transactions that are not checked by BRC-code
· Develop (and enforce) implementation guidelines regarding BRC-code
There is a growing awareness nowadays that the ability to efficiently maintain and manage BRC-code is becoming crucial. Application developers should cease doing both at the same time: build APP-code and BRC-code. Rather development teams should be split up, with one team responsible for implementing the robust database (BRC-code) and the other team responsible for producing the application (APP-code). The need to have a Rule Repository to support the robust database development team and from which BRC-code can be generated (or at least maintained), is rising on the wish list of the Application Development manager. And of course vendors are responding to these demands: there are already over a dozen commercially available so-called Rule Engines. Every one of them is doing a different job with regards to generating BRC-code and every one of them is doing it with proprietary repositories, different environments, diverse technologies and own (coding) architectures.
The author strongly favors the post-DML method, which is by nature the most robust Business Rule implementation strategy[13]. Building a robust database ensures that transactions will never be allowed to corrupt the integrity of the data: no matter from what APP-code these transactions originate. Everybody (every application) will be enforced to use the data in the same way. Knowing the database state always complies with the Business Rules enables us to write more efficient SQL (be it ad-hoc queries, or embedded within APP-code of our applications). The rest of the paper will fully explore the post-DML strategy and produce implementation guidelines. By fully exploring this implementation strategy we learn about the true challenges that are involved when implementing BRC-code (in our SQL-world). Only when we are aware of the complexity involved in building a robust (post-DML) database, will we be able to judge the third party add-ons (Rule Engines).
How to
implement Business Rules
Having come up with a classification scheme of data constraints, we can use this as a set-up for the implementation guidelines[14]. The following sections will cover guidelines for the implementation of data constraints at the server side. Again we point out that this is done under the assumption that SQL DML-statements will be directly issued from the application front-end (APP-code software) and that the chosen implementation strategy for the BRC-code is, post-DML. The alternative strategy where all tables are shielded by a layer of stored procedures (implementing pre-DML BRC-code) and only calls to these procedures will be issued from the APP-code software, will not be discussed in this paper.
Oracle offers two ways to implement checks for data constraints in a post-DML way: declarative or procedural. In the declarative way we specify the constraints at create table time, using special SQL DDL language constructs available within the “create table” command. Considering system overhead (performance) and ease of use and maintenance, the declarative way to implement constraint checks is to be highly preferred to the procedural way. Note however that only a (very) limited set of constraint types can be handled declaratively (as we will see shortly). In the procedural way, we create database triggers and (packaged) stored procedures that are called by these triggers, using the PL/SQL language. Each database trigger is associated with a table and will automatically be executed by Oracle if a DML-statement changes rows inside that table. The trigger code can then check whether the new database state is admitted or not. Oracle offers up to twelve different types of triggers per table each having its own triggering time:
· Four Insert-triggers: one that fires before the Insert-statement starts executing (before statement), one that fires right before each row is inserted (before row), one that fires after each row is inserted (after row) and one that fires after completion of the Insert-statement (after statement).
· Four Update-triggers: same four types, before-statement, before-row, after-row, after-statement.
· Four Delete-triggers: same four types.
Only row level triggers (before and after) are able to access the column-values of the current row being inserted, updated, or deleted (consult the Oracle manuals for more information on this). Statement level triggers can only see the full snapshot of the table as it existed before the start of the DML-statement and the resulting snapshot of the table as it exists after completion of the DML-statement (they cannot detect which rows have been affected by the DML-statement).
Guidelines
for attribute constraints
Almost all attribute constraints can and therefore should be implemented as declarative check clauses using the column_constraint syntax of the create table command (also called in line check clause). The mandatory constraint should be separately implemented using the not null keywords. In our example the attribute constraints of the Employee table would be implemented as follows (see italic print):
Create Table
Emp(
empno number(4,0) not null
check(empno>0),
ename varchar2(20) not null,
job varchar2(10) not null check(job in
('MANAGER','CLERK','SALESMAN')),
vacation_days number(3,0) not null
check(vacation_days>10));
Note that,
·
All
check clauses can optionally be given a name (not shown here), which is stored
in the data dictionary and used in the error message in case the check fails.
·
If
check conditions evaluate to unknown due to NULL values, Oracle will
treat them as if they evaluated to TRUE. The opposite happens in PL/SQL where a
Boolean expression evaluating to unknown is handled as FALSE! The possible
presence of NULL values must therefore be taken into account when implementing
these constraints also at the client side (for instance Oracle*Forms) where
PL/SQL is used.
·
Check
clauses cannot contain references to the standard functions/pseudocolumns
sysdate, uid, user, userenv, currval, nextval, level or rownum. Neither can
they contain calls to stored Boolean functions created by us (even when these
functions do not reference any database objects and can be considered DETERMINISTIC[15]).
If these calls are needed, an insert and update row trigger should be
created to handle the constraint checking. Since it seems good practice trying
to trap constraint violations as early as possible, before-row triggers
would be preferred in these cases.
There should never be a need to
implement attribute constraints in a deferred way: i.e. we never allow
temporary violations for this class of constraints.
Guidelines
for tuple constraints
Almost all tuple constraints can and therefore
should be implemented as declarative check constraints using the
table_constraint syntax of the create table command (also called 'out of line'
check clause). In our example the tuple constraints of the Employee table would
be implemented as follows (see italic print):
Create Table
Emp(
empno number(4,0) not null <empno attribute constraints>,
ename varchar2(20) not null <ename attribute constraints>,
job varchar2(10) not null <job attribute constraints>,
vacation_days number(3,0) not null <vacation_days attribute constraints>,
check(job<>'MANAGER' or
vacation_days>15));
Here, we must transform (rewrite) the original tuple constraint specification because the implication-operator (Þ) cannot be directly implemented. The notes on check constraints mentioned in the previous section also apply here. Checks for tuple constraints should also never need a deferred implementation. If we had had more than one tuple constraint, we could have implemented all of them using only one check clause containing the conjunction (logical AND) of the check clauses for each separate tuple constraint. However, it seems better to allocate a separate check clause to each Boolean expression. By doing so (and if we had also supplied each check clause with a meaningful name), we enable Oracle to give a more detailed error message in case one of them fails validation[16].
Guidelines
for table constraints
Sofar everything has been straightforward
concerning the implementation of data constraints. It is here at the table
constraint level where things rapidly become more complex. This is because only
two types of table constraints can be implemented declaratively: uniqueness of
attributes and simple subset requirements (here we are referring to a foreign
key within one table, in which case the foreign key is a table constraint
and not a database constraint). Since sets of tuples (queries) cannot be used
as operands for check conditions, the declarative check clause mechanism cannot
be used for table constraints. Thus, all other table constraints must be dealt
with in a procedural way. In our example the table constraints of the Employee
table that can be done declaratively would be implemented as follows:
Create Table
Emp(
empno number(4,0) not null <empno
attribute constraints>,
ename varchar2(20) not null <ename attribute
constraints>,
job varchar2(10) not null <job attribute
constraints>,
vacation_days number(3,0) not null
<vacation_days attribute constraints>,
<Emp tuple constraints>,
primary key (empno));
We must implement procedural checks (using table triggers and PL/SQL) for all other types of table constraints. For instance for the table constraints in the VAC table: “no directly adjacent vacation records allowed of the same employee” and “never more than 10 employees on vacation at the same day”. Our aim is to find the minimal checks needed to verify whether a new database state caused by a transaction, is allowed or not. Finding and implementing this minimal check is a multiple step process. Let us investigate this in more detail for the “no directly adjacent vacation-records” constraint.
Designing BRC-code for table constraints: step 1, identifying violating statements
First, we should find out for which type of changes constraint-checks need to be implemented. For this we need to analyse every possible DML-statement that involves any (combination of) attribute(s) of the VAC table that play(s) a role in the table constraint. By scanning the constraint-text inside the table-universe definition w-VAC, we conclude that the EMPNO, FIRST_DAY and LAST_DAY attributes play a role. Given these three attributes, we can now generate the following list of all DML-statements that can possibly violate the table constraint and thus may require BRC-code to be executed[17].
· Insert of a new row (supplying values for all three mandatory attributes).
· Delete of an existing row.
· Several Updates of an existing row[18]:
1. Update that changes EMPNO attribute only
2. Update that changes FIRST_DAY attribute only
3. Update that changes LAST_DAY attribute only
4. Update that changes both EMPNO and FIRST_DAY attributes
5. Update that changes both EMPNO and LAST_DAY attributes
6. Update that changes both FIRST_DAY and LAST_DAY attributes
7. Update that changes all three attributes.
In order to find minimal checks, we will look at each different type of update statement generated in above list.
Designing BRC-code for table constraints: step 2, deriving the minimal check
We can now proceed and formally derive the minimal check for each case of DML-statement identified. For instance let’s take the case of the Insert-statement: we can derive the minimal needed check by substituting the table value T inside the specification of the table constraint, with the same table value T extended with the new tuple (say u). Next, we will rewrite the expression, according to the laws available to us within the Predicate Calculus and assuming the rule was upheld prior to the transaction, to deduce the minimal check.
"s,tÎTÈ{u}:
(s(EMPNO)=t(EMPNO)Ùs(FIRST_DAY)<t(FIRST_DAY))Þ½{dÎWORKDATE|
s(LAST_DAY)<d<t(FIRST_DAY)}½>0
º /* Splitting domains of the
universal quantification operator variables */
"s,tÎT:
(s(EMPNO)=t(EMPNO)Ùs(FIRST_DAY)<t(FIRST_DAY))Þ½{dÎWORKDATE|
s(LAST_DAY)<d<t(FIRST_DAY)}½>0
Ù
"s=u, tÎTÈ{u}:
(s(EMPNO)=t(EMPNO)Ùs(FIRST_DAY)<t(FIRST_DAY))Þ½{dÎWORKDATE|
s(LAST_DAY)<d<t(FIRST_DAY)}½>0
Ù
"sÎTÈ{u},
t=u: (s(EMPNO)=t(EMPNO)Ùs(FIRST_DAY)<t(FIRST_DAY))Þ½{dÎWORKDATE|
s(LAST_DAY)<d<t(FIRST_DAY)}½>0
Ù
"s=u, t=u: (s(EMPNO)=t(EMPNO)Ùs(FIRST_DAY)<t(FIRST_DAY))Þ½{dÎWORKDATE|
s(LAST_DAY)<d<t(FIRST_DAY)}½>0
º /* Substitution of free
variables by u (last 3 cases), table-constraint was valid prior to TX-start (1st
case) */
true
Ù
"tÎTÈ{u}:
(u(EMPNO)=t(EMPNO) Ù
u(FIRST_DAY)<t(FIRST_DAY)) Þ ½{dÎWORKDATE|
u(LAST_DAY) < d < t(FIRST_DAY)}½>0
Ù
"sÎTÈ{u}:
(s(EMPNO)=u(EMPNO) Ù
s(FIRST_DAY)<u(FIRST_DAY)) Þ ½{dÎWORKDATE|
s(LAST_DAY) < d < u(FIRST_DAY)}½>0
Ù
(u(EMPNO)=u(EMPNO) Ù
u(FIRST_DAY)<u(FIRST_DAY)) Þ ½{dÎWORKDATE|
u(LAST_DAY) < d < u(FIRST_DAY)}½>0
º /*
(u(FIRST_DAY)<u(FIRST_DAY))ºfalse, (falseÞX)ºtrue, (falseÙX)ºfalse, (trueÙX)ºX (twice) */
"tÎTÈ{u}:
(u(EMPNO)=t(EMPNO) Ù
u(FIRST_DAY)<t(FIRST_DAY)) Þ ½{dÎWORKDATE|
u(LAST_DAY) < d < t(FIRST_DAY)}½>0
Ù
"sÎTÈ{u}:
(s(EMPNO)=u(EMPNO) Ù
s(FIRST_DAY)<u(FIRST_DAY)) Þ ½{dÎWORKDATE|
s(LAST_DAY) < d < u(FIRST_DAY)}½>0
Let us examine the first part of this expression further:
"tÎTÈ{u}:
(u(EMPNO)=t(EMPNO) Ù
u(FIRST_DAY)<t(FIRST_DAY)) Þ ½{dÎWORKDATE|
u(LAST_DAY) < d < t(FIRST_DAY)}½>0
º /* Splitting domains of the
universal quantification operator variable */
"tÎT:
(u(EMPNO)=t(EMPNO) Ù
u(FIRST_DAY)<t(FIRST_DAY)) Þ ½{dÎWORKDATE|
u(LAST_DAY) < d < t(FIRST_DAY)}½>0
Ù
"t=u: (u(EMPNO)=t(EMPNO) Ù
u(FIRST_DAY)<t(FIRST_DAY)) Þ ½{dÎWORKDATE|
u(LAST_DAY) < d < t(FIRST_DAY)}½>0
º /* Substitution of free variable
by u (2nd case), will again evaluate this expression to true */
"tÎT:
(u(EMPNO)=t(EMPNO) Ù
u(FIRST_DAY)<t(FIRST_DAY)) Þ ½{dÎWORKDATE|
u(LAST_DAY) < d < t(FIRST_DAY)}½>0
º /* Rewrite of universal to
existential quantification, Ø(AÞB)º(AÙØB) */
Ø$tÎT: u(EMPNO)=t(EMPNO) Ù u(FIRST_DAY)<t(FIRST_DAY) Ù ½{dÎWORKDATE|
u(LAST_DAY) < d < t(FIRST_DAY)}½=0
º /* Elementary set theory,
special Add_WD-operator for the WORKDAY domain */
Ø$tÎT: u(EMPNO)=t(EMPNO) Ù u(FIRST_DAY)<t(FIRST_DAY) Ù Add_WD(u(LAST_DAY),2)>t(FIRST_DAY)
In the expression above the add_WD operator is the special case addition-operator for the WORKDAY domain, which can be defined as follows:
For
d a WORKDATE and x a positive integer (depicting number of
days):
If 5£x, Add_WD(d,x) := Add_WD(d+7,x-5)
If 1£x£4 Add_WD(d,x) :=
Add_WD(d+1,x-1), if d is not a FRIDAY
:=
Add_WD(d+3,x-1), if d is a FRIDAY
If x=0 Add_WD(d,x) := d
In a similar way, we can rewrite the second part of the expression to:
Ø$sÎT: s(EMPNO)=u(EMPNO) Ù s(FIRST_DAY)<u(FIRST_DAY) Ù Add_WD(s(LAST_DAY),2)>u(FIRST_DAY)
The conjunction of these
two expressions can easily be translated to SQL, given the new tuple u,
as follows:
not exists(select ‘violation at lastday-end of u’
from vac t
where t.empno = <empno of u>
and t.first_day > <first_day of u>
and Add_WD(<last_day of u>,2) > t.first_day
and
not exists(select ‘violation at firstday-end of u’
from vac s
where s.empno = <empno of u>
and s.first_day < <first_day of u>
and Add_WD(s.last_day,2) > <first_day
of u>
This minimal check can then be implemented in a before row-insert trigger on the VAC table. The trigger-body can either do a ‘select … from dual where <above SQL-expression>’ and raise an error in case no row is fetched, or declare two cursors (one for each exists-subquery), sequentially fetch them and if one of them actually returns a row, then raise an error. Before discussing this further in terms of actual implementation guidelines, let’s first examine the other cases (delete and 7 updates) that might violate our VAC table constraint.
Delete.
In a similar way as with the Insert-statement, we can formally derive that
nothing needs to be checked for this table-constraint whenever a row is
deleted.
Once the checks for the Insert-statement and the Delete-statement have been derived, the checks for all update statements could be deduced by using the following technique: think about every Update-statement as a Delete-statement of the involved row, directly followed by an Insert-statement of the row as it would have existed after the Update-statement. The check necessary for the Update-statement is then the same as the check for the Delete-statement, followed by the check for the Insert-statement. However, in this case the constraint involved must be allowed to be temporarily violated, since what the original Update-statement might have (validly) effected using only one DML-statement, is now simulated using two DML-statements. In between these two statements, the constraint might very well be temporarily violated[19], so long as it is valid again after the Insert-statement. If constraints are such that nothing needs to be checked in the case of a Delete-statement then temporary violations never occur using this technique.
Update that changes EMPNO attribute.
This update changes the empno column only. It can be viewed as a delete of a
VAC-record for one employee, followed by an insert of the same vacation period
for another employee. Since the delete requires no check, this update needs the
same check as derived for the Insert-statement.
Update that changes FIRST_DAY attribute.
Again, here the same check might apply as for the Insert-statement.
Update that changes LAST_DAY attribute.
Again, the same check applies as for the Insert-statement.
Intermezzo 3
The technique of regarding an Update as a Delete
followed by an Insert, introduces a flaw in designing the minimal BRC-code.
Further optimilisation of BRC-code may sometimes be achieved, in the case of
Update-statements, by taking into account not only which columns have changed,
but also how they have changed (i.e. by also comparing the database
state as it existed prior to the transaction, with the state in which the
transaction has resulted[20]).
Given the constraint at hand, we can deduce that if either first_day has been
incremented, or last_day has been decremented (and empno has not changed), no
check is necessary at all.
End-Intermezzo 3
Update that changes EMPNO and FIRST_DAY
attributes.
Can be viewed as a delete followed by an insert, thus needing the same check as
the Insert case.
Update that changes EMPNO and LAST_DAY
attributes.
Can be viewed as a delete followed by an insert, thus needing the same check as
the Insert case.
Update that changes FIRST_DAY and LAST_DAY
attributes.
Applying the optimilisation introduced in above Intermezzo 3, we only need to
check if either FIRST_DAY has been decreased (check only for violation at
firstday-end), or LAST_DAY has been increased (check only for violation at
lastday-end).
Update that changes all three attributes.
Can be viewed as a delete followed by an insert, thus needing the same check as
the Insert case.
Designing BRC-code for table constraints: step 3, building the triggers
In summary, all cases need similar BRC-code. We will implement the execution of this code via the table-triggers available in Oracle. These triggers will call a boolean stored function; say not_directly_adjacent, which we need to build. This function accepts as input parameters, the new column values, as well as an indication whether expansion to Left (decrement of first_day), Right (increment of last_day) or Both sides has occurred. The function returns true if no directly adjacent records are found, and false otherwise. It computes this return value by executing the above derived minimal check. The source of this function could be as follows (in some pseudo-code):
Function not_directly_adjacent(in empno, in first_day, in last_day, in indicator) out boolean
Begin if indicator=left
then if not-exists(violation at firstday-end);
then return true;
else return false;
end if;
elsif indicator=right
then if not-exists(violation at lastday-end);
then return true;
else return false;
end if;
elsif indicator=both
then if not-exists(violation at lastday-end)
and not-exists(violation at firstday-end)
then return true;
else return false;
end if;
end if;
End;
The trigger ensuring execution of this check (in a post-DML way) in case of Insert- and Update-statements could be coded as follows (again in pseudo-code):
Trigger vac1 fires before insert or update on vac for each row
Begin if inserting or updating empno or (decrementing firstday and incrementing lastday)
then if not not_directly_adjacent(empno,first_day,last_day,both)
then raise error;
end if;
elsif decrementing firstday
then if not not_directly_adjacent(empno,first_day,last_day,left)
then raise error;
end if;
elsif incrementing lastday
then if not not_directly_adjacent(empno,first_day,last_day,right)
then raise error;
end if;
end if;
End;
Now we will hit upon a big problem. Above implementation calls the not_directly_adjacent function in the row insert or update trigger of the VAC table, supplying it with new values for EMPNO, FIRST_DAY, LAST_DAY and the expansion indicator as the input parameter values. Since this function will have to do a select from the Vacation table[21] in order to determine its return value and since the (contents of the) VAC table is mutating at that point in time, the following error will be issued by Oracle:
ORA-04091: table VAC is mutating, trigger may not read or modify it.
This is a fundamental problem when implementing checks for table constraints, since these constraints refer to values of other tuples in the same table. The Boolean check simply cannot be done using for each row triggers because of the above-mentioned error (by the way, the error is very valid: reading a table while it is experiencing DML-statement execution gives rise to non-deterministic behaviour). We can neither use the after statement trigger, which doesn't have the mutating table problem, since that trigger doesn't have access anymore to the column values of the specific row(s) that have been affected by the DML-statement (using :new.<column> and :old.<column> expressions). Using an after-statement trigger, we would have to resort to always running a much more elaborate check (in fact, the most maximal check) for all records in the table, which is, of course, unacceptable.
The solution to this problem is to transfer the knowledge of which rows have been affected from the before (or after) row trigger execution time to the after statement trigger execution time. This can be achieved through either using a persistent packaged table variable, or using a session temporary global table. Either one of these can be used to temporarily store the input-parameter values to be used for the Boolean function calls[22]. The constraint will then be checked by a combination of a row level trigger and a statement level trigger. The row level trigger fills the persistent table variable, or the session temporary table, with the parameters required by the Boolean check function. The statement level trigger reads the persistent table variable, or session global table and executes the minimal check for the affected rows by calling the Boolean function.
Designing BRC-code for table constraints: step 4, catering for concurrency
When we procedurally check our constraints, we must not forget to also investigate what rows need to be locked once we have successfully checked a constraint, but did not commit yet. We must do this rather complex investigation because multiple users can be accessing our table at the same time. The general rule here is that we should lock all data that we used inside our Boolean check function in order to compute the Boolean return value true or false. Nobody else should be allowed to change the data that has been used to check the constraint, until we have committed our current transaction. Note that in cases where the Boolean return value depends on the non-existence of certain rows, this general rule does not help us: since how can we lock data that does not exist? The not-directly-adjacent constraint is such a case: our BRC-code checks for rows to not exist; yet, other transactions may be inserting new rows (not yet committed and therefore invisible to us) at the same time. For instance: assume we only have one EMP record (EMPNO=1) and no VAC records yet.
1. User A issues 'insert into vac values(1,sysdate,sysdate+5)', which succeeds: it is the first record in VAC.
2. User B issues 'insert into vacation values(1,sysdate+6,sysdate+10)', which succeeds: it too is the first record in VAC[23].
3. User A commits (always succeeds).
4. User B commits (always succeeds).
We now have directly adjacent vacation records for employee 1. User B should not have been allowed to maintain vacation records related to employee 1, until user A had committed his transaction. This case can be resolved by always locking the parent employee record when vacation records are maintained. Acquiring this (exclusive) lock should be the first step of the BRC-code. If such a lock cannot be acquired, we can choose to either allow a certain wait time (using the select…for update wait… construct of SQL), or directly generate an error (using for update nowait) indicating that another transaction is currently maintaining VAC-records for this employee.
One closing remark on acquiring locks to fix the concurrency issue. Oracle provides us with only one type of rowlock: the exclusive rowlock. However, rowlocks to prevent others from changing data that has been used (the general rule above) for a constraint-check need not be exclusive (write) rowlocks. Read rowlocks will suffice in this case and offer a much better concurrency[24]: they allow multiple transactions to read this data in order to check a constraint (all acquiring read rowlocks on the same records), yet disallow any transaction to change this data (acquire an exclusive rowlock). Since read rowlocks are not available to us in Oracle, we must choose between obtaining write rowlocks and obtaining a table level read lock to get the job done (the first option is clearly less worse). The dbms_lock-package, which enables us to create and use own application locks (in various modes, amongst which also the row share mode), can solve this problem. We would like to call the dbms_lock.allocate_unique procedure from the table trigger to allocate a named lock[25]. The dbms_lock-allocate_unique procedure issues an implicit commit, which is not allowed within the context of trigger code. A way around this, of course, is to create a wrapper procedure around the allocate_unique call and define this procedure to be AUTONOMOUS (using the pragma autonomous_transaction). Note that using dbms_lock to address the issue of Oracle not supporting a read rowlock, forces us to allocate a lock for each row ever involved in BRC-code queries. The author is still investigating whether dbms_lock will scale at all, since potentially very many application locks will be created during the course of time as transactions continue to change (different) rows. It would be much nicer if Oracle supports acquiring read rowlocks, through some new language construct, say consistent read, in the Select-statement as follows:
Select empno
From EMP
Where empno = <bind variable>
For consistent read -- acquires read rowlock, as opposed to for update which acquires
-- exclusive rowlock
Once this is available, then the use of dbms_lock can be restricted to those transactions firing BRC-code that depends on the non-existence of data: by the use of it, we can serialize these transactions.
Having extensively investigated the issues around implementing BRC-code for table constraints, we will now pick up some speed and continue with the database and dynamic constraints in a less elaborate mode.
Guidelines for database constraints
Of all possible database (multi table)
constraints, only simple subset requirements (foreign keys) between different
tables can be handled declaratively. All other database constraints must be
dealt with in a procedural way. In our example the database constraints that
can be done declaratively would be implemented as follows:
Create Table Vac(
empno number(4,0) not null <empno attribute constraints>,
first_day date not null <first_day attribute constraints>,
last_day date not null <last_days attribute constraints>,
<vac tuple
constraints>,
<vac declarative
table constraints>,
foreign key (empno) references emp(empno));
The other database constraint (vacation taken may not exceed the maximum allowed) will need a similar treatment as the table constraints. Here we will already have 14 cases to investigate (2 columns of Emp and all columns of Vac play a role):
1. Insert of Vacation record.
2. Delete of Vacation record.
3. Update of Empno column in Vac.
4. Update of First_day column in Vac.
5. Update of Last_day column in Vac.
6. Update of Empno and First_day columns in Vac.
7. Update of Empno and Last_day columns in Vac.
8. Update of First_day and Last_day columns in Vac.
9. Update of all columns in Vac.
10. Insert of Employee record.
11. Deletion of Employee record.
12. Update of Empno column in Emp.
13. Update of Vacation_days column in Emp.
14. Update of Empno and Vacation_days columns in Emp.
Short descriptions of each case:
1. This increases the number of vacation days. Check the quota for this employee.
2. No check needed.
3. Consider this as a delete of vacation for :old.empno, followed by an insert of vacation for :new.empno. Check the quota for :new.empno.
Similar checking as for case 3 should be done for cases 6, 7 and 9 (they all update empno).
4. Only if first_day is decreased need we check.
5. Similar to 4, only if last_day is increased need we check.
8. Similar to cases 4 and 5, only if the number of days between first_day and last_day increases, need we check.
10. A newly inserted employee has no vacation records yet. Need not check.
11. We allow deletion of employee records only if no vacation records exist. In that case we need not check.
12. This one is tricky. Based on cases 10 and 11, plus viewing this case as a delete followed by an insert, we might at first deduce that no check is neccessary. However certain multi-row updates do cause situations where checks should be done. For instance: Emp-table holds 3 employees (say empno 10, 20 and 30) and vacation information is available only for employees 20 and 30. Issuing 'update emp set empno=empno+10' will not cause a foreign key violation (the first database constraint), but does create a situation where checks for the 'vacation quota' database constraint should be run.
13. Only if the quota is decreased, need we check.
14. Similar to case 12.
In a very similar way as
was the case for table constraints, table triggers can be built, and
concurrency needs to be investigated.
Guidelines for dynamic constraints
There are no declarative constructs for handling
dynamic constraints. They must always be handled procedurally. The
specification of dynamic constraints tends to be an area on which we spend too
little time. If application designers specify them at all, they are rather
simple, both in nature and in implementation. As mentioned earlier, row
triggers can reference old and new state of rows through the :old and :new
keywords. Looking at one of the example dynamic constraints given in our
database, we can implement it by creating one pre-update row trigger:
create trigger dyn1 before update on emp for each row
when (new.empno=old.empno)
begin if :old.vacation_days>:new.vacation_days
then raise_application_error(-20000,'Cannot decrease vacation quota.');
end if;
end;
In a very similar way, the update of unique columns can be prevented too. If due to the mutating table issue the check is to be delayed to the statement trigger, then information about the affected rows must again be transferred to that statement trigger (using package variables, or session tables). This is the case for the latter three transition constraints inside the definition of VEX.
The author believes that in the area of dynamic constraints, a further sub classification is highly desirable to enable a more detailed (i.e. per subclass) set up for the implementation guidelines. Such useful sub classification remains to be investigated.
Guidelines for deferred checking
Earlier in this paper, we have concluded that since transactions (in our SQL world) will consist of multiple DML-statements, serially executed after each other, we may need to temporarily allow violations of constraints. We also concluded that if a DML-statement, say DML1, introduces a violation within a transaction, there must follow another DML-statement, say DML2, which corrects the violation (introduced by DML1) prior to the end of the transaction. Ideally, we would have liked to queue the execution of the BRC-code that fires for DML1, such that it will be (re) executed when the APP-code submits the commit. If the violation is detected for a second time, commit processing should be halted. This might have been possible with deferrable triggers. Unfortunately, Oracle does not offer us triggers that can be fired in a deferred way (right before commit time). Nevertheless, there is another magic bullet…
With the advent of Oracle8i, it has become possible to implement every constraint that needs deferred checking in a post-DML way. Let us elaborate on how this can be accomplished and reveal the magic bullet. Suppose we find that a certain (table or database) constraint, say C1, needs to be implemented in a deferred way. We will design BRC-code for C1 in the usual way following the process described in the “Guidelines for table constraints” section. However, the Boolean functions (in combination with the triggers) will now register the fact that a violation has occurred in a central table, say BR_violations, as opposed to raising an error when they detect such violation.
A generic constraint violation table can be set
up as follows (revealing the magic bullet in italics).
create global
temporary table BR_violations
constraint varchar2(10) not null
check(upper(constraint)=constraint),
data_key varchar2(40) not null,
constraint
no_violations_at_commit check(0=1) initially deferred,
primary key
(constraint,data_key))
on commit
preserve rows;
This table offers the ability to maintain multiple violations per constraint[26]. Since, throughout this paper, we have been seeking to implement minimal checks, we also want to register violations at the lowest possible level. With this, we mean: instead of registering “Constraint so and so has been violated”, we will want to register “Constraint so and so has been violated for case so and so”. By doing so, we will be able to (after a recheck) remove the registered violations on a case-by-case basis. Registering this lower level is possible for every table and database constraint that can be specified by means of a universal quantification ("-operator). Typically, the major part of all table and database constraints is formulated this way. If a constraint is not formulated that way, then we can only register the high level (“Constraint so and so has been violated”): a case does not apply then. The data_key column in the BR_violations table is meant to identify the case (or hold a default value when this does not apply). Note that the table is a global temporary table (every session has its own copy of such a table). Also, note the check clause. This check clause always evaluates to false (0=1). In addition, it is implemented as initially deferred. What this does, is allow records to be inserted during a transaction, yet disallow any of those to remain present in the BR_violations table at commit-time! Let’s go back to the BRC-code of our C1 constraint (which was to be deferred). We need to also design this BRC-code such that it will (lookup) delete entries in the BR_violations table if it computes that the current DML-statement is valid. This ensures that successive DML-statements will remove a violation entry that was introduced by an earlier DML-statement, but is now fixed. Then, only if the transaction is such that the consecutive execution of its DML-statements renders the BR_violations table to be empty again, will a commit of the transaction succeed[27].
A final remark on deferred checking
When BRC-code detects a constraint violation (for a non-deferred constraint) and generates a specific error (through a raise_application_error call) then this error will cause the triggering DML-statement that was executing to be rolled back. Previously executed DML-statements within the same transaction will not be rolled back: this is called statement-level rollback. However, when a commit detects that a deferred check is still violated then Oracle will roll back the entire transaction. This is a bit unpleasant, because we will also loose the entry in the BR_violations table that would have enabled us to report the specific error back to the APP-code. It would have been nicer if Oracle would just have the commit fail and not rollback anything.
Bringing it all together: an implementation framework
Having addressed various
issues involved in designing and coding BRC-code for table, database and
potentially dynamic, constraints, we now gear our thoughts towards an
implementation framework. With the lessons learned in the past decade regarding
implementing Business Rules, this framework must be such that: a) the set of
all rules must be manageable (easy to add, delete or even temporary
disable Rules) and b) BRC-code for a given Rule must be maintainable
(easy to locate and all in one place, organised the same way). Note that the
above two requirements are easily met for constraints that have a declarative
implementation: these constraints are all documented inside the Data
Dictionary, and Oracle provides us with various DDL-statements to maintain
them. Our framework must specifically address these requirements for
constraints whose BRC-code is procedural. A framework is currently under
development. Its (high-level) architecture can be described as follows:
· For every table, database or dynamic constraint that requires a procedural implementation, all BRC-code is consolidated into one PL/SQL package[28] (per constraint).
· Such (BRC) package has one externalised procedure, say check_rule, whose task it is to perform the minimal check necessary, and depending on the nature of the constraint, raise an error or register a (temporary) violation when the resulting database state is not admitted.
· The check_rule procedure has no parameters: it must be able to determine the triggering DML-statement including the affected rows (and how they changed in case of updates) using some context that is available.
· Row triggers are responsible for creating the context that enables the check_rule procedure to determine the triggering DML-statement and affected rows. Such context is often called the transition effect of a triggering DML-statement. [WF90] provides us with a formal definition of the transition effect. In short this context holds the tuples that have been inserted, deleted, or updated in the database. In a straightforward way we can develop row triggers to maintain the transition effect inside global temporary tables.
· Statement triggers are responsible for firing the check_rule procedure of all potentially affected Rules (in some intelligent order). By always performing the checks at statement-trigger-time, we get around the infamous mutating table error.
· Preferably row and statement triggers are designed such, that Rules can be added, deleted, changed or disabled without requiring changes to the trigger code. In other words, regardless of the set of rules that have currently been specified and implemented, the triggers bodies should always contain the same (generic) code.
We can use Oracle’s Rule
Engine (supplied package DBMS_RULE and related views), to act as both a dictionary
and an execution environment for all of our procedurally implemented
Business Rules. Using packaged procedure dbms_rule.create_rule we can
register the check_rule procedure of each rule. All registered rules can
be viewed in the USER_RULES data dictionary view. To force a certain
order of rule processing (execution of check_rule procedures) we create several
rule sets[29]
(via dbms_rule.create_rule_set) per table and partition the involved
rules of a table over these rule sets using dbms_rule.add_rule. The
(generic) body of the Insert, Update, and Delete statement triggers for a given
table would then consist of a predetermined order of dbms_rule.evaluate
calls for these rule sets (such an evaluation call initiates the execution of
each rule in the rule set). The actual order of execution of the rules within a
rule set, however, remains undetermined.
Using Oracle’s Rule Engine
we can so produce an architecture where:
· All rules are centrally stored in the Rule Repository (DD-views)
· All rules are managed by the Rule Engine (dbms_rule is responsible for execution)
· All triggers are generic (need to be developed only once)
· All BRC-code is seperately implemented in packages (one per rule)
· BRC-coders need only worry about the essence of checking the rule
As mentioned, this framework is currently under development. By the time this paper will be presented the author expects to have finalised a first version of it. More information on the framework and an implementation (BRC-code) of all constraints in our example database universe using this framework will then be available at http://web.inter.nl.net/users/T.Koppelaars.
Conclusion
We need information systems with reliable data.
Therefore, we must have solid data model
specifications that give us a good and clear insight into the relevant
constraints (Business Rules). Using a formal (non ambiguous) specification of the Business Rules, we
can come up with a meaningful classification
of them and from that, perform a reliable and robust implementation. This
paper has clearly shown the distinct difference between the specification of a data constraint (a
predicate), and the implementation of
a data constraint (one or more declarative or procedural checks,
coded in the server).
Implementing data constraints declaratively is
very easy. Implementing data constraints procedurally is by far not a trivial
task (case analysis, locking issues, mutating table, deferring). It's time
consuming and will produce more lines of complex PL/SQL code than you might
have expected (part of which can be generated though). Current technology
status of Database Systems like Oracle, enable us to build robust databases.
Still few people actually do this: why is this? Probably because designing
BRC-code (that is fully seperated from APP-code) is indeed truly
complex. Catering for every possible transaction makes this a much more
difficult job than it would be, to simply embed within the APP-code the
(evident) rule check, given the context of the transaction that the APP-code is
performing. The big picture, however, is neglected. Failing to fully detach
BRC-code from APP-code implies not being able to maintain and manage the
BRC-code. We should not underestimate the gains we receive once all
BRC-code is independently implemented (preferably post-DML). Business Rules
have then finally become manageable. Furthermore, strict separation of
BRC-code from APP-code, has the potential of much more rapid
construction of applications. Maybe a practical way to start organising this is
to split up development teams into BRC-coders and APP-coders[30].
Both teams could work independently from each other. It is probably advisable
to put into place one (shared) BRC-coders team, from which the application
construction teams can draw resources. The BRC-coders will have to be empowered
with tools that assist them in
doing their job efficiently. This is where Rule Engines enter the stage.
Hopefully this paper has shown the reader what to look for in these products.
· How complete is the Rule syntax (compared to Predicate Calculus)?
· How are Rules classified (if at all)?
· How do they deal with deriving minimal checks?
· Do they exploit declarative features of the underlying database system?
· How do they solve the concurrency issue (the locking of data)?
· How do they deal with temporary violations / deferring checks?
· Do they apply some intelligent execution order of BRC-code?
· Do they really offer a robust implementation?
· And finally, do they enable us to both manage the rules (specification) and manage the code (implementation)?
There is a huge potential for Database Systems themselves to evolve into Rule Engines (rendering the third party products superfluous). It is up to the scientific world to first come up with more meaningful subclasses of constraints. The Database System implementers, in their turn, should provide us then with new declarative constructs at the server level to easily implement these. Right now, a few enhancement requests for Oracle remain on the short term:
·
The
availability of programmable read row-level locks.
·
A
more elegant solution for deferred checking.
·
The
ability to call deterministic stored functions inside check-clauses.
And on the long term,
·
Evolve
into a true Rule Engine that is able to: accept rule predicates, derive minimal
checks, ensure rule robustness with concurrent transactions, transparently
defer checks, perform semantic optimization, etc.
Hey, dreaming is allowed, right?
References
[BRO89] De Grondslagen
van Semantische Databases / E.O. De Brock; Academic Service, Schoonhoven (in
Dutch).
[BRO94] The Foundations of Semantic Databases / E.O. De Brock; Prentice Hall, International Series in Computing Science, ISBN 013.327.099.8.
[CW90] Deriving Production Rules for Constraint Maintenance / S. Ceri & J. Widom, VLDB 1990 Conference Proceedings.
[DAT00] What Not How: The Business Rule Approach to Application Development / C.J. Date; Addison-Wesley 2000, ISBN 0201708507
[KOP94-1] Business Rules, Classification and Implementation / A.V. Koppelaars; EOUG 1994 Conference Proceedings, Volume 3 paper 169q.
[KOP94-2] Business Rules, Classification and Implementation / A.V. Koppelaars; IOUW 1994 Conference Proceedings, Volume 1 paper 10q.
[KOP95-1] Business Rules, Guidelines for Deferred Checking / A.V. Koppelaars; EOUG 1995 Conference Proceedings, Volume 4 paper 323q.
[KOP95-2] Business Rules, Specification and Classification / A.V. Koppelaars; IOUG SELECT Newsmagazine, Vol.2 No.2 january 1995, pages 35-38.
[KOP95-3] The Specification and Implementation of Business Rules / A.V. Koppelaars; ASPAC IOUW 1995 Conference Proceedingsq.
[KOP96] Business Rules: All You Need to Know When Implementing Them / A.V. Koppelaars; EOUG 1996 Conference Proceedingsq.
[ROS94] The Business Rule Book / R.G. Ross; Database Research Group ISBN 0941049027
[SKD95] Promises and Realities of Active Database Systems / Simon & Kotz-Dittrich, VLDB Conference proceedings 1995.
[SM77] Discrete Mathematics in Computer Science / Stanat & McAllister, ISBN 0132161508
[WF90] Set-Oriented Production Rules in Relational Database Systems / J. Widom & S. Finkelstein, Proceedings of 1990 ACM-SIGMOD Conference, pages 259-270.
Author can be contacted via
t.koppelaars@inter.nl.net, or t.koppelaars@centraal.boekhuis.nl.
Future updates of this
paper and sources of BRC-code to implement constraints mentioned in this paper,
will be posted at http://web.inter.nl.net/users/T.Koppelaars.
[1] Indicating that Business Rule checks are performed
automatically by the database system, directly after (post) the
application has submitted a DML-statement.
[2] In the course of this paper an accurate definition
will be given.
[3] Such given set is unfortunately often called the domain
for that attribute. In mathematics, domain has a specific meaning with
regards to functions. A function is a mapping from elements of one set
(called the domain of the function), to elements of another set (called
the range of the function). Throughout this paper if we use the word domain,
we will use it in this mathematical, set theoretic, meaning.
[4] Note that this classification implies that we cannot
classify a Business Rule (rambling) unless we have a database schema (skeleton)
at hand. Moreover, with a different database schema, the Rule (if still a
Rule…) could be classified differently.
[5] Any function can be represented in set theory as a
set of ordered pairs. Consider the Square function: it can be
represented as the following set of ordered pairs: { (0;0), (1;1), (2;4),
(3;9), (4;16), (5;25), …}. Given such representation for the Square function,
then expressions such as Square(3) are allowed (yielding 9 in this case).
[6] Domain used here in the set theoretic sense.
[7] The reader is again encouraged to try to produce this
proof.
[8] We might allow one exception to this rule. As we will
see shortly, rules can either be implemented in a declarative way using
some language construct available to us in the database system, or in a procedural
way using an imperative language available to us in the database system. The
exception to the rule would be that we might assume all declaratively
implemented rules to be checked prior to rules that have a procedural
implementation.
[9] There is actually a third reason (not fully caused by
SQL) for temporarily allowing rules to be violated inside transactions: see
[KOP95-1] for this.
[10] This encompasses both procedural code, as well as
declarative code.
[11] For instance, code that creates user interface, or
code that deals with user interface events and composes transactions.
[12] Implementing all BRC-code in the post-DML way is
considered robust. If all BRC-code is implemented the pre-DML way (in the
server tier), then direct DML from APP-code should be prohibited: only when
APP-code always executes RPC-calls that implement the transactions (DML) and
call the BRC-code layer, is the database considered robust.
[13] Actually, the author believes DML-statements
should be the standard interface for APP-code to manipulate data (and not
procedure-calls).
[14] We will use the Oracle sql-DBMS to demonstrate the
guidelines.
[15] A function is considered DETERMINISTIC if given the
same input values it will always return the same (output) value.
[16] A similar remark holds for all other constraint types
that follow. If the predicate of a Business Rule, say Pred-R, can be
rewritten (using the rewrite rules available in the Calculus) such that it
takes the form of “Pred-Rx Ù Pred-Ry”, then in
fact we have two Business Rules, and each one of them is to be implemented
independently.
[17] A way to automatically derive this list, including
some optimizations to miminise the list, is available in [CW90].
[18] Suppose the table constraint involves n
(n>0) attributes then we can generate 2n-1 number of different
Update-statements.
[19] This implies that the BRC-code for the
Delete-statement should be such that it allows a violation. The case being
violated will have to be re-checked in a deferred way (see later section on
implementing deferred checking).
[20] Note: such comparison is always necessary for
BRC-code regarding transition-constraints.
[21] Every such designed function will have to do this,
since they involve checking table constraints: by their nature, these
can only be checked while referring to other tuples in the same table.
[22] The author strongly prefers the session temporary
table to the packaged variables. Since clearing the contents of
session temporary tables is managed by Oracle (at commit the table will be
cleared, at rollback it will be cleared). In contrast to the contents of
packaged variables: they need to be cleared at specific times by the BRC-code
programmer.
[23] The BRC-code does not see the record just previously
inserted by user A since it has not yet been committed.
[24] If read locks could be acquired at the column level,
instead of row level, then concurrency could be even further increased.
[25] Part of the lockname would have to contain some
encoded string that identifies the primary key of the row involved, since we
need to be able to create a lock per row.
[26] We assume each constraint is given a name no longer
than 10 positions.
[27] It’s funny that Oracle supplies us with deferrable
check clauses (used to implement attribute and tuple constraints)
that enable us to implement deferrable table and database
constraints. Theoretically, there was never a need for deferrable check clauses
(in our SQL world).
[28] In earlier versions of a Framework (papers of mid
nineties), BRC-code was distributed across trigger-body and pl/sql
package-body, which created a situation that was not very manageable.
[29] For instance we could create three rule sets per
table: one for holding all table constraints of that table, one for holding all
database constraints in which the table is involved, and one for holding all
dynamic constraints that involve that table.
[30] However, beware that fully separating APP-coders from
BRC-coders may lead to The Iceberg Problem described in [SKD95].
q Also available at http://web.inter.nl.net/users/T.Koppelaars