Representing Structural
Business Rules Using UML Class Diagrams
Introduction
For the last several years, I have been investigating the use of UML class diagrams to design databases. In the last year, I have been using this mechanism almost exclusively to design data models. Much to my surprise, UML has proven itself to be a superior tool to ERDs. UML is a standard environment for object-oriented design and development. The most commonly recognized parts of the UML modeling environment are class diagrams, which somewhat resemble ERDs. There is some debate within both the object-oriented and relational communities concerning the applicability of UML class diagrams for representing structural data business rules such as those traditionally articulated in ERDs.
Three years ago, I attempted to resolve this issue in conjunction with the writing of Oracle8 Design Using UML Object Modeling (Dorsey& Hudicka, Oracle Press, 1999). In the course of writing that book, I concluded that UML class diagrams had not been originally intended for designing data models but were suited to the task and, in some cases, superior to ERDs. Over the last year, we have completely abstracted our data models to be fully object-oriented generic data structures, very similar in style to the so-called “universal data model” described by proponents such as Ulka Rodgers and David Hay. At the same time, by embracing object-oriented thinking, we were trying to describe more and more complex business rules that could simply not be represented in a traditional relational database.
UML Class diagrams with the appropriate extensions represent a significant step forward in data modeling. Structural business rules can be represented more easily and completely using an extended UML syntax than was ever possible with ERDs. This paper will discuss some basic UML concepts, show the important extensions to UML and demonstrate the advantages of using UML for creating data models that represent the structural business rules of any system.
UML to the Rescue
A UML class diagram, although syntactically different from an ERD can (with some exceptions) be read similarly. Boxes on UML diagrams (classes) correspond almost exactly to what relational modelers call entities.
The lines connecting classes are of several types. The most basic of which, the association, also corresponds to the ERD concept of association. The only difference is that, rather than the limited cardinalities available to most relational modelers, the cardinality of objects can be richer. For a full discussion of this topic, see Oracle8 Design Using UML Object Modeling (Dorsey& Hudicka, Oracle Press, 1999).
Currently, we use UML to create our data models and use a UML repository to drive a DML engine that enforces all of the rules in the repository and performs the insertion to the appropriate underlying tables. A full discussion of this architecture is beyond the scope of this paper. However, it was this architecture that led us to use and extend UML to represent data-related structural business rules.
The result of these extensions is quite remarkable. We can now say that this is unequivocally a better way of designing data models. UML and the appropriate extensions provide a richer vocabulary where more data rules can be expressed more easily. The business rules can be represented more accurately and in a way that better describes the relationships among the data.
Because of my focus on this business rules project, I have not used ERDs for any data models in the last year. So, I had not had any opportunity to directly compare the two modeling methods. Then, I was called in to audit a medium-sized flexible product data model to support a new medical software.com system. This data model was created using Oracle Designer with traditional ERDs. In trying to audit and make suggestions for improvements, I found myself wishing that I had access to the extended UML class diagram vocabulary in order to redraw the data model. Modeling using ERDs after I has become accustomed to our extended UML felt like I was trying to swim wearing an overcoat. It is possible if you are very good, but it is not the easiest way to do it.
Creating Data Models
A data model is more than just a representation of the data related business rules; it can also be (and is in our architecture) a representation of the physical database tables. Many structural rules in a relational database are not or cannot be easily implemented using traditional ERDs. Using our own internally developed UML engine, we are able to handle almost any business rule that can be articulated. These rules can be enforced by the engine, independent of what can be easily enforced in a traditional database.
The data models described in this paper represent not only captured business rules but also physical database designs. The data model drives the physical database. Data is stored twice, once in the generic repository and a second time in the physicalized tables as shown in Figure 1.
Figure 1: System Architecture
In the actual physical implementation, the tables are not directly accessed by the developers. Single table views with INSTEAD OF triggers sit on top of tables. Applications make DML calls to the views, which intercept the calls and send data manipulation scripts to the UML engine for processing.
The UML engine validates whether or not the DML request violates any business rules and updates the generic and physical class tables. If rules are violated, the engine delivers the appropriate error message.
Structural and Process Business Rules
Business rules may be divided into two categories:
1. Structural business rules: Largely state and time independent and may include everything from “Vice Presidents report to Presidents” to “Employees work in Departments.”
2. Process business rules: workflow, state dependent rules
A structural rule is one that is defined on the data structure independent of process. Rules indicating the types of information to be stored and how those information elements interrelate are structural. All information traditionally stored in a relational ERD is structural. An ERD lists the information groupings in the database (entities), the individual information elements (attributes) and, to a limited extent, how the information elements are related (relationships). This is a very limited “grammar” for business rules, but clearly is a type of business rule repository.
Of course there are structural rules that are not easy to represent in an ERD. Even simple rules involving the relationship between 2 attributes usually cannot be represented in an ERD such as the standard constraint that a starting date must precede and ending date. There is no way to store such information in a traditional ERD.
Both because of its status as an emerging standard and its ability to store more complex rules than ERDs, we have chosen to use UML class diagrams for our core structural information. Anything that can be represented in an ERD can be done in UML with the additional advantage of actually reflecting inheritance relationships in UML that are not possible to represent in ERDs.
This change in thinking about state transition engines using business events enables us to build systems more consistent with the way that organizations think about their business. For example, business people do not think about workflow associated with an asset, but do think about workflow in terms of acquisition or retirement of an asset. For example, the retirement of an employee and not the employee him/herself has a defined workflow.
This is a more natural approach to building business-related systems. However, if the system user chooses to define workflows on types of objects, they can define a default business event (one for each type of object) and the state transition engine can be used for core data objects.
A process rule, on the other hand, relates to process or workflow-related information. The rules associated with declaring the approval process for an invoice are clearly process rules.
It should be noted that this division of business rules into structural and process rules is purely artificial. Any business rule can be articulated as either a structural rule or a process rule. There are rules that are more easily represented as structural rules and there are rules that are more easily as process rules. The goal is to represent all of the rules in such a way that the cost of development is minimized.
To illustrate this point, consider the following two rules, one naturally structural, one naturally process related. As a structural rule, consider a simple 1-entity data model of an Employee as shown below:
There can be no clearer structural rule, yet even this rule could be represented using a process-related business rule repository. The structure of an employee record is only relevant when inserting or updating an employee. If we placed rule checks on the operations that insert or update employees, then the Employee structure could be controlled indirectly by controlling the process that manipulates employees. This would give us the ability to enforce the structure of the employee through its processes.
Similarly, consider any process flow. All process rules can be articulated using conditional logic associated with data fields. Even a complex state transition rule flow can be represented using data-based rules. For example: “If the status of the order is “OPEN” and the current date is greater than 90 days past the day the order was created, then set the status of the order to “CANCELED””. You could insist upon calling this rule a “process” rule because of the setting of the status, but just as good a case could be made for calling this a structural rule. We are simply declaring a complex data rule that governs the automatic setting of an attribute based upon the value of some other attributes.
If we control any business event that impacts a particular data structure by enforcing it in such a way that the structural rule is not violated, then we have effectively made our structural business rule into a process. Conversely, for business events that have an attribute called “State,” all process-related rules can be articulated by “If the value of the state attribute is ‘x’, then field ‘y’ must not be null.” Therefore, any process business rule can also be represented as a complex structural rule.
Both because of its status as an emerging standard and its ability to store more complex rules than ERDs, we have chosen to use UML class diagrams for our core structural information. Anything that can be represented in an ERD can be represented in UML with the additional advantage of actually reflecting inheritance relationships in UML that are not possible to represent in ERDs. Additional structural rules that cannot be represented in UML (e.g. Start Date < End Date for a project) will be handled using PL/SQL-based class triggers.
Extensions & Modifications to Basic UML
We have taken advantage of UML’s extensibility to add features and create a richer data modeling tool. The associations between objects in the generic structure are physically stored in a SUPER ASSOCIATION table. To date, we have not found a compelling reason to use composition or aggregation other than for notational simplicity. Typically, we use strong composition to indicate a mandatory 1-to-many or weak composition (aggregation) symbol to denote an optional 1-to-many relationship between two classes.
On associations, we support relationships from n1…m1 to n2…m2 where n and m are positive integers or “*” (any number). We have not found any reason to support the more complex cardinalities such as “2, 4, 6” which are allowable in UML syntax. In the data models, we have designed, only on rare occasions have we used cardinalities other than 0..1, 1, 1..* and *. However, these other cardinalities do come in handy for specifying the maximum size of a committee or similar situations.
Generalizations are used extensively consistent with UML syntax. Association classes are explicitly supported in a way consistent with UML syntax.
Methods
We are still relatively early in our exploration of how to use methods in this system. Display methods are used to drive application development. In order to display an object from a particular class, there must be a way to show the object. For example, in an Employee class, we want to display the employee ID and name. On an invoice, we want to display the invoice number, date and customer. It is useful to have methods stored in the data repository for this purpose.
Triggers
We have extended the Oracle concept of table triggers. Since our attributes are stored separately from the object, there is no UPDATE OBJECT trigger. Instead, CREATE OBJECT, DELETE OBJECT, and SET ATTRIBUTE. are particularly useful validation triggers.
Primary Keys
Object-oriented development proponents don’t think in terms of primary keys. This is a mistake. Understanding and articulating what uniquely identifies an object beyond its ID is critical to good database design. We have extended UML to allow for primary key specification. In particular, primary key components are attributes or associations, just as in relational models. We have added the ability to make generalization a component of the primary key of a class.
Unique Constraints
Any number of multi-component unique constraints may be set on a class. Allowable components of unique constraints are the same as for primary keys, namely attributes, associations and generalizations.
Limitations
The only limitation placed on UML modeler is that we not allow recursion on abstract classes because the syntax is ambiguous and cannot be implemented. Specifically, a recursion on an abstract class might be represented by one of the diagrams shown in Figure 2.
Figure 2: Recursive relationships
Either the recursion is inherited by each subclass or we must support any subclass or we must support any subclass object connecting to any other subclass object. It was not until we started trying to figure out how to create physical tables for business rule classes that the problem surfaced. If there are several (10-20) subclasses of A, the physical table becomes very difficult to manage. This is probably not what the designer intended. Instead of recursion or abstract classes, the legal associations must be specified at the subclass level.
History
There are two types of history: simple and complex.
1. Simple history
Specified by applying the keyword “HIST” or “History” to the class. Note that history can also be applied to an association, meaning that it is being applied to the automatically generated association class. With simple history, two additional attributes are automatically added: StartDate and EndDate. The PK of the object does not change.
As mentioned, simple history may also be applied to an association. This does not necessitate or imply the creation of an association class. As with objects, StartDate and EndDate are applied to the associations.
StartDate and EndDate are added as attributes to the class.
NOTE: StartDate and EndDate need not be specified as attributes in the UML diagrams. The only reason to include these would be to change the multiplicity or domain of the StartDate and EndDate. By default, start dates and end dates allow any valid date except where they violate the history constraints as described below. Both are optional.
Limitations of StartDate and EndDate values
Neither the start date nor the end date may be after the current system date. If you want objects with future effective dates, you should add additional attributes to the class for this purpose. The History keyword has a very precise meaning and does not support this functionality.
If StartDate and EndDate are both not null, StartDate must be less than EndDate. Records may exist without Start Date when the start date is indeterminate. This is often the case for data migrations. However, you cannot have a logically deleted record without an end date since deleted records are detected by a non-null end date.
NOTE: Record status is not additionally maintained as part of history. If desired, record status should be added and manipulated as another attribute on the class.
Implementation Notes
By placing a “History” keyword on a class, when that class is brought into the repository, it is expected that a start date and end date will automatically be added to the class. If the start date or end date multiplicity or domains have been overridden by the diagram, these overrides should be passed to the repository.
When you create an object with history, a start date of SYS_DT is assigned to the start date attribute by default. StartDate is an optional parameter in the Create Object command and can be overridden by passing a start date parameter at object creation.
StartDate behaves like any other attribute and can be modified or made null using a Set Attribute command.
Applying a keyword of “History” to a class does not affect the behavior of its primary key. This is equivalent to setting an active flag on a relational database record. The record still exists. If an Employee with an EmpID as a primary key exists and history is associated with the Employee class, when the Employee is deleted, you cannot insert a new employee with the same ID.
To remove the Employee record from the database, use the overloaded DELETE-OBJECT function. You can either pass the delete date or the character string “remove” in the position of the optional parameter. If you pass “remove,” the record will be removed from the database.
2. Complex History (Primary Key)
Complex history involves the tracking of history on the record and allows duplicate primary key values in the class.
Primary key history behaves in a similar way to regular history since start date and end date can be added. The commands used are the same and behave in the same way with the following exceptions:
· When you execute a Delete command, the associated record is removed from the PK table enabling the primary key of the class to be reused. Because the primary key is not really the PK (in the actual physical table) the primary key will be the logical primary key plus the OID.
· If you have multiple records with the same primary key, there is a further stipulation that the start dates and end dates may not overlap as illustrated in the following examples.
The following are examples of successful and unsuccessful insertions to an organization class.
|
Original |
New |
|
|
|
Object |
Object |
|
|
OrgID |
100 |
100 |
|
|
StartDate |
1/Jan/2000 |
1/Jan/2001 |
Succeed |
|
EndDate |
1/Jan/2001 |
|
|
|
|
|
|
|
|
OrgID |
100 |
100 |
|
|
StartDate |
Null |
1/Jan/2001 |
Succeed |
|
EndDate |
1/Jan/2001 |
|
|
|
|
|
|
|
|
OrgID |
100 |
100 |
|
|
StartDate |
1/Jan/2000 |
1/Jan/2001 |
Fail |
|
EndDate |
Null |
Null |
|
|
|
|
|
|
|
OrgID |
100 |
100 |
|
|
StartDate |
1/Jan/2000 |
1/Feb/2000 |
Fail |
Not implemented |
EndDate |
1/Jan/2001 |
Null |
|
|
Effectively, we are relaxing the PK constraint so that the class may have duplicate primary keys as long as the StartDate and EndDate do not overlap. This constraint must be checked every time any primary key columns are changed, when the StartDate or EndDate attributes are set and whenever an object is created.
The StartDate and EndDate are implemented in the PARTY_VAL table. The Activ_yn is in the object tables and as a column in the PARTY_VAL table.
You may also apply a “history” keyword to an association. This allows for logical deletion of associations. The Create Association and Delete Association commands behave the same as the Create Object and Delete Object commands on classes with history. You can pass a date on either creation or deletion of an association. You can pass a date or pass the keyword “remove” to actually remove the association from the database.
Associations with not null end dates are automatically ignored in cardinality checks by the engine.
Impact of History on Associations
History has an impact on the cardinality of rule associations. In particular, sometimes you want the association to be considered part of the object and the history class. At other times, you do not. In Figure 3 and the example below, there are sales attached to customers and customers are placed together in customer groups with no more than ten customers in each group.
Figure 3: History on associations example
If the customer is no longer in business and needs to be deleted from the system, in the case of a sale, we do not want the deletion of the customer to orphan the sales records. Neither do we want to inactivate the association of the sale record since that would allow the sale objects to be attached to an additional customer.
However, in the customer group, the opposite situation exists. When the individual customer is inactivated or deleted, their positions in the customer group should be available to another customer. Sometimes the deletion of the object should also invalidate or delete its associations but not in every case.
The way that the engine works, associations are stored independently from the objects that they attach. Cardinality rules are validated by counting valid records in the association table. Therefore, in the implementation of history, when a command is received to delete a history object, this triggers no validation of any cardinality rules since the engine cannot determine whether cardinality restrictions should or should not be observed in any given situation. When deleting history objects, the applications must determine (on a case by case basis) whether a particular association should be deleted, invalidated or ignored. The cardinality rules will be enforced when any attempt is made to invalidate or delete an existing association.
UML extensions to support the creation of class tables
As mentioned above, each class in the UML data model is physically instantiated as a table. Each attribute in the class becomes a column. Foreign key relationships are handled using a reference to the master table OID rather than using the logical primary key.
Specialization classes include attributes along the generalization path. For example, in the EMP class, a Last Name and First Name column would be included if EMP is a specialization of PERSON. The UML engine is smart enough to handle an Insert or Update into EMP ASSOCIATED VIEW by inserting, updating or attaching to the appropriate person record.
We use keywords attached to UML objects to specify various physical column and class names. For example, to express a many-to-many relationship between two classes, you will need an association table between the two classes. If designers do not want to draw an association class, they can simply attach a keyword to declare what the name of the association table will be. Foreign key columns are named using the PARENT CLASS NAME_OID, but can be overridden via a keyword on the association.
Many-to-many associations between abstract classes become complex because an association table between any of the underlying subclasses may be needed as shown in Figure 4.
Figure 4: Many-to-many associations between abstract classes
In this case, the designer may want to have as many as twelve association classes because you will not only need association classes between all of the subclasses, but also the superclasses (an EMP-DEPT association class). How will these tables be named? As a keyword on the many-to-many association, the class pairings to be physically instantiated as tables are specified along with their names. It is beyond the scope of this paper to describe all of the notational nuances that we have used to extend UML.
Domains on Attributes
In addition to the traditional database (MIN/MAX) constraint, within the domain structure, we are also able to support reference tables (STATUS, TYPE) as well as domains validated using a function to support things like Social Security numbers or Canadian postal codes. We have also thought about including format masks but have not yet implemented this functionality.
What Have We Accomplished?
The best ideas from UML, ERDs and Oracle Designer have been combined to evolve the process one step farther to provide a much richer modeling environment, enabling us to better encapsulate all of the relevant business rules associated with a system.
Because all of the validation is handled by the UML engine, the enforcing of complex business rules avoids any mutating table problems.
One example showing how this approach provides a better data model is shown in Figure 5. Note the “delegate” and “supercede” keywords on certain associations.
Figure 5: Example of UML data model showing “delegate” and “supercede” associations
The company keeps track of people at different levels. If an individual is simply a person, we may only know their name. If they are designated as a “clean person,” we also have contact information (address, phone). If the person is an employee, we also need their tax information, gender, birth date, etc.
In this model, people can have many different roles; however, you must at least be a “clean person” to be a customer and only employees can be department members. Attempting to model these relationships with an ERD would look like Figure 6.
Figure 6: ERD showing associations
This data model is not all that complex. It captures most of the business rules with the notable exception of the way in which history is implemented in UML. We can explicitly enforce that an individual cannot belong to the same department twice at the same time.
The major difference between the ERD and UML models is in the implementation. How can all of the subclasses be implemented with ERDs? There are many alternatives, none of which are particularly satisfying for this example. You will either end up with one very large PERSON table and a complex application or many small tables with numerous joins. The combined UML/ERD approach is better, not only in the slight improvement of the diagram, but mainly because the physical tables are easier to build against.
UML with some extensions makes for a superior modeling environment to traditional ERDs. You can represent more business rules, in a cleaner format. However, we recognize that much of our affection for using UML results from how we implement the diagrams as much as the diagrams themselves.
About the Author
Dr. Paul Dorsey is the founder and President of Dulcian, Inc. (www.dulcian.com), an Oracle consulting firm specializing in client/server and web custom application development. Paul is co-author with Peter Koletzke of The JDeveloper3 Handbook (2001), Oracle Developer: Advanced Forms & Reports (2000), and The Oracle Designer Handbook (1999), and with Joseph R. Hudicka of Oracle8 Design Using UML Object Modeling (1999), all from Oracle Press. Paul is the Executive Editor of SELECT Magazine. He is the President of the New York Oracle Users’ Group and a frequent speaker at Oracle conferences and user group meetings.