Using UML to Design Data Structures

Dr. Paul Dorsey, Dulcian, Inc.

Introduction

UML is a standard environment for object-oriented design and development. The most commonly recognized parts of the UML Modeling environment are class diagrams, which somewhat resemble ERDs. There is some debate within both the object-oriented and relational communities concerning the applicability of UML class diagrams for representing structural data business rules such as those traditionally articulated in ERDs.

Three years ago, I attempted to resolve this issue in conjunction with the writing of Oracle8 Design Using UML Object Modeling (Dorsey& Hudicka, Oracle Press, 1999). In the course of writing that book, I concluded that UML class diagrams had not been originally intended for designing data models but were suited to the tasks and, in some cases, superior to ERDs. Over the last year, we have completely abstracted our data models to be fully object-oriented generic data structures, very similar in style to the so-called “universal data model” described by proponents such as Ulka Rodgers and David Hay. At the same time, by embracing object-oriented thinking, we were trying to describe more and more complex business rules that could simply not be represented in a traditional relational database.

UML to the Rescue

Currently, we use UML to create our data models and use a UML repository to drive a DML engine that enforces all of the rules in the repository and performs the insertion to the appropriate underlying tables. A full discussion of this architecture is beyond the scope of this paper. However, it was this architecture that led us to use and extend UML to represent data-related structural business rules.

The result of these extensions is quite remarkable. We can now say that this is unequivocally a better way of designing data models. UML and the appropriate extensions provide a richer vocabulary where more data rules can be expressed more easily. The business rules can be represented more accurately and in a way that better describes the relationships among the data.

Because of my focus on this project business rules project, I have not used ERDs for any data models in the last year. So, I had not had any opportunity to directly compare the two modeling methods. Then, I was called in to audit a medium-sized flexible product data model to support a new medical software.com system. This data model was created using Oracle Designer with traditional ERDs. In trying to audit and make suggestions for improvements, I found myself wishing that I had access to the extended UML class diagram vocabulary in order to redraw the data model. Modeling using ERDs after I has become accustomed to our extended UML felt like I was trying to swim wearing an overcoat.  It is possible if you are very good, but it is not the easiest way to do it.

This paper will do the following:

·         Show the basic UML commands that we use

·         Show how we have extended UML to support both the data models and their physical implementations

·         Show examples to demonstrate the superiority of this approach

More details about the superiority of UML can be found in the paper “Making the Transition from ERDs to UML” available on the Dulcian website (http://www.dulcian.com/).

Creating Data Models

A data model is more than just a representation of the data related business rules; it can also be (and is in our architecture) a representation of the physical database tables. Many structural rules in a relational database are not or cannot be easily implemented using traditional ERDs. Using our own internally developed UML engine, we are able to handle almost any business rule that can be articulated. These rules can be enforced by the engine, independent of what can be easily enforced in a traditional database.

The data models described in this paper represent not only captured business rules but also physical database designs. The data model drives the physical database. Data is stored twice, once in the generic repository and a second time in the physicalized tables as shown in Figure 1.

 

 

Figure 1: System Architecture

 

In the actual physical implementation, the tables are not directly accessed by the developers. Single table views with INSTEAD OF triggers sit on top of tables. Applications make DML calls to the views, which intercept the calls and send data manipulation scripts to the UML engine for processing.

The UML engine validates whether or not the DML request violates any business rules and updates the generic and physical class tables. If rules are violated, the engine delivers the appropriate error message.

 

The Universal Modeling Language  (UML)

A UML class diagram, although syntactically different from an ERD can (with some exceptions) be read similarly. Boxes on UML diagrams (classes) correspond almost exactly to what relational modelers call entities.

The lines connecting classes are of several types. The most basic of which, the association, also corresponds to the ERD concept of association. The only difference is that, rather than the limited cardinalities available to most relational modelers, the cardinality of objects can be richer. For a full discussion of this topic, see Oracle8 Design Using UML Object Modeling (Dorsey& Hudicka, Oracle Press, 1999).

 

Extensions & Modifications to basic UML

The associations between objects in the generic structure are physically stored in a SUPER ASSOCIATION table. To date, we have not found a compelling reason to use composition or aggregation other than for notational simplicity. Typically, we use strong composition to indicate a mandatory 1-to-many or weak composition (aggregation) symbol to denote an optional 1-to-many relationship between two classes.

On associations, we support relationships from n1…m1 to n2…m2 where n and m are positive integers or “*” (any number). We have not found any reason to support the more complex cardinalities such as “2, 4, 6” which are allowable in UML syntax. In the data models, we have designed, only on rare occasions have we used cardinalities other than 0..1, 1, 1..* and *. However, these other cardinalities do come in handy for specifying the maximum size of a committee or similar situations.

Generalizations are used extensively consistent with UML syntax. Association classes are explicitly supported in a way consistent with UML syntax.

 

Methods

We are still relatively early in our exploration of how to use methods in this system. Display methods are used to drive application development. In order to display an object from a particular class, there must be a way to show the object. For example, in an Employee class, we want to display the employee ID and name. On an invoice, we want to display the invoice number, date and customer. It is useful to have methods stored in the data repository for this purpose.

 

Triggers

We have extended the Oracle concept of table triggers. Since our attributes are stored separately from the object, there is no UPDATE OBJECT trigger. Instead, CREATE OBJECT, DELETE OBJECT,  and SET ATTRIBUTE are used.  SET ATTRIBUTE is a particularly useful validation trigger.

 

Primary Keys

Object-oriented development proponents don’t think in terms of primary keys. This is a mistake. Understanding and articulating what uniquely identifies an object beyond its ID is critical to good database design. We have extended UML to allow for primary key specification. In particular, primary key components are attributes or associations, just as in relational models. We have added the ability[CLRF1]  to make generalization a component of the primary key of a class.

Unique Constraints

Any number of multi-component unique constraints may be set on a class. Allowable components of unique constraints are the same as for primary keys, namely attributes, associations and generalizations.

 

Limitations

The only limitation placed on UML modeler is that we not allow recursion on abstract classes because the syntax is ambiguous and cannot be implemented. Specifically, a recursion on an abstract class might be represented by one of the diagrams shown in Figure 2.


Figure 2: Recursive relationships

Either the recursion is inherited by each subclass or we must support any subclass or we must support any subclass object connecting to any other subclass object. It was not until we started trying to figure out how to create physical tables for business rule classes that the problem surfaced. If there are several (10-20) subclasses of A, the physical table becomes very difficult to manage. This is probably not what the designer intended. Instead of recursion or abstract classes, the legal associations must be specified at the subclass level.

 

History

There are two types of history: simple and complex.

1. Simple history

Specified by applying the keyword “HIST” or “History” to the class. Note that history can also be applied to an association, meaning that it is being applied to the automatically generated association class. With simple history, two additional attributes are automatically added: StartDate and EndDate. The PK of the object does not change.

As mentioned, simple history may also be applied to an association. This does not necessitate or imply the creation of an association class. As with objects, StartDate and EndDate are applied to the associations.

StartDate and EndDate are added as attributes to the class.

NOTE: StartDate and EndDate need not be specified as attributes in the UML diagrams. The only reason to include these would be to change the multiplicity or domain of the StartDate and EndDate. By default, start dates and end dates allow any valid date except where they violate the history constraints as described below. Both are optional.

 

Limitations of StartDate and EndDate values

Neither the start date nor the end date may be after the current system date. If you want objects with future effective dates, you should add additional attributes to the class for this purpose. The History keyword has a very precise meaning and does not support this functionality.

If StartDate and EndDate are both not null, StartDate must be less than EndDate. Records may exist without Start Date when the start date is indeterminate. This is often the case for data migrations. However, you cannot have a logically deleted record without an end date since deleted records are detected by a non-null end date.

NOTE: Record status is not additionally maintained as part of history. If desired, record status should be added and manipulated as another attribute on the class.

 

Implementation Notes

By placing a “History” keyword on a class, when that class is brought into the repository, it is expected that a start date and end date will automatically be added to the class. If the start date or end date multiplicity or domains have been overridden by the diagram, these overrides should be passed to the repository.

When you create an object with history, a start date of SYS_DT is assigned to the start date attribute by default. StartDate is an optional parameter in the Create Object command and can be overridden by passing a start date parameter at object creation.

StartDate behaves like any other attribute and can be modified or made null using a Set Attribute command.

Applying a keyword of “History” to a class does not affect the behavior of its primary key. This is equivalent to setting an active flag on a relational database record. The record still exists. If an Employee with an EmpID as a primary key exists and history is associated with the Employee class, when the Employee is deleted, you cannot insert a new employee with the same ID.

To remove the Employee record from the database, use the overloaded DELETE-OBJECT function. You can either pass the delete date or the character string “remove” in the position of the optional parameter. If you pass “remove,” the record will be removed from the database.

 

2. Complex History (Primary Key)

Complex history involves the tracking of history on the record and allows duplicate primary key values in the class.

 

Primary key history behaves in a similar way to regular history since start date and end date can be added. The commands used are the same and behave in the same way with the following exceptions:

 

The following are examples of successful and unsuccessful insertions to an organization class.

 

Org

{Hist_PK}

OrgID {PK}

 

 

 

 
 

 

 

 

 

 

 

 


Original

New

 

 

 

Object

Object

 

 

OrgID

100

100

 

 

StartDate

1/Jan/2000

1/Jan/2001

Succeed

 

EndDate

1/Jan/2001

 

 

 

 

 

 

 

 

OrgID

100

100

 

 

StartDate

Null

1/Jan/2001

Succeed

 

EndDate

1/Jan/2001

 

 

 

 

 

 

 

 

OrgID

100

100

 

 

StartDate

1/Jan/2000

1/Jan/2001

Fail

 

EndDate

Null

Null

 

 

 

 

 

 

 

OrgID

100

100

 

 

StartDate

1/Jan/2000

1/Feb/2000

Fail

Not implemented

EndDate

1/Jan/2001

Null

 

 

 

 

Effectively, we are relaxing the PK constraint so that the class may have duplicate primary keys as long as the StartDate and EndDate do not overlap. This constraint must be checked every time any primary key columns are changed, when the StartDate or EndDate attributes are set and whenever an object is created.

 

The StartDate and EndDate are implemented in the PARTY_VAL table. The Activ_yn is in the object tables and as a column in the PARTY_VAL table.

3. History on Associations

You may also apply a “history” keyword to an association. This allows for logical deletion of associations. The Create Association and Delete Association commands behave the same as the Create Object and Delete Object commands on classes with history. You can pass a date on either creation or deletion of an association. You can pass a date or pass the keyword “remove” to actually remove the association from the database.

 

Associations with not null end dates are automatically ignored in cardinality checks by the engine.

 

Impact of History on Associations

History has an impact on the cardinality of rule associations. In particular, sometimes you want the association to be considered part of the object and the history class. At other times, you do not. In Figure 3 and the example below, there are sales attached to customers and customers are placed together in customer groups with no more than ten customers in each group.

 

 

 

Figure 3: History on associations example

 

If the customer is no longer in business and needs to be deleted from the system, in the case of a sale, we do not want the deletion of the customer to orphan the sales records. Neither do we want to inactivate the association of the sale record since that would allow the sale objects to be attached to an additional customer.

 

However, in the customer group, the opposite situation exists. When the individual customer is inactivated or deleted, their positions in the customer group should be available to another customer. Sometimes the deletion of the object should also invalidate or delete its associations but not in every case.

 

The way that the engine works, associations are stored independently from the objects that they attach. Cardinality rules are validated by counting valid records in the association table. Therefore, in the implementation of history, when a command is received to delete a history object, this triggers no validation of any cardinality rules since the engine cannot determine whether cardinality restrictions should or should not be observed in any given situation. When deleting history objects, the applications must determine (on a case by case basis) whether a particular association should be deleted, invalidated or ignored. The cardinality rules will be enforced when any attempt is made to invalidate or delete an existing association.

 

UML extensions to support the creation of class tables

As mentioned above, each class in the UML data model is physically instantiated as a table. Each attribute in the class becomes a column. Foreign key relationships are handled using a reference to the master table OID rather than using the logical primary key.

Specialization classes include attributes along the generalization path. For example, in the EMP class, a Last Name and First Name column would be included if EMP is a specialization of PERSON. The UML engine is smart enough to handle an Insert or Update into EMP ASSOCIATED VIEW by inserting, updating or attaching to the appropriate person record.

We use keywords attached to UML objects to specify various physical column and class names. For example, to express a many-to-many relationship between two classes, you will need an association table between the two classes. If designers do not want to draw an association class, they can simply attach a keyword to declare what the name of the association table will be. Foreign key columns are named using the PARENT CLASS NAME_OID, but can be overridden via a keyword on the association.

Many-to-many associations between abstract classes become complex because an association table between any of the underlying subclasses may be needed as shown in Figure 4.

 

 

 

Figure 4: Many-to-many associations between abstract classes

 

In this case, the designer may want to have as many as twelve association classes because you will not only need association classes between all of the subclasses, but also the superclasses (an EMP-DEPT association class). How will these tables be named? As a keyword on the many-to-many association, the class pairings to be physically instantiated as tables are specified along with their names. It is beyond the scope of this paper to describe all of the notational nuances that we have used to extend UML.

Domains on Attributes

In addition to the traditional database (MIN/MAX) constraint, within the domain structure, we are also able to support reference tables (STATUS, TYPE) as well as domains validated using a function to support things like Social Security numbers or Canadian postal codes. We have also thought about including format masks but have not yet implemented this functionality.


What Have We Accomplished?

The best ideas from UML, ERDs and Oracle Designer have been combined to evolve the process one step farther to provide a much richer modeling environment, enabling us to better encapsulate all of the relevant business rules associated with a system.

Because all of the validation is handled by the UML engine, the enforcing of complex business rules avoids any mutating table problems.

One example showing how this approach provides a better data model is shown in Figure 5. Note the “delegate” and  “supercede” keywords on certain associations.

 

 

Figure 5: Example of UML data model showing “delegate” and “supercede” associations

 

 

 

The company keeps track of people at different levels. If an individual is simply a person, we may only know their name. If they are designated as a “clean person,” we also have contact information (address, phone). If the person is an employee, we also need their tax information, gender, birth date, etc.

In this model, people can have many different roles; however, you must at least be a “clean person” to be a customer and only employees can be department members. Attempting to model these relationships with an ERD would look like Figure 6.

 

 

 

Figure 6: ERD showing associations

This data model is not all that complex. It captures most of the business rules with the notable exception of the way in which history is implemented in UML. We can explicitly enforce that an individual cannot belong to the same department twice at the same time.

The major difference between the ERD and UML models is in the implementation. How can all of the subclasses be implemented with ERDs? There are many alternatives, none of which are particularly satisfying for this example. You will either end up with one very large PERSON table and a complex application or many small tables with numerous joins. The combined UML/ERD approach is better, not only in the slight improvement of the diagram, but mainly because the physical tables are easier to build against.

UML with some extensions makes for a superior modeling environment to traditional ERDs. You can represent more business rules, in a cleaner format. However, we recognize that much of our affection for using UML results from how we implement the diagrams as much as the diagrams themselves.

About the Author

Dr. Paul Dorsey is the founder and President of Dulcian, Inc. (www.dulcian.com), an Oracle consulting firm specializing in client/server and web custom application development. Paul is co-author with Peter Koletzke of The JDeveloper3 Handbook (2001), Oracle Developer: Advanced Forms & Reports (2000), and The Oracle Designer Handbook (1999), and with Joseph R. Hudicka of Oracle8 Design Using UML Object Modeling (1999), all from Oracle Press. Paul is the Executive Editor of SELECT Magazine. He is the President of the New York Oracle Users’ Group.

 


Page: 1
 [CLRF1]I couldn’t read the word I wrote here – HELP!