Time and Time Again: Managing Time in Relational Databases, Part 17 - Physical Implementation of Version Pattern 2

Number of posts : 1850 Age : 61 Registration date : 2008-03-08

In this PDM segment, we will describe the physical implementation of Version Pattern 2. This pattern was described, from a logical point of view, in part 3 of this series. This pattern expands on pattern 1 by adding a logical delete capability.

This pattern is more semantically expressive than Version Pattern 1, just as each pattern described in this series is more expressive than the patterns prior to it. This pattern allows us to keep a record of deleted rows, and thus information about when they were first inserted, last updated and, finally, deleted. A delete request which is carried out as a physical delete loses all this information, removing all evidence, in the online table, that the row ever existed.

To support logical deletes, we need an additional column, either some kind of a delete indicator or else a delete date. A delete indicator would be set to "N" when the row is first inserted, and set to "Y" if and when the row is logically deleted. A delete date would be set to {null} when the row is first inserted, and set to the date of the logical deletion if and when the row is logically deleted.

Remember, in this series when we reference a column as “date” we normally would consider a “date/timestamp” to be more accurate and to provide more flexibility as to when we can change the state of a row. We use dates in our examples to help keep them simple.

We prefer to use a delete date because it provides more information than a delete indicator. If a delete indicator is used, we must decide to either leave the last update date alone, or to put the delete date in that column. If we choose the first option, we have lost the delete date. If we choose the second option, we have overwritten the date of the last update. But if we use a delete date instead of a delete indicator, we can preserve both dates. In addition, as we will show later, it becomes easier and more efficient to query when looking for rows that were active at a given point in the past. Also discussed later in this article is the option of using a special date value rather than {null}.

A variant of Version Pattern 2 would be to add a “delete reason” column. Together with a delete date, this would tell us both when and why rows were logically deleted. For example, a row in a Customer table might have been logically deleted because of no recent orders, bad credit, etc. And, of course, the combination of insert date and delete date tells us how long the row was logically present in the table. (This may not correspond to how long the customer was active, however. It will correspond only if the business chooses to treat row insertion and deletion dates as also representing customer effectivity dates.)

With rows logically rather than physically deleted, it is also possible to maintain referential integrity for rows in dependent tables. Otherwise, a physical deletion would force us either to set foreign keys to {null}, or to cascade delete those dependent rows. In certain types of queries, data from a logically deleted parent row might frequently be usefully joined to nondeleted child rows. This would provide data about the parent row related to those child rows, a relationship that was in effect between the later of the insert dates for the parent and the child rows, and the delete date of the parent row.

Foreign Key Constraints

Most popular database management systems (DBMSs) allow a “Set Null” delete option. With this option, if there are one or more rows in one or more dependent tables that have a foreign key to a row being deleted, then those foreign keys are set to {null}. As just mentioned, while this approach retains the dependent rows, it loses the link to the parent row. So a logical delete allows us to retain more information than a physical delete which uses the Set Null option.

These same DBMSs also provide for a “Cascade” delete option. With this option, if there are one or more rows in one or more dependent tables that have a foreign key to a row being deleted, then those dependent rows are also deleted. If other rows have foreign keys to these dependent rows, then those other rows will also be deleted. Usually, Cascade delete options are avoided in most financial and other mission-critical applications. A Cascade delete retains less information than a Set Null delete against the same set of rows because a Cascade delete removes the dependent rows while a Set Null delete retains the rows and merely sets their foreign keys to {null}. So a logical delete also allows us to retain more information than a physical delete with the Cascade option.

The third common delete option is called “No Action” or “Restrict.” No Action and Restrict are nearly identical, and in DBMSs that support both, the only difference is when the constraint is enforced – before or after other constraints. This constraint is used as a safety valve to prevent the inadvertent deletion of dependents when a parent row is physically deleted and the inadvertent deletion of a parent row that has dependents. Therefore, in this physical implementation of Version Pattern 2, we will use the “Delete No Action” option because no rows should ever be physically deleted, but if someone tries to physically delete a row, this option will prevent the deletion if that row has dependent rows.

This will not, however, prevent the physical deletion of rows that do not have dependents. So, if we really want to prevent the physical deletion of rows in a table, the safest approach would be to revoke the delete permission on the table from the users, or not grant delete authority in the first place. This will force the user to update the logical delete indicator (or logical delete date) in order to perform a business deletion.

A logical delete date or indicator column can help maintain referential integrity if we want to inactivate a parent table row while retaining the dependents’ relationship. For example, we may want to inactivate a customer but retain both the orders they placed and the foreign key relationship. This is helpful for several reasons, including the ability to find the customer name and address of an order that was shipped last year, even though the customer is no longer active.

This approach also gives us the ability to reactivate the customer if conditions change – without having to create a new customer entity occurrence – thus retaining the customer's previous history. But if we do this, we will lose the information that this customer was logically deleted and inactive for a period of time before its re-activation. To handle re-activation without loss of information required a more advanced version pattern.

Figure 1 shows how Version Pattern 2 will look in a physical database.

Figure 1: Version Pattern 2 (see PDF below)

Figure 2: Version 2 Setup (see PDF below)

Starting in this PDM segment we will include a setup section that will show the preparation of the data and chronology of SQL events leading up to the Scenarios.

Figure 3: Version 2 Scenario 2.1 (see PDF below)

Figure 4: Version 2 Follow-Up (see PDF below)

We will also perform some follow-up SQL that shows various ways the data can be queried.

Two Approaches to a Delete Date

In the physical implementation of our remaining versioning patterns, we will not use {null} in delete dates. Instead, we will use the special value “12/31/9999.” Two reasons to use this special value are to simplify queries and to improve performance.

For example, if a delete date is inclusive and nullable, the SQL that looks for active rows at a given point in time may be written as:

“… AND (crt_dt <= ‘01/01/2005’ AND (del_dt >= ‘01/01/2005’ OR del_dt is null)) ”

…using the table facsimile in Figure 5.

Figure 5: Version 2 Follow-Up (see PDF below)

This query will return policies 138 and 139 because they were both active on 01/01/2005.

However, if we used “12/31/9999” for the delete date, and it is inclusive, the SQL may be simply written as:

“… AND (:my-date between crt_dt AND del_dt ) ”

or another way:

“… AND (crt_dt <= :my-date AND del_dt >= :my-date) ”

where :my-date is ‘01/01/2005’ using the table facsimile in Figure 6.

Figure 6: (see PDF below)

This query will also return policies 138 and 139 because they were both active on 01/01/2005.

Using “12/31/9999” instead of {null}, the SQL is clearly simpler to write. In addition, performance will be better because we have removed the “OR” from the optimization criteria. Many DBMSs do not optimize ORs very well, and often will alter the access path to something less efficient compared to similar SQL without the OR. Using “12/31/9999,” we may also decide to put an index on the delete date, usually combined with some other columns such as client_nbr.

Wrap-Up

This article has shown how to implement Version Pattern 2, which builds on Version Pattern 1 by providing a logical delete to use in place of a physical delete. However, updates using this version pattern still lose a great deal of information. This pattern does not retain previous versions of an object, nor can it distinguish between re-activated objects and those which were never deleted. Also, we’ve only discussed one dimension of time, the system dates of the row's creation and deletion. Thus, Version Pattern 2 does not manage bi-temporality. Its dates are dates of database activity. Business meaningful dates – specifically the dates on which a version of an object became effective and ceased being in effect – are not explicitly represented in Version Pattern 2.

In the next several articles, we will proceed to demonstrate how to physically model the more advanced semantics of the higher version patterns. We will also present the SQL needed to maintain and query these tables when these more advanced patterns are used.

» Time and Time Again: Managing Time in Relational Databases, Part 18 - Physical Implementation of Version Pattern 3
» Time and Time Again: Managing Time in Relational Databases Part 16 - Physical Implementation of Version Pattern 1
» Time and Time Again: Managing Time in Relational Databases, Part 9: Completing Version Pattern 5
» Time and Time Again: Managing Time in Relational Databases, Part 8: Version Pattern 5
» Time and Time Again: Managing Time in Relational Databases, Part 12: Foreign Keys in Version Tables