A Commercial Temporal RDBMS

The current database technology that most businesses rely upon today is beginning to show its age as new types of transactional applications are required.  The design and formalization of a relational database management system (RDBMS) during the 1970s is widely credited to E. F. Codd, and is based on the relational model.  This model is strongly suited to maintaining the current state of entities and defining their relationships.  Database professionals will be familiar with the term ACID, which defines a set of rules for ensuring that a relational database always moves from one consistent state to another as transactions are applied.

However, as the business intelligence (BI) revolution marches forward, queries based on point-in-time state are becoming increasingly common.  Whenever data is modified in a traditional RDBMS the following logical operation occurs.  The record in question is deleted from the database and then replaced with a copy that includes the modified attributes.  The previous state is necessarily lost, making it impossible to ever peer into the past and ask questions about the data as it once was.

The need for temporal extensions to the ANSI SQL standard was apparent to some within the database community as early as 1992.  Started by Richard Snodgrass, a committee was formed to pursue the issue and provide guidance on its development.  The new language would to be called TSQL2 – not to be confused with Microsoft’s Transact-SQL language which is also referred to as “T-SQL”.  In October of 2004 the final language specification was made available along with a number of contributing commentaries.  At least two books have been published by groups at the forefront of this research (see The TSQL2 Temporal Query Language and Temporal Data and the Relational Model).  Still, there are no serious commercial products available and only Oracle has made any effort – a minimal effort to be sure – to implement the features described by the specification.  The features that it offers (starting with version 9i) include flashback queries and workspace manager.  These are non-standard features that were created to further distinguish Oracle from its competitors rather than to adhere to the principals of temporal queries.

Based on the work of Snodgrass and company, I envision a database engine and query language that inherently offer temporal abilities.  A database professional would follow current best practices for schema design as if he were working with a traditional RDBMS.  The temporal features would only come alive when, and if, they were needed.  Every record of every table (and in fact, the tables themselves) would be time stamped with a validity period from when the entity was created until a time when it was either deleted or replaced through a standard update statement.  DBAs may not even find it necessary to normalize a schema up to sixth normal form (6NF).

One simple modification to SQL would be the addition of a new clause that I will call AS OF.  Queries against current data state would be formed exactly as they are today.  The AS OF clause would optionally follow the FROM clause and would include a date and time parameter d.  Data would be returned from each of the relations in the FROM clause only where the records were in existence and had not expired on or before d.  Queries where the AS OF clause is omitted would be functionally equivalent to specifying AS OF GETDATE().  For queries where d is a moment in the future the database engine could respond as if GETDATE() were supplied, inferring that if no changes occur up until that point in time the state would be exactly as it is now.  Or, it could return an empty record set as if to mimic null behavior where the future state is currently unknown or not available.

In order to properly produce result sets the system catalog would also need to be temporally aware.  This will allow the AS OF clause to deal with columns as they are added or modified, and tables that are created in or removed from the schema.  I imagine this will be an area for debate as well.  Should the database engine report null for columns that no longer exist or should it be a run-time error?  If it is in fact an error then how can we expect an end user to have knowledge of the schema as it was several years prior?  Worse yet would be the queries against tables that did not exist in the past, joined to tables that currently do exist.  Should the entire result set be empty?

It seems clear that there is a fair amount of research work left to be done.  Still, I’m a little surprised that there aren’t at least a few commercial offerings available.  Almost all of the basic features I would expect of a temporal system could be implemented using current technology.  Without a proper query language and database support a developer would be required to create a corresponding history table for each of the other tables in a schema.  Disk space will have to be planned even more carefully, since the database will grow faster than a typical relational information store.  Thankfully the cost per gigabyte of online media continues to plummet so the financial implications are less than they would have been in 1992.  The corollary is that the value of business intelligence will only increase as more tools and techniques are developed.  Having a true, temporal relational database management system as a back end will contribute to this field immensely.

× Comments are closed.