The April 2010 edition of SQL Server Magazine has an article entitled “Efficient Data Management in SQL Server 2008, Part 1“. The article introduces a new way to save storage space by using sparse columns. As described by the author, Don Kiely, sparse columns allow SQL Server to manage storage of data more efficiently if a large number of the rows in the table will contain null entries.
I think it’s fair to assume that most people who consider themselves to be a database professional and not a hobbyist will understand that columns containing mostly null data are indicative of a badly designed database schema! Null, of course, is the concept where an attribute’s value is currently unknown, not available, or does not apply to the associated record. If a given schema has predominantly null values for a column or columns, shouldn’t these attributes be grouped into related sets and moved into separate tables? In fact this is the most efficient way to store null attributes; if the associated row doesn’t exist it will occupy exactly zero bytes. I don’t think you can do much better than that.
Kiely provides an example where an object-relational mapping tool places attributes for inherited classes in a single table. If you use the tool this way of course there is the possibility for many null entries. This isn’t the tool’s fault; it’s the fault of the person using it! He then continues on to say that you could use the tool to map to multiple tables (a-ha!) but that doing so would make the design more complex and harder to maintain. I’m a bit lost on how a horizontally expansive table riddled with null attributes is less confusing than narrow tables that store information related to sets of similar aspects of an entity.
I’ll give Kiely the benefit of the doubt that he wrote the article to explain a new feature of SQL Server 2008, and not to advocate denormalized schema design. The bigger issue here is that Microsoft needs to stop creating features that cater to bad database design. Sparse columns in SQL Server 2008 encourage uninformed users to enable the feature, chalk up the space savings and call it a day. Without an option like this database administrators would be forced to address the underlying problem. By using few, wide tables you are increasing contention for those objects. Increased contention for resources of course leads to lower levels of concurrency. There are many other reasons not to design your tables this way but I think this is one of the most obvious and it’s easy to see the impact in a live environment.