At work we’re using Class Table Inheritance to model the core data structures of our as yet nameless open source CRM. (actually it has a code name, but I don’t like it, so we’ll pretend it’s nameless)
This week as I learned both the name of this pattern, and the SQL to implement it efficiently in MySQL I thought I’d share some notes on what we’ve come up with.
Class Table InheritanceCTI is a pattern where your schema hews closely to the your class hierarchy — you have a table for each class, and object attributes are stored as columns in the table for that class. This is as opposed to Single Table Inheritance, which stores all attributes for every member of an inheritance tree in a single table, or Concrete Table Inheritance which duplicates the inherited fields to each table.
We’re using CTI for a number of reasons, some excellent, and some because that is how our brains work. One key benefit is we get the ability to operate on heterogenous collections of object. (i.e. display a sorted list of contact objects irrespective of whether they are individuals, organizations, etc.) The fact that something as fundamental as polymorphism can be difficult to accomplish using other O/R techniques hilights the general difficulty and impedance mismatch of object to relational mapping.
There are, however a few challenges involved.
The ModelBut quickly a trivial mock up of the data model to give us something to play with. In this model we have 3 classes, a base Contact class, and 2 types of contacts Individual and Organization which inherit from Contact. In the database we model this as a Contact table, with a primary id, and any fields which the Contact the parent class provides. Individual and Organization each get their own table, with class specific attributes, plus the primary id from Contact which is both the primary id, and a foreign key into Contact. For every contact stored in the database there is a record in the Contact table, and a record in either Individual or Organization.
Challenge 1:How do you retrieve your data now that it is broken across multiple tables? The obvious (and obviously wrong) solution would be to first select against the Contact table, examine the resulting data, and select against the appropriate children tables. (remember this example is a drastically simplified data model)
Solution 1:My current solution is to do a 3 way left outer join, and then instantiate the objects out of fields which aren’t null. It can look a bit ugly with large numbers of columns, but is actually pretty simple, and initial benchmarks against a test data set of 1.3 million records suggests it scales nicely.
SELECT c.id as id, c.contactType, i.firstName, i.lastName, o.name as orgName FROM Contact as c LEFT JOIN Individual as i ON i.id = c.id LEFT JOIN Organization as o ON o.id = c.id
(And it’s pretty easy to coax the necessary SQL out of our modified version of DB_DataObject. I’ll demonstrate that in a future entry)
Challenge 2:One of the good reasons (in fact the only concrete one I’ve given you) to go with CTI and its potential added complexity is the desire to sort a mixed list of contact types. But what do you do when you want to sort on an amorphous concept like “Name”? Individual might sort by “Lastname, Firstname”, while we just use “orgName” for an organization. Where is our polymorphism now?
One solution would be select all known records, and do sorting and slicing at the application level. (This is so wrong it makes my head hurt) Another solution would be store a field like “sortName” on Contact, which you could calculate and save on inserts and updates. This is on the right track.
Solution 2:The solution (which Carl clued me into while we waited to get into the zoo), is to use the SQL CASE statement to calculate “sortName” on the fly. Below is our SQL from below with the new logic in green.
SELECT c.id as id, c.contactType, i.firstName, i.lastName, o.name as orgName, CASE WHEN c.contactType = 'Individual' THEN CONCAT(i.lastName, i.firstName) WHEN c.contactType = 'Organization' THEN o.name END as sortName FROM Contact as c LEFT JOIN Individual as i ON i.id = c.id LEFT JOIN Organization as o ON o.id = c.id ORDER by sortName
This simply adds a switch statement to your field list conditionally setting the value of sortName.
There is nothing quite so satisfying as finding the right tool for the right job. I’m currently totally in love with the CASE statement, and think everyone should know about it. (hence this bit of evangelism)
Tip: don’t forget that ‘,’ after orgName, I keep forgetting it and wondering why my SQL isn’t working.
So concludes today’s edition of “Enterprise Development with MySQL” (“Kellan Learns SQL” didn’t sound as impressive). In the near future I hope to get a chance to write up some of the PHP libs we’ve been building/modifying to support object rich web development techniques, completing the picture. (OLamp anyone?)