MySQL, and the CASE for Class Table Inheritance

August 14th, 2004

At work we’re using Class Table Inheritance to model the core data structures of our as yet nameless open source CRM. (actually it has a code name, but I don’t like it, so we’ll pretend it’s nameless)

This week as I learned both the name of this pattern, and the SQL to implement it efficiently in MySQL I thought I’d share some notes on what we’ve come up with.

Class Table Inheritance

CTI is a pattern where your schema hews closely to the your class hierarchy — you have a table for each class, and object attributes are stored as columns in the table for that class. This is as opposed to Single Table Inheritance, which stores all attributes for every member of an inheritance tree in a single table, or Concrete Table Inheritance which duplicates the inherited fields to each table.

We’re using CTI for a number of reasons, some excellent, and some because that is how our brains work. One key benefit is we get the ability to operate on heterogenous collections of object. (i.e. display a sorted list of contact objects irrespective of whether they are individuals, organizations, etc.) The fact that something as fundamental as polymorphism can be difficult to accomplish using other O/R techniques hilights the general difficulty and impedance mismatch of object to relational mapping.

There are, however a few challenges involved.

The Model

But quickly a trivial mock up of the data model to give us something to play with. In this model we have 3 classes, a base Contact class, and 2 types of contacts Individual and Organization which inherit from Contact. In the database we model this as a Contact table, with a primary id, and any fields which the Contact the parent class provides. Individual and Organization each get their own table, with class specific attributes, plus the primary id from Contact which is both the primary id, and a foreign key into Contact. For every contact stored in the database there is a record in the Contact table, and a record in either Individual or Organization.

Challenge 1:

How do you retrieve your data now that it is broken across multiple tables? The obvious (and obviously wrong) solution would be to first select against the Contact table, examine the resulting data, and select against the appropriate children tables. (remember this example is a drastically simplified data model)

Solution 1:

My current solution is to do a 3 way left outer join, and then instantiate the objects out of fields which aren’t null. It can look a bit ugly with large numbers of columns, but is actually pretty simple, and initial benchmarks against a test data set of 1.3 million records suggests it scales nicely.

SELECT 
  c.id as id, c.contactType,
  i.firstName, i.lastName,
  o.name as orgName
FROM
  Contact as c
LEFT JOIN Individual as i ON i.id = c.id 
LEFT JOIN Organization as o ON o.id = c.id

(And it’s pretty easy to coax the necessary SQL out of our modified version of DB_DataObject. I’ll demonstrate that in a future entry)

Challenge 2:

One of the good reasons (in fact the only concrete one I’ve given you) to go with CTI and its potential added complexity is the desire to sort a mixed list of contact types. But what do you do when you want to sort on an amorphous concept like “Name”? Individual might sort by “Lastname, Firstname”, while we just use “orgName” for an organization. Where is our polymorphism now?

One solution would be select all known records, and do sorting and slicing at the application level. (This is so wrong it makes my head hurt) Another solution would be store a field like “sortName” on Contact, which you could calculate and save on inserts and updates. This is on the right track.

Solution 2:

The solution (which Carl clued me into while we waited to get into the zoo), is to use the SQL CASE statement to calculate “sortName” on the fly. Below is our SQL from below with the new logic in green.

SELECT 
  c.id as id, c.contactType,
  i.firstName, i.lastName,
  o.name as orgName,
CASE 
 WHEN c.contactType = 'Individual' THEN CONCAT(i.lastName, i.firstName)
 WHEN c.contactType = 'Organization' THEN o.name
END as sortName
FROM
  Contact as c
LEFT JOIN Individual as i ON i.id = c.id 
LEFT JOIN Organization as o ON o.id = c.id
ORDER by sortName

This simply adds a switch statement to your field list conditionally setting the value of sortName.

There is nothing quite so satisfying as finding the right tool for the right job. I’m currently totally in love with the CASE statement, and think everyone should know about it. (hence this bit of evangelism)

Tip: don’t forget that ‘,’ after orgName, I keep forgetting it and wondering why my SQL isn’t working.

So concludes today’s edition of “Enterprise Development with MySQL” (“Kellan Learns SQL” didn’t sound as impressive). In the near future I hope to get a chance to write up some of the PHP libs we’ve been building/modifying to support object rich web development techniques, completing the picture. (OLamp anyone?)

Mixing Metaphors

Single, and Concrete TI both have their places, and the good news is you can mix these patterns without much difficulty. Still I’m happy with both the flexibility and “OO-ness” of Concrete Table Inheritance, and once again impressed by the speed, and power of MySQL. (N.B. we’re targeting 4.1.x, but these above examples all work with 4.0.16 and 4.0.20)

Tagged: Uncategorized , , , , , , , , ,

7 Responses to “MySQL, and the CASE for Class Table Inheritance”

  1. Jordi B says:

    Just what I was looking for. Thanks!!

  2. [...] Edit: Well they are cool, but it seems that CakePHP’s paginate functionality doesn’t like my custom findAll() function and likes to spit out warnings and errors. Due to deadlines, I’m going to have to abandon this line of enquiry – but not forever! Once I pass my current milestone, it’ll be something I’ll look at again – maybe even look at Class Table Inheritance as another option. [...]

  3. Alex Barrett says:

    The contactType column in your base table is unnecessary, the same should be achievable with something similar to the following:

    SELECT c.id as id, c.contactType, i.firstName, i.lastName, o.name as orgName FROM Contact as c LEFT JOIN Individual as i ON i.id = c.id LEFT JOIN Organization as o ON o.id = c.id ORDER BY CASE WHEN i.id IS NOT NULL THEN CONCAT(i.lastName, i.firstName) WHEN o.id IS NOT NULL THEN o.name END

  4. kellan says:

    Hey Alex, thanks for the comment, but I’m not sure I see what you’re getting at. (though I do like the technique of moving the CASE statement to the order by for this example)

  5. Interesting article, thanks for sharing. I think Alex’ point is the ‘is not null’ part in the case. The select of ‘c.contactType’ would then indeed be unnecessary (but probably not unwanted).

  6. Thomas says:

    There is no big difference compared to the Single Table Inheritance attempt, is there? To get data out of the split tables you use, you join them back to a single table. What benefits does CTI give you having in mind, you have to join all those tables back to one?

    BTW: The Case-statement gives you a wrong order, sometimes. Image “Tom Roberts” and “Jim Robertson”

    Anyways, good article. Helped me sort my mind :-)

  7. Thanks for bringing me on the right path.

    I just found out, that there exists two other nice statements, which makes the SQL-Statement a bit shorter. IFNULL(probablyIsNull,shouldBeUsedInstead) IF(boolean,trueValue,falseValue)

    more about them: http://dev.mysql.com/doc/refman/5.5/en/control-flow-functions.html