Data Model Relationships – CakePHP’s HABTM

For today, lets dive back into some code, well data modeling at least. When you set up an application that connects to a database, you need to understand the data that will be working in the application. This is the data that will be edited, added, read and even scrutinized int he application. When looking at the application data, one could easily put all data in a table and make it as flat as possible. We could normalize it until the cows come home as well. What is the best choice? My vote is always plan for what is best for the application, and the future of the application. When it comes to data, a more normalized data layout is always going to provide better performance and better ability to scale in the future. In our little example application, we are going to model the data for an online movie rental inventory. We will take an example film: Gran Torino to help the example model.

The data we need for this application includes some basic information: movie title, genre(s), stars, directors, writers, story information, rating, release year, rent price. We can include a lot more data if we really needed to, but for the purpose of this, we will keep it a little simple. A possible way of modeling this data is to create a table that stores all of this information, and have one table in the database. But now when we need to add something else, we have to add columns to the table. For example, in a few months the company decided to add related titles, sequels and sets, etc. It would require a refactor of the data in order to handle this, as well as refactor of the code. So lets split this out.

In the image below, I divided the content based on a few things: Title data, Talent Data, Genre Data, Rating Data

Starting the data model design
Starting the data model design

I now have the four main tables, but we need to figure out how these are related. First lets tackle the Rating Data, as that will be a simple design. I am linking to the IMDB so you can look at the data. The ratings available in the United States, at least the ones we will include, are: G (General Audiences), PG (Parental Guidance suggested), PG-13 (Parents strongly cautioned) and R (Restricted, no one under 17 allowed without a parent, or as I call it, PG-17). So each rating will be housed in this table. We will need an identifier, the rating, the explanation, and some data to track creation and modification. We do not need those last two, but it is just good practice to include those if there is ever going to modification on data. Using our example film, it is rated “R”. And since any film title object (Titles) will ever only have one rating (for the sake of this example) there is an easy relation of a hasOne relation to the Ratings table. We need to add a foreign key to the Titles table and connect these.

hasOne Relation to Ratings
Title hasOne Rating

Easy to connect those. Now, we need to tackle the Genres. This is a little more complicated, but we can get through this. The Genres table will house the genres we need to display. This list can be as big or small as needed. Our example movie is in the “Drama” genre according to IMDB. However, in our application, the business has decided the movie is classified as Drama and Action. So now a title is going to have many genres. And a genre can belong to many titles. The “Drama” genre may belong to multiple titles. So we can not just add a new column to the Titles table, as that will not satisfy the requirements. We need to add a connecting table, and according to the naming convention of CakePHP, the name of the connector table is the alphabetical order of the two tables it is connecting.

So we need to add a table titled “Genre_Titles”. It will have an ID, and foreign keys to both tables.

HABTM Genres
Connecting the Genre and Title tables

Now we are almost done with the HABTM set up. We made it through one of them, and that was a good thing. See it was not so difficult. Now, we need to finish this up, and connect the talent table to the title. Talent can be anything. Since the company wants to display the stars of the show, the directors and writers, we need to be able to connect these. And again, this will require a a HABTM relationship. An actor can be in many titles, just like Clint Eastwood, as he was not just in Gran Torino. So he may be listed in many titles. And, with this movie, he not only stars in it, he directed it. So now we not only need to match up this talent, we have to identify it correctly. So this adds a little complexity to this, but we can do this.

As you know from the previous example, we need to create a connecting table. The name would be “Talent_Titles”. But that still will not solve the issue of identifying Clint Eastwood as an actor and director in the title. We can add a new table “Talent_Types”. This will be a “lookup table” that houses Star, Director, Writer as values. We can then connect that to the connecting table. This relationship will be a hasMany to Talent_Titles, as a star may have many entries in the Talent_Title table.

HABTM Talent and Title
HABTM Talent and Title

And that is the HABTM design. Using the Bake method, you can now bake this up, and set up the model. The thing to remember about the HABTM, it is not something to fear. Usually, if a connecting table is needed, you have a HABTM design. Remember to think in human terms when examining the data model. What does this belong to, what does it have. In this, the Title will have many actors, directors, etc. And the actors, directors, etc will belong to many titles.