Without them your app is a wafer in the wind: on data models and persistence
Hey, folks, it’s me again, with some thoughts on data models and why they are almost ridiculously important.
I’d like to begin with a tl;dr-style declaration followed by some elaboration: data models are what allow apps to communicate with themselves over time.
Learning programming vs. developing within a framework
My story in learning development has thus far followed a kind of two-step trajectory. I started out learning basic things involving the usual suspects: arrays and strings and mathematical operations and similar things in a handful of programming languages. This went fairly smoothly. I did lots of tutorials, bought some books, and felt pretty good about how it was all going.
But when it came time to learn how to work within actual web dev frameworks (Rails being my starting point, as I discussed a while back), I found my programming knowledge to be of far less help than I had expected.
Why? Because I had yet to grasp the conceptual side of development.
Even if you gain a solid mastery of a programming language–functions, arrays, module implementation, that kind of thing–there is still a lot that you need to learn before you can hope to produce something meaningful. Being able to see development from a more bird’s-eye level and seeing how the big pieces interlock is the key to crossing the chasm, so to speak, between “learning programming” and actually building something useful that you would be eager to share with the rest of the world.
For me, the single, prime culprit that kept me on one side of the learning-building chasm was the fact that I really, really got stuck on figuring out the role of data models.
The missing link in the trinity
I knew that models were important to web development because they were part of the model-view-controller trinity, which I had seen discussed in tons of books and tutorials.
Views and controllers were fairly intuitive to me, but I was still stuck with a recurring question in my brain: if you have a controller that dictates what goes into your views as well as when and how views are rendered, where is the gap in this logic that is filled by the model? Aren’t views and controllers enough to make it all hang together?
It wasn’t until the following question dawned on me that things began to click: what role do models play in things that I’m already familiar with?
The first example that popped into my mind ended up being a pretty apt one: Twitter. It’s not surprising that Twitter came to me so quickly, as it’s something that I love dearly and use every single day.
So what kind of data model undergirds Twitter? Clearly, I can’t say exactly because I won’t be seeing Twitter’s source code any time soon. But I still think that we can construct a rough approximation of the model that undergirds it.
Let’s try to imagine how an incredibly bare-bones version would work. The most fundamental data types at play would be users and tweets. Users sign on, add themselves to the pool of already existing users, produce tweets, and then access one another’s tweets. There’s much more to Twitter than this, but this is the basic schematic engine that makes it go.
Correspondingly, we could call the two data models at play User and Tweet (following the Ruby convention of capitalizing class names). But just having these models side by side isn’t enough. We need to establish a definitive relationship between them. Are they on an equal footing? Is there a hierarchy that needs to be maintained? Are they interdependent?
In this case, it turns out that there is indeed a hierarchy. Without users you have no tweets (unless our Twitter clone started issuing hyper-creepy anonymous tweets). Furthermore, users may have many tweets, but tweets always only have one user. Users thus have a relationship of superiority over tweets.
Making it more concrete: building our stripped-down Twitter in Rails
If we picked Rails as our platform for building Twitter, we would probably do something like this when specifying the classes that subtend our model:
class User < ActiveRecord::Base
class Tweet < ActiveRecord::Base
Here, we’re using ActiveRecord to work with these two data models and also specifying their relationship with one another. What would happen, then, if we wanted to add a new data type, re-tweets? We would do that by adding this class:
class Retweet < ActiveRecord::Base
Then, we would change our Tweet class to appropriately reflect this:
class Tweet < ActiveRecord::Base
We may also want to change our User class like this, so that we could properly subordinate re-tweets to users as well as normal tweets:
class User < ActiveRecord::Base
has_many :retweets, :through => :tweets
This subordinates re-tweets to users but also ensures that re-tweets are not at the same level in the classificatory hierarchy. Users might have many re-tweets, but re-tweets are still tweets, though of a subordinate sort. They just have the distinguishing characteristic of being responses to already-existing tweets, and that difference needs to be reflected in our model. Why? Because we might want to construct, for example, a search function that allows us to search for re-tweets alone, or to figure out which users have posted the most re-tweets.
No models, no complexity
Data models correspond directly to the principle of persistence. An app needs to be set up in such a way that data is produced in a consistent and predictable way that makes it useful to the app later on (and thereby useful to the user). There would be little use for an app that produced data objects haphazardly and then couldn’t make of use of them later. It would probably lack even the minimal coherence that would allow it to merit the title “application.”
Models, be they explicit (as in my example) or implicit (as they might be if you were working in a language that did not use classes), are the source of this persistence. If you’re working in an MVC framework, the controller does a lot of stuff with data, too, from creating it to destroying it to communicating with views and telling them what to display. But controllers need to know what they’re dealing with, how it’s structured, and how it relates to other types of data that it’s working with. A controller without models is simply flying blind.
In our Twitter example, if you forget to specify that tweets belong to users, tweets will be anonymous and thus far less interesting. If you’re on the real Twitter and someone tweets “Party at my house at 7:30! Be there!” then it matters a great deal if that person is your friend Dave or Pauly Shore.
Models and the application lifecycle
IMHO, the take-home point is this: at no point in the lifecycle of your app should you take the structure of your data models for granted. Your models and the inter-relationship between them may change, as it did for us here when we added the class Retweet. But if something ever goes awry at the model level–improperly specifying your class hierarchy, to give an example–then you can expect that problem to reverberate throughout your entire application, affecting your controller, your database connections, your rendering patterns, and everything else. Due diligence at this layer will save you mountains of trouble.
For developers that work only in the back end, this might not come across as a terribly bold claim. They most likely know already that data models end up determining what kind of database(s) will be optimal for use in conjunction with the app (relational, non-relational, hybrid, etc.), which querying methods are appropriate, what server frameworks are optimal to handle that data, etc.
Models might seem like dry subject matter, but I think that we’d all be quite remiss in underestimating their importance and in not frequently taking the time to step back from our projects to make absolutely sure that our model structure is both parsimonious and logically seamless.