My Case for DTO’s

In many of my posts about Grails and Flex integration, I take for granted that I use Data Transfer Objects to transfer data between my Grails backend and my Flex frontend. Put simply, Data Transfer Object are pure data containing classes different from the domain entity classes used to store data in the backend. I take it for granted because I’m deeply convinced that it’s the best way to do things and so far, experience has never proved me wrong. But I often get this question in comments or by mail (this is for you Martijn): why bother create an entirely separate class structure and copy data from entities to DTO’s and back instead of just using entities?

I’ve expressed my arguments a couple of times across various posts but I thought it would be nice to sum things up in here for future reference.

Where does it come from?

When I first started to work on enterprise applications and ORM-based architectures, it was with a Model-Driven Architecture framework called AndroMDA. AndroMDA was absolutely key in helping me getting started with Spring and Hibernate and I was especially inspired by one paragraph in their “getting started” tutorial, which I quote here:

Data Propagation Between Layers

In addition to the concepts discussed previously, it is important to understand how data propagates between various layers of an application. Follow along the diagram above as we start from the bottom up.

As you know, relational databases store data as records in tables. The data access layer fetches these records from the database and transforms them into objects that represent entities in the business domain. Hence, these objects are called business entities.

Going one level up, the data access layer passes the entities to the business layer where business logic is performed.

The last thing to discuss is the propagation of data between the business layer and the presentation layer, for which there are two schools of thought. Some people recommend that the presentation layer should be given direct access to business entities. Others recommend just the opposite, i.e. business entities should be off limits to the presentation layer and that the business layer should package necessary information into so-called “value objects” and transfer these value objects to the presentation layer. Let’s look at the pros and cons of these two approaches.

The first approach (entities only, no value objects) is simpler to implement. You do not have to create value objects or write any code to transfer information between entities and value objects. In fact, this approach will probably work well for simple, small applications where the the presentation layer and the service layer run on the same machine. However, this approach does not scale well for larger and more complex applications. Here’s why:

  • Business logic is no longer contained in the business layer. It is tempting to freely manipulate entities in the presentation layer and thus spread the business logic in multiple places — definitely a maintenance nightmare. In case there are multiple front-ends to a service, business logic must be duplicated in all these front-ends. In addition, there is no protection against the presentation layer corrupting the entities – intentionally or unintentionally!
  • When the presentation layer is running on a different machine (as in the case of a rich client), it is very inefficient to serialize a whole network of entities and send it across the wire. Take the example of showing a list of orders to the user. In this scenario, you really don’t need to transfer the gory details of every order to the client application. All you need is perhaps the order number, order date and total amount for each order. If the user later wishes to see the details of a specific order, you can always serialize that entire order and send it across the wire.
  • Passing real entities to the client may pose a security risk. Do you want the client application to have access to the salary information inside the Employee object or your profit margins inside the Order object?

Value objects provide a solution for all these problems. Yes, they require you to write a little extra code; but in return, you get a bullet-proof business layer that communicates efficiently with the presentation layer. You can think of a value object as a controlled view into one or more entities relevant to your client application. Note that AndroMDA provides some basic support for translation between entities and value objects, as you will see in the tutorial.

Because of this paragraph, I started writing all my business services with only data transfer objects (what they call “value objects”) as input and output. And it worked great. Yes it did require a little bit of coding, especially as I had not discovered Groovy yet, but it was worth the time, for all the following reasons.

The conceptual argument: presentation/storage impedance mismatch

Object-relational mapping is what Joel Spolsky calls a “Leaky Abstraction“. It’s supposed to hide away the fact that your business entities are in fact stored in a relational database, but it forces you to do all sorts of choices because of that very fact. You have to save data in a certain order in order not to break certain integrity constraints, certain patterns are to be avoided for better query performance, and so on and so forth. So whether we like it or not, our domain model is filled with “relational choices”.

Now the way data is presented involves a whole different set of constraints. Data is very often presented in a master/detail format, which means you first display a list of items, with only a few fields for each item, and possible some of those fields are calculated based on data that is stored in the database. For example, you may store a country code in your database, but you will display the full country name in the list. And then when the user double-clicks an item, he can see all the fields for that item. This pattern is totally different from how you actually store the data.

So even though some of the fields in your DTO’s will be mere copies of their counterparts in the entity, that’s only true for simple String-typed fields. As soon as you start dealing with dates, formatted floats or enum codes, there is some transformation involved, and doing all that transformation on the client-side is not always the best option, especially when you have several user interfaces on top of your backend (a Flex app and an iPhone app for example), in which case you’re better off doing most of these transformations on the server.

In anyway, if you change the way you store data, it should not influence too much the way you present the same data, and vice-versa. This decoupling is very important for me.

The bandwidth argument: load just the data you need

In the master/data use case, when you display the list of items, you just need a subset of the fields from your entities, not all of them. And even though you’re using Hibernate on the backend with lazy-loading enabled, fields are still initialized and transferred over the wire. So if you use entity classes for data transfer, you will end up transferring a whole bunch of data that may never be used. Now it might not be very important for hundreds of records, but it starts being a problem with thousands of records, especially when there is some parsing involved. The less data you transfer the better.

The security argument: show only the data you want to show

Let’s say you’re displaying a list of users, and in the database, each user has a credit card number. Now of course when you display a list of users, you might not want everyone to see the list of credit card numbers. You might want to expose this data only in detail view for certain users with certain privileges. DTO’s allow you to tailor your API to expose just the data you need.

The error-prone argument: argh! Yet another LazyInitializationException!

Of course there are associations between your business entities, and by default, those associations are lazy-loaded, which means they are not initialized until you actually query them. So if you just load a bunch of instances from your entity manager and send them over to your client, the client might end up with null collections. Now of course you can always pay attention, or use some tricks to initialize associations up to a certain level before you send your data, but this process is not automatic and it’s very error-prone. As for using things like dpHibernate, I think it just adds too much complexity and uncontrolled server requests.

The laziness argument: Come on! It’s not that hard!

I think that most of the time, the real reason why people don’t want to use DTO’s is because they’re lazy. Creating new classes, maintaining code that does “almost” the same as existing code, adding some code to service implementation to copy data back and forth, all of that takes time and effort. But laziness has never been a good reason for ditching a design pattern altogether. Yes, sometimes, best practices force us to do more stuff for the sake of maintainability and robustness of our code, and for me the solution is certainly not to shortcut the whole practice, but just to find the best tools to minimize the added work. With its property support and collection closures, Groovy makes both creating, maintaining and feeding DTO’s as simple and fast as it can be. AndroMDA had converters. There are even some DTO-mapping frameworks like Dozer to help you. No excuse for laziness.

For me, all the reasons above largely overcome the added work to maintain a parallel DTO structure.

Now of course, this is a very opinionated topic and you will probably have a different view. So all your comments are welcome as long as they remain constructive and argumented.