My Case for DTO’s

In many of my posts about Grails and Flex integration, I take for granted that I use Data Transfer Objects to transfer data between my Grails backend and my Flex frontend. Put simply, Data Transfer Object are pure data containing classes different from the domain entity classes used to store data in the backend. I take it for granted because I’m deeply convinced that it’s the best way to do things and so far, experience has never proved me wrong. But I often get this question in comments or by mail (this is for you Martijn): why bother create an entirely separate class structure and copy data from entities to DTO’s and back instead of just using entities?

I’ve expressed my arguments a couple of times across various posts but I thought it would be nice to sum things up in here for future reference.

Where does it come from?

When I first started to work on enterprise applications and ORM-based architectures, it was with a Model-Driven Architecture framework called AndroMDA. AndroMDA was absolutely key in helping me getting started with Spring and Hibernate and I was especially inspired by one paragraph in their “getting started” tutorial, which I quote here:

Data Propagation Between Layers

In addition to the concepts discussed previously, it is important to understand how data propagates between various layers of an application. Follow along the diagram above as we start from the bottom up.

As you know, relational databases store data as records in tables. The data access layer fetches these records from the database and transforms them into objects that represent entities in the business domain. Hence, these objects are called business entities.

Going one level up, the data access layer passes the entities to the business layer where business logic is performed.

The last thing to discuss is the propagation of data between the business layer and the presentation layer, for which there are two schools of thought. Some people recommend that the presentation layer should be given direct access to business entities. Others recommend just the opposite, i.e. business entities should be off limits to the presentation layer and that the business layer should package necessary information into so-called “value objects” and transfer these value objects to the presentation layer. Let’s look at the pros and cons of these two approaches.

The first approach (entities only, no value objects) is simpler to implement. You do not have to create value objects or write any code to transfer information between entities and value objects. In fact, this approach will probably work well for simple, small applications where the the presentation layer and the service layer run on the same machine. However, this approach does not scale well for larger and more complex applications. Here’s why:

  • Business logic is no longer contained in the business layer. It is tempting to freely manipulate entities in the presentation layer and thus spread the business logic in multiple places — definitely a maintenance nightmare. In case there are multiple front-ends to a service, business logic must be duplicated in all these front-ends. In addition, there is no protection against the presentation layer corrupting the entities – intentionally or unintentionally!
  • When the presentation layer is running on a different machine (as in the case of a rich client), it is very inefficient to serialize a whole network of entities and send it across the wire. Take the example of showing a list of orders to the user. In this scenario, you really don’t need to transfer the gory details of every order to the client application. All you need is perhaps the order number, order date and total amount for each order. If the user later wishes to see the details of a specific order, you can always serialize that entire order and send it across the wire.
  • Passing real entities to the client may pose a security risk. Do you want the client application to have access to the salary information inside the Employee object or your profit margins inside the Order object?

Value objects provide a solution for all these problems. Yes, they require you to write a little extra code; but in return, you get a bullet-proof business layer that communicates efficiently with the presentation layer. You can think of a value object as a controlled view into one or more entities relevant to your client application. Note that AndroMDA provides some basic support for translation between entities and value objects, as you will see in the tutorial.

Because of this paragraph, I started writing all my business services with only data transfer objects (what they call “value objects”) as input and output. And it worked great. Yes it did require a little bit of coding, especially as I had not discovered Groovy yet, but it was worth the time, for all the following reasons.

The conceptual argument: presentation/storage impedance mismatch

Object-relational mapping is what Joel Spolsky calls a “Leaky Abstraction“. It’s supposed to hide away the fact that your business entities are in fact stored in a relational database, but it forces you to do all sorts of choices because of that very fact. You have to save data in a certain order in order not to break certain integrity constraints, certain patterns are to be avoided for better query performance, and so on and so forth. So whether we like it or not, our domain model is filled with “relational choices”.

Now the way data is presented involves a whole different set of constraints. Data is very often presented in a master/detail format, which means you first display a list of items, with only a few fields for each item, and possible some of those fields are calculated based on data that is stored in the database. For example, you may store a country code in your database, but you will display the full country name in the list. And then when the user double-clicks an item, he can see all the fields for that item. This pattern is totally different from how you actually store the data.

So even though some of the fields in your DTO’s will be mere copies of their counterparts in the entity, that’s only true for simple String-typed fields. As soon as you start dealing with dates, formatted floats or enum codes, there is some transformation involved, and doing all that transformation on the client-side is not always the best option, especially when you have several user interfaces on top of your backend (a Flex app and an iPhone app for example), in which case you’re better off doing most of these transformations on the server.

In anyway, if you change the way you store data, it should not influence too much the way you present the same data, and vice-versa. This decoupling is very important for me.

The bandwidth argument: load just the data you need

In the master/data use case, when you display the list of items, you just need a subset of the fields from your entities, not all of them. And even though you’re using Hibernate on the backend with lazy-loading enabled, fields are still initialized and transferred over the wire. So if you use entity classes for data transfer, you will end up transferring a whole bunch of data that may never be used. Now it might not be very important for hundreds of records, but it starts being a problem with thousands of records, especially when there is some parsing involved. The less data you transfer the better.

The security argument: show only the data you want to show

Let’s say you’re displaying a list of users, and in the database, each user has a credit card number. Now of course when you display a list of users, you might not want everyone to see the list of credit card numbers. You might want to expose this data only in detail view for certain users with certain privileges. DTO’s allow you to tailor your API to expose just the data you need.

The error-prone argument: argh! Yet another LazyInitializationException!

Of course there are associations between your business entities, and by default, those associations are lazy-loaded, which means they are not initialized until you actually query them. So if you just load a bunch of instances from your entity manager and send them over to your client, the client might end up with null collections. Now of course you can always pay attention, or use some tricks to initialize associations up to a certain level before you send your data, but this process is not automatic and it’s very error-prone. As for using things like dpHibernate, I think it just adds too much complexity and uncontrolled server requests.

The laziness argument: Come on! It’s not that hard!

I think that most of the time, the real reason why people don’t want to use DTO’s is because they’re lazy. Creating new classes, maintaining code that does “almost” the same as existing code, adding some code to service implementation to copy data back and forth, all of that takes time and effort. But laziness has never been a good reason for ditching a design pattern altogether. Yes, sometimes, best practices force us to do more stuff for the sake of maintainability and robustness of our code, and for me the solution is certainly not to shortcut the whole practice, but just to find the best tools to minimize the added work. With its property support and collection closures, Groovy makes both creating, maintaining and feeding DTO’s as simple and fast as it can be. AndroMDA had converters. There are even some DTO-mapping frameworks like Dozer to help you. No excuse for laziness.

For me, all the reasons above largely overcome the added work to maintain a parallel DTO structure.

Now of course, this is a very opinionated topic and you will probably have a different view. So all your comments are welcome as long as they remain constructive and argumented.

16 comments

  1. Thank you very much for elaborating on this subject. The focus these days seems to be entirely on the DRY principle. This advocates against using DTO’s, because of the duplication in validation logic that DTO’s entail. There is a number of frameworks out there that support this position: Grails, Roo, Ruby on Rails (or even Oracle ADF for that matter).
    Yes, it takes less time to get things running and, yes, there is less maintenance required – at least initially.
    And as I readily subscribe to the arguments that Sébastien poses, I can only wonder about what the DRY camp has to say about this.

    1. For the validation code, in the case of Grails and Flex, you have to duplicate your validation anyway since ActionScript and Groovy are totally different. But I discovered recently that you can perfectly mark your DTO classes as @Validateable and use Grails validation on them, and still use those classes for Flash Builder client stub generation without any problem. The next step would be to have a good standard validation framework in Actionscript and modify the DCD wizard to generate Actionscript validation metadata based on Grails metadata. And I’d rather go that way (the “tooling” way) rather than the “DRY-only” way.

  2. Thanks for this article about DTO.

    Could you please explain a bit how with Groovy it is as simple as fast ?

    My opinion :
    Presentation/storage impedance mismatch : UI frameworks manage transformation (eg. JSF converter)
    Security : It is in my opinion only valid for a public API.
    Bandwidth : It is in my opinion only valid for a WS. You could tune Hibernate to return only some values.
    LazyInitializationException : there is pattern to avoid it

    With Java I find DTO cumbersome and couter productive. I agree DTO is usefull in some use case.

    1. The first thing is that Groovy supports properties. So no need to create a private field, a public getter and a public setter for each of the properties in a class. It makes DTO classes much smaller (exactly like entities are smaller) and easier to maintain.

      The second thing is that Groovy makes it much easier to copy data thanks to closures and methods like collect, as well as dynamic constructors. As an exercise, try to write the java equivalent for the following code that copies data from Customer entities into CustomerListItem DTO’s:

      return Customer.findAll().collect{
      new CustomerListItem(
      name: it.name,
      birthDate: it.birthDate.getTime() //dates are typically transformed becaused they are not handled in the same way in all languages
      )
      }

      Regarding your comments:
      – How do UI frameworks handle data combination/separation? Let’s say you store a java Date but you want to show it as a separate Date and Time in your UI?
      – Every API is public as soon as you expose it on the Internet. Anyone can decompile your code/reverse engineer your network data to try and reproduce server calls.
      – Even if you tune Hibernate to return only some values, the fields are still there, even if they are not filled, so users of your API will expect to find data in them.
      – What pattern are you referring to? The infamous open-session-in-view pattern? Maintaining an Hibernate session over the wire? Now that’s what I call complex and cumbersome, don’t you think?

  3. Have you played with command objects as DTOs? Something I’ve been kicking around where Grails might be a broker to a different persistence layer. (like a legacy EJB backend)

  4. Thank you for your explanation.

    >- How do UI frameworks handle data combination/separation? Let’s say you store a java Date but you want to show it as a separate Date and Time in your UI?

    With JSF you can do it with a converter which take a date and time pattern (ie. yyyyMMdd hh:mm). It will be true only to display date time. To do input it will be different unless you have a specific date time component.

    > – Every API is public as soon as you expose it on the Internet. Anyone can decompile your code/reverse engineer your network data to try and reproduce server calls.

    And you can have a private API used only inside one project. If someone use it outside the project, it is his responsability.

    > – Even if you tune Hibernate to return only some values, the fields are still there, even if they are not filled, so users of your API will expect to find data in them.

    I agree. But it is really a problem unless you have a table with 100 columns ? (Hibernate is a leaky abstraction).

    > – What pattern are you referring to? The infamous open-session-in-view pattern? Maintaining an Hibernate session over the wire? Now that’s what I call complex and cumbersome, don’t you think?

    Or the conversation pattern.

  5. Just my 2 cents on this debate : I agree that DTO are great to separate UI from DB and I also agree that DTO are mainly a copy/paste of business object (especially for validation)

    Some technologies avoid using DTO (like JSF) but others do not.
    Some technologies helps with DTO but others do not.

    So the idea I just had : what would be great is to deduce different DTOs from a business object with annotations.

    @Entity
    @Dto
    public class Employee {
    @Id
    private String id;
    @NotEmpty
    @Size(max=20)
    private String name;
    @Size(max=20)
    @Dto(group=”full”)
    private float salary;
    }

    And then, I could write something like :
    Employee e = hibernate.get(Employee.class, id);
    return e.asDto(); //will return only id/name
    or return e.asDto(“full”); //will return id/name/salary

    I think this was not possible until I see this link on DZONE this morning : http://blog.sanaulla.info/2010/11/22/project-lombok-now-write-less-boilerplate-code-in-java/

  6. I agree that it is better in some design to use Value Objects. There are many reasons for this. But if you just copy object when going from one layer to another, you achieve nearly the same thing but with a far lower cost.

    I also want to add some points :
    – The UI control the data it send to the business. So it can messup it anyway. You can implement many checks (some will argue you must), but the Value object will not help here.

    -If you use master detail, or things like that, many time, this design go down to the database request (or even to a view or a table). So basilly when you request the “master”, you show the master. Definitely you will not compute the master from the detail in the business layer, possibly loading millions of row in the main memory for that. So the representation of your master in the data access layer is likely to be the same as your master representation in the UI.

    For the bandwith argument : hey most of the time you go down to JSON or XML anyway before sending the data, this give you a chance to avoid sending useless information. But if you are really concerned with the bandwidth, you will not get the data from the database anyway.

    Data Transformation for differents clients/implementations : Differents clients WILL need different data format to be showed on the UI anyway. Even on the same client code, format will change depending of the locale or user preferences. This formating is to be done on client side. Do you really want to make your phone app reload ALL the mails it has even cached on the phone because the user changed the language in the preferences or just choosen to see Date in 24 hour format instead of PM/AM format ? keep one format for all communications, and let client wanting to deal with different formats do the conversion. They’ll have to anyway and you will not implements 23 different business layer because you have 23 client formats.

    Security and credit card number : You will not store credit card number in the business layer in your server memory anyway. You will not request it from the database. In fact you will use very specific pipeline to deal with this. Not a good exemple.

    Lazy loading : Why would you want to use that ? It will not work anyway : The user click on the result on the browser (maybe 10 minutes after because he has go to the restroom). The transaction has long gone. The value object, the business objects too. Maybe it is not even the same server that will serve the request because you have clustering or because the other one failed. Or because you are in the cloud and you don’t control what server are up and when they are launched or shuting down. Trend is on stateless architecture. What the UI need is the detail of this item. And even if it can work, lazy loading (as in ORM terms) will tend to load more than what you really want. In fact the client will not tell you I want the detail of this item because remember i request all the item with this property, pagineted up to the page of result (with 10 per page) and user clicked on the 5th row. The UI will just request item with ID XYZ.

  7. Stumbled across this post while grappling with the issue of whether to expose entity classes or DTOs to clients when it comes to the service (no UI) that I’m creating. This seemed like a bad idea inherently, as I felt that I was exposing implementation details which seemed, as you put it, a bit lazy on my part. Users may think, heck, if this guy is so lazy that he can’t create simple DTOs, what other corners has he been cutting.

    So, in my opinion, you hit the nail on the head and “you had me at lazy”. As it turns out, I’m predominantly using Groovy, so have even less excuses.

    Thanks for the post, now for the elbow grease, but first, a cuppa…

  8. None of these arguments have convinced me. I’m working currently on two projects – one with huge DTO usage and book-like layers separation. Second one in Domain Driven Development model with heavy usage of domain object everywhere.

    I can show my points of view for these project.

    DTO project is much nicer for a bunch of indifferent developers and other people (like testers). Everyone can find a work in this project, because this involves just maybe 5% of creativity and 95% of repeatable work with struggling with always the same problems, like with tossing data between object types and layers. In a term of a pleasure of development – in a nutshell I have a big smile working with DDD project, and some vomit spurt sitting at DTO one (I’m doing this because of foreign employer who has 1:4 currency rate to this in my country). After a week of work with DDD I can look back and see a lot of value added I’ve done, in DTO I can spend a week and really nothing changes in the application within this time. From the business owner point of view of the DDD project, I can surely say that if it was layered, so called “well designed” DTO pattern we would already vanish from the market, because we couldn’t follow quick changes and customization for the particular customers. At least in the fair price and time.

    Now, the argument you said (I know I’m repeating some thoughts of guys above 😉 ).

    1. “The conceptual argument: presentation/storage impedance mismatch.”

    It depends on your conception. For me I just want to have my business object to be stored into DB without any additional code and conversion. The “persistence” of these objects is just one aspect of them. I don’t believe that the big separation is required here, what is acknowledged with the quality, cost and changes responsibility of projects I was involved.

    The “relational constraint” limitation, you’ve written about, is not really true if you have well designed long conversation management, and you aren’t bound just to persistent properties and collections. You can extend you domain object easily with different properties and behaviour related to eg. presentation layer (DDD concept), and you casually avoid tiring copying existing properties between you domaindtoform objects, what is truly 80% of work with this separation. Instead of this you can just extend your domain object with this 20% of presentation properties and behavior and safe 160% of unnecessary code 😉 (80×2 – for DTO and then for form bean).

    Your example: “For example, you may store a country code in your database, but you will display the full country name in the list.” – I see this as the “renderer object” joined to the persistence field, which you can reuse for other object having the same country code, and not introducing DTO-FORM pair objects for each such usage. So, you can use other techniques and design to achieve your goals described under this subject.

    2. “The bandwidth argument: load just the data you need”

    Not true. Just before sending you usually convert object to XML or JSON and the conversion code should care about the subset of properties you want to send outside the app.

    3. “The security argument: show only the data you want to show”

    When you render the form in JSP (let’s say, or by something else), isn’t the JSP the place where you decide which properties you want to show and which one to hide?

    3. “The error-prone argument: argh! Yet another LazyInitializationException!”

    Not really true, because exactly the same problems with lazy initialization you need to solve in persistence-service layer, as in presentation layer.

    4. “The laziness argument: Come on! It’s not that hard!”

    Not hard, but tiring and the most important unnecessary, if you can achieve the same without this.

    “I think that most of the time, the real reason why people don’t want to use DTO’s is because they’re lazy.”

    Lazy – yes, surely I’m lazy ;). But also as business owner I need to have everything cheap, quickly responsible to changes and in the same time well designed. All these goals are fulfilled only in DDD design currently. For the DTO I can have the same, but I need 4x more manpower and 2x more time.

    “Creating new classes, maintaining code that does “almost” the same as existing code, adding some code to service implementation to copy data back and forth, all of that takes time and effort. But laziness has never been a good reason for ditching a design pattern altogether.”

    I don’t think laziness is a reason. The reason is different – you can achieve the same with few times smaller time, with much smaller team, preserving quick responsibility for changes. You can compete successfully with your software with the DTO competitors, always giving big prices, and big delivery time. This is just I like in DDD (and this is a reason for me 😉 ).

    Sorry for long post, I’m just interested a lot lately why does people use patterns that have 10 or more years and write application in which the 70-80% of code is unnecessary and can be avoid. Unfortunately I always see the same arguments, like they were copied from some book, but they can’t convince me.

  9. @l0co
    Definitely do not agree with you.
    Are you really doing DDD? Because if you expose your entity properties using gets and sets definitely you are not really doing DDD.
    The domain model should contain behavior and often must run under a transactional context. If you are sending your entities directly to the presentation layer well you have a problem understanding what is DDD. I recommend you to review CQRS, and of course Udi Dahan and Greg Young comments about this subject.

Leave a Reply to Grails cette semaine (2011-01) – Traduction de l’article original Cancel reply