Data Model vs. Domain Model

ORM is a good servant but a bad master. The object-relational impedance mismatch is still there and as dangerous as ever.


Nowadays modern frameworks from the Java world such as Hibernate or Spring offer a lot of convenience for software development. One can easily build a web service, create and manage database schemas via Java code, or integrate with external systems within a few minutes.

However, such great benefits don't come without drawbacks: tools like this usually aim to solve purely technical problems. Architecture and system design are then a task left for us.

One of the most dangerous design flaws I still see a lot these days is (mis)using data models created with ORM (Object-relational mapping) tools as domain models.

This basically means that the structure of our database is visible all over the codebase. It represents a very hazardous leak of implementation details. And it's not the only issue with this approach...

The true gem of the Domain-driven design (DDD) is the strategic part. The tactical design is purely additional and today partly obsolete. Sadly, far too many developers still think that putting a "Service" suffix on the class name actually means doing DDD. They are wrong.

False Repositories

Unfortunately, even authors of popular tools such as Spring take similar shortcuts. Spring Data is a good example.

With Spring Data one can define queries for accessing data simply by composing method names following simple conventions:

interface PersonRepository extends CrudRepository {

  Optional findByGivenNameAndFamilyNameAndBirthdateBefore(String givenName, String familyName, ZonedDateTime birthdateBefore);
}

We should recall what a Repository means in the context of DDD:

A repository is a facade object that encapsulates a persistence technology and provides a domain-oriented API to the clients.

Looking back to the example above, does it fit the definition? Well, not really. It is a facade API that decouples the concrete persistence implementation from the client code, but it's hardly domain or even object-oriented.

First, it has no API that could be naturally understood by business clients. A genuine domain-driven repository would look like the following:

interface PersonRepository {
    
  Optional<Person> findAdultByFullName(String givenName, String familyName);
}

Such a repository provides much more convenience and safety to the client code. It is clear what the method does; the client is not bothered by having to solve the puzzle of how to set parameters correctly.

What we called a Repository in the first code snippet is a mere Data Access Object (DAO). To be frank, I really hesitate to call this set of procedures an object, a more appropriate name for this pattern would in my opinion simply be Data Accessor.

Of course, we can use a data accessor for implementing a repository; this fact just should not be visible to clients.

False Entities

Similar to entities. An Entity is an object that is distinguished by its identity, rather than its attributes. Take a typical ORM entity definition (Hibernate, in this case):

@Entity
class Person {

  @Id @GeneratedValue
  private Long id;

  @Column(unique = true)
  private String personalNumber;

  @Column
  private String givenName;

  @Column
  private String familyName;

  @Column
  private ZonedDateTime birthdate;
    
  // more code
}

Now, the identity of this entity is defined by a Long value that is generated and set by the database engine. Is the database ID a genuine identity of an object?

What if the Person is stored in a different database or came from an external system with a different sequence for ID distribution? In that case, the values of the ID attribute would be different, and the entities would not be recognized as identical. Bad luck!

We have let a purely technical unit such as a database define what it means to be the same thing. In fact, only the model could and should do that:

interface Person {

  PersonalNumber id();
    
  // more business methods
}

Despite the fact, that ORM entities could and usually do contain methods, by their nature, they are just data containers on steroids. They cannot really be called Objects in the true sense of the term as defined by OOP because they break one of the main features of objects: encapsulation (or more generally speaking information hiding). Therefore, they cannot be called Entities either because entities are in fact just kinds of objects.

ORM entities expose their internal attributes (usually via getters and setters) and conceptually are data structures, kind of boosted data structures, but still just data structures.

Pitfalls of Leaking Implementation Details

Exchanging a data model for a domain model is a big problem because data lack the abstraction necessary to express our business domain in code. The domain becomes implicit and invisible in the pollution of too many irrelevant details. This is a sure recipe for a big ball of mud.

Moreover, technical details hurt the purity of the model and easily lead to nasty pitfalls.

For example, you may set relational entity attributes to lazy fetching for performance reasons. Sharing such entities over the codebase is dangerous as loading those lazy attributes requires the caller code to be transactional.

Although working with domain entities outside a transaction shall be perfectly valid this will not work:

@Entity
class Person {

  @Id @GeneratedValue
  private Long id;

  @Column(unique = true)
  private String personalNumber;

  @ManyToOne(fetch = FetchType.LAZY)
  @JoinColumn(name = "parent_id")
  private Person parent;
  
  // more code
}

@RestController
class PersonRestController {

  private PersonRepository personRepository;

  @GetMapping("/{id}/parent)
  Optional<Person> getParent(@PathVariable String id) {
    return personRepository
      .findByPersonalNumber(id)
      .map(Person::getParent);  // Runtime Error!!!
  }

  // more code
}

Because the code above does not run in a transaction an exception is thrown in runtime:

org.hibernate.LazyInitializationException: 
  could not initialize proxy [Person#5] - no Session

Trying to fix this can easily end up in a hellish situation like this:

@EntityGraph(attributePaths = "parent")
@Query("select p from Person p where p.personalNumber = ?1")
Optional<Person> findByPersonalNumberWithParent(String personalNumber);

@EntityGraph(attributePaths = "parent.children")
@Query("select p from Person p where p.personalNumber = ?1")
Optional<Person> findByPersonalNumberWithParentChildren(String personalNumber);

Not only do we write a lot of boilerplate code for the sake of a technology that is aimed at saving us typing boilerplate code in the first place, but we also pollute our codebase with ugly difficult-to-maintain code, thus making it necessarily complex and no fun at all.

Or, even worse, we may feel the temptation to stretch the transactional boundaries to places where they should not be:

@RestController
class PersonRestController {

  @Transactional	// wrong place!!!
  @GetMapping("/{id}/parent)
  Optional<Person> getParent(@PathVariable String id) {
    // ...
  }

  // more code
}

Transactional boundaries should not exceed the service layer because of performance issues and separation of responsibility.

For all of these reasons, it is considered unsafe to exchange data for domain entities. A data model should always be re-mapped when leaving transactional boundaries and the boundaries should be kept non-extensive.

That is, let's make our code aware of transactional boundaries:

class PersonService {

  private PersonRepository repository;
  private PersonMapper mapper;

  @Transactional
  public Optional<PersonDto> parentById(String id) {
    return repository
      .findByPersonalNumber(id)
      .map(Person::getParent)
      .map(mapper::toDto);
  }

  // more code
}

Conclusion

ORM frameworks are great tools to deal with data using type-safe code. They might save you a lot of work when used correctly as a scaffolding utility for the persistence of the domain model.

ORM aims to describe data rather than a domain model. Don't let them rule the entire codebase.

Our domain model should be as expressive and pure as possible with no technical and implementation details leaking into the API.

Don’t let data be the domain model!

You can find the source code on GitHub.

Happy modeling!