Wednesday, March 10, 2010

DDD and the Repository

Repository vs. DAO vs. ORM

So there is all this debate about repositories and sizing the up to other accepted techniques, most common ones being DAOs and ORMs. Out of it have come some pretty heated debates. In all of it, much of it seems to be a battle of egos. Ok, that's not a nice thing to say, I'll take that back. Actually, I think it is a cultural thing. No, really. We have these more classical technologies that are beginning to mix with newer ones. The two cultures are generally quite different. The classicist guys have always done it a certain way and have ground themselves into a profound fondness of the techniques they adopt. Rightfully so, too! It has worked for decades! Then we have the new - let's call them contemporary developers – that are gung-ho and evangelistic about these new techniques and about how its going to solve the world. Well, admittedly, not all are like that thankfully, but you get the idea.

Folks, welcome to software engineering. Rather than take sides, I'm one to stand back and study the scene - look before you leap, right? At the same time, you have to choose something and go with it! Argh, I say! Stuck between a rock and a hard place! Its true that developers often get too feisty and often "protect" their view (too often, I do too), rather than staying humble and searching for a better truth, even if the truth "depends" and wont instantly come within a single conversation or even a few conversations. But once in a while, I meet a developer or two that can do this and are extremely humble yet productive. Sadly, most of them I "met" online and I never had the pleasure to work with them! I totally take my hat off to them and appreciate and respect them greatly. It is very honorable.

Anyway, until recently, I was a data access layer (DAL) kind of guy with total trust in ORM technologies as the "be all" solution. Well almost. Its just that my business logic layer (BLL) was more like a "facade with business logic" than a domain-centric layer. But then I began a new project. Now I have multiple data sources to deal with and a plan for a services layer atop all of this "stuff". It has grown large enough now that testability has become imperative, too.

Hello, repository (or whatever you are)

So here I am, in the DDD marketplace, and I'm going shopping. And I'm NOT buying the first deal I see! Onward. It has been taking me a while to appreciate what this "repository" really is about. I’ll be honest, at first, I just didn’t get it. I thought to myself (still in a data-centric mentality mind you), why on earth would I want to develop a repository when my ORM provides the same things? So I went and read the repository pattern definition by Martin Fowler again … then again … and then one more time. It still didn’t stand out anymore distinctly than a DAO pattern to me. And then the DDD community has a slightly different definition for it which mentions aggregate roots, entity objects, and value objects. This difference of accuracy and perhaps ambiguity in a pattern is what got me and I suspect others too. So some argue that the DAO serves the same essence as the repository and that it is overkill. Others say that the repository even replaces the DAO. And then, a many say that both are needed. Ironically though, everyone seems to agree on the notion of "business objects". However, their exact implementation seems to vary. So rather than arguing with fellow developers or just protecting a view, I've kept my lips sealed and doing a lot of reconnaissance work.

Rant: It often takes a certain amount of soak time for things to be really and deeply realized - I don't care how "smart" you are. Some get right to it and bang it out on the keyboard and beat it into shape, which isn't always a bad thing granted that [1] you already know what you are doing or [2] it is a sort of prototyping or experimental effort that will deliver some kind of surviving value – even if it was an intentional “lessons learned” trial.

What about ORMs? Don't these babies serve the purpose?

Object-relational Mappers (ORMs) have been treated as a godsend and they should be for what they provide. At the same time however, ORMs are still maturing and there are many different styles out there: Hibernate/NHibernate, LLBGenPro, WilsonORM, LinqToSql, ADO.NET Entities to name a few. Each one have different caveats and different feature sets, and every one of them have different interfaces. Some provide caching. Many (but not all) provide model shaping. Most have code generation support. I will point out though that the most common sought after thing is the notion of persistence ignorance - a sort of holy grail for component-oriented developers. Some but not all ORMs fully support this as their mapping scheme sometimes "pollutes" the interface. The Interface Segregation Principle (ISP) from the five SOLID principles introduced by Robert Martin which asserts that all interfaces should remain cleanly factored and define a discrete interface which targets a single theme/concept:

The ISP says that once an interface has gotten too 'fat' it needs to be split into smaller and more specific interfaces so that any clients of the interface will only know about the methods that pertain to them. In a nutshell, no client should be forced to depend on methods it does not use.

Another similar principle that it violates is the High Cohesion Principle (HCP) of the General Responsibility Assignment Software (GRASP) principles, defined as:

High Cohesion is an evaluative pattern that attempts to keep objects appropriately focused, manageable and understandable.

The most typical example of employing these principles is the desire for a clean POCO/POJO model to use across many ORMs or even as the basis for the business layer. Some ORMs allow decorating the class types in the model with attributes, thereby "polluting" the model with technology-specific elements. A few ORM model designer tools even encourage this via making it the default behavior. Until .NET 4.0, ADO.NET Entities worked much this way, though it has always supported external mappings.  NHibernate by default uses external mappings but has support for attribute-based mapping, too. This is perhaps why many still favor NHibernate in addition to its maturity. All said and done, however, the ORM is inherently data-centric. So until the day comes that an ORM evolves to a full fledged modeling framework (for which I would quickly assert that it would then supersede an ORM), using the ORM as a "silver bullet" I'm afraid is smelly for all but relatively simple applications. Ponder a few reasons:

  • Every time the underlying DB changes, the model is subject to change.
  • Violates the ISP of the SOLID patterns because the interface has become too "fat."
  • No out of the box support physically distributed tiers, where a DAL resides on one server, and the BLL resides on another server in the middleware.
  • Little or no support for multiple data sources for a given model.

So a repository is handy after all. What is the missing piece?

I digress. I've come to think of my repository as a "view" of the model. Simple as that. So there is a need for a controller in all this to make it work right. And I don't think the repository should be a controller AND a view. So in my scenario, I've come up with another type to facilitate this. Further, I actually think of my repository in terms of a "resource". I borrow this idea from how the System.Transactions API works in .NET as well as the "transactional programming" paradigm. Unless you've been under a rock, you've noticed all the work and research going on with this: STM.NET, apache.commons.transaction, COM+/Enterprise Services, among others. So for me, all this issue of DAO vs. repository becomes moot as I move much of that logic into a controller. So if I don't have or need a repository, but do have (which in my case I ALWAYS have) a DAO, my controller just uses that instead. Mind you, I use modern ORMs to provide DAO, ActiveRecord, DataMapper stuff. So my "DAO" is what object my ORM provides and in the way it provides it. (No more hand coding DAOs, thank goodness.) I've even gone a step further (in concept at least) to provide this notion of an "agent" over the top of my repository. This allows some degree of autonomy for my domain layer. It can self initialize. It can decide on different repositories (think: "resource") to consume based on parameters or the environment. Maybe I set a test/simulate flag on in a config. file. The agent detects it and can provide "mock" or proxy services for me. And so on.

Actually, if your business objects in your BLL stay proper and true to their purpose, they are inherently a “mock” of the underlying data [objects] anyway! So if you don’t “bind” or attach them to any underlying data source, you have truly transient objects that you can use for testing and simulation. Further, if you implement n-level undo/redo into your business objects, you now have the playback features that are also usable application features.

The other cool thing is that it could be physically distributed in a tiered scenario if needed. And of course, we MUST support SOA friendliness such that an entity-centric service could call into it easily without a whole lot of rework. And let’s not forget security, federation, and governance – very business-centric things. Anyone with practical experience in LOB application/system development know the value of these things. In a distributed environment, most of the old pros would think you’re absolutely insane to have anything otherwise.

Nonetheless, maintaining a separate business layer and adding a controller or agent, I can then reserve the right to evolve the "resources" as I see fit. If I need a transactional one, then I'll build a transactional repository. Or maybe I don't need/have a repository, and I instead have a ORM model, in which case it'd wire into the BeginTrans, CommitTrans, Rollback semantics, complete with transaction promotion. Whether you use an agent approach or controller approach, I think that is actually the missing piece. Cool enough, it fits in almost perfectly with workflow-style technologies, too.

Great, so where do we go from here?

Since all these overly-academic names like repository, unit of work, and such add more confusion than coolness, I like to find more "contemporary" representations that follow the lingua franca of the environment as my actual interfaces. Consider something like this text model:

IRepository 

IUnitOfwork 

IBusinessActivityScope : IUnitOfWork 

ITransactionalBusinessActivityScope 
: IBusinessActivityScope 

IResourceAgent 

BasicBusinessActivityScope: IBusinessActivityScope 

TransactionalBusinessActivityScope 
     : BasicBusinessActivityScope,
		   IBusinessActivityScope, 
			 ITransactionalBusinessActivityScope
			 
DomainResourceAgent : IResourceAgent

ResourceManager

Done this way, you can pick and choose which of those interfaces you actually need to commit to implementing. Then as your system evolves, you already have a bit of a strategy/vision on which way to go. "Model" before you commit to anything. Then determine interface candidates (read: candidate) from your modeling effort and commit to them. You can always add on later but never take out, so get it right.

As a use case, I would like to use something like a IBusinessActivityScope as my controller for the underlying repository. Internally, the controller would be boiled down into a few types: a manager, optional agent(s) or the concrete repository implementations themselves. This removes the unit of work concerns from your repository and now suddenly it becomes very clearly "just a view." Then your domain objects don't actually have to directly know about the repositories even, allowing you to opt-out of a repository in favor of a classic DAO, or ORM directly.

// through indirect injection, its internal
// repository is discovered 
using(var scope = new BusinessActivityScope()) { 
  // alternatively, explicitly inject a repository
  // directive via a one time call
  //scope.Initialize(settings); 
  
  //not the best way, but for the sake of example
  var orderRepo = scope.GetRepository<Order>(); 
  
  Order o = orderRepo.GetById("90210"); 
  o.Quantity += 1; 
  
  //not necessary, but ok to mark a save point
  //scope.Save(); 
  
  // signal to the controller that you are done
  scope.Complete(); 
  
  // Dispose called which internally calls 
  // Save and Complete
}
Figure 1

So with BusinessActivityScope acting as a controller, it handles the workload, resolution, and so forth, letting the repository be simply a view (with implied model constraints and/or optimizations). At that point, it wouldn't be too much of a stretch to generate a base domain model and appropriate repositories from a tool. All the work and coordination is in the controller, which knows about your base interfaces.

In sum, instead of coercing a design into a DAO or into a repository or rebut one or the other, the problem is solved with, yes, yet another pattern! If I see very clearly a model and view, where is the controller?! The answer: Build it!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.