Monday 11 March 2019

Perils of opinionated frameworks, like Spring Boot. Inverting for opinionated code.

We developers like abstraction.  Without it, we could not build applications.  Our programming disciplines even require that we code to abstractions and avoid coupling our code to detailed implementations.

However, what is the right abstractions for your application?

Sadly, the choice of abstractions really comes from our choice of framework.  Frameworks are basically abstract solutions that we extend to solve our problem.

Unfortunately frameworks, like Spring Boot, come opinionated about the  threading models you use, interfaces you need to extend, possibly the data repositories applicable and various other assumptions about your problem space.  That's a lot of restrictions before I've even written my first line of code.

What we really want to do is explore the problem space first.  This is what test driven design is all about.   We write tests to define what is successful code.  Then we implement code to pass those tests.  As we go along writing tests to cover off requirements, we subsequently churn out working code for the application.  In time we get enough working code to release as the application.

So this leads me to ask, when do we test the choice of framework?

Opinionated frameworks force abstractions too early in the development process

Well, I guess we pay very experienced senior people to make this choice.  So this choice must be correct.  It would not be for reasons like:
  • I (or our company) only know this framework, so we are using it
  • New shiny with lots of buzz words, we must use it
  • My CVs a little old, let's try something new
  • This one is cheaper
  • Architecture believed what it says on the tin
Regardless of the reason, the only way to test the framework choice is to build the application with it.  And just for those of you who like opinionated frameworks (like Spring Boot), please tell me you write the most risky aspects first.  This is so you can quickly discover if the framework's opinions match with your problem.

Sadly, even if you test with the most risky aspects, finding out the framework decision is wrong can lead to a lot of wasted code.  This arguably wastes is a lot of money for the business and can lead to failing projects.

For example, say we choose Spring Reactive.  Yay, we can make concurrent asynchronous calls out to various micro-services.  We can also use the latest in NoSQL data stores.  This was all a great decision.   However, over time we realise we have a small amount of data where integrity of the data is very important.  We find we want to use a relational database to solve this, and then incorporate JPA on this database for easier interaction.  However, our choice of Spring Reactive has disallowed this because it requires all I/O to be asynchronous (JPA is synchronous database calls).  Ok, yes, we can use Schedulers, but I seem to be continually doing work arounds for lack of transactions.  The data consistency issues are starting to mount up and we're missing deadlines.  I'm now in a position of do I throw out all the Reactive code, or do I keep making work arounds hoping it might all hang together.  I definitely need to swap jobs before this hits production and we start supporting it.  In my next job, I've learnt to use Spring Servlets for this type of problem.

The flip side of this could also be easily the case.  We start out wanting Spring Servlet for JPA interaction with a database.   However, over time we realise the database interaction is mostly read-only.  What we really wanted was asynchronous I/O from Spring Reactive to collect data from multiple micro-services and data stores concurrently.  Unfortunately, with our up front Spring Servlet choice, the data collection is just too slow.  Our work around is to use async Servlets and spawn threads to make concurrent requests.  This worked initially, but over time the load increased.  This significantly increased thread counts, resulting in thread scheduling starvation, which resulted in timeouts.  I've really got no way to fix this without significant rewrites of the application.  In my next job, I've learnt to use Spring Reactive for this type of problem.

So can look to test the framework without having to throw out all our code?

Inverting framework control

Dependency Injection went a long way in inverting control.   When I write my Servlet handling method, I no longer need to pass in all my dependent objects.  I would define dependencies, via @Inject, to have the framework make them available.  The framework, subsequently, no longer dictates what objects my implementation can depend on.

However, there is a lot more to a framework than just the objects.  Frameworks will impose some threading model and require me to extend certain methods.  While dependency injection provides references to objects, the framework still has to call the methods on the objects to do anything useful.  For example, Spring goes along way to make the methods flexible, but still couples you to Reactive or Servlet coding by the required return type from the method.

As I need the Spring framework to undertake Dependency Injection for my tests, I'm coupled to the particular Spring Servlet/Reactive abstractions before I even write my first line of code.  An upfront choice that could be quite costly to change if I get wrong!

What I really want to do is:
  1. Write tests for my implementations (as we are always test driven, of course)
  2. Write my implementations
  3. Wire up my implementations together to become the application
Well the first two is very simple:
  1. Write tests calling a method passing in mock objects
  2. Write implementation of the method to pass the test
The last becomes very hard.  The reason the last becomes very hard is there is no consistent way to call every method.  Methods have different names, different parameters, different exceptions, possibly different threading requirements and different return types.  What we need is some facade over the methods to make them appear the same.

The Inversion of (Coupling) Control (IoC) provides this facade over the method via the ManagedFunction.  The ManagedFunction interface does not indicate what thread to use, what parameters/return types are required, nor what exceptions may be thrown.   This is all specified by the contained method implementation.  The coupling is inverted so the implementation specifies what it requires.

This inversion of coupling allows framework decisions to be deferred.  As I can have all my methods invoked in a consistent way, I can go ahead and start writing implementations.   These implementations may require Reactive coding to undertake asynchronous calls out to different micro-services.  Some of these implementations may require using JPA to write to relational databases.  I really should not care at the start of building the system.  I'm tackling the concrete problems to gain a better understanding of the real problem space.  I know my methods can be invoked by the framework via wrapping them in a ManagedFunction.  We can deal with determining the right framework later on, once we know more.

Actually, this is allowing the implementations to choose the appropriate abstractions to be provided by the framework. My implementations define what objects they require, what other methods they require calling and what thread models they will require. The implementations are, effectively, defining what abstractions are required from the framework.

Therefore, it is no longer the framework being opinionated.  It is your developer code that is allowed to be opinionated.

This then allows your implementations to be opinionated about the most appropriate framework to use.  No longer do you have to guess the framework based on vague understanding of the problem space.   You can see what abstractions your implementations require and make a more informed choice of framework.

In effect, IoC has deferred choice of the framework to much later in the development process.  This is so you can can make the decision much more confidently.  And isn't this what Agile says, defer the commitment until the last responsible moment.

Summary

In summary, why be forced to make too many up front decisions about your application?  In choosing the framework, you are making some significant choices is solving your problem space.  As frameworks are opinionated, they impose a lot of coupling on your solution.

Rather, why can't I just start writing solutions to concrete problems and worry about how they fit together later on?  This allows me to make choices regarding the appropriate abstractions (and subsequently framework) when I know a lot more about the problem space.

Inversion of (Coupling) Control gives this ability to defer abstraction and framework choices to much later in the development process, when you are more informed to make the decision correctly.

Wednesday 6 March 2019

Inversion of Control (explained non-technically)

A design pattern for the Inversion of Control principle was presented in a paper published in 2015 (a free download is available here).  The premise of the paper was "can we learn something from how businesses organise themselves and translate this into software design improvements?" Basically, businesses have been around a lot longer than software systems.  So, how did businesses deal with problems, such as scale, before computers existed?

So running with this analogy, I looked at how businesses evolve. Businesses don't set out day one to be a Fortune 500. Typically, they start with you in your garage (maybe with a friend). Overtime your business grows and you hire people, assign clearer functional responsibilities and start to scale up your business. Businesses have to do this, while also changing quickly to stay competitive.

Within software, we have moved from Waterfall to Agile. Waterfall can be considered your set out to build a Fortune 500 company day one approach. Agile, on the other hand, says build only things of value and evolve your system over time.  Also, Agile focuses on being quicker to react to change. Therefore, Agile is a lot closer to how businesses grow, evolve and stay competitive.

Unfortunately, our software architectures have still stayed a "waterfall" top down approach. Architects will produce technology stack diagrams that indicate how the architectural layers of the system work (those technical pictures full of rectangles on top of each other).  The nature of these layers is always a bureaucratic control from the top layer to the bottom layer.  This is very similar to large companies with many layers of management.  So our software architectures force us to design the Fortune 500 company before developers even get to write the first line of code.

Inversion of Control is like empowering employees in a business.  Rather than the manager dictating exactly how the employees will work, the manager trusts the employees to undertake the goals of the business. In other words, the employees are in control of how the work gets done.  They will decide what help to get (Continuation Injection).  They will decide what business resources they require (Dependency Injection).  They may decide to escalate issues with the work (Continuation Injection).  They may even decide another employee may be better suited to do the work (Thread Injection).

By empowering the employee, we have inverted the control.  The employee is now in control of how the work gets done.  This is similar to the Inversion of Control in software.  The developer is now in control of how they write code.  They are not restricted by bureaucratic top down architecture controls from their managers.  This allows the developer to evolve the business's software quickly so it may grow and stay competitive.

Monday 4 March 2019

Reactive frameworks "rob Peter to pay Paul"! Avoiding the problem with Dependency Contexts

Reactive frameworks, typically, align to message driven architectures.

The problem with building message driven systems, is the message provides the context.  The nature of message driven architectures is that the handler of the message is disconnected from the producer of the message.  The only information connecting them is the message.  Hence, the message needs to provide all the necessary context and data for the handler to process the message.

This becomes complicated when downstream message handlers require more context (such as transactions).  How do you pass a transaction via a message?  Yes, we can use XA transactions to suspend and resume transactions but this creates a lot of overhead to the performance and scale these Reactive frameworks are chasing.  Furthermore, Spring Data even considers transactions not a good fit for reactive.

So how can we create a Reactive framework that enables context for such things as transactions?

Dependency Contexts

Well the first problem is co-ordination.  Firing off lots of events serviced by different functions in different threads, creates a lot of multi-threading co-ordination.  How does one know when to commit the transaction?  Or possibly decide to rollback the transaction?  Or what if an asynchronous I/O takes too long causing the held transaction to time out.  We would be required to monitor all messages and co-ordinate their results to then make a decision on commit / rollback (along with handling further non-application events, such as transaction timeout events).   This puts a lot of coding decisions on the developer outside the normal application logic.  Gone are the days the developer just threw an exception and the transaction is rolled back.

So let's bring back order to the event handling chaos.  Load the events to a queue and process events in the order they are added to the queue.  As requests on the system should be independent, we create a separate queue for each request to keep requests isolated from each other.

The resulting event loop would look as follows:


  public void startRequest(Function function) {
    this.functions.add(function);
  }

  BlockingQueue<Function> functions = new LinkedBlockingQueue<>();

  public void eventLoop() throws InterruptedException {
    for (;;) {

      // Obtain next function to execute
      Function function = this.functions.take();

      // Determine if new request (no context)
      FunctionContext context = (function instanceof ContextualFunction)
          ? ((ContextualFunction) function).context
          : new FunctionContext();

      // Allow for multi-threading in executing
      synchronized (context) {
        function.doFunction(context);
      }

      // Register the next function to execute
      Function nextFunction = context.functions.pollFirst();
      if (nextFunction != null) {
        this.functions.add(nextFunction);
      }
    }
  }

  public interface Function {
    void doFunction(FunctionContext context);
  }

  public class FunctionContext {

    private Deque<Function> functions = new LinkedList<>();

    public void triggerFunction(Function function) {
      this.functions.add(new ContextualFunction(function, this));
    }
  }

  public class ContextualFunction implements Function {

    private final Function delegate;
    private final FunctionContext context;

    public ContextualFunction(Function delegate, FunctionContext context) {
      this.delegate = delegate;
      this.context = context;
    }

    @Override
    public void doFunction(FunctionContext context) {
      this.delegate.doFunction(context);
    }
  }

Ok, a lot of code but it is now doing two things:
  • ordering all functions to be executed sequentially
  • providing a context that travels with the execution chain of functions
But how do we make use of the FunctionContext to begin and manage the transaction?

We incorporate the ManagedFunction of Invervsion of Control and use a ServiceLocator to enable storing the dependencies within the context:

  public interface ServiceLocator {
    // Uses context for transitive dependencies
    Object locateService(String serviceName, FunctionContext context);
  }

  ServiceLocator serviceLocator = ...; // can, for example, wrap Spring BeanFactory

  public class FunctionContext {

    // ... triggerFunction logic omitted for brevity

    private Map<String, Object> contextDependencies = new HashMap<>();

    public Object getDependency(String dependencyName) {

      // Pull dependency from context cache
      Object dependency = this.contextDependencies.get(dependencyName);
      if (dependency == null) {

        // Not cached, so create new and cache in context
        dependency = serviceLocator.locateService(dependencyName, this);
        this.contextDependencies.put(dependencyName, dependency);
      }

      // Return dependency for Function to use
      return dependency;
    }
  }

The Functions are now able to use the getDependency("name") method to retrieve the same objects for a request.  As the dependency objects are cached within the context, the various Functions involved in servicing the request are able to retrieve the same object.

Therefore, a transaction can be managed across Functions.   The first Function retrieves the Connection and starts the transaction.  Further Functions execute, pulling in the same Connection (with transaction established).  The final Function in the chain then commits the transaction.

Should there be a failure, the injected Exception handlers of the ManagedFunction can rollback the transaction.  Within thread-per-request architectures, exceptions are thrown by the developer's code typically rolling back the transaction.  By having the ManagedFunction injected handlers also rollback the transaction, this reproduces the ease of thread-per-request transaction management for exceptions.

Furthermore, the exception handlers would clear the FunctionContext queue of Functions.  As the transaction has been rolled back, there is no point further executing the remaining Functions.  In typical thread-per-request handling, the remaining part of the try block would be skipped.  By clearing the FunctionContext queue this mimics skipping the remaining logic and goes straight to the catch block.  In the case of ManagedFunction exception handler, this would be triggering a new function to handle the failure.

But we've just reduced Reactive framework to a single sequence of functions, losing the concurrency it can provide!

Well beyond making it easier to code now that there is some order in function execution, we can introduce concurrency by spawning another sequence of functions.  As the FunctionContext ties the sequence of functions together, we just create a new FunctionContext to enable concurrency.  The following code shows this:

  public class FunctionContext {

    private Deque<Function> functions = new LinkedList<>();

    public void triggerFunction(Function function) {
      this.functions.add(new ContextualFunction(function, this));
    }

    public void spawnNewSequence(Function function) {
      this.functions.add(function); // event loop will create new FunctionContext
    }

    // ... getDependency(...) removed for brevity
  }

In actual fact, we have just created Threads running on an event loop.  The sequence of functions are executed in order just like imperative statements are executed in order by a thread.  So we now have Threads without the overheads of a thread-per-request.  The dependencies are bound to the context and subsequently the Thread of execution - making them effectively ThreadLocal.  As ThreadLocals are thread safe, we now have safe multi-threading functional code.

As dependencies are effectively ThreadLocal to the sequence of Functions, they can be mutated for the next Function to pick up the change.  Yes, immutability better removes developer errors, however this should not be reason to restrict the developer from doing it.  This is especially the case if you want to mutate objects via an ORM (e.g. JPA or Spring Repositories) to do updates in the database.

OfficeFloor (http://officefloor.net) implements Dependency Contexts in it's threading models.  OfficeFloor, actually, makes this easier to develop by introducing the concept of Governance.  Governance does the above transaction management by configuration declaration (much like a @Transaction annotation on a method).  However, as OfficeFloor uses graphical configuration, this is done graphically.  Rather than annotating the code directly, OfficeFloor graphically marks functions for transaction governance for better code re-use and flexibility in configuring applications.  An example of this configuration is as follows:


Please try out OfficeFloor (tutorials) and we value all feedback.

Summary

So when Reactive frameworks are tooting their horn, they are actually doing this while restricting you further as a developer.  Because the reactive frameworks can't handle the context problem, they push this problem onto you the developer in avoiding context.

Subsequently, reactive frameworks rob developers (Peter) of context to pay for the message passing problem in their frameworks (Paul).

By incorporating dependency contexts into event loops, you can incorporate context that we have grown to love in thread-per-request for managing things such as transactions.