Monday, January 23, 2023

Autofac and Lazy Dependency Injection

 Autofac's support for lazy dependencies is, in a word, awesome. One issue I've always had with constructor injection has been with writing unit tests. If a class has more than a couple of dependencies to inject, you quickly run into situations where every single unit test needs to be providing Mocks for every dependency even though a given test scenario only needs to configure one or two dependencies to satisfy the testing.

A pattern I had been using in the past with introducing Autofac into legacy applications that weren't using dependency injection was using Autofac's lifetime scope to serve as a Service Locator, so that the only breaking change was injecting the Autofac container in the constructors, then using properties that would resolve themselves on first access. The Service Locator could even be served by a static class if injecting was problematic with teams, but I generally advise against that to avoid the Service Locator being used willy-nilly everywhere which becomes largely un-test-able. What I did like about this pattern is that unit tests would always pass a mocked Service Locator that would throw on a request for a dependency, and would be responsible for setting the dependencies via their Internally visible property-based dependencies. In this way a class might have six dependencies with a constructor that just includes the Service Locator. When writing a test that is expected to need two of those dependencies, it just mocks the required dependencies and sets the properties rather than injecting those two mocks and four extra unused mocks.

An example of the Service Locator design:

private readonly IlifetimeScope _scope;

private ISomeDependency? _someDependency = null;
public ISomeDependecy SomeDependency
{
    get => _someDependency ??= _scope.Resolve<ISomeDependency>()
        ?? throw new ArgumentException("The SomeDependency dependency could not be resolved.");
    set => _someDependency = value;
}

public SomeClass(IContainer container)
{
    if (container == null) throw new ArgumentNullException(nameof(container));
    _scope = container.BeginLifetimeScope();
}

public void Dispose()
{
    _container.Dispose();
}

This pattern is a trade-off of reducing the impact of re-factoring code that doesn't use dependencies to introduce IoC design to make the code test-able. The issue to watch out for is the use of the _scope LifetimeScope. The only place the _scope reference should ever be used is in the dependency properties, never in methods etc.  If I have several dependencies and write a test that will use SomeDependency, all tests construct the class under test with a common mock Container that will provide a mocked LifetimeScope that will throw on Resolve, then use the property setters to initialize the dependencies used with mocks. If code under test evolves to use an extra dependency, the associated tests will fail as the Service Locator highlights requests for an unexpected dependency.

The other advantage this gives you is performance. For instance with MVC controllers and such handling a request. A controller might have a number of actions and such that have touch-points on several dependencies over the course of user requests. However, an individual request might only "touch" one or a few of these dependencies. Since many dependencies are scoped to a Request, you can optimize server load by having the container only resolve dependencies when and if they are needed. Lazy dependencies.

Now I have always been interested in reproducing that ease of testing flexibility and ensuring that dependencies are only resolved if and when they are needed when implementing a proper DI implementation. Enter lazy dependencies, and a new example:

private readonly Lazy<ISomeDependency>? _lazySomeDependency = null;
private ISomeDependency? _someDependency = null;
public ISomeDependency SomeDependency
{
    get => _someDependency ??= _lazySomeDependency?.Value
        ?? throw new ArgumentException("SomeDependency dependency was not provided.");
    set => _someDependency = value;
}

public SomeClass(Lazy<ISomeDependency>? someDependency = null)
{
     _lazySomeDependency = someDependency;
}

This will look similar to the above Service Locator example except this would be operating with a full Autofac DI integration. Autofac provides dependencies using Lazy references which we default to #null allowing tests to construct our classes under test without dependencies. Test suites don't have a DI provider running so they will rely on using the setters for initializing mocks for dependencies to suit the provided test scenario. There is no need to set up Autofac to provide mocks or be configured at all such as with the Service Locator pattern.

In any case, this is something I've found to be extremely useful when setting up projects so I thought I'd throw up an example if anyone is searching for options with Autofac and lazy initialization.

Thursday, May 30, 2019

JavaScript: Crack for coders.

I've been cutting code for a long time. I was educated in Assembler, Pascal, and C/C++. I couldn't find local work doing that so I taught myself Access, Visual Basic, T-SQL, PL-SQL, Delphi, Java, Visual C++, and C# (WinForms, WebForms, MVC, Web API, WPF, and Silverlight). It wasn't until I really started to dive into JavaScript that programming started to feel "different".

With JavaScript you get this giggly-type high when you're sitting there in your editor with a dev server running on the other screen tinkering away and the code "just works". You can incrementally build up your application with "happy little accidents" just like a Bob Ross painting. WebPack and Babel automatically wrangled the vast conflicting standards for modules, ES versions, and such into something that actually runs on most browsers. React's virtual DOM makes screen updates snappy, MobX is managing your shared state without stringing together a range of call-backs and Promises. And there is so much out there to play with and experiment. Tens of thousands of popular packages on npm waiting to be discovered.

But those highs are separated by often lingering sessions testing your Google-fu trying to find current, relevant clues on just why the hell your particular code isn't working, or how to get that one library you'd really like to use to import properly into your ES6 code without requiring you to eject your Create-React-App application. You get to grit your teeth with irritations like when someone managing moment.js decided that add(number, period) made more sense than add(period, number) while all of the examples you'd been using had it still the other way around. Individually, these problems seem trivial, but they really add up quick when you're just trying to get over that last little hurdle between here and your next high.


JavaScript development is quite literally Crack for coders.

Monday, May 20, 2019

YAGNI, KISS, and DRY. An uneasy relationship.

As a software developer, I am totally sold on S.O.L.I.D. principles when it comes to writing the best software. Developers can get pedantic about one or more of these principles, but in general following them is simple and it's pretty easy to justify their value in a code base. Where things get a bit more awkward is when discussing three other principles that like to hang out in coding circles:

YAGNI - You Ain't Gonna Need It
KISS - Keep It Stupidly Simple
DRY - Don't Repeat Yourself

YAGNI, he's the Inquisitor. In every project there are the requirements that the stakeholders want, and then there are the assumptions that developers and non-stakeholders come up with in anticipation of what they think stakeholders will want.YAGNI sees through these assumptions and expects every feature to justify it's existence.

KISS is the lazy one in the relationship. KISS doesn't like things complicated, and prefers to do the simplest thing because it's less work, and when it has to come back to an area of a project later on, it wants to be sure that it's able to easily understand and pick the code back up.  KISS gets along smashingly with YAGNI because the simplest, easiest thing is not having to write anything at all.

DRY on the other hand is like the smart-assed, opinionated one in a relationship. DRY wants to make sure everything is as optimal as possible. DRY likes YAGNI, and doesn't get along with KISS and often creates friction in the relationship to push KISS away. The friction comes when KISS advocates doing the "simplest" thing and that results in duplication. DRY sees duplication and starts screaming bloody murder.

Now how do you deal with these three in a project? People have tried taking sides, asking which one trumps the other. The truth is that all three are equally important, but I feel that timing is the problem when it comes to dealing with KISS and DRY.  When teams introduce DRY too early in the project you get into fights with KISS, and can end up repeating your effort even if your intention wasn't to repeat code.

YAGNI needs to be involved in every stage of a project. Simply do not ever commit to building anything that doesn't need to be built. That's an easy one.

KISS needs to be involved in the early stages of a project or set of features. Let KISS work freely with YAGNI and aim to make code "work".  KISS helps you vet out ideas in the simplest way and ensure that the logic is easy to understand and easy to later optimize.

DRY should be introduced into the project no sooner that when features are proven and you're ready to optimize the code. DRY is an excellent voice for optimization, but I don't believe it is a good voice to listen to early within a project because it leads to premature optimization.

Within a code base, the more code you have, the more work there is to maintain the code and the more places there are for bugs to hide. With this in mind, DRY would seem to be a pretty important element to listen to when writing code. While this is true there is one important distinction when it comes to code duplication. Code should be consolidated only where the behavior of that code is identical. Not similar, but identical.  This is where DRY trips things up too early in a project. In the early stages of a project, code is highly fluid as developers work out how best to meet requirements that are often still being fleshed out. There is often a lot of similar code that can be identified, or code that you "expect" to be commonly used so DRY is whispering to centralize it, don't repeat yourself!

The problem is that when you listen to DRY too early you can violate YAGNI unintentionally by composing structure you don't need yet, and make code more complex than it needs to be. If you optimize "similar" code too early you either end up with conditional code to handle the cases where the behaviour is similar but not identical, or you work to try and break down the functionality so atomically that you can separate out the identical parts. (Think functional programming) This can often make code a lot more complex than it needs to be too early in the development leading to a lot more effort as new functionality is introduced. You either struggle to fit with the pre-defined pattern and restrictions, or end up working around it entirely, violating DRY.

Another risk of listening to DRY too early is when it steers development heavily towards inheritance rather than composition. Often I've seen projects that focus too heavily and too early on DRY start to architect systems around multiple levels of inheritance and generics to try and produce minimalist code. The code invariably proves to be complex and difficult to work with. Features become quite expensive to work with because they cause use cases that don't "fit" with previous preconceptions of how related code was written to be re-used. This leads to complex re-write efforts or work-arounds.

Some people argue that we should listen to DRY over KISS because DRY enforces the Single Responsibility Principle from S.O.L.I.D.  I disagree with this in many situations because DRY can violate SRP where consolidated code can now have two or more reasons to change.  Take generics for example. A generic class should represent identical functionality across multiple types. That is fine, however you can commonly find a situation where a new type could benefit from the generic except...   And that "except" now poses a big problem. A generic class/method violates SRP because it's reason for change is governed by every class that leverages it. So long as the needs of those classes is identical we can ignore the potential implications against SRP, but as soon as those needs diverge we either violate SRP, violate DRY with nearly duplicate code or further complicate the code to try and keep every principle happy. Functional code conforms to SRP because it does one thing, and does it well. However, when you aren't working in a purely functional domain, consolidated code serving a part of multiple business cases now potentially has more than one reason to change. Clinging to DRY can, and will make your development more costly as you struggle to ensure new requirements conform.

Personally, I listen to YAGNI and KISS in the early stages of developing features within a project, then start listening to DRY once the features are reasonably established. DRY and KISS are always going to butt heads, and I tend to take KISS's side in any argument if I feel like that area of the code may continue to get attention or that DRY will risk making the code too complex to conveniently improve or change down the track.  I'll choose simple code that serves one purpose and may be similar to other code serving another purpose over complex code that satisfies an urge not to see duplication, but serves multiple interests that can often diverge.


Wednesday, April 10, 2019

The stale danish


One Monday, while walking to the office you pass a little bakery with a window display. A beautiful looking danish catches your eye. The crust looks so light and fluffy, a generous dollop of jam in the middle, and the drizzle of icing sugar. So tempting, but you promised yourself to cut back on the snacks, and continue your way to the office.

Each morning, you walk past that same bakery and see a danish sitting there, just beckoning you... Finally, Friday morning comes around and you see that gorgeous danish waiting there. "I can't take it any more!" you scream to yourself and you give in, walk into the bakery and buy the danish. After the guilt melts away holding that beautiful danish, you take a bite... It's long since gone stale, having been sitting there since Monday.

The stale danish is how I view the Internet, and the marvelous source of information we, as software developers have come to rely on it for. When technology is new, the Internet provides a flury of information and experiences that the global software development community collectively accumulates. However, much of the time, that information isn't versioned, and often sources like blog posts & walk-throughs aren't even dated. As the technology matures, finding the information you need to solve a particular problem becomes a treasure hunt to find a fresh danish in a sea of stale ones. Working with JavaScript frameworks like React and it's supporting libraries like Redux in particular has proven extremely challenging time and time again. Every undated bit of information needs to be taken with a grain of salt, and it's compounded by the fact that with JavaScript, there are always a number of variations to do what should be relatively standard stuff, all conforming to different interpretations of "best practices".

Even normal consumers of information are getting tricked by the stale danishes floating around the web. Technologies like Google street view in particular are marvelous, but they require constant updates because as the world around us changes, the maps and camera images quickly become out of date. Using these services to assist with landmarks can be fraught with danger as you're looking for an intersection after a car dealership or restaurant that relocated six months ago. At least Google had the common sense to put dates on their imagery.

The stale danish is the one key reason that I would much, much rather work in the back end with technologies like .Net, Entity Framework, and SQL, than working in the front end with JavaScript libraries. I love working in the front end, solving usability issues and presenting users with useful views to data and features they need. I enjoy MVC, WPF, and had enjoyed Silverlight largely because the pace of change was a lot slower, and there were a handful of sources to find directly relevant information about the technologies and how to effectively use them.

JavaScript, and open source in general is far more volatile. While you may argue this makes it more responsive so that problems get solve more quickly, the real trouble is that everyone's perception of a problem is different, so there are a lot of "solutions" out there that are screaming on the web for validation. As a software developer, projects need to have a completion date so you do need to choose from those available options at the time, and commit to them. Having relevant and accurate information at hand through the life of the project is essential for when challenges do crop up. Ripping out X to replace it with Y isn't an option, and often the breaking changes from X v0.8 to X v0.9 prohibit the idea of upgrading, so when searching for possible answers and work-arounds to problems, it would be a wonderful if more articles and posts were versioned, or at least had a "Baked On" date listed.


Friday, December 14, 2018

2 very good reasons why you should never pass entities to the client.

So you are building a web application with Entity Framework on the back end. You've got the code to load the entities now you need to get that data to the client. Then the question enters your head, "I've already loaded this data, why not simply return the entity?" Why bother creating a view model when it's just going to contain many of the fields you've already got here? Well, here's 2 very good reasons why you shouldn't, and if you have, you should seriously consider changing it.

1. Security

If you are passing an entity to the client, and critically, if you are accepting that entity back from the client, attaching it to a context, and saving it, you have opened a massive, gaping security hole in your application.

As an example lets say I have a system that a user logs into and displays a list of debts that user currently has, and allows them to do something innocent like updating a status or record a new note. The debt entity might look something like:


We have a model which contains a list of debts and a view that lets a user tick off a debt and mark it as accepted. We might only display the debt amount and maybe a couple extra details on the screen, however when you look at the debugger where the model might be exposed such as our action to accept the debt:

Since we passed the entity, and our serializer diligently serialized all properties and children we expose all of that data to the client. Hence I might have notes that aren't meant to be seen by the client, yet here they are.

The action handler looks something like:

Now this is a very basic example. In reality we could have a view that allows for a more meaningful edit against an entity.  When we click the Accept button the record gets updated as accepted, all seems good. Except...

I don't want to accept a debt of $5000. I want to make that $1.

Using the debug tools, with a break-point before that call to UpdateDebt, I can edit the debt payload to change the Amount from "5000" to "1" then resume the execution.

Suddenly my data shows that the debt was accepted, but now has an amount of $1?! 

Heck, I don't even want to pay $1. Let's make it someone else's problem.  Using the same break-point we can see that it's assigned to a Customer with ID = 1.  Let's change those Customer ID references to 2, maybe update that customer name to "Fred" and anything else that might identify me.   Presto, that debt is now someone else's problem. What happened?

Let's have a look at the UpdateDebt method:

Part of the attraction I fear developers have with Entity Framework and reattaching entities is that it will save an extra read to the database. Just attach it and save. It's also only 3 lines of code! Win. There are libraries such as GraphDiff that even help dive through related entities to attach them and mark them as modified. Doing this violates the cardinal rule of client-server software development. "Trust nothing that the client sends you!"

Now you might argue that it doesn't make sense to have a client update a single field like that. That an "accept" action would simply take a debt ID and do that behind the scenes. True, but any update action in a system that takes an entity and attaches it is vulnerable to this tampering, and importantly, any entity associated with that entity via it's references.

Even using a view model is not immune to this form of tampering. If you use a tool like Automapper to map entities to view models, but then use Automapper to map view model's back to entities you can expose yourself to the same unintended changes, though to a lesser degree provided those view models limit the data visible. 

In any case, *never* trust the data coming back from the client. Even if it's an action that just accepts a debt ID, verify that the debt ID actually belongs to the currently authenticated user. *always*. If it isn't, log the user out immediately, and log the attempt to notify administrators. You either have a bug that should be fixed, or someone actively trying to circumvent or compromise the system.

2. Performance

Now point #1 should be enough, but if you've chosen to pass entities anyways, and are validating everything coming back from the client, the second reason you should reconsider using entities is that sending entities incur a performance cost, and are easily susceptible to serious performance issues. One issue that I see commonly tripping up users around entity framework is lazy loading where they pass entities to the client. When MVC /  Web API go to send entities back to the client, the serializer touches each property on the entity. Where that property is a lazy-loadable proxy method, that touch triggers off a lazy-load call. Each of those loaded entities then have each of their properties touched to be serialized, triggering more lazy loads. What is worse is that where you might have loaded 20 rows for search results, the lazy loads for each of those 20 entities and their children will be performed 1 by 1, not using the criteria you used to load the 20 parent entities as if would have been the case if you used .Include() to eager load them. The simple work-around is to eager load everything, but that means eager loading absolutely everything associated with that entity, whether you intend to display it or not.

Even in this case we are now faced with a situation where we are loading dozens of columns of data where we might only want to display a handful. That is extra work and memory for your database to provide those unnecessary columns, little to no optimization the database can do with indexes, extra data across the wire to your app server, then extra data across the wire to the client, and extra memory needed on both the app server and client. How much faster would it be responding if it just queried and returned that handful of columns? It's noticeable.

Addressing these issues

The simple answer to avoid these issues? Use view-models. A view model is a POCO (Plain 'ol C# Object) that contains just the data you want your view to see, and the IDs you will need to perform any action from that view. Automapper now provides support for the IQueryable interface and Linq integration via the .ProjectTo() method which can pass through the Select criteria that the database will need to pass back just what is needed for the view-model. The client doesn't see anything it shouldn't, and the data is optimal for performance by not sending anything that isn't needed.

When it comes to getting those view models back: Trust nothing. Every operation and update absolutely must be authenticated and have all applicable IDs resolved back to the user. Any deviation should be logged and generate an alert to administrators.

Conclusion


Hopefully this helps convince developers to reconsider ideas to pass entities between their controllers and views / API consumers.  If you are working on a system that does pass entities around, at least at a minimum be sure to test your client with the debugger tools thoroughly to see exactly what data is being exposed, and ensure that every screen and action is properly guarding against JSON data tampering.

Wednesday, December 12, 2018

Maximizing Performance with Entity Framework & MVC / Web API

You've just started on a contract and the manager or lead developer comes up to you and says: "Ok, we were using Entity Framework for our data layer but it's too slow so we need to replace it with {Dapper/NHibernate/EF-Core/NoSQL/ADO+Sprocs/etc.}"

When I hear this (and I have heard it more than a couple times) it sends the warning bells ringing, and presents an opportunity to save clients a whole lot of money. 99.5% of the time the issue isn't that EF is too slow; The issue is that they just don't know how to use it, and even switching to another technology stack they will likely encounter the same or worse performance issue.

Whenever I face the task of improving performance with EF, there are always issues and smells to investigate and eliminate. In this article I'll be outlining some of the major ones, and what can be done about them. Some of these items can involve a fair amount of re-factoring to implement, but the performance impact of these changes will be evident quite clearly to justify the changes.

Culprit #1 - Code-First

I'll get it out there right now. I *hate* EF code-first. I mean I really hate code-first. It is in the list of technologies that I wish were never invented, right up there with "remember password" and "hide file extensions". The road to Hell was paved in good intentions, and code-first accounts for more than a few bricks on that path. Code-first and Migrations make up now the majority of EF related issues on Stack Overflow. They are great tools for whipping up Alpha 0.1 and getting something up and running fast, but beyond that they should simply be turned off. Code-first databases *can* be set up efficiently, but I'd say that every instance of a code-first database implementation that I've seen in a production system makes my inner-DBA shrivel and die a little. I am not a DBA, but I have worked with enough to have picked up on what makes for efficient and effective database design. Out of the box, code-first offers none of this. My recommendation is that if you're using it and not experiencing performance issues, then you shouldn't be reading this article. If you are reading this article and using code-first, seriously consider turning it off unless you know this beast well enough to have addressed these following smells:

a) Inefficient or missing indexes. Indexes are something that in EF systems should be added and maintained as systems mature. Looking at real-world usage and performance you should be constantly evaluating and implementing indexes. This isn't something that should be set up "first" (premature optimization), and it shouldn't be forgotten/ignored.
b) Poor schema design. Simple things like using GUIDs for PKs with NewId() instead of NewSequentialId(), or using meaningful string PKs/FKs, not to mention probably a dozen sins that a DBA will point out about generated schemas.
c) Time wasted attempting to resolve issues with code first generation and migrations, and the inevitable work-around/hacks to keep it happy. How many columns are left as inefficient data types, confusing names, or otherwise polluting your schema, indexes, and entities because you couldn't get EF migrations to clean them up properly without failures or side-effects?
d) Generation is no substitute for a good DBA. More than a couple teams that I have joined have foregone a DBA thinking that developers using code-first negate the need for a DBA. (Frankly, if I was a DBA faced with a team planning to use code first, I'd expect I'd either get frustrated and quit, or have been forced out.) If you're using code-first without access to a DBA to tune it you are asking for trouble. If you do still have a in the team DBA, give them a hug for sticking around and putting up with you!

No, I don't like code-first at all, and it's responsible in part or whole for at least a few of the issues below. I will always recommend using a database first approach with EF and manage the schema manually, preferably with access to a DBA.

Culprit #2 - Schema related issues.

Even if code first wasn't responsible for the schema, there are a few schema related bugbears to investigate and potentially improve. These often aren't the primary performance bottleneck, but they are one of the quickest wins to squeeze a bit of performance out without code changes, and these steps will help identify bigger wins in re-factoring for further culprit items.

The first thing that every EF developer needs to be familiar with is a profiler. This can be an EF profiler, or any SQL profiler for your database of choice will work equally well. A common one I use for SQL Server is ExpressProfiler (https://github.com/OleksiiKovalov/expressprofiler) which is a quick and simple profiling tool. Using a profiler against an isolated database that you can test against will help reveal slow, expensive queries that you can copy and run against the database with an execution plan. This can highlight missing indexes for example.

The key things to check in the schema:
a) Correctly typed keys with efficient identity/defaults. For example, if you use UniqueIdentifier PKs, are you using NewSequentialId() instead of NewId()? Are you using composite PKs where a dedicated meaningless PK could be used? Are you using meaningful string PKs instead of meaningless numeric ones?
b) Are there missing indexes, or indexes that can be improved?
c) Do you have an index maintenance schedule?
d) Do you have Transaction Log maintenance scheduled as part of your backup strategy? (SQL Server)
e) In general, has a DBA gone over your database and maintenance scheduling? If no, get someone to help look at it.

Culprit #3 - Exposing entities instead of View Models / DTOs.

In terms of performance impacts on a system, this culprit is far an above the most significant issue to track down and eliminate. By now you should have a profiler installed, and the profiler is instrumental in demonstrating the pain that passing entities around can cause. The first problem that you can encounter with returning entities, especially in the case of MVC controllers is unexpected lazy load calls. These show up in a profiler when you trigger a call against the controller then see a whole stream of database hits go through. If you put a breakpoint at the end of the call, then run to that point, and then resume to exit the method, these extra DB calls will appear after exiting the method.  What is happening is that ASP.Net is attempting to serialize the entities to return to the view. The serializer iterates over the properties and by doing so, trips the lazy loaded property proxy methods, triggering a load from the database. 

What is worse is that when you're faced with returning a collection of entities such as search results, each set of lazy loaded dependencies will be queried against the database *one.. by.. one*. Ouch! So lets say I query a page of order results. Orders have a customer reference and a delivery address reference. Customers have an address and contact details reference. If you returned a page of 20 orders, the lazy load hits will include 20x "SELECT FROM tblCustomer WHERE CustomerID = x" plus 40x "SELECT FROM tblAddress WHERE AddressID = x" (20 for the order delivery address, and 20 for the customer address) plus 20x "SELECT FROM tblContact WHERE CustomerID = x", etc. etc. for every reference of Order, and every reference of that reference and so-on.

These lazy loads can be mitigated by eager-loading the data using .Include() statements, however this still presents performance issues as the database server needs to retrieve all columns of all related tables for the applicable rows. This data then needs to be transmitted and allocated memory on the application server, before being serialized and sent across the wire to the client. That is almost always amounting to a lot more data than is needed, and doesn't present opportunities to take full advantage of indexing.

Sending entities also most importantly exposes systems to revealing more information than users should be able to access. Even if you don't add controls to your view for that data, a simple F12 and inspection in the debugging tools can expose that information to clever viewers. Passing entities back to the server and being "clever" by attaching them to a context and saving is a cardinal sin. Those entities expose all FKs etc. and could have been modified in ways that your client doesn't support, but a clever debugger could modify.

Lastly, dealing with detached entities adds complexity and potential errors within your server code. You need to test whether entities aren't already associated to a DbContext before reattaching them, and ensuring that references are updated before being saved.

Using POCO view models you can improve performance considerably by letting EF query just the columns and rows you need from the database, and pass that smaller payload across the wire to the client. Tools like Automapper now support IQueryable extensions via .ProjectTo which means you can relatively easily integrate mapping an entity query and related data into a view model that is optimized for the particular view.

I believe the biggest argument for passing entities (besides being lazy about defining a view model which looks similar to an entity) is to magically avoid having to re-read the data when performing the update. I call B.S. on that "optimization" on the grounds that systems generally do a lot more reads than writes, and updates typically involve fetching a relatively small amount of data by PK which EF performs ridiculously fast.  Time also passes from when that data is read and when it is written so systems still need to handle the fact that the server data could have been updated, so the code still needs to be prepared to re-load the entity anyways. Trading off the security and performance for avoiding defining an extra set of simple POCO classes and trying to avoid a very fast read is a bad deal.

Culprit #4 - Async, async everywhere.

When developing large, complex systems, you need to know and understand async/await and how it works with EF. However, many developers are a bit mistaken about what it does, and where it should or should not be used. Generally, I find that once teams decide to use it, they decide to use it *everywhere* for consistency. This is a mistake.

Async does not make queries faster. It makes them slower. What async does do is make servers more responsive in that you can kick off a fairly expensive query for one request, then free up that request thread to start processing another request. When that first query finishes, it will resume to finish off that first request on a new worker thread.  When you implement async code you add an overhead to set up the code to effectively suspend and set up a continuation point for resumption later. That overhead is both in terms of server resources, and thread resource synchronization. If you use it for big, expensive requests, those requests will not hold up server processing waiting for those queries. However, if you use it for *all* queries, every operation, including the short & sweet ones will be participating in that park & resume scheduling game.  My recommendation when it comes to async is to start with everything running synchronously, then as you identify heavy queries, update these to use async operations. Typical candidates include:

  • Searches
  • Retrieving entity graphs for updates
  • Retrieving large graphs for reports or detail views.

Removing unnecessary async code is a relatively easy re-factor and something I suggest when I see systems where async/await have been lathered throughout a system.

Culprit #5 - Large DbContexts.

As systems grow larger, the number of tables and entities can get quite large. The larger a DbContext gets, the longer it takes to spin one up. IMO this is a major drawback of most Code-First implementations. Using bounded contexts (Contexts geared towards specific areas of an application with just those entities and their direct dependencies) can noticeably improve the performance of EF operations. Entities used as a read-only reference in one context can be greatly simplified so that they can be loaded and associated more efficiently. For example a Customer may have 40+ columns, many non-nullable, and the act of creating an Order needs a reference to a customer but rather than mapping an entire Customer table entity for an OrderContext, you can simplify it to just the relevant details you would need for a customer as far as an order is concerned. Smaller entity definition + fewer entity definitions = faster context spin-up and simpler read/write code. There is probably a way to get bounded contexts cooperating against a single DB schema /w migrations, but frankly I find code-first is more hassle than it's worth.

Introducing bounded contexts with a DB-first configuration is a relatively simple task to approach on a screen by screen basis. Often this is an approach I use to introduce EF into legacy systems, and a similar approach can be employed to separate a set of pages from an existing global context into using a new purpose-built context. An absolute must for this step is to ensure that Culprit #3 isn't rearing it's head. To have this level of flexibility you cannot be passing entities around.

Culprit #6 - Entity Framework 4

This item deserves special mention. In more than a few instances I've seen criticisms of EF performance from teams that are still using Entity Framework 4. I've used EF since EF 4 was first introduced (technically EF v2) and unless I *had* to use it, I didn't. I'd previously been an NHibernate advocate. I'd tinkered with EF 5, but I didn't take EF seriously until EF 6 was released. From a performance perspective, EF 4 is a dog's breakfast, and any project still using it should immediately be looking at upgrading to EF 6. The migration path from 4 to 6 is pretty simple. Unfortunately code written around EF 4 will not be initially taking advantage of many of the improvements introduced with 6 such as deferred execution, but those can be re-factored in over time to net significant performance improvements. Simply replacing EF 4 with EF 6 would net 10~25% performance improvements without any other changes.

Conclusion

Hopefully these six culprits can help you with identifying performance problems with EF-backed systems and some ideas on how to improve them rather than giving into believing EF needs to be replaced for something else. If you have any comments or questions about coaxing some performance out of EF flick me a comment or post your question on StackOverflow tagged under "entity-framework" as I tend to follow that tag.

Wednesday, August 28, 2013

On Agile Project Management

I've often seen a bit of confusion around how to manage projects that are using Agile approaches for software development. On the one hand businesses are very receptive to continuous releases, however on the other hand they struggle with trying to fit the project budget and timeline into more traditional project management molds.

The trouble stems from Agile projects being measured on a velocity basis with the highest priority features being tackled first. From a project management perspective you have to keep an eye on what needs to be done, what has been done, how fast things are progressing, and that they are heading the right direction. Unfortunately the instruction given to them is to bolt this somehow into something like a PRINCE2 where the two are pretty distinctly incompatible.

How I visualize managing Agile projects:

You have a large furnace. This furnace burns money (logs) and produces business value. (steam) Developers, testers, and business analysts take the form of valves for directing the steam to practical goals. Your projects are funded by budgets which form different stacks of logs. Adding up all of the valves tells you how fast the furnace burns logs, that is a constant unless you add valves or increase their size.

Now as a project manager you have control over what logs get put in the furnace and where the steam is directed. The furnace burns a constant rate based on the attached valves. Something is always fed into the furnace, so you nominate the pile of logs to pull from, but if you don't specify, the furnace loader will just start throwing in anything flammable which could include the furniture.

A common problem that I'm faced with is when a project manager is trying to directly tie the output of one or more valves to the burning of a specific log. Between sprints, or sometimes even within a sprint they are tempted to fiddle with the valves to switch between different outputs while a log is burning. (This work needs to be billed against billing code X, while that work needs to be billed against Y.)

Some key problems with this:
Agile teams work most efficiently when they are not context switching between tasks. Each time you fiddle with a valve, steam is lost. Developers have to drop what they're doing, start something, put it down, and go back to what they were working on.
Logs burn more smoothly than shards. Project managers stop feeding logs, instead they try and budget for pieces of work and hand-feed just enough money from a particular pile to produce a specific amount of steam. Agile projects estimate on *difficulty* not cost. The measurement is how fast features are implemented, and when they're deemed good enough, not predicting how long something will take to complete. It might be tempting to pre-cut budgets to work in micro-iterations, however the reality is that staff and contractors are paid 9-5. You'll end up with 5 hours of "budget" spent to cover 8+ real hours of cost, and 3+ hours coming from "somewhere".

If you must context switch for budgeting purposes then the best approach I can recommend is to set it up as a completely separate furnace and dedicate valves to it. Avoid attempting to move valves back and forth frequently.

Does this mean that Agile teams cannot switch between priorities? Certainly not! They're ideally set up for dealing with shifting priorities, but what a project manager must tackle is how the logs are accounted for, and managing the valve changes as efficiently as possible.

1) Feed in the logs, and balance out the piles at the end of the sprint. At the beginning of a sprint set the valves. For these two weeks these developers will be working on these stories, while the rest of the team continues with Y. At the end of the sprint you look at what was delivered in both projects, account for which piles the logs for that sprint were coming from, and what the valve settings will need to be for the next sprint. This is a different frame of reference, at the beginning of the sprint you aren't allocating 3 logs from budget A, and 7 from budget B, you are merely setting the valves and putting 10 logs in the furnace. Based on what you get out at the end of the sprint will determine what piles the logs were pulled from.

2) Look at the quality and type of valves available. Some valves leak more than others, especially when fiddled with. Valves can represent individual members of the team, or groups within the team. Fewer large valves will direct the steam more efficiently than lots of small ones. Overall, the less you fiddle with valves, the more efficiently the steam is materialized into product. Getting into the habit of grouping developers into larger valves is also beneficial when you look to grow a team to increase the burn rate for the furnace. Adding 10 individual valves to 10 existing individual valves will mean you have 10 new, fairly leaky valves. However if you had developers grouped into teams, new developers can be merged into new teams with some or all of the developers from the existing teams to tackle new challenges, or added to existing teams. Developers reinforce each others efforts which helps mitigate leaks. (or at least brings issues to light early to be addressed.)

How does this fit with efforts to get budget approvals, set deliverable feature sets and delivery dates? I can't answer that, but I hope my perspective above gives some food for thought about how to better fit it into more traditional frames of reference, or convince upper echelons to to better fuel Agile projects and continue to see the benefits to the end deliverable.