Py's Sty: Linq2SQL Cache & Connectivity

One "gotcha" you will need to consider when using Linq2SQL is how to handle situations where a connection is lost at a moment when a SubmitChanges call is made. (Basically the period of time between when the last object reference is retrieved from the cache, and SubmitChanges is called.)

The SubmitChanges call will fail, expressing what is wrong, however the cached copies of the data will hold onto their changes until they are committed, or refreshed. This can have some truly confusing side-effects that can be a lot of fun to track down.

Case in point: I recently had to track down one of these issues in a simple barcode stock application. It basically consisted of orders and boxes. As boxes of stock come in, they are scanned with a barcode reader, looked up in the DB using Linq2SQL, have their statuses validated, and updated. As each box is scanned in, the order is also loaded to inspect whether or not all of its boxes have been scanned into stock, and when they all are, the order is updated to "complete". The bug was that every so often (once every several weeks or more) we got a situation where an order was found to be complete while one of its boxes was not recorded in the DB as "in stock". We eliminated every possibility, scoured to code for status change bugs, but there was nothing. Scanning exceptions were already brought up to the screen, but operators typically dealt with several scan issues (re-scanning boxes after being interrupted, etc.) so they didn't notice anything odd. After extending the logging support to see what was going in I found that the box was scanned, an exception occurred, but when re-scanned it said it was in-stock, and after all remaining boxes on the order were scanned, the order found all boxes to be in-stock, and diligently closed itself off.

The issue was Linq2SQL's caching behaviour. The exception was occurring when the box's change was submitted to the database. Any problem with the connection to the server (This was a client running in Victoria hitting a DB in Queensland) and the SubmitChanges would fail, but the updated objects remained in the cache. Since the same data context was used for all boxes scanned in during a receive stock session, further requests for that box retrieved the cached copy.

This is a bad scenario to be in. Your options are either to retry the submit changes until it either goes through, or you try to revert the changes until they revert, or you force the application to shut down if neither of those work. I was reporting back to the user that there was an error, and IMO if there is a problem then the user should retry what they were doing, I.e. re-scan the box, so I wanted to revert the changes. If the data source still wasn't available then the user should shut down the application and contact support.

The first solution that came to mind was to catch the exception from SubmitChanges and refresh the affected object graph. Since the box had child entities that were also updated, plus an order line item, this entire graph would need to be refreshed. Easy enough, as objects are queried and updated, they got added to a revertList should the SubmitChanges fail:

try
{
    DataContext.SubmitChanges();
}
catch
{ 
    DataContext.Refresh(RefreshMode.OverwriteCurrentValues, revertList);
    throw;
}

Unfortunately this doesn't work as the Refresh method will fail with:
System.InvalidOperationException: The operation cannot be performed during a call to SubmitChanges.
at System.Data.Linq.DataContext.CheckNotInSubmitChanges()
at System.Data.Linq.DataContext.Refresh(RefreshMode mode, IEnumerable entities) ...

So I needed to trigger the Refresh call outside of the SubmitChanges scope. The first thing to note is that while the system is processing, the scanner is turned off until the processing is completed. For minor errors such as scanning a box twice, etc. the scanner beeps out, displays a message, and reactivates. For serious application errors the scanner remains off until the application determines the exception can be corrected. The user knows that if the scanner beeps out and turns off, check the screen, and try restarting the application if it doesn't turn on within a minute or so.

In any case, the solution was to use an asynchronous operation to handle the scanner error and data refresh. This meant worker thread and locking off the scanner access until the refresh was successful, or timed itself out trying.

try
{
  DataContext.SubmitChanges();
}
catch
{ 
  revertDomainObjects(scanner, revertList);
  throw;
}

/// <summary>
/// This method will attempt to revert changes to objects to reflect the data
/// state. This will be called in situations where the data connectivity has 
/// been interrupted, where data state does not reflect cached state. This will
/// attempt to refresh cached copies of objects to match data state.
/// </summary>
/// <param name="revertList">List of domain objects to revert back to data state.</param>
private void revertDomainObjects(IScannerHandler scanner, List<object> revertList)
{
  ThreadPool.QueueUserWorkItem((e) => 
  {
    lock (_dataContext)
    { 
      var inscopeScanner = ((List<object>)e)[0] as IScannerHandler;
      var inscopeRevertlist = ((List<object>)e)[1] as List<object>;
      int attemptCounter = 0;
      bool isSuccessful = false;
      while (!isSuccessful && attemptCounter <= 4)
      {
        try
        {
          _dataContext.Refresh(RefreshMode.OverwriteCurrentValues, inscopeRevertlist);
          isSuccessful = true;
          inscopeScanner.RecoveredFromCriticalException();
          scanner.Status = Objects.ScannerStatus.Idle;
          scanner.StatusMessage = "Scanner has recovered from the previous error. Please retry operation and contact Licensys I.T. if the issue continues.";
        }
        catch
        { // If the rollback fails, sleep for half a second and retry up to 5 times.
          Thread.Sleep(500);
          if (attemptCounter++ > 4)
          throw;
        }
      }
    }
  }, 
  new List<object>() {scanner, revertList});
}

Probably a better solution would be to use a Task<> implementation if it's available in .Net 3.5, but this does the job and I don't need to be able to interrupt it.

Now in the case of an exception during SubmitChanges, the scanner will alert the user and be de-activated (triggered when the exception bubbles)and the worker process will kick off to try and refresh the affected data. if that is successful the scanner will be re-activated and the user notified that they can continue.

Caveats: This did require adding some thread-safety to an otherwise single-threaded application. I added an accessor property that attempted to lock the data context before returning it. Not ideal but in this case the means of triggering calls to the data context (scanner) was disabled until the operation was completed. This will be reviewed and refactored shortly as we move to support multiple simultaneous scanners.

Py's Sty

Wednesday, September 15, 2010

Linq2SQL Cache & Connectivity

No comments:

Post a Comment