Lambdas: An Example in Refactoring Code

Background: Lambdas and Why This Example is Important

Based on your experience in C# or other programming languages, you may or may not be familiar with what a lambda is. If the word “Lambda” is new and scary to you, don’t worry. Hopefully after reading this you’ll have a better idea of how you can use them. My definition of a lambda expression is a function that you can define in local scope to pass as an argument provided it meets the delegate signature. It’s probably pretty obvious to you that you can pass in object references and value types into all kinds of functions… But what about passing in a whole function as an argument? And what if you just want to declare a simple anonymous method right when you want to provide it to a function? Lambdas.

So now you at least have a basic idea of what a Lambda is. What’s this article all about? I wanted to discuss a real-world coding experience that helped demonstrate the value of lambdas to me. In my honest opinion, I think having real world programming topics to learn from is more beneficial than many of the “ideal” scenario examples/tutorials you end up reading on the Internet. We can argue and debate that certain things are better or worse in an ideal sense, but when you have a real practical example, it really helps to drive the point home.

So for me, I love working with events. I’m very comfortable with the concept of delegation in C#. I can have one object that may notify anyone that’s interested that something is happening, and the other objects that do care are able to handle the event. Thus, actions can get delegated to those objects that care to be notified. One of my weaknesses at this point in my development experience is leveraging the concept of delegation outside of the realm of events. Delegation is powerful, but it’s certainly not limited to hooking onto events with event handlers.

The particular example I want to illustrate is a parallel of a real coding scenario. I was refactoring some code that was leveraging close to zero OOP practices. I wanted to create a nice extensible framework and class hierarchy to replace it. Once I was done, a few colleagues of mine at Magnet Forensics picked up on a bit of a code smell. We all agreed the new framework and class hierarchy was better, but there seemed to be a lot of boiler plate code going on. We got into the discussion of how lambdas could reduce a lot of the light-weight classes I had introduced. After taking their thoughts and refactoring my changes just a little bit more, the benefits of the lambdas were obvious to me.

So obvious, I had to write about it to share with all of you! Feel free to skip ahead to the downloads section to get the code and follow along with it. There are plenty of options for downloading.

The Scenario

I mentioned that this was a real world scenario. I’ve contrived a parallel example that hopefully demonstrates some of the real world issues while illustrating how lambdas are useful. Let’s imagine we have some big chunk of logic that does data processing. In my real-world scenario, this may have existed as one monolithic function. I would have one big function that, based on all the parameters I provide, can figure out how to process the data I feed it.

Problems:

  • Hard to test (You need to test the whole function even if you’re really just wanting to target a small part of it)
  • Error prone (Any small change to one part can potentially break an entire other part of the function as it grows in complexity)
  • Not extensible (As soon as you need to deviate a little bit from the structure that’s existed, suddenly things get really complicated)

By switching to more of an OOP approach, I can start to address all of the above problems. So in this example, I’ll illustrate what my initial refactoring would have looked like by introducing classes. Afterward, I’ll show what my second refactor may have looked like after taking lambdas into account. In order to stay true to some of the real world problems you might encounter when performing a big refactor like this, I’ve opted to include some fictitious dependency. I refer to this at the “mandatory argument” or “important reference”. You’ll notice in the code that I don’t really use it to do any work, but it’s demonstrating having to pass down some other critical information to my classes that the original function may have had easy access to.

Pre-Refactor: No Lambdas Here!

Let’s start with our new OOP layout. I want to have a factory that can create data processor instances for me. So let’s define what those look like.

First, we have the interface for our data processors:

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing
{
  public interface IProcessor
  {
    bool TryProcess(object input);
  }
}

And then a simple interface for a factory that can create the data processor instances for us:

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing
{
  public interface IProcessorFactory
  {
    IProcessor Create(ProcessorType type, object mandatoryArgument, object value);
  }
}

As you may have noticed, the factory interface I’ve provided above takes a ProcessorType enumeration. You may or may not agree that using an enumeration as an argument for the factory is good practice, but I’m using it to make my example simple. Here’s what our enumeration will look like:

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing
{
  public enum ProcessorType
  {
    GreaterThan,
    LessThan,
    NumericEqual,
    StringEqual,
    StringNotEqual,
    /* we could add countless more types of processors here. realistically,
     * an enum may not be the best option to accomplish this, but for
     * demonstration purposes it'll make things much easier.
     */
  }
}

And now we have a definition for all of the basic building blocks defined. These will also be used later when we refactor, so I wanted to get them out of the way right in the beginning.

Right. So, let’s create an extensible IProcessor implementation. We can address some of our basic requirements (like our artificial dependency) and create something that can easily be built on top of. All of our child classes will just have to handle validating their constructor input and overriding a single method. Easy!

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing.PreRefactor
{
  public abstract class Processor : IProcessor
  {
    private readonly object _importantReference;

    public Processor(object mandatoryArgument)
    {
      if (mandatoryArgument == null)
      {
        throw new ArgumentNullException("mandatoryArgument");
      }

      _importantReference = mandatoryArgument;
    }

    public bool TryProcess(object input)
    {
      if (input == null)
      {
        return false;
      }

      return Process(_importantReference, input);
    }

    protected abstract bool Process(object importantReference, object input);
  }
}

And now let’s provide the factory that’s going to be making all of these instances for us. Please not that the factory is left incomplete on purpose. I’ll only be providing two actual processor implementations and I’ll leave it up to you to try and fill out the rest!

using System;
using System.Collections.Generic;
using System.Text;

using LambdaRefactor.Processing.PreRefactor.Numeric;
using LambdaRefactor.Processing.PreRefactor.String;

namespace LambdaRefactor.Processing.PreRefactor
{
  public class ProcessorFactory : IProcessorFactory
  {
    public IProcessor Create(ProcessorType type, object mandatoryArgument, object value)
    {
      switch (type)
      {
        case ProcessorType.GreaterThan:
          return new GreaterProcessor(mandatoryArgument, value);
        case ProcessorType.StringEqual:
          return new StringEqualsProcessor(mandatoryArgument, value);
        /*
         * we still have to go implement all the other classes!
         */
        default:
          throw new NotImplementedException("The processor type '" + type + "' has not been implemented in this factory.");
      }
    }
  }
}

And now that we have a factory that can easily create our processors for us, let’s actually define some of our processor implementations.

We’ll start off with a simple processor for checking if some input is greater than a defined value. It should really only work with numeric values, but one of the challenges we need to work with is that our data is only provided to us as an object. As a result, we’ll have to do some type checking on our own.

using System;
using System.Collections.Generic;
using System.Text;
using System.Globalization;

namespace LambdaRefactor.Processing.PreRefactor.Numeric
{
  public class GreaterProcessor : Processor
  {
    private readonly decimal _value;

    public GreaterProcessor(object mandatoryArgument, object value)
      : base(mandatoryArgument)
    {
      if (value == null)
      {
        throw new ArgumentNullException("value");
      }

      _value = Convert.ToDecimal(value, CultureInfo.InvariantCulture); // will throw exception on mismatch
    }

    protected override bool Process(object importantReference, object input)
    {
      decimal numericInput;
      try
      {
        numericInput = Convert.ToDecimal(input, CultureInfo.InvariantCulture);
      }
      catch (Exception)
      {
        return false;
      }

      return numericInput > _value;
    }
  }
}

And to put a spin on things, let’s implement a processor that operates on string values only. We’ll implement the processor that checks if strings are equal. Like the GreaterProcessor, we’re forced to get object references passed in. We’ll need to convert these to strings to work with them.

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing.PreRefactor.String
{
  public class StringEqualsProcessor : Processor
  {
    private readonly string _value;

    public StringEqualsProcessor(object mandatoryArgument, object value)
      : base(mandatoryArgument)
    {
      if (value == null)
      {
        throw new ArgumentNullException("value");
      }

      _value = (string)value; // will throw exception on mismatch
    }

    protected override bool Process(object importantReference, object input)
    {
      return Convert.ToString(input, System.Globalization.CultureInfo.InvariantCulture).Equals(_value);
    }
  }
}

Where can we go from here?

  • We can make simple inverse processors by overriding others and inverting the return value on the Process() function. Want a StringDoesNotEqual processor? It’s just as easy as  inheriting from the StringEqualsProcessor and then modifying the return of Process(). Then we add this to our factory.
  • Adding other various types of processors is easy. We just have to extend our base class and add a couple of lines to our factory.
  • This code is much easier to test than one monolithic function that does all types of processing. We can now put a nice testing framework around this, and test each method on each class individually.

Post-Refactor: All of the Lambdas!

So… Why don’t we stop here? Because we can do better.

I mentioned that to make a simple inverse processor, all I had to do was override a class and invert the return value of Process(). That’s pretty easy to do… Except I need an entire new class to do it. If I want to make more types of numeric processing, I need to provide similar type checking and conversion. This code gets duplicated every time I go to add another simple class.

I also have my factory class responsible for creating my processor instances. They’re relatively coupled already, but I want developers to have to use my factory to construct instances of processor interface and not worry about the specific implementations. So what if my factory had a bit more say in the construction if the processors? I could use lambdas to pass in the logic that’s unique to each type of processor, and keep each type of processor pretty bare bones. This would move more logic into the factory, but reduce the number of processor implementations I have to make.

So let’s do better!

Let’s start with our new IProcessor implementation. We’ll provide a delegate signature that will be the basis for the lambda expressions we pass in:

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing.PostRefactor
{
  public abstract class Processor : IProcessor
  {
    private readonly object _importantReference;

    public Processor(object mandatoryArgument)
    {
      if (mandatoryArgument == null)
      {
        throw new ArgumentNullException("mandatoryArgument");
      }

      _importantReference = mandatoryArgument;
    }

    public delegate bool ProcessDelegate<T>(object importantReference, T processorValue, T input);

    public bool TryProcess(object input)
    {
      if (input == null)
      {
        return false;
      }

      return Process(_importantReference, input);
    }

    protected abstract bool Process(object importantReference, object input);
  }
}

From here, we can come up with some child classes that that are generic enough for us to work with using lambas that still provide enough functionality for them to exist on their own. We can break our processors up based on the type of data they’ll be working with. That is, we can have a processor for numeric values and a processor for string values. This will cover a lot of the duplicated functionality that exists in the current state of our refactor if we wanted to keep creating new IProcessor implementations.

Let’s start with our NumericProcessor:

using System;
using System.Collections.Generic;
using System.Text;
using System.Globalization;

namespace LambdaRefactor.Processing.PostRefactor.Numeric
{
  public class NumericProcessor : Processor
  {
    private readonly decimal _value;
    private readonly ProcessDelegate<decimal> _processDelegate;

    public NumericProcessor(object mandatoryArgument, object value, ProcessDelegate<decimal> processDelegate)
      : base(mandatoryArgument)
    {
      if (value == null)
      {
        throw new ArgumentNullException("value");
      }

      if (processDelegate == null)
      {
        throw new ArgumentNullException("processDelegate");
      }

      _value = Convert.ToDecimal(value, CultureInfo.InvariantCulture); // will throw exception on mismatch
      _processDelegate = processDelegate;
    }

    protected override bool Process(object importantReference, object input)
    {
      decimal numericInput;
      try
      {
        numericInput = Convert.ToDecimal(input, CultureInfo.InvariantCulture);
      }
      catch (Exception)
      {
        return false;
      }

      return _processDelegate(importantReference, _value, numericInput);
    }
  }
}

And similarly, a StringProcessor:

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing.PostRefactor.String
{
  public class StringProcessor : Processor
  {
    private readonly string _value;
    private readonly ProcessDelegate<string> _processDelegate;

    public StringProcessor(object mandatoryArgument, object value, ProcessDelegate<string> processDelegate)
      : base(mandatoryArgument)
    {
      if (value == null)
      {
        throw new ArgumentNullException("value");
      }

      if (processDelegate == null)
      {
        throw new ArgumentNullException("processDelegate");
      }

      _value = (string)value; // will throw exception on mismatch
      _processDelegate = processDelegate;
    }

    protected override bool Process(object importantReference, object input)
    {
      return _processDelegate(importantReference, _value, Convert.ToString(input, System.Globalization.CultureInfo.InvariantCulture));
    }
  }
}

With these two basic child classes built upon our new IProcessor implementation, we can restructure a new IProcessorFactory implementation. As I mentioned, we can leverage lambdas to move some logic back into the factory class and keep the processor implementations relatively basic.

Here’s the new factory:

using System;
using System.Collections.Generic;
using System.Text;

using LambdaRefactor.Processing.PostRefactor.Numeric;
using LambdaRefactor.Processing.PostRefactor.String;

namespace LambdaRefactor.Processing.PostRefactor
{
  public class ProcessorFactory : IProcessorFactory
  {
    public IProcessor Create(ProcessorType type, object mandatoryArgument, object value)
    {
      switch (type)
      {
        case ProcessorType.GreaterThan:
          return new NumericProcessor(mandatoryArgument, value, (_, x, y) => x <; y);
        case ProcessorType.StringEqual:
          return new StringProcessor(mandatoryArgument, value, (_, x, y) => x == y);
        /*
         * Look how easy it is to add new processors! Exercise for you:
         * implement the remaining processors in the enum!
         */
        default:
          throw new NotImplementedException("The processor type '" + type + "' has not been implemented in this factory.");
      }
    }
  }
}

As you can see, our new factory is simple like our first implementation. The major difference? We’re passing very simple lambdas that would have otherwise been functionality defined in a very light-weight child class. This allows us to move away from having many potentially very bare-bones classes and minimizes the amount of boilerplate duplication.

Summary

I didn’t post it here, but the original implementation that this example paralleled  in real life was a pain to deal with. It was hard to test, brittle to modify/extend, and just downright unwieldly. It was obvious to me that switching to a refactored object-oriented implementation was going to make this style of code easy to extend and easy to test.

The initial refactor posted in this example was a great step in the right direction. The code became easy to build upon by relying on simple OOP principals, and granular parts of the functionality became really easy to test. If I just wanted to test certain types of numeric processing, I didn’t have set up a test for my entire massive “process” function. All I’d have to do is make an instance of the processor I want to test, and call the methods I’d like to cover. Incredibly easy.

Lambdas took this to the next level though. By leveraging lambads, I could refactor even more common code into a base class. This meant that  in order to use my processors properly, the final factory class implementation definitely became required to use. It caused a paradigm shift where instead of making lots of light-weight child classes for additional processor implementations, I’d only need to implement some logic in the factory. All of my existing processors could be refactored into a handful of generic processor classes, and the factory would be responsible for passing in the necessary lambdas.

Lambdas let you accomplish some pretty powerful things, and this refactoring example was one case where they made code much easier to manage. Hopefully you can find a good use for lamba expressions in your next up-coming programming task!

Code Downloads