Tag: Interface

What Makes Good Code? – Should Every Class Have An Interface? Pt 2

Should Every Class Have an Interface?

This is part two in the sub-series of “Should Every Class Have an Interface?“, and part of the bigger “What Makes Good Code?” series.

Other Peoples’ Code

So in the last post, we made sure we could get an interface for every class we made. Okay, well that’s all fine and dandy (I say half sarcastically). But you and I are smart programmers, so we like to re-use other peoples’ code in our own projects. But wait just a second! It looks like Joe Shmoe didn’t use interfaces in his API that he created! We refuse to pollute our beautiful interface-rich code with his! What can we do about it?

Wrap it.

That’s right! If we add a little bit of code we can get all the benefits as the example we walked through originally. It’s not going to completely fix “the problem”, but I’ll touch on that after. So, we all remember our good friend encapsulation, right?

Let’s pretend that Joe Shmoe wrote some cool code that does string lookups from an Excel file. We want to use it in our code, but Joe didn’t use the IStringLookup interface (because… it’s in OUR code, not his) and he didn’t even use ANY interfaces. The constructor for his class looks like:


public ExcelParser(string pathToExcelFile);

On this class, there’s two methods. One method allows us to find the column index for a certain heading, and the other method allows us to get a cell’s value given a column and row index. The method calls looks like:


public int GetColumnIndex(string columnName);

public string GetCellValue(int columnIndex, int rowIndex);

We can wrap that class by creating a wrapper class that meets our interface, like so:


public sealed class ExcelStringLookup
{
  // ugh... we have to reference the class directly!
  private readonly ExcelParser _excelParser;

  // ugh... we have to reference the class directly!
  public ExcelStringLookup(ExcelParser excelParser)
  {
    _excelParser = excelParser;
  }

  public string GetString(string name)
  {
    var columnIndex = _excelParser.GetColumnIndex(name);
    // assumes all of our strings will be under a column header
    var cellValue = _excelParser.GetCellValue(columnIndex, 1);
    return cellValue;
  }
}

And now this will plug right into the rest of our code that we defined originally.

This doesn’t totally eliminate “the problem” though (the problem being that some class doesn’t have an interface (what this post is trying to answer)). There’s still a class we’re making use of that doesn’t have an interface, but it looks like we’ve reduced the exposure of that problem to JUST this class and the spot that would construct this class. Are we okay with that?

Thoughts So Far…

Let’s do a little recap on what we’ve seen so far:

  • Having interfaces for our classes is a nice way to introduce a layer of abstraction
  • Interfaces are just *one* tool to get layers of abstraction introduced
  • If you wanted to have interfaces for all of the classes in your code and some third party didn’t use interfaces, that code is likely not as common in your code base (especially if you wrap it like I mentioned above). This may not always be true in your code base, but it’s likely the case.
  • The amount of work to wrap things can vary greatly. Some things are straight forward to wrap, but you need to add many methods/properties. Sometimes it’s the inverse and you only have a few things to wrap but they’re not straight forward.
  • The number of classes you’d need to wrap to get to this state can vary greatly… Since even built-in System classes aren’t all backed with interfaces!
  • There’s certainly a trade off between the original work + maintenance to wrap a class in an interface versus the benefits it provides.

Is that last point blasphemy?! So there may actually be times we DON’T want to have an interface for a class?

Watch this space for part 3 where we start to look at a counter-example!

 


ProjectXyz: Enforcing Interfaces (Part 2)

Enforcing Interfaces

This is my second installment of the series related to my small side project that I started. I mentioned in the first post that one of the things I wanted to try out with this project is coding by interfaces. There’s an article over at CodeProject that I once read (I’m struggling to dig it up now, arrrrrghh) that really gave me a different perspective about using interfaces when I program. Ever since then I’ve been a changed man. Seriously.

The main message behind the article was along the lines of: Have your classes implement your interface, and to be certain nobody is going to come by and muck around with your class’s API, make sure they can’t knowingly make an instance of the class. One of the easiest ways to do this (and bear with me here, I’m not claiming this is right or wrong) is to have a hidden (private or protected) constructor on your class and static methods that let you create new instances of your class. However, the trick here is your static method will return the interface your class implements.

An example of this might look like the following:


public interface IMyInterface
{
    void Xyz();
}

public sealed class MyImplementation : IMyInterface
{
    // we hid the constructor from the outside!
    private MyImplementation()
    {
    }

    public static IMyInterface Create()
    {
        return new MyImplementation();
    }

    public void Xyz()
    {
        // do some awesome things here
    }
}

Interesting Benefits

I was pretty intrigued by this article on enforcing interfaces for a few reasons and if you can stick around long enough to read this whole post, I’ll hit the cons/considerations I’ve encountered from actually implementing things this way. These are obviously my opinion, and you can feel free to agree or disagree with me as much as you like.

  • (In theory) it keeps people from coming along and tacking on random methods to my classes. If I have an object hierarchy that I’m creating, having different child classes magically have random public APIs changing independently seems kind of scary. People have a harder time finding ways to abuse this because they aren’t concerned with the concrete implementation, just the interface.
  • Along with the first point, enforcing interfaces makes people think about what they’re doing when they need to change the public API. Now you need to go change the interface. Now you might be affecting X number of implementations. Are you sure?
  • Sets people up nicely to play with IoC and dependency injection. You’re already always working with interfaces because of this, now rolling out something like Moq or Autofac should be easier for you.
  • Methods can be leveraged to do parameter checks BEFORE entering a constructor. Creating IDisposable implementations can be fun when your constructor fails but and your disposable clean up code was expecting things to be initialized (not a terribly strong argument, but I’ve had cases where this makes life easier for me when working with streams).

Enforcing Interfaces in ProjectXyz

I’ve only implemented a small portion of the back-end of ProjectXyz (from what I imagine the scope to be) but it’s enough where I have a couple of layers, some different class/interface hierarchies that interact with each other, and some tests to exercise the API. The following should help explain the current major hierarchies a bit better:

  • Stats are simple structures representing an ID and a value
  • Enchantments are simple structures representing some information about modifying particular stats (slightly more complex than stats)
  • Items are more complex structures that can contain enchantments
  •  Actors are complex structures that:
    • Have collections of stats
    • Have collections of enchantments
    • Have collections of items

Okay, so that’s the high level. There’s obviously a bit more going on with the multi-layered architecture I’m trying out here too (since the hierarchies are repeated in a similar fashion in both layers). However, this is a small but reasonable amount of code to be trying this pattern on.

I have a good handful of classes and associated interfaces that back them. I’ve designed my classes to take in references to interfaces (which, are of course backed by my other classes) and my classes are largely decoupled from implementations of other classes.

Now that I’ve had some time to play with this pattern, what are my initial thoughts? Well, it’s not pure sunshine and rainbows (which I expected) but there are definitely some cool pros I hadn’t considered and definitely some negative side effects that I hadn’t considered either. Stay tuned

(The previous post in this series is here).


Yield! Reconsidering APIs with Collections

Yield! Reconsidering APIs with Collections (Image by http://www.sxc.hu/)

Yield: A Little Background

The yield keyword in C# is pretty cool. Being used within an iterator, yield lets a function return an item as well as control of execution to the caller and upon next iteration resume where it left off. Neat, right? MSDN documentation lists these limitations surrounding the use of the yield keyword:

  • Unsafe blocks are not allowed.
  • Parameters to the method, operator, or accessor cannot be ref or out.
  • A yield return statement cannot be located anywhere inside a try-catch block. It can be located in a try block if the try block is followed by a finally block.
  • A yield break statement may be located in a try block or a catch block but not a finally block.

So what does this have to do with API specifications?

A whole lot really, especially if you’re dealing with collections. I personally haven’t been a big user of the yield keyword, but I’ve never really been forced to use it. After playing around with it for a bit, I saw a lot of potential. I’ve written before about what I think makes a good API. In my article, I was making a point to discuss two perspectives:

  • Who needs to implement your interface. You want it to be easy for them to implement.
  • Who needs to call your interface. You want it to be easy for them to use.

In my opinion, the IEnumerable<T> interface was a tricky thing to work with as a return value. You can essentially only iterate an IEnumerable, and at the time of calling a function, maybe that’s not what you want to do. The flip side is that for the person implementing the interface, IEnumerable<T> is a really easy interface to satisfy. However, the yield keyword has opened up some new doors.

In this article, I’d like to go over a couple of different approaches for an API and then explain why the yield keyword might be something you consider next time around. Disclaimer: I’m not claiming anything I’m about to present is the only way or the best way–I’m just sharing some of my own findings and perspective.

Interface For Returning Collections

The first type of API I’d like to look at is for returning collections. Based on my own API guidelines, I’d ideally choose an interface or class to return that provides a lot of information to the caller that is also easy to create for the implementer of my interface. The List<T> class is a great choice:

  • It’s easy to construct
  • It’s built-in to the .NET framework
  • It provides many handy functions (All of the IList<T> functionality as well as things like AddRange(), or functions that support delegates)

My next choice might be to have a return type of IList<T>, which would provide a little less ease of use to the caller, but make it even easier for the implementer of the interface. They could return arrays of type T, since an array implements the IList<T> interface, or their own custom list implementation that doesn’t inherit from the List<T> class. The differences between using IList<T> and List<T> are arguable pretty small.

A third alternative, which I would have avoided in the past, is to return an IEnumerable<T>. My opinion used to be that this made the life of the interface implementer a bit easier compared to returning an IList<T>, but complicated the life of the caller for a couple of reasons:

  • The caller would have to use the results of the function in a foreach loop.
  • The caller would have to add the items to their own collection to be able to do much more with the items.

My naive implementations of being forced to return an IEnumerable<T> were… well… crap. I would have constructed a collection within the function, fill it up, and then return it as an IEnumerable<T>. Then as the caller of my function, I’d have to re-enumerate the results (or add it to another collection):

public static IEnumerable<T> GetItems()
{
  var collection = new List<T>();
  // add all the items to a collection
  return collection;
}

private static void Main()
{
  var myCollection = new List<T>();
  myCollection.AddRange(GetItems());
  // use myCollection...

  // or.....
  foreach (var item in GetItems())
  {
    // use the items
  }
}

Seems like overkill to me with that implementation. However, we’ll examine how using yield can truly transform this into something… better. So to reiterate, a few potential implementations for an API involving collections might be:

  • Return a List<T> class
  • Return an IList<T> (or even an ICollection<T>) interface
  • Return an IEnumerable<T> interface

Constantly Creating Collections

My design decisions, in the past, were really driven by two guidelines:

  • Make it easier for the person implementing/extending the API
  • Make it easy for the person consuming the API

As I quickly illustrated in the first section, this meant that I would have a method where I would create a collection, fill it with items, and then return it. I could generally pick any concrete collection class and return it since I would usually pick a simple collection as the return type. Easy.

One thing that might be noticeable with this approach is that it looks pretty inefficient to keep creating new collections, fill them, and then return them. I’ll illustrate with a simple example. We’ll create a class that has a method on it called GetItems(). As per my reasoning presented earlier, we’ll have this method return a List<T> instance, and to make this example easier to work with, we’ll pass in an IEnumerable<T> instance. For what it’s worth, the input to this function is really just for demonstration purposes here–We’re really focusing on how we’re creating our return value.

public class CreateNewListApi<T>
{
  public List<T> GetItems(IEnumerable<T> input)
  {
    var newCollection = new List<T>();

    foreach (var item in input)
    {
      newCollection.Add(item);
    }

    return newCollection;
  }
}

And now that we have our simple class we can mock up a little test for performance… Just how inefficient is creating new lists every time?

internal class Program
{
  private static void Main(string[] args)
  {
    const int NUM_ITEMS = 100000000;
    var inputItems = new int[NUM_ITEMS];

    Console.WriteLine("API Creating New Collections");
    var api = new CreateNewListApi<int>();

    var watch = Stopwatch.StartNew();
    var results = api.GetItems(inputItems);

    foreach (var item in results)
    {
    }

    Console.WriteLine(watch.Elapsed);
    Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64);
    Console.ReadLine();
  }
}

When I run this on my machine, I get an average of about 1.73 seconds. The memory printout I get when running is 1615908864 bytes. So is that slow? Is that a lot of memory usage? I think it’s pretty hard to say conclusively without being able to compare it against anything. So let’s keep this number in mind as we continue to investigate the alternatives.

Side Note: At this point, some readers may be saying “Well, if the input to our function was also a list (or if whatever our function has to work with was otherwise equivalent to our return value) then we wouldn’t have to go populate a new collection every time… We can just return the underlying collection”! And I would say you are absolutely correct. If your function has access to an instance of the same type as the return type, then you could always just return that instance. But what implications does this have? You’re now giving people access to your underlying internals, and they can go modify them as they please. So, if you need to control access to items being added or removed, then it might not make sense for you to expose your internal collections like this.

Yield to Incoming API Alternatives

We’ve seen how my past implementations may have looked, so how might we tweak this? If we tweak our API a bit, we can make our method return an IEnumerable<T> instead. Let’s see what that might look like:

public class YieldingApi<T>
{
  public IEnumerable<T> GetItems(IEnumerable<T> input)
  {
    foreach (var item in input)
    {
      yield return item;
    }
  }
}

So in this API implementation, all we’ll be doing is iterating over some type of collection and then yielding each result. If we run it through the same type of test as out previous API implementation, what kind of results do we end up with?

internal class Program
{
  private static void Main(string[] args)
  {
    const int NUM_ITEMS = 100000000;
    var inputItems = new int[NUM_ITEMS];

    Console.WriteLine("API Yielding");
    var api = new YieldingApi<int>();

    var watch = Stopwatch.StartNew();
    var results = api.GetItems(inputItems);

    foreach (var item in results)
    {
    }

    Console.WriteLine(watch.Elapsed);
    Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64);
    Console.ReadLine();
  }
}

When I run this on my machine, I get an average of about 2.80 seconds. The memory printout I get when running is 449409024 bytes. How does this relate back to our first implementation? Well, it’s certainly slower. It takes about 1.62x as long to enumerate using the yield implementation as it did with the first API we created. However, yield also uses less than 1/3 (about 27.8%, actually) of the memory footprint when compared to the first implementation. Pretty cool results!

Site Note: So yield was a bit slower according to our results, but what happens if print the elapsed time before we run that foreach loop? Well, on my machine it averages at about one millisecond. Now that’s fast, right?! The cool thing about using yield with the IEnumerable<T> interface is that the work is deferred. That is, not until the program goes to actually run the enumeration do we get our performance hit. Try it out! Try moving the time printout from after the foreach loop to before the foreach loop. Try sticking breakpoints in on the line that yields. You’ll see what I mean.

Summary

In this article, I’ve explored two different ways of implementing an API (specifically focusing on the return value). We saw a brief performance analysis between the two and I highlighted some differences in both approaches. Let’s recap though:

  • Approach 1: Returning a List<T> and creating the collection ahead of time
    • Appeared to be overall a bit faster then yielding.
    • Consumed much more memory than yielding.
    • Callers can use the results immediately for enumeration, checking count, or as a collection to add more things to
    • The return type of List<T> is a bit more restrictive than an IEnumerable<T> like in the second API implementation
  • Approach 2: Return type of IEnumerable<T> and yielding results
    • Appeared to be overall a bit slower than the List<T> implementation
    • Lazy. We don’t actually execute any enumeration code until the caller actually enumerates
    • Consumed significantly less memory than the first approach using List<T>
    • Callers can enumerate the results immediately, but they need to add the results to a collection class to do much more than enumerate

So next time you’re designing an API for your interfaces and classes, try keeping these things in mind!

EDIT (December 30th, 2013):
As per some comments on Google+ by Dan Nemec, I figured I’d add a bit more here in the summary. IEnumerable<T> on it’s own is certainly not useless, especially if you’re leveraging LINQ or extension methods. My main beef in the past was that the consumer of an API with a IEnumerable<T> return value can only iterate over the results… And that’s just because that’s all that IEnumerable<T> lets you do. Dan made a great point though–If you are leveraging things like extension methods, or LINQ (which introduces tons of handy extension methods for working with IEnumerable<T>) then you get all of that functionality tacked on to IEnumerable<T>.

So if you’re not fortunate enough to be working with LINQ or extension methods (i.e. working with legacy code in old .NET framework versions… and yes I am familiar with the attribute you can add in to allow extension methods provided you have a compiler version high enough to support it), then IEnumerable<T> sometimes just plain sucks. I’d wager the majority of C# developers aren’t in this boat though, so I’d like to thank Dan again for his comments.


Lambdas: An Example in Refactoring Code

Lambdas: An Example in Refactoring Code

Background: Lambdas and Why This Example is Important

Based on your experience in C# or other programming languages, you may or may not be familiar with what a lambda is. If the word “Lambda” is new and scary to you, don’t worry. Hopefully after reading this you’ll have a better idea of how you can use them. My definition of a lambda expression is a function that you can define in local scope to pass as an argument provided it meets the delegate signature. It’s probably pretty obvious to you that you can pass in object references and value types into all kinds of functions… But what about passing in a whole function as an argument? And what if you just want to declare a simple anonymous method right when you want to provide it to a function? Lambdas.

So now you at least have a basic idea of what a Lambda is. What’s this article all about? I wanted to discuss a real-world coding experience that helped demonstrate the value of lambdas to me. In my honest opinion, I think having real world programming topics to learn from is more beneficial than many of the “ideal” scenario examples/tutorials you end up reading on the Internet. We can argue and debate that certain things are better or worse in an ideal sense, but when you have a real practical example, it really helps to drive the point home.

So for me, I love working with events. I’m very comfortable with the concept of delegation in C#. I can have one object that may notify anyone that’s interested that something is happening, and the other objects that do care are able to handle the event. Thus, actions can get delegated to those objects that care to be notified. One of my weaknesses at this point in my development experience is leveraging the concept of delegation outside of the realm of events. Delegation is powerful, but it’s certainly not limited to hooking onto events with event handlers.

The particular example I want to illustrate is a parallel of a real coding scenario. I was refactoring some code that was leveraging close to zero OOP practices. I wanted to create a nice extensible framework and class hierarchy to replace it. Once I was done, a few colleagues of mine at Magnet Forensics picked up on a bit of a code smell. We all agreed the new framework and class hierarchy was better, but there seemed to be a lot of boiler plate code going on. We got into the discussion of how lambdas could reduce a lot of the light-weight classes I had introduced. After taking their thoughts and refactoring my changes just a little bit more, the benefits of the lambdas were obvious to me.

So obvious, I had to write about it to share with all of you! Feel free to skip ahead to the downloads section to get the code and follow along with it. There are plenty of options for downloading.

The Scenario

I mentioned that this was a real world scenario. I’ve contrived a parallel example that hopefully demonstrates some of the real world issues while illustrating how lambdas are useful. Let’s imagine we have some big chunk of logic that does data processing. In my real-world scenario, this may have existed as one monolithic function. I would have one big function that, based on all the parameters I provide, can figure out how to process the data I feed it.

Problems:

  • Hard to test (You need to test the whole function even if you’re really just wanting to target a small part of it)
  • Error prone (Any small change to one part can potentially break an entire other part of the function as it grows in complexity)
  • Not extensible (As soon as you need to deviate a little bit from the structure that’s existed, suddenly things get really complicated)

By switching to more of an OOP approach, I can start to address all of the above problems. So in this example, I’ll illustrate what my initial refactoring would have looked like by introducing classes. Afterward, I’ll show what my second refactor may have looked like after taking lambdas into account. In order to stay true to some of the real world problems you might encounter when performing a big refactor like this, I’ve opted to include some fictitious dependency. I refer to this at the “mandatory argument” or “important reference”. You’ll notice in the code that I don’t really use it to do any work, but it’s demonstrating having to pass down some other critical information to my classes that the original function may have had easy access to.

Pre-Refactor: No Lambdas Here!

Let’s start with our new OOP layout. I want to have a factory that can create data processor instances for me. So let’s define what those look like.

First, we have the interface for our data processors:

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing
{
  public interface IProcessor
  {
    bool TryProcess(object input);
  }
}

And then a simple interface for a factory that can create the data processor instances for us:

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing
{
  public interface IProcessorFactory
  {
    IProcessor Create(ProcessorType type, object mandatoryArgument, object value);
  }
}

As you may have noticed, the factory interface I’ve provided above takes a ProcessorType enumeration. You may or may not agree that using an enumeration as an argument for the factory is good practice, but I’m using it to make my example simple. Here’s what our enumeration will look like:

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing
{
  public enum ProcessorType
  {
    GreaterThan,
    LessThan,
    NumericEqual,
    StringEqual,
    StringNotEqual,
    /* we could add countless more types of processors here. realistically,
     * an enum may not be the best option to accomplish this, but for
     * demonstration purposes it'll make things much easier.
     */
  }
}

And now we have a definition for all of the basic building blocks defined. These will also be used later when we refactor, so I wanted to get them out of the way right in the beginning.

Right. So, let’s create an extensible IProcessor implementation. We can address some of our basic requirements (like our artificial dependency) and create something that can easily be built on top of. All of our child classes will just have to handle validating their constructor input and overriding a single method. Easy!

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing.PreRefactor
{
  public abstract class Processor : IProcessor
  {
    private readonly object _importantReference;

    public Processor(object mandatoryArgument)
    {
      if (mandatoryArgument == null)
      {
        throw new ArgumentNullException("mandatoryArgument");
      }

      _importantReference = mandatoryArgument;
    }

    public bool TryProcess(object input)
    {
      if (input == null)
      {
        return false;
      }

      return Process(_importantReference, input);
    }

    protected abstract bool Process(object importantReference, object input);
  }
}

And now let’s provide the factory that’s going to be making all of these instances for us. Please not that the factory is left incomplete on purpose. I’ll only be providing two actual processor implementations and I’ll leave it up to you to try and fill out the rest!

using System;
using System.Collections.Generic;
using System.Text;

using LambdaRefactor.Processing.PreRefactor.Numeric;
using LambdaRefactor.Processing.PreRefactor.String;

namespace LambdaRefactor.Processing.PreRefactor
{
  public class ProcessorFactory : IProcessorFactory
  {
    public IProcessor Create(ProcessorType type, object mandatoryArgument, object value)
    {
      switch (type)
      {
        case ProcessorType.GreaterThan:
          return new GreaterProcessor(mandatoryArgument, value);
        case ProcessorType.StringEqual:
          return new StringEqualsProcessor(mandatoryArgument, value);
        /*
         * we still have to go implement all the other classes!
         */
        default:
          throw new NotImplementedException("The processor type '" + type + "' has not been implemented in this factory.");
      }
    }
  }
}

And now that we have a factory that can easily create our processors for us, let’s actually define some of our processor implementations.

We’ll start off with a simple processor for checking if some input is greater than a defined value. It should really only work with numeric values, but one of the challenges we need to work with is that our data is only provided to us as an object. As a result, we’ll have to do some type checking on our own.

using System;
using System.Collections.Generic;
using System.Text;
using System.Globalization;

namespace LambdaRefactor.Processing.PreRefactor.Numeric
{
  public class GreaterProcessor : Processor
  {
    private readonly decimal _value;

    public GreaterProcessor(object mandatoryArgument, object value)
      : base(mandatoryArgument)
    {
      if (value == null)
      {
        throw new ArgumentNullException("value");
      }

      _value = Convert.ToDecimal(value, CultureInfo.InvariantCulture); // will throw exception on mismatch
    }

    protected override bool Process(object importantReference, object input)
    {
      decimal numericInput;
      try
      {
        numericInput = Convert.ToDecimal(input, CultureInfo.InvariantCulture);
      }
      catch (Exception)
      {
        return false;
      }

      return numericInput > _value;
    }
  }
}

And to put a spin on things, let’s implement a processor that operates on string values only. We’ll implement the processor that checks if strings are equal. Like the GreaterProcessor, we’re forced to get object references passed in. We’ll need to convert these to strings to work with them.

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing.PreRefactor.String
{
  public class StringEqualsProcessor : Processor
  {
    private readonly string _value;

    public StringEqualsProcessor(object mandatoryArgument, object value)
      : base(mandatoryArgument)
    {
      if (value == null)
      {
        throw new ArgumentNullException("value");
      }

      _value = (string)value; // will throw exception on mismatch
    }

    protected override bool Process(object importantReference, object input)
    {
      return Convert.ToString(input, System.Globalization.CultureInfo.InvariantCulture).Equals(_value);
    }
  }
}

Where can we go from here?

  • We can make simple inverse processors by overriding others and inverting the return value on the Process() function. Want a StringDoesNotEqual processor? It’s just as easy as  inheriting from the StringEqualsProcessor and then modifying the return of Process(). Then we add this to our factory.
  • Adding other various types of processors is easy. We just have to extend our base class and add a couple of lines to our factory.
  • This code is much easier to test than one monolithic function that does all types of processing. We can now put a nice testing framework around this, and test each method on each class individually.

Post-Refactor: All of the Lambdas!

So… Why don’t we stop here? Because we can do better.

I mentioned that to make a simple inverse processor, all I had to do was override a class and invert the return value of Process(). That’s pretty easy to do… Except I need an entire new class to do it. If I want to make more types of numeric processing, I need to provide similar type checking and conversion. This code gets duplicated every time I go to add another simple class.

I also have my factory class responsible for creating my processor instances. They’re relatively coupled already, but I want developers to have to use my factory to construct instances of processor interface and not worry about the specific implementations. So what if my factory had a bit more say in the construction if the processors? I could use lambdas to pass in the logic that’s unique to each type of processor, and keep each type of processor pretty bare bones. This would move more logic into the factory, but reduce the number of processor implementations I have to make.

So let’s do better!

Let’s start with our new IProcessor implementation. We’ll provide a delegate signature that will be the basis for the lambda expressions we pass in:

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing.PostRefactor
{
  public abstract class Processor : IProcessor
  {
    private readonly object _importantReference;

    public Processor(object mandatoryArgument)
    {
      if (mandatoryArgument == null)
      {
        throw new ArgumentNullException("mandatoryArgument");
      }

      _importantReference = mandatoryArgument;
    }

    public delegate bool ProcessDelegate<T>(object importantReference, T processorValue, T input);

    public bool TryProcess(object input)
    {
      if (input == null)
      {
        return false;
      }

      return Process(_importantReference, input);
    }

    protected abstract bool Process(object importantReference, object input);
  }
}

From here, we can come up with some child classes that that are generic enough for us to work with using lambas that still provide enough functionality for them to exist on their own. We can break our processors up based on the type of data they’ll be working with. That is, we can have a processor for numeric values and a processor for string values. This will cover a lot of the duplicated functionality that exists in the current state of our refactor if we wanted to keep creating new IProcessor implementations.

Let’s start with our NumericProcessor:

using System;
using System.Collections.Generic;
using System.Text;
using System.Globalization;

namespace LambdaRefactor.Processing.PostRefactor.Numeric
{
  public class NumericProcessor : Processor
  {
    private readonly decimal _value;
    private readonly ProcessDelegate<decimal> _processDelegate;

    public NumericProcessor(object mandatoryArgument, object value, ProcessDelegate<decimal> processDelegate)
      : base(mandatoryArgument)
    {
      if (value == null)
      {
        throw new ArgumentNullException("value");
      }

      if (processDelegate == null)
      {
        throw new ArgumentNullException("processDelegate");
      }

      _value = Convert.ToDecimal(value, CultureInfo.InvariantCulture); // will throw exception on mismatch
      _processDelegate = processDelegate;
    }

    protected override bool Process(object importantReference, object input)
    {
      decimal numericInput;
      try
      {
        numericInput = Convert.ToDecimal(input, CultureInfo.InvariantCulture);
      }
      catch (Exception)
      {
        return false;
      }

      return _processDelegate(importantReference, _value, numericInput);
    }
  }
}

And similarly, a StringProcessor:

using System;
using System.Collections.Generic;
using System.Text;

namespace LambdaRefactor.Processing.PostRefactor.String
{
  public class StringProcessor : Processor
  {
    private readonly string _value;
    private readonly ProcessDelegate<string> _processDelegate;

    public StringProcessor(object mandatoryArgument, object value, ProcessDelegate<string> processDelegate)
      : base(mandatoryArgument)
    {
      if (value == null)
      {
        throw new ArgumentNullException("value");
      }

      if (processDelegate == null)
      {
        throw new ArgumentNullException("processDelegate");
      }

      _value = (string)value; // will throw exception on mismatch
      _processDelegate = processDelegate;
    }

    protected override bool Process(object importantReference, object input)
    {
      return _processDelegate(importantReference, _value, Convert.ToString(input, System.Globalization.CultureInfo.InvariantCulture));
    }
  }
}

With these two basic child classes built upon our new IProcessor implementation, we can restructure a new IProcessorFactory implementation. As I mentioned, we can leverage lambdas to move some logic back into the factory class and keep the processor implementations relatively basic.

Here’s the new factory:

using System;
using System.Collections.Generic;
using System.Text;

using LambdaRefactor.Processing.PostRefactor.Numeric;
using LambdaRefactor.Processing.PostRefactor.String;

namespace LambdaRefactor.Processing.PostRefactor
{
  public class ProcessorFactory : IProcessorFactory
  {
    public IProcessor Create(ProcessorType type, object mandatoryArgument, object value)
    {
      switch (type)
      {
        case ProcessorType.GreaterThan:
          return new NumericProcessor(mandatoryArgument, value, (_, x, y) => x <; y);
        case ProcessorType.StringEqual:
          return new StringProcessor(mandatoryArgument, value, (_, x, y) => x == y);
        /*
         * Look how easy it is to add new processors! Exercise for you:
         * implement the remaining processors in the enum!
         */
        default:
          throw new NotImplementedException("The processor type '" + type + "' has not been implemented in this factory.");
      }
    }
  }
}

As you can see, our new factory is simple like our first implementation. The major difference? We’re passing very simple lambdas that would have otherwise been functionality defined in a very light-weight child class. This allows us to move away from having many potentially very bare-bones classes and minimizes the amount of boilerplate duplication.

Summary

I didn’t post it here, but the original implementation that this example paralleled  in real life was a pain to deal with. It was hard to test, brittle to modify/extend, and just downright unwieldly. It was obvious to me that switching to a refactored object-oriented implementation was going to make this style of code easy to extend and easy to test.

The initial refactor posted in this example was a great step in the right direction. The code became easy to build upon by relying on simple OOP principals, and granular parts of the functionality became really easy to test. If I just wanted to test certain types of numeric processing, I didn’t have set up a test for my entire massive “process” function. All I’d have to do is make an instance of the processor I want to test, and call the methods I’d like to cover. Incredibly easy.

Lambdas took this to the next level though. By leveraging lambads, I could refactor even more common code into a base class. This meant that  in order to use my processors properly, the final factory class implementation definitely became required to use. It caused a paradigm shift where instead of making lots of light-weight child classes for additional processor implementations, I’d only need to implement some logic in the factory. All of my existing processors could be refactored into a handful of generic processor classes, and the factory would be responsible for passing in the necessary lambdas.

Lambdas let you accomplish some pretty powerful things, and this refactoring example was one case where they made code much easier to manage. Hopefully you can find a good use for lamba expressions in your next up-coming programming task!

Code Downloads


API: Don’t Forget About The Non-Public API!

Image courtesy of FreeDigitalPhotos.net

Background

From an object oriented programming perspective, an application programming interface (API) is often referred to as the way other developers can interact with the public members of your class(es) and interface(s). Of course, API can be used to describe how one interacts with a web service (or other types of services), but for this discussion I’m limiting the scope to that of interfaces and classes. Limiting the definition of API to public members (or the equivalent of C#’s “public” in other languages) is omitting one huge part of what it encompasses. The purpose of this post is to clarify, in my opinion, why I think forgetting about the non-public API can lead to bad framework and API designs.

API And The Audience

I’ve written before about what I think makes a good API, and I had some comments on Code Project on the same posting that lead me to this topic: the audience. There’s a lot to consider when you’re writing what many people would consider good, clean code. It’s impossible to please everyone because everyone has some sort of best practice, guideline, or convention that they follow because they believe it’s the best. So, I’m not going to tell you that some way is the best… I just want you to be conscious of your audience so that you can make better decisions.

Okay, okay… So what do I mean by audience? I’m going to generalize your audience (the consumers of your API) into two distinct categories. The first category is the group of developers who will be using references to things that implement your interfaces and your concrete classes. They’ll use the instances of the things you define. For example, let’s consider a something built-in to the .NET framework: the List<T> class. Your first group of API consumers are just going to use instances of this class exactly as provided by the framework. They’ll make new instances of them and pass them around to be used in functions, or make properties that return instances of List<T>, or declare variables of type List<T>, etc… They use it as is, as provided.

The second general group of API consumers are the ones who are going to be extending your interfaces and classes. A great example of this is the EventArgs class. This class is super basic; It doesn’t do anything! Anyone who wants to use the EventArgs class essentially has to make their own class that inherits from EventArgs. Another example of this is the Exception class. Again, this class is built for people to extend it with their own implementations. I suppose both of these examples are pretty primitive because the base classes don’t offer much functionality, but what if your class wanted to let child classes override the default behaviour? I’ll have some more concrete examples of this later on.

Intentions Of The Audience

With these two general types of consumers defined, it’s a bit easier to consider how people may want to use your API differently. The first class of consumer wants to able to easily call your methods that you’ve defined and easily create the objects/interfaces that are core to your API. This means that the input they need to provide should be extremely basic, so using things like built in interfaces, or other classes that you’ve defined that are easy to create. These consumers also want information-rich return values and classes. Why? Because it makes their life easy! If they only need to provide a little bit of information and they get a lot back, they’re able to do a lot more with the data that they have access to. They don’t need to (and they certainly don’t want to) call 10 different things in intricate ways to get a little bit of data back.

The second class of consumer takes the exact opposite perspective. Here’s why. In the first case where the first group of consumers want to pass in only minimal information to methods, the second class of consumer wants lots of information passed in. This class of consumer is required to do some job or return some data, so the more information they are provided the easier it is for them to perform their job. Similarly, they want the return values of methods to be as simplistic as possible. Why? It makes their job easier! If they are responsible for returning some incredibly complex class from a method defined in your interface, and the input to that method is only minimal information, this makes the job of the second class of consumer quite difficult.

The best way to remember these similarities and differences is that the flow of data is opposite depending on what type of API consumer you’re talking about. The first type of consumer wants to provide a little and get a lot, and the second type of consumer wants to get a lot and provide a little. Makes sense right? Everyone wants to do the easy thing.

The take-away here is that depending on who you think your main audience is going to be for your API, it will affect how you structure it. And this is exactly why forgetting about the non-public API can be a huge mistake. Forgetting this part of the API makes the whole thing difficult to extend because your base classes cannot as easily be built on top of. You actually make the lives of the second class of API consumers very difficult if you ignore their needs.

Example: WinForms

If you’ve done desktop application development in C#, you’ve likely used (or at least heard of) WinForms. Some people new to desktop application development might have actually started with Windows Presentation Foundation (WPF), but the same concepts will apply here. In my opinion WinForms is a great example of an API that has both public and non-public components that were designed with both audience types in mind. Let’s start with the first class of API consumers.

If you’ve dabbled in WinForms, you’re likely familiar with the Windows Form Designer offered in Visual Studio. If you are… then congrats! You’re the first class of API consumer that I’ve described. By using the built in classes like Buttons, TextBoxes and Labels, you’re using the out-of-the-box components offered in the framework and consuming the public API offered by these controls. You’ll be using things like the Text property offered on these controls and interacting with them via their events (i.e. hooking onto click events or text changed events). You’ll be using the public API. Nothin’ wrong with that!

//
// MyButton
//
this.MyButton.Location = new System.Drawing.Point(146, 84);
this.MyButton.Name = "MyButton";
this.MyButton.Size = new System.Drawing.Size(75, 23);
this.MyButton.TabIndex = 0;
this.MyButton.Text = "Click Me!";
this.MyButton.UseVisualStyleBackColor = true;
this.MyButton.Click += new System.EventHandler(this.MyButton_Click);

private void MyButton_Click(object sender, EventArgs e)
{
    // do stuff!
}

Now, the creators of WinForms weren’t dummies. They knew that they weren’t going to be able to offer you every possible control you’d ever need. They made the API in WinForms have great support for the non-public members of their base classes! So, what do I mean by that?

Let’s pretend we want to have our own fancy button class. Because our button is fancy, we always want to tell the user just how fancy it is when they click it. Now if the framework you were using had a poorly designed API, you might be forced to code a button from scratch. That would be pretty lame considering the complexity of the built-in controls offered to you already. For this example, you could surely just create one button and hook an event handler onto the click event (using the public API), but what if you wanted to re-use this everywhere throughout your user interface? You’d want your own FancyButton class that has this behaviour built-in so you can easily reuse it. No problem.

private class FancyButton : Button
{
    protected override void OnClick(EventArgs e)
    {
        base.OnClick(e);
        this.Text = "The fancy button was clicked.";
    }
}

The non-public API in WinForms gives you access to built-in behaviour of the base classes. You don’t need to hook onto events to get the job done, you can actually override the OnClick method and prevent the click event from even firing! The focus on the non-public API allows developers to extend the built-in classes without having to design their own from the ground up.

Summary

It takes a lot of practice and experience to be able to write a good API. There’s also plenty of different opinions on what constitutes a well-designed API. In my opinion, you need to give a lot of thought as to how your API will be used. Consider the two general types of API consumers I’ve defined: The consumers that use your API with the public parts of the interfaces and classes you’ve defined, and the consumers that want to extend your defined classes to provide their own related implementations. These two types of consumers will want very different things, and in order to please the second class of consumer, you absolutely cannot forget about the non-public API.

Some food for thought:

  • How can I best guess how developers will use my API?
  • I’m providing base classes with my framework and API. Can people easily extend them through inheritance? Will people want to extend them?
  • What would make my public API easier for other developers to use?
  • What would make my non-public API easier for other developers to build on to my base classes?

  • Nick Cosentino

    Nick Cosentino

    I work as a team lead of software engineering at Magnet Forensics (http://www.magnetforensics.com). I'm into powerlifting, bodybuilding, and blogging about leadership/development topics over at http://www.devleader.ca.

    Verified Services

    View Full Profile →

  • Copyright © 1996-2010 Dev Leader. All rights reserved.
    Jarrah theme by Templates Next | Powered by WordPress