Tag: Class

What Makes Good Code? – Should Every Class Have An Interface? Pt 2

Should Every Class Have an Interface?

This is part two in the sub-series of “Should Every Class Have an Interface?“, and part of the bigger “What Makes Good Code?” series.

Other Peoples’ Code

So in the last post, we made sure we could get an interface for every class we made. Okay, well that’s all fine and dandy (I say half sarcastically). But you and I are smart programmers, so we like to re-use other peoples’ code in our own projects. But wait just a second! It looks like Joe Shmoe didn’t use interfaces in his API that he created! We refuse to pollute our beautiful interface-rich code with his! What can we do about it?

Wrap it.

That’s right! If we add a little bit of code we can get all the benefits as the example we walked through originally. It’s not going to completely fix “the problem”, but I’ll touch on that after. So, we all remember our good friend encapsulation, right?

Let’s pretend that Joe Shmoe wrote some cool code that does string lookups from an Excel file. We want to use it in our code, but Joe didn’t use the IStringLookup interface (because… it’s in OUR code, not his) and he didn’t even use ANY interfaces. The constructor for his class looks like:


public ExcelParser(string pathToExcelFile);

On this class, there’s two methods. One method allows us to find the column index for a certain heading, and the other method allows us to get a cell’s value given a column and row index. The method calls looks like:


public int GetColumnIndex(string columnName);

public string GetCellValue(int columnIndex, int rowIndex);

We can wrap that class by creating a wrapper class that meets our interface, like so:


public sealed class ExcelStringLookup
{
  // ugh... we have to reference the class directly!
  private readonly ExcelParser _excelParser;

  // ugh... we have to reference the class directly!
  public ExcelStringLookup(ExcelParser excelParser)
  {
    _excelParser = excelParser;
  }

  public string GetString(string name)
  {
    var columnIndex = _excelParser.GetColumnIndex(name);
    // assumes all of our strings will be under a column header
    var cellValue = _excelParser.GetCellValue(columnIndex, 1);
    return cellValue;
  }
}

And now this will plug right into the rest of our code that we defined originally.

This doesn’t totally eliminate “the problem” though (the problem being that some class doesn’t have an interface (what this post is trying to answer)). There’s still a class we’re making use of that doesn’t have an interface, but it looks like we’ve reduced the exposure of that problem to JUST this class and the spot that would construct this class. Are we okay with that?

Thoughts So Far…

Let’s do a little recap on what we’ve seen so far:

  • Having interfaces for our classes is a nice way to introduce a layer of abstraction
  • Interfaces are just *one* tool to get layers of abstraction introduced
  • If you wanted to have interfaces for all of the classes in your code and some third party didn’t use interfaces, that code is likely not as common in your code base (especially if you wrap it like I mentioned above). This may not always be true in your code base, but it’s likely the case.
  • The amount of work to wrap things can vary greatly. Some things are straight forward to wrap, but you need to add many methods/properties. Sometimes it’s the inverse and you only have a few things to wrap but they’re not straight forward.
  • The number of classes you’d need to wrap to get to this state can vary greatly… Since even built-in System classes aren’t all backed with interfaces!
  • There’s certainly a trade off between the original work + maintenance to wrap a class in an interface versus the benefits it provides.

Is that last point blasphemy?! So there may actually be times we DON’T want to have an interface for a class?

Watch this space for part 3 where we start to look at a counter-example!

 


Should My Method Do This? Should My Class?

Whose Job Is It?

I wanted to share my experience that I had working on a recent project. If you’ve been programming for a while, you’ve definitely heard of the single responsibility principle. If you’re new to programming, maybe this is news. The principle states:

That every class should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class

You could extend this concept to apply to not only classes, but methods as well. Should you have that one method that is entirely responsible for creating a database connection, connecting to a web service, downloading data, updating the database, uploading some data, and then doing some user interface rendering? What would you even call that?!

The idea is really this: break down your code into separate pieces of functionality.

Easier Said Than Done… Or Is It?

The idea seems easy, right? Then why is it that people keep writing code that doesn’t follow this guideline? I’m guessing it’s because even though it’s an easy rule, it’s even easier to just… code what works.

The recent experience I wanted to share was my work on a project that has a pretty short time frame to prove it was feasible. It was starting something from scratch, so I had all the flexibility in the world to design code however I wanted to. I really made an effort to keep asking myself this one question: Whose job is it?

Every time I asked that question and found that it was not my current method’s responsibility, I would ask “Is this class really responsible for that”? I’d either go make myself a new method in my class or I’d just go immediately make a new class with a single method on it. It seemed like a bit of extra overhead each time I had to do it, but was it worth it in the end?

Absolutely. After the project had proven itself and development continued on, I was easily able to refactor code (where necessary) and mock out functionality in my coded tests. Instead of trying to write test setup code that required a whack of classes I needed to initialize, I could mock out a couple of interfaces and test with ease. It was also really obvious which pieces were responsible for what functionality.

Final Thoughts

If you want to get better at following the single responsibility principle, I think it starts with one question: Whose job is it? Try it out!


Yield! Reconsidering APIs with Collections

Yield! Reconsidering APIs with Collections (Image by http://www.sxc.hu/)

Yield: A Little Background

The yield keyword in C# is pretty cool. Being used within an iterator, yield lets a function return an item as well as control of execution to the caller and upon next iteration resume where it left off. Neat, right? MSDN documentation lists these limitations surrounding the use of the yield keyword:

  • Unsafe blocks are not allowed.
  • Parameters to the method, operator, or accessor cannot be ref or out.
  • A yield return statement cannot be located anywhere inside a try-catch block. It can be located in a try block if the try block is followed by a finally block.
  • A yield break statement may be located in a try block or a catch block but not a finally block.

So what does this have to do with API specifications?

A whole lot really, especially if you’re dealing with collections. I personally haven’t been a big user of the yield keyword, but I’ve never really been forced to use it. After playing around with it for a bit, I saw a lot of potential. I’ve written before about what I think makes a good API. In my article, I was making a point to discuss two perspectives:

  • Who needs to implement your interface. You want it to be easy for them to implement.
  • Who needs to call your interface. You want it to be easy for them to use.

In my opinion, the IEnumerable<T> interface was a tricky thing to work with as a return value. You can essentially only iterate an IEnumerable, and at the time of calling a function, maybe that’s not what you want to do. The flip side is that for the person implementing the interface, IEnumerable<T> is a really easy interface to satisfy. However, the yield keyword has opened up some new doors.

In this article, I’d like to go over a couple of different approaches for an API and then explain why the yield keyword might be something you consider next time around. Disclaimer: I’m not claiming anything I’m about to present is the only way or the best way–I’m just sharing some of my own findings and perspective.

Interface For Returning Collections

The first type of API I’d like to look at is for returning collections. Based on my own API guidelines, I’d ideally choose an interface or class to return that provides a lot of information to the caller that is also easy to create for the implementer of my interface. The List<T> class is a great choice:

  • It’s easy to construct
  • It’s built-in to the .NET framework
  • It provides many handy functions (All of the IList<T> functionality as well as things like AddRange(), or functions that support delegates)

My next choice might be to have a return type of IList<T>, which would provide a little less ease of use to the caller, but make it even easier for the implementer of the interface. They could return arrays of type T, since an array implements the IList<T> interface, or their own custom list implementation that doesn’t inherit from the List<T> class. The differences between using IList<T> and List<T> are arguable pretty small.

A third alternative, which I would have avoided in the past, is to return an IEnumerable<T>. My opinion used to be that this made the life of the interface implementer a bit easier compared to returning an IList<T>, but complicated the life of the caller for a couple of reasons:

  • The caller would have to use the results of the function in a foreach loop.
  • The caller would have to add the items to their own collection to be able to do much more with the items.

My naive implementations of being forced to return an IEnumerable<T> were… well… crap. I would have constructed a collection within the function, fill it up, and then return it as an IEnumerable<T>. Then as the caller of my function, I’d have to re-enumerate the results (or add it to another collection):

public static IEnumerable<T> GetItems()
{
  var collection = new List<T>();
  // add all the items to a collection
  return collection;
}

private static void Main()
{
  var myCollection = new List<T>();
  myCollection.AddRange(GetItems());
  // use myCollection...

  // or.....
  foreach (var item in GetItems())
  {
    // use the items
  }
}

Seems like overkill to me with that implementation. However, we’ll examine how using yield can truly transform this into something… better. So to reiterate, a few potential implementations for an API involving collections might be:

  • Return a List<T> class
  • Return an IList<T> (or even an ICollection<T>) interface
  • Return an IEnumerable<T> interface

Constantly Creating Collections

My design decisions, in the past, were really driven by two guidelines:

  • Make it easier for the person implementing/extending the API
  • Make it easy for the person consuming the API

As I quickly illustrated in the first section, this meant that I would have a method where I would create a collection, fill it with items, and then return it. I could generally pick any concrete collection class and return it since I would usually pick a simple collection as the return type. Easy.

One thing that might be noticeable with this approach is that it looks pretty inefficient to keep creating new collections, fill them, and then return them. I’ll illustrate with a simple example. We’ll create a class that has a method on it called GetItems(). As per my reasoning presented earlier, we’ll have this method return a List<T> instance, and to make this example easier to work with, we’ll pass in an IEnumerable<T> instance. For what it’s worth, the input to this function is really just for demonstration purposes here–We’re really focusing on how we’re creating our return value.

public class CreateNewListApi<T>
{
  public List<T> GetItems(IEnumerable<T> input)
  {
    var newCollection = new List<T>();

    foreach (var item in input)
    {
      newCollection.Add(item);
    }

    return newCollection;
  }
}

And now that we have our simple class we can mock up a little test for performance… Just how inefficient is creating new lists every time?

internal class Program
{
  private static void Main(string[] args)
  {
    const int NUM_ITEMS = 100000000;
    var inputItems = new int[NUM_ITEMS];

    Console.WriteLine("API Creating New Collections");
    var api = new CreateNewListApi<int>();

    var watch = Stopwatch.StartNew();
    var results = api.GetItems(inputItems);

    foreach (var item in results)
    {
    }

    Console.WriteLine(watch.Elapsed);
    Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64);
    Console.ReadLine();
  }
}

When I run this on my machine, I get an average of about 1.73 seconds. The memory printout I get when running is 1615908864 bytes. So is that slow? Is that a lot of memory usage? I think it’s pretty hard to say conclusively without being able to compare it against anything. So let’s keep this number in mind as we continue to investigate the alternatives.

Side Note: At this point, some readers may be saying “Well, if the input to our function was also a list (or if whatever our function has to work with was otherwise equivalent to our return value) then we wouldn’t have to go populate a new collection every time… We can just return the underlying collection”! And I would say you are absolutely correct. If your function has access to an instance of the same type as the return type, then you could always just return that instance. But what implications does this have? You’re now giving people access to your underlying internals, and they can go modify them as they please. So, if you need to control access to items being added or removed, then it might not make sense for you to expose your internal collections like this.

Yield to Incoming API Alternatives

We’ve seen how my past implementations may have looked, so how might we tweak this? If we tweak our API a bit, we can make our method return an IEnumerable<T> instead. Let’s see what that might look like:

public class YieldingApi<T>
{
  public IEnumerable<T> GetItems(IEnumerable<T> input)
  {
    foreach (var item in input)
    {
      yield return item;
    }
  }
}

So in this API implementation, all we’ll be doing is iterating over some type of collection and then yielding each result. If we run it through the same type of test as out previous API implementation, what kind of results do we end up with?

internal class Program
{
  private static void Main(string[] args)
  {
    const int NUM_ITEMS = 100000000;
    var inputItems = new int[NUM_ITEMS];

    Console.WriteLine("API Yielding");
    var api = new YieldingApi<int>();

    var watch = Stopwatch.StartNew();
    var results = api.GetItems(inputItems);

    foreach (var item in results)
    {
    }

    Console.WriteLine(watch.Elapsed);
    Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64);
    Console.ReadLine();
  }
}

When I run this on my machine, I get an average of about 2.80 seconds. The memory printout I get when running is 449409024 bytes. How does this relate back to our first implementation? Well, it’s certainly slower. It takes about 1.62x as long to enumerate using the yield implementation as it did with the first API we created. However, yield also uses less than 1/3 (about 27.8%, actually) of the memory footprint when compared to the first implementation. Pretty cool results!

Site Note: So yield was a bit slower according to our results, but what happens if print the elapsed time before we run that foreach loop? Well, on my machine it averages at about one millisecond. Now that’s fast, right?! The cool thing about using yield with the IEnumerable<T> interface is that the work is deferred. That is, not until the program goes to actually run the enumeration do we get our performance hit. Try it out! Try moving the time printout from after the foreach loop to before the foreach loop. Try sticking breakpoints in on the line that yields. You’ll see what I mean.

Summary

In this article, I’ve explored two different ways of implementing an API (specifically focusing on the return value). We saw a brief performance analysis between the two and I highlighted some differences in both approaches. Let’s recap though:

  • Approach 1: Returning a List<T> and creating the collection ahead of time
    • Appeared to be overall a bit faster then yielding.
    • Consumed much more memory than yielding.
    • Callers can use the results immediately for enumeration, checking count, or as a collection to add more things to
    • The return type of List<T> is a bit more restrictive than an IEnumerable<T> like in the second API implementation
  • Approach 2: Return type of IEnumerable<T> and yielding results
    • Appeared to be overall a bit slower than the List<T> implementation
    • Lazy. We don’t actually execute any enumeration code until the caller actually enumerates
    • Consumed significantly less memory than the first approach using List<T>
    • Callers can enumerate the results immediately, but they need to add the results to a collection class to do much more than enumerate

So next time you’re designing an API for your interfaces and classes, try keeping these things in mind!

EDIT (December 30th, 2013):
As per some comments on Google+ by Dan Nemec, I figured I’d add a bit more here in the summary. IEnumerable<T> on it’s own is certainly not useless, especially if you’re leveraging LINQ or extension methods. My main beef in the past was that the consumer of an API with a IEnumerable<T> return value can only iterate over the results… And that’s just because that’s all that IEnumerable<T> lets you do. Dan made a great point though–If you are leveraging things like extension methods, or LINQ (which introduces tons of handy extension methods for working with IEnumerable<T>) then you get all of that functionality tacked on to IEnumerable<T>.

So if you’re not fortunate enough to be working with LINQ or extension methods (i.e. working with legacy code in old .NET framework versions… and yes I am familiar with the attribute you can add in to allow extension methods provided you have a compiler version high enough to support it), then IEnumerable<T> sometimes just plain sucks. I’d wager the majority of C# developers aren’t in this boat though, so I’d like to thank Dan again for his comments.


  • Nick Cosentino

    Nick Cosentino

    I work as a team lead of software engineering at Magnet Forensics (http://www.magnetforensics.com). I'm into powerlifting, bodybuilding, and blogging about leadership/development topics over at http://www.devleader.ca.

    Verified Services

    View Full Profile →

  • Copyright © 1996-2010 Dev Leader. All rights reserved.
    Jarrah theme by Templates Next | Powered by WordPress