Guidelines to improve your software design skills with .NET (Part I)

Hi readers,

C# advent calendar

This year, I decided to participate in the C# advent calendar. You can find other posts for the C# advent calendar here.

I had the chance to partake before in the F# advent calendar and was not aware that there was one for C# users. I feel this blog entry can target multiple languages and paradigms. But today, you will see my code samples only targeting C#. With that in mind, let us dive into the heart of the topic: software design and data structures. 

In this article, we won’t be going over a small subject. To avoid making the blog entry longer than it should be, I will break it into multiple parts. In this first part, we will not go too deeply into data structures. That will be reserved for later. The focus of today is more on software design.

Care about data structures and algorithms

Software development goes beyond just knowing about programming language features and syntax. It also goes well beyond googling for your problem and copy-pasting code and hacking it together until it works itself out somehow. As a software developer, you are responsible for every line of code that will be thought of, written, maintained, and deleted. A solution that just works will not cut it sometimes. 

Data structures and algorithms are not an easy topic to comprehend. It takes time to understand them and build the experience to know when to best apply them. It might be frustrating in the beginning to get started and work out solutions with these tools. DO NOT FEEL DISCOURAGED! 

Walt Disney once said that “it is fun to do the impossible”. Take your time, take just one more step forward and try again. Rushing to understand something will do you no good. In fact, here is the time to slow down and absorb as much as you can. Learn as much as you can while you continue to improve.

“If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle.” 

― Sun Tzu, The Art of War

By knowing yourself and your tools, you shall be capable of leading your software projects to success. At least, that is what I choose to understand from the quote. 🙂

Do not over-engineering a solution

There are times when the business logic is not made apparent in the documentation and it becomes necessary to take a deep dive in at the messy implementation to somehow make some sense of it. In the code, important knowledge constructs can be hidden or be made implicit through multiple layers of abstractions and heavy usage of object-oriented design patterns. This is an observation that I have made during my career. 

There is a false sentiment of confidence in using overly-complex components to implement your systems as you are developing software. Not only that, but the community leaders we look up to, the open-source tools on GitHub that we can find, the recognized references that you have are also embracing that. There is a big mental cognitive load to complicated systems on software developers. You need to parse complex mental models in your mind to try to figure out if something works as expected. That is a big ask if the code is not helping you to perform that task.

I am a firm believer in not forcing a design. In the past few years, I have come to embrace functional programming, amongst other development tools. One thing that I enjoy about the paradigm is the fact that you need to separate behaviour from the data itself. 

These data structures can explicitly encapsulate small pieces of business logic. Since type definition has a low-footprint and ceremony in ML-like languages (ex: F#), it is common to see a lot of types. Do not focus too much on what is an ML programming language. You can open a tab on the link I have provided and read on that later.

As I got more comfortable with F#, I have noticed that I do take the time to reflect and identify how a portion of the data should be shaped. Not only the shape matters but its characteristics and how it will be linked to other components. Being a stricter language than C#, it forces me to spend more time thinking and designing how the pieces will come together to achieve their goal(s). When I see a piece of data getting way to complex, I will break it down into smaller components quickly so going through the code is less taxing for the readers who need to understand the intent of the system/application.

This has a non-negligible impact on how you approach building software. This concept has helped me as I brought it back whenever I design software in C# or in another programming language. One thing that is also clear to me: exposing the relationships in the data will help you in numerous ways. The maintainer of the code will be capable of discerning the difference between building a relationship and actual business logic with more ease. Making the life of those who read your code better improves you as a developer. You will be demonstrating a trait of great developers: empathy.

Embrace simple solutions

Once your domain has been established, it becomes very fluid to let your data drive the design that you need to support your vision. At that moment, it becomes manageable to identify the data transformations that are mandatory to implement a working product.

While putting together your implementation, you will want to avoid buggy code. At any stage of your career, that is something that will be expected. It is impossible to think of every possible scenario, and this can lead to potential bugs. Without over-engineering a solution, it is possible to mitigate those pesky problems out of our codebase. A simple solution is a winning solution in my eyes.

In the software industry, we are evolving in a fast-paced environment that is forever changing and, somehow, becomes even more tortuous than ever-before. When we are faced with technical problems, they are rarely easy to solve. It can be easy to fall into this anti-pattern, where complex solutions are greater, as developers are getting more seniority. I would argue that it takes energy and mastery in software development to make something complicated seem easy to apprehend. 

A developer must understand the problem inside-and-out, and seek out logical pieces to help them build a working solution. A good solution does not come to life in the first implementation. It is a dedicated iterative approach that requires patience and care as the problem unfolds. It will also be quite clear to see how to extend the solution to meet forever evolving requirements as time goes by.

Figuring out your problem space

Initially, you should start by understanding the context as much as you can. If you can, ask clarifying questions to get you one step closer to solving the problem. After that, you need to reach into both your problem-solving abilities and your software toolbox. Try to see what you can solve by breaking the problem apart into smaller problems. By doing so, you can divide-and-conquer. 

This will also help to build your confidence as you are solving the matters at hand. Once you have reached the end of your solution, only then should you check for ways to improve. You can do this by doing the following:

  • Checking for edge-cases in your implementation.
  • Using advanced techniques such as a caching mechanism to reduce access to the disk.
  • Using better-suited data structures for your given problem. For example, there are many scenarios where a hash table is an excellent data structure to improve your solution.

Whether you find yourself in an interview context, in an open-source project, or in a business context, trying to gather as much information on the problem will help drive the solution. There are multiple factors that you need to be concerned about while implementing your solution. Here is a list :

  • You are not writing code for a machine, but a human. Trying to score cool points in your code can make it harder to read and understand. As more people are going to look at your code, you need to pay attention to make sure they can quickly grok what you are trying to achieve.
  • If you are not writing code for a hot-path, then do not over-engineer a solution. Your piece may run once or twice in the life cycle of the application that you are building. If it is not a requirement to be fast, you do not need to make it a top priority. Do not write poor quality code just because it does not need to be fast.
  • Based on the input size and your user-context, a brute-forced solution can out-perform an optimized solution. Figuring out how much data you need to handle or how fast you need to be will help get a sense of what your code should look like.
  • Without any backing data, your code is just an opinion. To know how fast your code is in your context, you are going to need benchmarks. With these, you can compare competing solutions and choose what you need for your situation.

Start solving problems with a brute-force approach

You may feel amazing by trying to find the best solution as a first draft. Do not forget that as a software developer, like an author, you write better through editing and multiple iterations over your content. I sometimes make that mistake when I develop a system. As I am writing the code, I am also trying to refactor it to make it feel/look better. Start by writing the code completely, then you can focus on rewriting what feels off.

To that end, brute-forcing your way around the problem is a skill to acquire as quickly as you can. Other people may even call it a hack. It does not matter. It is actually a good thing, as long as you know how to make it better down the line.

Why consider writing a brute-force algorithm? 

They represent a straightforward method of solving your problem through the sheer power of the machine executing your algorithm. They aren’t concerned with how quickly the problem is solved or how many resources are involved. You will see that you understand more about your problem when you are evaluating some code. It won’t matter that it might not look/feel good. 

Aim to get something working for your situation. From there, you can start to see what are the current gaps in the initial solution. As your knowledge of data structures deepens, you will start seeing patterns of which one is better suited for the given problem space. Try to be as pragmatic as you can while maintaining high-quality code. This implies:

  • Making sure that the name of the function/method is easy to understand.
  • Documenting the portions of the algorithm that are harder to understand.
  • Making sure you have validated your arguments against edge-cases before manipulating them.
  • Having at least one test for your function/method. The more you have, the better it will be to validate that it works as expected.
  • Avoiding magic values in your implementations. Figure out a constant name that vehicles the intent behind the value. Make sure others understand the context surrounding your code.

Although flexible, List<T> will not always be the correct collection to use

Data structures are a core component in your career that will help you to continuously improve. You will always be expected to understand how things work under the hood. You are required to understand deeply the code and the systems that you are either designing or maintaining. Focusing on that early on can make a world of difference between an algorithm that continuously runs fast or slow. As a developer, you want to avoid implementing systems that cannot seem to handle larger inputs due to poor implementations.

Modern languages like C# make it painless for the developer of today to write poorly-thought code. Somehow, it is not alarming to develop software with only a List<T> as a data collection to hold your data in your program. Why is that a bad thing to do? That is an excellent question to ask, especially if your development environment is not alerting you that you could probably do better (more on that at the end of the post).

List<T>, or rather, dynamic arrays and arrays in general, have many different applications in the software world. In the wild, there are many more data structures than just arrays. Your software design skills would benefit from learning about the options that are made available to you. Like anything in this world, there is no such thing as a free lunch. Every data structure that you will encounter will have its strengths and shortcomings. It will be up to you, the software developer, to understand them thoroughly and know when you get to use them.

Without going into the specifics, let’s talk quickly of a situation where you would choose a dynamic array over an array. For instance, you are trying to build a todo list application. As the developer, you know it would be a bad design to limit the number of tasks that an end-user could want to organize in their planning. It would be a much better design to let them decide when to add and remove an item. In the case of an array, you need to allocate them a fixed length. For your situation, if your user tries to add one item on their list, it could break your application. .NET makes it easier for developers in these situations with the List<T>. You can keep adding more items to that data collection without concerning yourself about the size limitations. 

Capacity != size

There is something you should be careful about when you are using a dynamic array in C#. It is easy, too much for my taste, to initialize an empty list and then carry on with adding data to it until you are done. Do you truly understand what happens underneath while you’re adding data? Maybe not. A List<T> in C# is nothing magical. Behind the curtains, there is an array with a fixed-size. Usually, it will start with a capacity of four items. This means that your collection can add up to 4 items and stay quite fast. The issue starts when you hit the maximum capacity of your data collection.

A data collection size relates to the number of items it currently holds. The capacity of a data collection relates to the maximum number of items it is allowed to hold. In the case of a List<T>, once initialized, it has a size of 0 items and a capacity of 4 items. When you meet the capacity on your collection and you need to add just 1 more item, then we need to create a new backing array with double the previous size. In that situation, the backing array of 4 elements needs to become a backing array of 8 elements.

Every item that was there before will be copied over and your new item will be stored in your new backing array. This is important to understand. You will allocate redundant memory for your data collection while you’re adding items every time that you reach the capacity of your collection. If you have the chance to know your capacity at the moment you initialize your collection, you’re best to provide it. 

You avoid allocating a ton of memory for your collection and you reserve in your RAM exactly what you need. For instance, say that you create a list out of an existing list where if a certain condition is met, then you want to add the item to your new list of items. In that situation, you have access to the capacity of your new list. It is the size of the previous collection. There’s a chance that either every item meets the requirement or just one falls out short. Thinking about these situations will make a difference while manipulating your data.

Considering algorithm complexity and its input size

An algorithm is just a means to an end. According to Stanford, algorithms are the heart of computer science, and the subject has countless practical applications as well as intellectual depth. Technopedia says the following: “In its purest sense, an algorithm is a mathematical process to solve a problem using a finite number of steps. In the world of computers, an algorithm is the set of instructions that defines not just what needs to be done but how to do it.” As you have noticed, an algorithm it is nothing magical; it is, in fact, a logical series of steps to follow to make something happen. You need to understand your problem space and the behaviour that you are trying to automate.

When we talk about the algorithm’s complexity, we are trying to assert and predict how fast the algorithm should run. In that context, we forget about our surrounding environments such as the programming language or hardware. Analyzing complexity also allows developers to quickly evaluate and compare competing solutions and determine which would fare best based on their given situation and tradeoffs.

One important aspect of complexity analysis is that it allows you to explain how your algorithm will react once your input size grows. As a software developer, you should understand and have a clear idea of how your program will behave if you would get 100x more traffic on your website. Could your server handle this kind of pressure with the current design?

There are several factors involved when you are trying to implement a solution to your problem. Being pragmatic with your time and resources is an important skill to gain in order to thrive in an Agile workplace. In the context of complexity analysis, we are looking for optimized solutions. 

When you practice your skills, focus on finding the best solution you can think of. When designing an algorithm, always have in mind the constraints of your given problem space. For instance, knowing the smallest size and largest size of your input will help drive a solution to your problem. If you are a bit in a rush, implement a brute-force approach that meets your requirements and document how to refine your solution for later.

Designing software with arrays and dynamic arrays (sequential data structures)

What is a sequential data structure? Any data structure that stores and retrieves items in a sequence. You can find below a list of data structures that are sequential:

Arrays, for instance, are a simple data structure for storing a lot of similar items. They pretty much exist in all programming languages. They are in fact used as a foundation to create other data structures that you will use in your software development career.

One thing that you must be aware of with arrays is the fact that they have a fixed capacity. Once that capacity has been established, during the initialization of the collection, it cannot be changed. It might seem wasteful if you are not fully using the memory you have allocated, but it is a necessary evil.

var array = new int[10]; // This array has a capacity of them and cannot be changed.

To try to fix this shortcoming, many programming languages, such as C#, Java and Python, offer a built-in dynamic array. Both the array and the dynamic array will allow you to access elements of your collection with random access. This means that you can provide an index value to access an item in your collection. The only thing to be careful of is to avoid providing a value outside of the bounds of your collection.

var array = new int[10] { 0, 2, 4 }; // We have an array with a capacity of 10 items and currently holds 3 out of those 10.
int valueOfArray = array[2]; // Here, valueOfArray is set to 4 which is the 3rd element of the array.
var dynamicArray = new List<int>(20) { 0, 2, 6, -9, 1000 }; // Here we have a list, or rather, a dynamic array. We have set its initial capacity to 20 and it currently holds 5 elements. Here I say initial capacity due to the nature of the collection. If you add items to your array up until it holds a maximum of 20 items, you are fine. If you add one more, then the backing array of your list will have to be doubled and then the capacity of your list becomes 40. This is a big comment 🙂
int valueOfDynamicArray = dynamicArray[4]; // Here, valueOfDynamicArray is set to 1000 which is the 5th element of the dynamic array.

If your given situation does not force you to do one of the following in a specific way:

  • Accessing your items
  • Searching for your items
  • Inserting new items
  • Deleting existing items

You cannot go wrong when using either an array or a List<T>. 

Before moving to the next section, let’s quickly see common edge cases when using arrays:

  • An empty array
  • An array that contains either 1 or 2 elements
  • An array that contains duplicated elements

Technical problems to practice sequential data structures

Recommendations for technical problem solving

  • You may be tempted to solve and memorize the answer to the technical problems you will face. Do not do this. The key is to understand your set of tools and see patterns emerge as you are solving more and more problems.
  • If you are currently training for a technical interview, you are going to need to become a fast problem solver. There will usually be some sort of time limit in which you need to solve at least one problem, if not more. I would strongly recommend practicing problem-solving at home with a timer.
  • Become more familiar with your language of choice. 
  • Learn to write clean and defensive code. Protect your implementation from common edge-cases surrounding your selected data structure(s) and algorithm(s).

Closing remarks

  • Software developers must understand everything related to the application that they are designing/maintaining.
  • While implementing your code, use empathy. The person fixing a bug ticket for a given repository could be yourself, but 3 years in the future. By then, you may not remember every little detail.
  • Try to figure out the simplest way to approach your problem and do not forget to document what it is that you are trying to achieve. I do not think there will be anyone saying that you are documenting too much as long as it remains relevant.
    • If you are adding two integers together, there is no need to add a comment for that.
  • Understand the data structures available to you thoroughly. Using them will help shape your algorithm as you are coming up with a solution.
  • Be sure to understand the tradeoffs in space and time while deciding which approach suits best a given problem. When in doubt, refer yourself to your context requirements. There will not be any silver bullets to save you. You have to pick your poison and live with it.
  • Do not over-engineer your feature/software architecture/system design. It can make future iterations on the software brittle and make it painful to provide any meaningful contributions.
  • An implementation without any benchmarks to accompany it is merely a developer’s opinion on its tangible performance. You cannot state that your code is fast without any backing facts. I would recommend using the BenchmarkDotNet library for developing your performance investigations.
  • When you are confronted with a problem, try to understand your problem space first before trying to come up with a solution. You are better off taking a few extra minutes to think of alternative approaches before picking your final decision. 
    • Never forget to identify your input size, constraints, and edge-cases.
    • Deconstruct the problem in its simplest form and build an initial around the simplest case before expanding.
    • Start solving your problem with a brute-force approach. Once you have something that works, then you can think about making it fast. Be also aware of how long you can keep the brute-force approach until it is no longer viable.
    • Understand the space and time complexities of alternative solutions to improve your coding solutions.
    • You are writing code for humans, not machines. An elegant solution might shave off a few extra milliseconds but is that gain worth the potential loss in code readability? Never underestimate that.
  • Outside the Agile methodology, when we are discussing solutions, we are trying to find the optimized solution for your given problem space. When many factors and considerations need to be taken into account, by all means, do. For instance, does it make sense to spend an extra month implementing code/refactoring code when we are rushing to be the first application on the market?
  • Sequential data structures have their usefulness but do not underestimate applying other data structures for your design. Just thinking of a list does not make sense when one requirement is to retain unique values. A HashSet<T> would be better here. It might feel less readable to you somehow, but developers come to expect these tools in a codebase. It helps to describe your intent clearer than a comment would have.
  • It is possible to be guided by our development environment when it comes to using data structures. Earlier this year, I have co-authored a Roslyn library that performs code analysis on your usage of LINQ APIs and data structures from System.Collections. It is currently in a prototype phase, but I think you should get a pretty good idea of what you can improve by using it in your C# projects. You can find it here.
  • Software development is an extremely satisfying and fun adventure. Remember what Walt Disney said about the impossible!
  • When you start to understand how data structures work and their intricacies, you will have won future battles that you will face in the future. Remember what Sun Tsu said about knowing thyself.

For all those who have stayed until the very end, congrats to you! This was the biggest post I have ever written on my blog up until now! Thank you for reading and I wish you a happy holiday season!

Stay safe!

Kevin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s