When to Performance
One of the common pieces of advice that I've received (or read on stackoverflow) in response to the question "how do I optimize xyz?" is: You should never optimize until you know that that particular logic or piece of code is the point of weakness.
This bugs me on a few different levels. Mainly, that the person's interest in the subject is being subverted by a common "motto," and it defeats the point of understanding what you're working with.
My belief on the subject is that, while I agree that you shouldn't optimize a piece of code that hasn't proven to be slow, you should understand the code that you are writing and how it could potentially affect performance. This is especially true if you're working for a business that is trying to deliver a [minimal viable] product, or meet internal goals.
I'll focus on C# for now, since that's what I've been working with for some time. Two big common mis-uses I've found is often around LINQ and tight loops.
Loops
Let's start with this basic test result: Looping through an enumerable with a foreach
is usually
10x slower than it's equivilant for-loop using an index.
Just a reminder, and to harp on above, this is just the speed of the loop. When you start putting code in the loop, the actual execution of the enumerator could quickly become the least of your worries. In addition, we also have to consider how the JITer interprets the loop, but we'll ignore that for right now.
So when is this useful?
var data = new List<int>(){...};
int total = 0;
foreach(var item in data)
{
total += item;
}
int total2 = 0;
for(int i=0; i<data.Count; ++i)
{
total2 += data[i];
}
In the above example, the internal operation of the loop is insignificant logic, which means the enumerator execution is the slower part. It's under this use case that you'll find that the for-loop is significantly faster, but, also, significantly more error-prone.
I cannot stress enough that 99% of the time you'll want to continue using foreach
loops. They're safer,
plenty fast, and easier to read/write. However, it is important to understand its limitations, and the
use cases where they provide more value than risk. (The most common example is usually image processing,
or other primitive-array processing).
LINQ
With C#'s LINQ statements, it's very easy to forget that evaluation of expressions is deferred. As an example, take a look at the following code:
var exp = myList.Where(x => SomeValidator(x)).Select(x => x.Val);
exp.Any(x => x == 3);
exp.ToList();
Take a look at that snippet. Anyone familiar with LINQ (or just deferred expressions), will be able
to easily point out that you evaluate a potentially complex statment twice (Once with the Any
, and once
with the ToList
). It's important to understand that often doing a .ToList()
or .ToArray()
earlier in
execution will often save computation later during execution. C#'s jitter is smart, but not that smart.
This is because LINQ statements are simply expressions that are evaluated only once you use them. So,
each time you do any operation on exp
, the .Where
and .Select
gets evaluated.
To Be Continued...
This is just the first of a short series I'm going to make about performance. I'd like to re-emphasis the importance of knowing how performance affects you and your application before going through great lengths to optimize, while remaining aware of the high-level ideas. A web application may not need quite as much optimization as a game :).
In future articles, I want to dig into class versus structs (and thus conversations around references, values, and byrefs), along with unsafe memory management and how it can help performance.
Nonetheless, I believe it's a very important topic for an engineer/architect to be educated on.