Foreach Loops
I've mentioned before that foreach loops can be slow. This is often simply because of their nature of needing to retrieve an enumerator, which is a new instance of an object on the heap, everytime you loop through an object. I did some experimenting with this concept and ran some test code (below), and identified a way that can help you improve performance of areas that need to be driven by for loops.
The code
This is the code I used to test it:
using System;
using System.Collections.Generic;
using System.Diagnostics;
internal class SpeedTest
{
private class TestClass
{
public IEnumerable<int> Yielder()
{
yield return 1;
yield return 2;
yield return 3;
yield return 5;
yield return 6;
}
public void Callbacker(Action<int> callback)
{
callback(1);
callback(2);
callback(3);
callback(5);
callback(6);
}
}
static void Main()
{
var sw = new Stopwatch();
const int ITERATIONS = 10000000;
int total = 0;
var test = new TestClass();
GC.Collect();
sw.Start();
for (int i=0; i<ITERATIONS; ++i)
{
foreach(var item in test.Yielder())
{
total += item;
}
}
sw.Stop();
Console.WriteLine("Total {0}ms, taking {1}ms/each", sw.ElapsedMilliseconds, sw.ElapsedMilliseconds / (double)ITERATIONS);
sw.Reset();
Console.WriteLine(GC.GetTotalMemory(false));
GC.Collect();
sw.Start();
for (int i = 0; i < ITERATIONS; ++i)
{
test.Callbacker(v =>
{
total += v;
});
}
sw.Stop();
Console.WriteLine("Total {0}ms, taking {1}ms/each", sw.ElapsedMilliseconds, sw.ElapsedMilliseconds / (double)ITERATIONS);
sw.Reset();
Console.WriteLine(GC.GetTotalMemory(false));
GC.Collect();
Console.WriteLine(total);
}
}
The Results
On mono 3.0.x the results were a significant improvement. The foreach loop took 6.0s, while the callback took 2.9s on average. We also saw a small memory footprint improvement of a handful of kilobytes less heap being used.
In the windows runtime, we saw a far large improvement. While I don't have the numbers at the moment to quote, I saw ~10x performance improvement, and a few megabytes of heap memory that was spared. Running the test through mono running a performance monitor shows that roughly 250MB of memory is created just from the enumerator alone (over the course of several million iterations). I suspect mono just cleans this up in a more timely manner.
Disclaimer
Like always, I will say there is a time and place for optimization. Foreach loops are easier to read, and I won't stop using them in areas of my code that are not performance critical (loading, scheduled tasks, etc). What I am using this method for is my tight loop of dynamic geometry rendering, which can easily be called thousands, if not tens of thousand, times per second. From these changes alone, I see a 25% improvement in performance for critical lighting computations.