Tuesday, April 12, 2016

Be mindful of Dictionary's keyed with enums. Also, the power of per-frame GC analysis!

If you’re going to use a Dictionary with enums in Unity, or any value types rather*, ensure you provide a custom (hopefully static) IEqualityComparer instance for it to use, instead of letting it use the default implementation. I’m not %100 sure about IL2CPP, but in Mono the default comparer (when used with value types) causes a separate heap allocation every time you do ContainsKey or use the Dictionary indexer (you should also probably ask yourself why you’re using ContainsKey/get_Item, instead of TryGetValue, too). The size of each BoehmGC heap allocation, not factoring for things like alignment, is (sizeof(void*) * 2) + sizeof(enum-underlying-type), which in our case was a byte so on 32-bit machines each allocation was 9 bytes.
* You have no choice when it comes to AOT environments like iOS when it comes to structs as keys
At my day job on an unannounced project, we had a particular enum to describe a set of hard coded things (let's call the enum HardCodedThing). Then a certain set of our source game data (let's call the class ProtoSomething) had two dictionaries that were keyed by this enum for various reasons. In our game's sim we have a method, we'll call it UpdateSomethings(), which does a lot of HardCodedThing indexing. The code is ran when a sim is deserialized or when some other state is derived from the sim.

For the past week or so, I've been adding per-frame memory analysis to the custom memory tools that I worked with Rich Geldreich to implement. I marked two custom events (strings which the game runtime sends to the memory profiler at specific points for later reference) as the start and end frames (or rather, the frames which the events took place in). I was interested, well worried, as to why we had so many GC allocations in general when just idling in the game. It turns out our sanity checks for Animator components having parameters before trying to set them was generating garbage (detailed later). It also just so happened that the sim state deriving I mentioned earlier was taking place shortly before the marked last custom event, so heap allocations for the HardCodedThing enum were bubbling up.

The fix to avoid all these allocations was to simply add a Comparer that is then passed in to our Dictionary<HardCodedThing, TValue> instances. The implementation, using the disguised typenames used in this article, can be found here.

Adding the per-frame analysis utilities to our tool has been EXTREMELY helpful (I found some other GC-heavy APIs specific to Unity that I'm sure we're not the only ones ignorant of...should blog about them soon). We're not stuck scratching our heads by what's making the stock Unity Profiler show "GC Allocations per Frame: 9999 / 12 KB" or some general nonsense. We really don't even bother using Unity's Memory profiler. Reason being is that it's just a snapshot. Snapshots have limited use. It's also far easier and quicker to make a Standalone build vs a device build, since their tool is IL2CPP only. We instead have a transaction log of all Boehm/Mono memory related operations from the start of the process until the very end. Coupled with our custom events setup we can create near-deterministic transaction logs over multiple runs. In theory, we could probably even setup a BVT to ensure we're not leaking managed memory or churning GC memory more than we already expect to.

And when/if we do, we have a boat load of information at our disposal to diagnose problems. Part of the per-frame analysis was to first spit out a CSV file mapping types seen across a set of frames to how many allocations and frees they were associated with, along with their average garbage (if alloc:free is near 1:1, you have garbage heavy code using that type somewhere!). You can find an excerpt of the CSV report here. I mentioned that we had code sanity checking Animators for parameters we are trying to set earlier, and here you can see a glimpse of how they were showing up in our reports.

With this broad report, the next thing I added was a 'blame list' report. The blame list details, per type, the sets of backtraces which are spawning these allocations. We then sort and break down these backtraces by their allocations, so the biggest offenders of garbage are at the top. You can find an excerpt of the blame list report here.

Perf matters. Memory usage matters. Tools matter. While the studio I work at probably won't open source our memory tool's full pipeline anytime soon for various reasons (although our Mono modifications can be found here), I'm hoping to use this blog to publicly speak more about bad/good patterns we see/learn from experience and verified using the tools we've developed.

No comments:

Post a Comment