Monday, December 14, 2015

Seeing Sharp C++: Behind the IL2CPP magic of Metadata and Statics

To the average C# programmer, how the .NET runtime glues together reflection in the final code isn't something they really need to be familiar with. It Just Works. However, we're Unity developers using IL2CPP for our .NET runtime so we're not the "average C# programmer". When it comes to buffing out every dip in our game's runtime performance, knowing the devil in the details is a requirement. So today we're going to look how (as of Unity 5.3.0f4) the use of metadata/reflection and class statics may be poking at your C#-turned-C++ code with a fiery pitchfork.

Metadata and Reflection in IL2CPP

Since IL2CPP is not a VM using generalized byte code instructions but compiled native code, it has to generate code to always ensure metadata is prepared for use before it is possibly needed. This is done via a sort of 'metadata initialization' prologue that is flagged for execution the very first time the the method is ran. Take for example the following IL2CPP code gen:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// C#: { (new SomeValueType(0)).GetType(); }

// Notice the extern on this TypeInfo*. There's only one TypeInfo* for every class (Int32, StringBuilder, etc)
extern TypeInfo* SomeValueType_t_il2cpp_TypeInfo_var;
extern "C" void IL2CPP_Method (Object_t * __this /* static, unused */, const MethodInfo* method)
{
 static bool s_Il2CppMethodIntialized;
 if (!s_Il2CppMethodIntialized)
 {
  // Will either get the cached TypeInfo*, or if this is the first time the type
  // is used, perform any lookups/etc needed to populate an internal cache
  SomeValueType_t_il2cpp_TypeInfo_var = il2cpp_codegen_type_info_from_index(910);
  s_Il2CppMethodIntialized = true;
 }
 Type_t * V_0 = {0};
 {
  SomeValueType_t  L_0 = {0};
  SomeValueType__ctor(&L_0, 0, /*hidden argument*/NULL);
  SomeValueType_t  L_1 = L_0;
  // GetType() is a method on the Object class. The struct
  // 'SomeValueType', having no reflection bits, needs to be boxed into an object
  // with the proper Il2CppObject header which has a TypeInfo* to the reflection 
  // bits needed for 'SomeValueType'
  Object_t * L_2 = Box(SomeValueType_t_il2cpp_TypeInfo_var, &L_1);
  NullCheck(L_2);
  Type_t * L_3 = Object_GetType(L_2, /*hidden argument*/NULL);
  ...

Here, all the C# code did was instance a struct named 'SomeValueType' then called 'GetType()' on that instance. But the native transformation of that C# must ensure the TypeInfo for the struct is prepared by the runtime.
IL2CPP optimization idea: It should be possible for IL2CPP to see cases where GetType() is called on structs or sealed classes. In these cases it should be possible for the code generator to instead output the same code as seen when using the typeof() operator (I'll leave this as an exercise for the user to investigate). In the case of structs this would result in no extra memory allocations due to boxing just to call Object.GetType
You may be thinking to yourself "My code doesn't make use of reflection so none of this magic should apply". Well, you could be right but also wrong. There's another case where you will see this prologue and more magic: statics and static constructors.

Class Statics

Static constructors (abbreviated as cctors; they're similar to 'dynamic initializers' in C++) are generated either manually or when you define static members with inline initialization (e.g., static int kMaxIterations = 4).

Unlike C++, these static values don't have a final memory location in a .bss or .data section of the executable (although it should be theoretically possible within IL2CPP). A buffer big enough to contain all the static fields is instead allocated in managed memory and then tracked via the class' TypeInfo. With IL2CPP this 'buffer' is actually a codegen'd C++ struct (so you can inspect it if you desire). This has the added benefit that static data is lazy loaded.
Statics background: This lazy loading behavior is part of the C# specification. However, struct cctors [Section 18.3.10] have slightly different requirements than class cctors [Section 17.11].
So if no code in your program ever used DateTime, then its statics should never be allocated and initialized. However, the draw back is that the runtime must ensure the TypeInfo is initialized (also where the statics buffer reference is kept) and that the cctor is ran. With IL2CPP this is implemented via injecting metadata initialization prologues and IL2CPP_RUNTIME_CLASS_INIT macro calls.

We can refer to the IL2CPP code for Runtime::ClassInit to see documentation on when they intend to insert instances of the macro previously mentioned:


1
2
3
4
5
6
7
8
// From vm/Runtime.cpp:

// We currently call Runtime::ClassInit in 4 places:
// 1. Just after we allocate storage for a new object (Object::NewAllocSpecific)
// 2. Just before reading any static field
// 3. Just before calling any static method
// 4. Just before calling class instance constructor from a derived class instance constructor
void Runtime::ClassInit (TypeInfo *klass);

Now, let's see this documentation in action!

ValueTypeWithCctor is a simple C# struct with a static field (and thus a cctor) and ValueTypeWithoutCctor is that same struct with instead a property that returns the same exact value of that static field. The IL2CPP transformations of both can be found here and here, respectively.

Again, the only difference between the two structs is the static 'One' member. In ValueTypeWithCctor it is a static field, and in ValueTypeWithoutCctor it is a static property that returns the result of the only user defined constructor. They both have a user defined addition operator that calls the static Add() method. While the logic for Add() doesn't make use of any static fields it still results in ValueTypeWithCctor::operator+ having a metadata initialization prologue and RUNTIME_CLASS_INIT call (keeping in line with the C# specs and bullet number 3 in the documentation above).

I then have two very similar methods (only differ in which struct they use) that exercise the user defined ctor, One static, operator+, and Add() method: Il2Cpp_ValueTypeWithCctor() and Il2Cpp_ValueTypeWithoutCctor().

In Il2Cpp_ValueTypeWithCctor's C++ you can see that it has a metadata initialization prologue and then a call to RUNTIME_CLASS_INIT. It requires this in order to access the 'One' static field (bullet #2). Even if you removed the access to 'One' (change 'v1 = v0'), it would still have both calls (again, due to bullet #3 "Just before calling any static method"). Albeit, RUNTIME_CLASS_INIT would occur before the call to operator+. However, recall that operator+ already does this! Ouch, code bloat.

Now in Il2Cpp_ValueTypeWithoutCctor's C++ you can see there's no prologue or RUNTIME_CLASS_INIT call. In fact, if we trace the call to ValueTypeWithoutCctor::get_One we can see a likely candidate for code inlining! For simple value types, replacing static fields with getter properties can not only have improvements to code size at call sites, but can trade potential cache misses for inline code. We're not having to hit the TypeInfo (or the per-method init check) to then get the memory of the statics allocation to copy the value of 'One'.
IL2CPP optimization idea: It could be beneficial to expose IL2CPP specific attributes, just like Il2CppSetOption, that say a given type should always be initialized at runtime so that the prologue and IL2CPP_RUNTIME_CLASS_INIT can be excluded from call sites.

Final Thoughts

Walking away from this blog you keep in mind these simple things:
  • Metadata/Reflection are not free lunches. In fact, they could be leading to obese code where you least expect it!
  • Calling GetType() on a concrete type results in a call to Object.GetType(), which in the case of structs requires the value to be boxed to an object (needless memory allocation!).
  • Be mindful of how you use static fields and constructors in structs, as they too can be bloating your code. This is especially the case if you came from a C++ background, as cctors are not like the 'dynamic initializers' you're used to.
I didn't investigate bullets #1 and #4 from the Runtime::ClassInit code. Primarily because I wasn't really focusing on reference types in the article. However, with #1 you can 'rest' assured that after every first instance of "var = new T()" in a method there's a RUNTIME_CLASS_INIT call inserted immediately before it. Whether this really impacts your core game logic is an exercise for you to run.

I should also note that string literals ("Hello World!", etc) are considered metadata too and so using them will result in similar prologues. However, their initialization isn't as liberally applied to call sites of the method which actually uses them like is the case with types containing static fields. I'd say the main time they'd be a concern is when you throw argument exceptions (naming the problem arg in a string) in core game logic. But if you're doing such sanity checks in release builds of performance critical code you probably have bigger issues at hand than how many native instructions your method spills out to.

Bonus Round: IL2CPP Ideas

In the future of IL2CPP, I can see a handful of interesting optimizations that could be done: C++-like statics, constexpr mimicking, and (while note entirely related to this article) bitfields to name a few. All of which should be doable without upsetting the Mono environment which our Editor builds still need to be ran in.

C++-like statics: Some performance critical types could have all or maybe just some statics marked as 'permanent' statics. By this I mean their storage exists in a .data section of the object code, instead of being allocated in strict managed memory (along with all the compiler generated checks we saw above). If these statics are pointerless (not containing managed references), then they should not even need to be registered with the GC. If they are pointerful (do contain managed references), it should just be a matter of instead creating a .roots section and using the GC's APIs for marking foreign memory ranges as candidates for root scanning.

Constexpr - C++11 introduced constant expressions and I believe with IL2CPP we could have constant expressions for our C# too! You could have types with static data which are executed as part of the IL2CPP build step to generate readonly data. Imagine we could apply a (made up) ConstExprAttribute to Vector3.up (a static property getter). IL2CPP could sandbox the UnityEngine assembly or some driver and use the results of Vector3.up to initialize a C++-like static const field with its value (or as an actual constexpr if the C++ compiler used supports it!). It should be possible to even elevate this to arrays of simple/pointerless structs even (eg, a integer lookup table). Under the hood it could just use a std::array for storage, translating .Length and indexers to the matching C++ constructs (some care has to be taken here, since C++ would be using the unsigned size_t for the array length while Length is signed in C#).

Bitfields - It may be possible to mark boolean fields has having their C++ transformations become "bool mSomeFlag : 1". If the type doesn't have a StructLayoutAttribute, IL2CPP could even re-order the bitfield booleans to exist sequentially in the C++ declaration. There could be some issues with this regarding reflection or usage as ref/out arguments but if you're going to go as far as use bitfields then you should know of these problems or respect any compiler glue that may be generated to make them Just Work.

Cherry On Top: History of Metadata Initialization Prologue

I want to point out that metadata init prologues weren't always generated. It wasn't until Unity 4.6.p3 and 5.2.1p2 that these two improvements were made:

  • IL2CPP: Optimize System.Reflection access to metadata.
  • IL2CPP: Reduce initialization time of IL2CPP scripting backend.

Up until then all TypeInfo was initialized at startup, instead of only when needed which is now ensured by injecting the prologue in user code. That initialization time was really taxing the iOS game I was working on at the time (of course it probably doesn't help how type heavy the game code is and that the game had to run on iPhone 4s).