Kornner Studios: 2015

Monday, December 14, 2015

Seeing Sharp C++: Behind the IL2CPP magic of Metadata and Statics

To the average C# programmer, how the .NET runtime glues together reflection in the final code isn't something they really need to be familiar with. It Just Works. However, we're Unity developers using IL2CPP for our .NET runtime so we're not the "average C# programmer". When it comes to buffing out every dip in our game's runtime performance, knowing the devil in the details is a requirement. So today we're going to look how (as of Unity 5.3.0f4) the use of metadata/reflection and class statics may be poking at your C#-turned-C++ code with a fiery pitchfork.

Metadata and Reflection in IL2CPP

Since IL2CPP is not a VM using generalized byte code instructions but compiled native code, it has to generate code to always ensure metadata is prepared for use before it is possibly needed. This is done via a sort of 'metadata initialization' prologue that is flagged for execution the very first time the the method is ran. Take for example the following IL2CPP code gen:

// C#: { (new SomeValueType(0)).GetType(); }

// Notice the extern on this TypeInfo*. There's only one TypeInfo* for every class (Int32, StringBuilder, etc)
extern TypeInfo* SomeValueType_t_il2cpp_TypeInfo_var;
extern "C" void IL2CPP_Method (Object_t * __this /* static, unused */, const MethodInfo* method)
{
 static bool s_Il2CppMethodIntialized;
 if (!s_Il2CppMethodIntialized)
 {
  // Will either get the cached TypeInfo*, or if this is the first time the type
  // is used, perform any lookups/etc needed to populate an internal cache
  SomeValueType_t_il2cpp_TypeInfo_var = il2cpp_codegen_type_info_from_index(910);
  s_Il2CppMethodIntialized = true;
 }
 Type_t * V_0 = {0};
 {
  SomeValueType_t  L_0 = {0};
  SomeValueType__ctor(&L_0, 0, /*hidden argument*/NULL);
  SomeValueType_t  L_1 = L_0;
  // GetType() is a method on the Object class. The struct
  // 'SomeValueType', having no reflection bits, needs to be boxed into an object
  // with the proper Il2CppObject header which has a TypeInfo* to the reflection 
  // bits needed for 'SomeValueType'
  Object_t * L_2 = Box(SomeValueType_t_il2cpp_TypeInfo_var, &L_1);
  NullCheck(L_2);
  Type_t * L_3 = Object_GetType(L_2, /*hidden argument*/NULL);
  ...

Here, all the C# code did was instance a struct named 'SomeValueType' then called 'GetType()' on that instance. But the native transformation of that C# must ensure the TypeInfo for the struct is prepared by the runtime.

IL2CPP optimization idea: It should be possible for IL2CPP to see cases where GetType() is called on structs or sealed classes. In these cases it should be possible for the code generator to instead output the same code as seen when using the typeof() operator (I'll leave this as an exercise for the user to investigate). In the case of structs this would result in no extra memory allocations due to boxing just to call Object.GetType

You may be thinking to yourself "My code doesn't make use of reflection so none of this magic should apply". Well, you could be right but also wrong. There's another case where you will see this prologue and more magic: statics and static constructors.

Class Statics

Static constructors (abbreviated as cctors; they're similar to 'dynamic initializers' in C++) are generated either manually or when you define static members with inline initialization (e.g., static int kMaxIterations = 4).

Unlike C++, these static values don't have a final memory location in a .bss or .data section of the executable (although it should be theoretically possible within IL2CPP). A buffer big enough to contain all the static fields is instead allocated in managed memory and then tracked via the class' TypeInfo. With IL2CPP this 'buffer' is actually a codegen'd C++ struct (so you can inspect it if you desire). This has the added benefit that static data is lazy loaded.

Statics background: This lazy loading behavior is part of the C# specification. However, struct cctors [Section 18.3.10] have slightly different requirements than class cctors [Section 17.11].

So if no code in your program ever used DateTime, then its statics should never be allocated and initialized. However, the draw back is that the runtime must ensure the TypeInfo is initialized (also where the statics buffer reference is kept) and that the cctor is ran. With IL2CPP this is implemented via injecting metadata initialization prologues and IL2CPP_RUNTIME_CLASS_INIT macro calls.

We can refer to the IL2CPP code for Runtime::ClassInit to see documentation on when they intend to insert instances of the macro previously mentioned:

// From vm/Runtime.cpp:

// We currently call Runtime::ClassInit in 4 places:
// 1. Just after we allocate storage for a new object (Object::NewAllocSpecific)
// 2. Just before reading any static field
// 3. Just before calling any static method
// 4. Just before calling class instance constructor from a derived class instance constructor
void Runtime::ClassInit (TypeInfo *klass);

Now, let's see this documentation in action!

ValueTypeWithCctor is a simple C# struct with a static field (and thus a cctor) and ValueTypeWithoutCctor is that same struct with instead a property that returns the same exact value of that static field. The IL2CPP transformations of both can be found here and here, respectively.

Again, the only difference between the two structs is the static 'One' member. In ValueTypeWithCctor it is a static field, and in ValueTypeWithoutCctor it is a static property that returns the result of the only user defined constructor. They both have a user defined addition operator that calls the static Add() method. While the logic for Add() doesn't make use of any static fields it still results in ValueTypeWithCctor::operator+ having a metadata initialization prologue and RUNTIME_CLASS_INIT call (keeping in line with the C# specs and bullet number 3 in the documentation above).

I then have two very similar methods (only differ in which struct they use) that exercise the user defined ctor, One static, operator+, and Add() method: Il2Cpp_ValueTypeWithCctor() and Il2Cpp_ValueTypeWithoutCctor().

In Il2Cpp_ValueTypeWithCctor's C++ you can see that it has a metadata initialization prologue and then a call to RUNTIME_CLASS_INIT. It requires this in order to access the 'One' static field (bullet #2). Even if you removed the access to 'One' (change 'v1 = v0'), it would still have both calls (again, due to bullet #3 "Just before calling any static method"). Albeit, RUNTIME_CLASS_INIT would occur before the call to operator+. However, recall that operator+ already does this! Ouch, code bloat.

Now in Il2Cpp_ValueTypeWithoutCctor's C++ you can see there's no prologue or RUNTIME_CLASS_INIT call. In fact, if we trace the call to ValueTypeWithoutCctor::get_One we can see a likely candidate for code inlining! For simple value types, replacing static fields with getter properties can not only have improvements to code size at call sites, but can trade potential cache misses for inline code. We're not having to hit the TypeInfo (or the per-method init check) to then get the memory of the statics allocation to copy the value of 'One'.

IL2CPP optimization idea: It could be beneficial to expose IL2CPP specific attributes, just like Il2CppSetOption, that say a given type should always be initialized at runtime so that the prologue and IL2CPP_RUNTIME_CLASS_INIT can be excluded from call sites.

Final Thoughts

Walking away from this blog you keep in mind these simple things:

Metadata/Reflection are not free lunches. In fact, they could be leading to obese code where you least expect it!
Calling GetType() on a concrete type results in a call to Object.GetType(), which in the case of structs requires the value to be boxed to an object (needless memory allocation!).
Be mindful of how you use static fields and constructors in structs, as they too can be bloating your code. This is especially the case if you came from a C++ background, as cctors are not like the 'dynamic initializers' you're used to.

I didn't investigate bullets #1 and #4 from the Runtime::ClassInit code. Primarily because I wasn't really focusing on reference types in the article. However, with #1 you can 'rest' assured that after every first instance of "var = new T()" in a method there's a RUNTIME_CLASS_INIT call inserted immediately before it. Whether this really impacts your core game logic is an exercise for you to run.

I should also note that string literals ("Hello World!", etc) are considered metadata too and so using them will result in similar prologues. However, their initialization isn't as liberally applied to call sites of the method which actually uses them like is the case with types containing static fields. I'd say the main time they'd be a concern is when you throw argument exceptions (naming the problem arg in a string) in core game logic. But if you're doing such sanity checks in release builds of performance critical code you probably have bigger issues at hand than how many native instructions your method spills out to.

Bonus Round: IL2CPP Ideas

In the future of IL2CPP, I can see a handful of interesting optimizations that could be done: C++-like statics, constexpr mimicking, and (while note entirely related to this article) bitfields to name a few. All of which should be doable without upsetting the Mono environment which our Editor builds still need to be ran in.

C++-like statics: Some performance critical types could have all or maybe just some statics marked as 'permanent' statics. By this I mean their storage exists in a .data section of the object code, instead of being allocated in strict managed memory (along with all the compiler generated checks we saw above). If these statics are pointerless (not containing managed references), then they should not even need to be registered with the GC. If they are pointerful (do contain managed references), it should just be a matter of instead creating a .roots section and using the GC's APIs for marking foreign memory ranges as candidates for root scanning.

Constexpr - C++11 introduced constant expressions and I believe with IL2CPP we could have constant expressions for our C# too! You could have types with static data which are executed as part of the IL2CPP build step to generate readonly data. Imagine we could apply a (made up) ConstExprAttribute to Vector3.up (a static property getter). IL2CPP could sandbox the UnityEngine assembly or some driver and use the results of Vector3.up to initialize a C++-like static const field with its value (or as an actual constexpr if the C++ compiler used supports it!). It should be possible to even elevate this to arrays of simple/pointerless structs even (eg, a integer lookup table). Under the hood it could just use a std::array for storage, translating .Length and indexers to the matching C++ constructs (some care has to be taken here, since C++ would be using the unsigned size_t for the array length while Length is signed in C#).

Bitfields - It may be possible to mark boolean fields has having their C++ transformations become "bool mSomeFlag : 1". If the type doesn't have a StructLayoutAttribute, IL2CPP could even re-order the bitfield booleans to exist sequentially in the C++ declaration. There could be some issues with this regarding reflection or usage as ref/out arguments but if you're going to go as far as use bitfields then you should know of these problems or respect any compiler glue that may be generated to make them Just Work.

Cherry On Top: History of Metadata Initialization Prologue

I want to point out that metadata init prologues weren't always generated. It wasn't until Unity 4.6.p3 and 5.2.1p2 that these two improvements were made:

IL2CPP: Optimize System.Reflection access to metadata.
IL2CPP: Reduce initialization time of IL2CPP scripting backend.

Up until then all TypeInfo was initialized at startup, instead of only when needed which is now ensured by injecting the prologue in user code. That initialization time was really taxing the iOS game I was working on at the time (of course it probably doesn't help how type heavy the game code is and that the game had to run on iPhone 4s).

Sunday, November 29, 2015

Seeing Sharp C++: The Il2CppSetOption attribute

The Mysterious Attribute

On November 5th Unity released patches 4.6.9p2 and 5.2.2p3 which included the following improvement:

"IL2CPP: Allow null checks, array bounds checks, and divide by zero checks to be selectively included or omitted from generated C++ code by using a C# attribute on type, method, or property definitions."

Note the lack of documentation of which C# attribute allows you to net these advance controls. Perhaps that's the thing though, that Unity considers these fringe tools and that the people who truly need it will find the attribute on their own. Well, while working on another blog entry related to IL2CPP I recalled this patch note and assumed a few keywords in a search on Google would cure my curiosity. Alas, no answers were served and my curiosity grew.

UPDATE: Unity's Josh Peterson pointed out to me that they do already have existing documentation on the new attribute on the Unity forums (he also covers the divide-by-zero check there, which I don't). Information on these options will show up soon in the Unity manual (in the Scripting section) soon.

Back in the Spring I was working on getting a project running with IL2CPP and I had to do my own digging into the IL2CPP.exe's guts and the Editor's pipeline. I remembered there being options it would set on the IL2CPP.exe command line like "--emit-null-checks" (it and the other options are not publicly documented either, since they're internally determined by the project settings), and figured I'd throw the .exe back into ILSpy again and see how it resolved this mysterious C# attribute.

So as I'm digging into the Unity program folder for the .exe, what do I immediately find? A lone C# file called Il2CppSetOptionAttribute.cs. Turns out the oil was closer to the surface than I thought. If you have the patches mentioned above or later, the path to this C# file should be: %UNITY_INSTALL_DIR%/Editor/Data/il2cpp/Il2CppSetOptionAttribute.cs.

I imagine that one day this attribute will become a more formal construct (maybe getting added to a default assembly like UnityEngine.dll) as IL2CPP matures and surpasses Mono as the primary runtime. I know Unity has been really pushing faster release cycles (kudos to them!), so documentation and mainline integration may take a back seat on such fringe tools.

However, I suppose I probably shouldn't stop here. Why not take a look at how these tools change the C++ code that was generated from IL-ized C#?

Using the Attribute

Using Unity 5.2.3p1 I created a blank project and added Il2CppSetOptionAttribute.cs to it so I could test drive the attribute. I changed the target platform to iOS and kept its stock build settings. From there I added the code found in [Listing1] and built the Xcode project. [Listing2] shows the IL which Mono generated and [Listing3] contains the C++ code that IL2CPP output (I removed uninteresting code, like the method initialization prologue).

Il2CppSetOption_ChecksEnabled is a method with some basic operations that mimic typical C# code.

It allocates an int[] with one element (to test NullChecks and ArrayBoundChecks later)
Multiplies the 1st (and only) element by one (no-op)
Returns the non-existant 2nd element

Il2CppSetOption_ChecksDisabled is the same exact code, but with Il2CppSetOption attributes applied so that NullChecks and ArrayBoundChecks are excluded from its C++.

Here's the C++ code, with comments mapping instructions back to the source C# and notes for which instructions are removed with Il2CppSetOption:

// System.Int32 SharpCpp.SharpCode::Il2CppSetOption_ChecksEnabled()
extern "C" int32_t SharpCode_Il2CppSetOption_ChecksEnabled_m_1098767581_0 (Object_t * __this /* static, unused */, const MethodInfo* method)
{
 // essentially creating a "int[]" pointer here
 Int32U5BU5D_t1872284309_0* V_0 = {0};
 {
  //var array = new int[1];
  V_0 = ((Int32U5BU5D_t1872284309_0*)SZArrayNew(Int32U5BU5D_t1872284309_0_il2cpp_TypeInfo_var, (uint32_t)1));

  Int32U5BU5D_t1872284309_0* L_0 = V_0;
  NullCheck(L_0); // NOTE: Removed with Il2CppSetOption
  IL2CPP_ARRAY_BOUNDS_CHECK(L_0, 0); // NOTE: Removed with Il2CppSetOption
  //address of array[0]
  int32_t* L_1 = ((int32_t*)(int32_t*)SZArrayLdElema(L_0, 0, sizeof(int32_t)));
  //array[0] *= 1;
  *((int32_t*)(L_1)) = (int32_t)((int32_t)((int32_t)(*((int32_t*)L_1))*(int32_t)1));

  //return array[2];
  Int32U5BU5D_t1872284309_0* L_2 = V_0;
  NullCheck(L_2); // NOTE: Removed with Il2CppSetOption
  IL2CPP_ARRAY_BOUNDS_CHECK(L_2, 2); // NOTE: Removed with Il2CppSetOption
  int32_t L_3 = 2;
  return (*(int32_t*)(int32_t*)SZArrayLdElema(L_2, L_3, sizeof(int32_t)));
 }
}

Looking at the code we can see that it's creating a lot of temporary variables to hold values which don't mutate. The Common Intermediate Language is stack based, so all results are pushed/popped to and from the stack (instead of choice registers like in x86 assembly). So when Unity performs IL->CPP transformations they end up tracing the stack operations modeled by the instructions to figure out what variables they need, even if they are really just aliases. A modern C++ compiler should be able to catch these needless copies and collapse them to the bare minimum when compiling to native assembly (as least in Release configurations). This means it becomes a non-problem for the IL2CPP team, while somewhat of a eye sore for readers of the codegen :-).

The lines with "NOTE: Removed with Il2CppSetOption" are absent from the Il2CppSetOption_ChecksDisabled method. There's only two constructs which are removed (NullCheck and IL2CPP_ARRAY_BOUNDS_CHECK), but due to the variable aliasing we end up with repeated (redundant) NullChecks.

Refer to [Listing4] to investigate these constructs and determine what their impact is..

Nothing too crazy is going on here. For the general case all these checks are fine (and I imagine equate to the same number of times the checks are ran in the Mono runtime). However, you can imagine how verbose these can become when you're writing a tight loop that performs many reads and\or writes from an array. If you performed loop unrolling yourself, these checks would explode your loop's body's logic! Don't believe me? Just look at Il2CppSetOption_ChecksEnabledMegaLogic's C++!

If you have a low-level background, I bet you're already picturing the many, many possible branches these managed checks resolve to. However, since the branches are for error cases, they shouldn't be hit except in abnormal conditions so the branch predictor (found on modern CPUs, including the ARM in the iPhone 4s) won't fail during normal processing. But this leads to another factor: code size. The function requires more instructions to represent whatever branch-ful assembly the compiler comes up with. Meaning you can fit less meaningful instructions in the processor's cache.

Some of you may be scoffing at why C# code should even need to worry about this, that if the process needs to be that performant then one should just do it all in C++! Sure, you could do that...but don't forget to account for the extra steps it takes to maintain a native library which needs to be hooked up to run on your target platforms and within the Unity editor. One of the awesome things about Unity is being able to write-one and deploy-to-many platforms, so the goal is to not venture out of this bubble if you can afford to.

However, maybe you already have a pre-existing 'native core' at your studio to where this link problem is already solved, or maintained by Mr Not-You down the hall. We have to keep in mind that for smaller studios or developers wanting to get their plugins up on the Unity Asset Store, this is not an option or a very unattractive one (especially to the consumers of said plugins).

Final Thoughts

Should the need ever arise, you now know how to disable these managed checks yourself in your own project. Hopefully this article has motivated you to investigate how your C# code is being transformed into C++ code with IL2CPP. You may find that you're producing non-optimal IL and in turn C++. Knowledge is power after all. Of course, with great power comes great responsibility, so always be sure to measure (profile) and if you're already this low-level, check the C++ compiler's output too.

You could even try building your project with Visual C# and seeing how the IL may differs from the very old Mono compiler that comes Unity. As a reward for reading this far, you can check out [Listing5] for the IL generated by Visual C# 2013 in Release mode and [Listing6] for the resulting C++ code. You can see are are some curious differences (the starting NOP I believe is for IL code alignment...not sure about the seemingly pointless GOTO).

Also, if you haven't already, I suggest you check out Unity's own IL2CPP Internals blog series. I have some more IL2CPP things in the blog pipe and willing be using the series as reference point.