Epic Flash memory leak track down

February 25, 2012

Have you ever had a memory instantiation problem that was impossible to track down? Here is a post that might help you with that kind of thing!

Beware! This post is very long… but VERY instructive! If you want to learn some internal mechanism of Flash, I strongly suggest you read it from top to bottom without skipping parts!:)

After finding what it was, we can conclude that this is not a real leak but a irritating behaviour of the Flash VM. Still, you should all be aware of it!

The Context

Yesterday I was helping Luca (creator of Nape physic engine, made with haXe) to find what seemed to be a big memory leak. There was a couple framework used so it could be directly from Nape… or from Starling… or from debug tool running… or from a port error from haXe…

So I though it was going to be simple

I got TheMiner running and started profiling. It was pretty easy to find a LOT of memory allocation coming from Starling. The SWC version available is over 4 months old and back then there was a lot of useless instantiation coming from TouchProcessor.as
If you use Starling, I suggest you build from sources as these instantiation are gone now.
There are a couple left but no big deal.

We also saw that Nape was using a LOT of anonymous function call and some try catch. This create what’s called activation-object. And there was a lot of these. So by removing anon-calls and try catch we were able to remove a lot of instanciation.

And After removing the debugging tools, It was clear it was coming from haXe or Nape.

But even when recording all samples allocated by the VM I could not find the damn allocation. (we are talking of more than 1Mo per sec)

Then I wrote a Scala script to try to identify samples that could have been missed, but that didn’t help me much since that was only an override of constructor and constructorProp avm opcodes.

Simplifying the context

So I decided to look a bit further into the bytecode and found something very interesting.

After simplifying the problem multiple time, I got this very simple code to demonstrate the bug.
Because Nape has that very nice abstraction layer, it can support both standard DisplayObject as well as Starling 3D DisplayObject.

The leak happens when Nape try to set the DisplayObject properties like .x, .y, .rotation, etc.
Let’s create a fake DisplayObject class and name it DO

public class DO
{
	public var x:Number;
	public var y:Number;

	public function DO() { }
}

And now, a simple loop that set .x and .y values on that DisplayObject, and on a UnTyped object representing the abstraction layer.

public class MainBug extends Sprite {
	public var mAnonDO:*;
	public var mDO:DO;
	public function MainBug():void {
			mDO = new DO();
			mAnonDO = mDO;
			addEventListener(Event.ENTER_FRAME, update);
		}
		public function update(ev:Event):void {
			var j:Number = 234.3948;
			for (var i:int = 0; i < 1000; i++) {
			mDO.x     = j; // <-- memory is flat
			mAnonDO.y = j; // <-- causes memory to explode
		}
	}
}

So basicly we have the same DO Object in memory, and we access it via a Typed or an UnTyped variable.
When accessing it with the Typed on, nothing happens, The memory is not impacted at all.
But when accessing through the Anonymous Object, BAM, 1 Mo/sec. wtf?

The ByteCode:

So let’s take a look at the bytecode generated for this loop in the update() function:

while(int1 < 1000)
{
	    #16   label
	
	    #18   getlocal0
	    #19   getproperty  mDO
	    #20   getlocal2
	    #21   setproperty  x
	this.mDO.x = number1;
	
	    #23   getlocal0
	    #24   getproperty  mAnonDO
	    #25   getlocal2
	    #26   setproperty  y
	this.mAnonDO.y = number1;
	
	    #28   getlocal3
	    #29   increment_i
	    #30   convert_i
	    #31   setlocal3
	int1 = int(int1) + 1;
}

I don’t know if you see any difference between the Anonymous function and the strickly typed one, but for me they look pretty much the same! getlocal0, getProp, getlocal2, setProp.

Again, when we comment out “this.mDO.x = number1;“, there is a lot of memory allocation, and when we comment out the other one “this.mAnonDO.y = number1;” there is zero allocation.

A strange behaviour

So if it’s not in the opcodes… it’s deeper! For those who have read the AVM2 Architecture documentation, you might remember that after the opcodes there are still a big phase to go through: The JIT, including Intermediate representation, and Machine Code Assembly.

Intermediate representation (MIR)

As demonstrated in the graph, the intermediate representation is part of the JIT.
And it looks pretty much like this:

When the abc is ready to be processed, the JIT compiles it in the MIR, and then the result is process for your specific machine (Win32, MAX, Linux, etc.) in a Machine Dependant Code.

To find that damn memory leak, I had to output that MIR code.

Digging deeper (The MIR)!

So I decided to go back in time to an old post I made about tracing Assembly data generated by the JIT: Flash Assembler (not ByteCode) : And you thought it couldn’t get faster/

I don’t use this function a lot nowadays because it has been removed from all FlashPlayer build since Flash 10.1

But it is possible to output it in old version of the player!

Here is the output for the strictly typed part:


40:getlocal0
	@83 use   @16 [0]
		stack: MainBug?@83
		locals: MainBug?@83 flash.events::Event?@21 Number@60 int@67
41:getproperty mDO
	cse   @75
	@84 ucmp  @83 @75
	@85 jeq   @84 -> 0
	@86 ld    132(@83)
		stack: DO?@86
		locals: MainBug@83 flash.events::Event?@21 Number@60 int@67
43:getlocal2
	@87 fuse  @60 [2]
		stack: DO?@86 Number@87
		locals: MainBug@83 flash.events::Event?@21 Number@87 int@67
44:setproperty x
	cse   @75
	@88 ucmp  @86 @75
	@89 jeq   @88 -> 0
	@90 st    24(@86) <- @87
		stack:
		locals: MainBug@83 flash.events::Event?@21 Number@87 int@67

And here is the output for the not strictly typed part:


48:getlocal0
	 @94 use   @16 [0]
		 stack: MainBug@94
		 locals: MainBug@94 flash.events::Event?@21 Number@60 int@67
 49:getproperty mAnonDO
	 @95 ld    128(@94)
		 stack: *@95
		 locals: MainBug@94 flash.events::Event?@21 Number@60 int@67
 51:getlocal2
	 @96 fuse  @60 [2]
		 stack: *@95 Number@96
		 locals: MainBug@94 flash.events::Event?@21 Number@96 int@67
 52:setproperty {MainBug,public,MainBug.as$29}::y
	 save state
	 @97 def   @95
	 @98 fdef  @96
	 @99 cm    MethodEnv::nullcheck (@3, @95)
	 set code context
	 @101 st    43382752(0) <- @3
	 set dxns addr
	 @102 ldop  0(@3)
	 @103 ldop  20(@102)
	 @104 lea   4(@103)
	 @105 st    43382076(0) <- @104
	 @106 fuse  @98 [6]
	 cse   @78
	 @107 cmop  AvmCore::doubleToAtom_sse2 (@78, @106)
	 init multiname
	 @109 alloc 16
	 @110 imm   16
	 @111 st    0(@109) <- @110
	 @112 imm   43427168
	 @113 st    4(@109) <- @112
	 @114 imm   44102088
	 @115 st    8(@109) <- @114
	 cse   @102
	 @116 ldop  8(@102)
	 @117 use   @97 [5]
	 @118 cmop  Toplevel::toVTable (@116, @117)
	 cse   @102
	 cse   @116
	 @120 lea   0(@109)
	 save state
	 @121 cm    Toplevel::setproperty (@116, @117, @120, @107, @118)
		 stack:
		 locals: MainBug@16 flash.events::Event?@21 Number@60 int@67

Whoa!!
What the hell is that!

It’s the same setproperty opcode, but with a LOT of things underneath…

The difference is really that one variable is strictly typed, and the other not.

We can see it on the Stack:

stack: DO?@86 Number@87

stack: *@95 Number@96

No need to say that just by looking at the number of instructions, you KNOW it will be a lot slower.
But not only that.. you can see type validation, memory allocation ( @109 alloc 16 )
and even access to the VTable: @118 cmop Toplevel::toVTable (@116, @117)

So I though *bingo*, there is our allocation!

sight… It’s not over yet!

I knew that the memory growing was not a problem when using Int instead of Number.
So I tried to output the same MIR code for the Int test:

43:getlocal2
	@88 use   @62 [2]
		stack: *@87 int@88
		locals: Main@84 flash.events::Event?@21 int@88 int@69
44:setproperty {Main,.events:EventDispatcher}::x
	save state
	@89 def   @87
	@90 def   @88
	@91 cm    MethodEnv::nullcheck (@3, @87)
	set code context
	@93 st    39909344(0) &lt;- @3
	set dxns addr
	@94 ldop  0(@3)
	@95 ldop  20(@94)
	@96 lea   4(@95)
	@97 st    39908668(0) &lt;- @96
	@98 use   @90 [6]
	cse   @79
	@99 cmop  AvmCore::intToAtom (@79, @98)
	init multiname
	@101 alloc 16
	@102 imm   16
	@103 st    0(@101) &lt;- @102
	@104 imm   39953728
	@105 st    4(@101) &lt;- @104
	@106 imm   40632776
	@107 st    8(@101) &lt;- @106
	cse   @94
	@108 ldop  8(@94)
	@109 use   @89 [5]
	@110 cmop  Toplevel::toVTable (@108, @109)
	cse   @94
	cse   @108
	@112 lea   0(@101)

Again…. it looks pretty much the same 😦

..So it’s not the Alloc from the MIR code that create the memory.

What difference is there left? Well on one side you have:

@99 cmop  AvmCore::intToAtom (@79, @98)

and on the other side you have:

@107 cmop  AvmCore::doubleToAtom_sse2 (@78, @106)

Could it be directly from Tamarin? Let’s open some code!

Digging even Deeper (Tamarin Core)!

Let’s start with intToAtom:

Atom AvmCore::intToAtom(int n)
{
	// handle integer values w/out allocation
	int i29 = n &lt;&lt; 3;
	if ((i29&gt;&gt;3) == n)
	{
		return uint32(i29 | kIntegerType);;
	}
	else
	{
		return allocDouble(n);
	}
}

Interesting… I always like the feeling of viewing Alloc in a line of code.

So I remembered that Int atom was encoded with only 29 bits for the value. I changed my test and instead of

var j:int = 234;

I used

var j:int = int.MAX_VALUE;

and BAM AGAIN!
memory growing like crazy.

Ok so I guess we are getting a lot closer to the source now.

Here is the doubleToAtom_sse2 function:

doubleToAtom_sse2 is defined in the AvmCore.cpp class from Tamarin

Atom AvmCore::doubleToAtom_sse2(double n)
{
	 int id3;
	 _asm {
			 movsd xmm0,n
			 cvttsd2si ecx,xmm0
			 shl ecx,3               // id<<3
			 mov eax,ecx
			 sar ecx,3               // id>>3
			 cvtsi2sd xmm1,ecx
			 ucomisd xmm0,xmm1
			 jne d2a_alloc   // < or >
			 jp  d2a_alloc   // unordered
			 mov id3,eax
	}

	if (id3 != 0 || !MathUtils::isNegZero(n))
	{
		 return id3 | kIntegerType;
	}
	else
	{
		 _asm d2a_alloc:
		 return allocDouble(n);
	}
}

Again, you can see the allocDouble in there.
But you can also see the return integer Atom part!
Again I forgot about that! dynamic type switching from int to number.. but also from number to int!

I re-wrote the test to use

var j:Number = 234.3948;

and used

var j:Number = 234;

And… here we go again… zero instantiation!

I guess the only thing left to look at is the allocDouble() function right?

Here it is:

Atom allocDouble(double n)
{
	double *ptr = (double*)GetGC()-&gt;Alloc(sizeof(double), 0);
	*ptr = n;
	return kDoubleType | (uintptr)ptr;
}

Oh my! look at that! A call to allocate the memory for a double!

Conclusion

At runtime, when you use anonymous object, the JIT have no idea of what type it is, and when you need to set the property on one of these object, it need to validate the objects, and instantiate a new Atom to copy the values.
It’s not the case when you use strictly typed object because when it’s ready to set the value, it just does!

When putting all the elements together you get this ridiculous code:

var anon:*;
for (var i:int = 0; i &lt; 100000; i++) {
	anon = 3.3;
}

This will allocate memory like crazy. (100000 * sizeof(double) )

If you think this is a bug, vote for it

Solutions?

I didn’t find any great way to solve this problem except from using strictly typed object.

Going back to Nape, it means that it cannot use the abstraction layer like it was designed.
The only way is to expose a generic management for classic displayobject, and overridable functions to give the possibility to an external framework (like Starling) to do updates of properties in a strictly typed way.

References:

Frameworks:
http://starling-framework.org/
https://github.com/PrimaryFeather/Starling-Framework/blob/master/starling/src/starling/events/TouchProcessor.as
http://code.google.com/p/nape/
Tool:
http://code.google.com/p/apparat/
http://www.sociodox.com/theminer/
Docs:
http://www.adobe.com/content/dam/Adobe/en/devnet/actionscript/articles/avm2overview.pdf
http://onflex.org/ACDS/AS3TuningInsideAVM2JIT.pdf
http://zenit.senecac.on.ca/wiki/dxr/source.cgi/mozilla/js/tamarin/core/AvmCore.cpp
http://zenit.senecac.on.ca/wiki/dxr/source.cgi/mozilla/js/tamarin/core/AvmCore.h

From → actionscript, flash, general, optimization

27 Comments

focus permalink

Awesome post! I’m very glad my work on one of my projects led to finding this bug out! Thank you for helping us with hunting it, I think we wouldn’t not track it so deep without your help.

It’s interesting that this bug hunting have a little bit more complicated history – Body.graphic became untyped (*) because of my ask for this to Luca to provide Starling support in Nape, and now this untyped variable was the cause to start all this investigation, haha =)

Thanks once again for joining us and looking for it so deep, you rock!)
jpauclair permalink

Oh focus I forgot to include your name in there.
And I also forgot your early test on Starling with Nape!!
Sorry for that.

Thanks for the context clarification 😉
- focus permalink
  
  Haha, it’s OK, thank you! It’s more important you found a source of this bug and it is now in Adobe bugbase, thanks to you and Luca!
heitara permalink

Good catch ! Well done.
Damian permalink

Excellent work JP! I’ll have to take a look at where I use untyped variables now 😀
Philippe permalink

I guess this means all those tween engines, dynamically tweening properties, suffer from this problem too.

For Nape could we imagine to provide an “update” function which would then explicitly update DisplayObjects or Starling DisplayObjects?
- focus permalink
  
  Hey, Philippe! Nape already has updateGraphic callback – we discussed that with Luca tomorrow – there is one path to avoid this bug and keep Nape compatible with both native and Starling DL – kust to cast internally Body.graphic to the native Flash DisplayObject and move\rotate it, and don’t cast\move\rotate it if updateGraphic is set, assuming user will cast it to needed type and move\rotate it himself right inside the updateGraphic callback function.
- focus permalink
  
  Tweening engines will not suffer from it if they cast that untyped object internally to the needed type. Should check it in sources!)
  - Philippe permalink
    
    Well precisely tween engines don’t usually do that 🙂
    
    Maybe BetweenAS3 does that, which would explain why it has such a good memory management but a large number of classes.
Nicolas permalink

This is what you call “boxing” in VM design : because it’s not strictly typed, you need to know from its sole memory address/content what is the actual type. So you need to allocate a “memory box” (hence boxing) that will both store the value and help you know its actual type.

Most of the dynamic VMs perform boxing for most of the types. For instance the NekoVM uses 31-bits unboxed integers (last bit always 1 to differentiate between actual pointers and int values), and all floats are boxed using 4 bytes of header (store the type) + 8 bytes of data (the actual double).
Nicolas permalink

PS : it’s not an actual memory leak, but higher memory allocation will make the GC collect less often (if it has a dynamic threshold) which leads to more global memory usage
- jpauclair permalink
  
  I know. But to be very honest, “Leak” drive a lot more ppl to read the post.
  Désolé!
  - itamt permalink
    
    oh…
- jpauclair permalink
  
  I corrected the statement in the introduction. You are right that I should not mislead ppl on what it is in reality.
Jackson Dunstan permalink

Wow, great post! This should serve as yet-another reason to use strictly-typed variables!
Rahul Kamra permalink

Hey ,Awesome Post , i was wondering which flash player version you used . I tried to reproduce the leak but couldn’t .
PrimaryFeather permalink

Very interesting article, thanks for hunting this down! Wow, I didn’t even know that it’s possible to debug that deeply into the Flash runtime! 😉

Hm … unfortunately I can’t think on anything on the Starling side that would allow us to circumvent this. We’d need some kind of interface that is available to both libraries, and I don’t think there is something like that.

Any other ideas?
- ncannasse permalink
  
  you could create a specialized adapter for each class that would hold an instance of it and forward changes on x/y properties.
- focus permalink
  
  Luca already solved it internally in Nape, if you’ll set the Body.graphic and Body.graphicUpdate will be null, Nape will cast Body.graphic to the native DisplayObject and move\rotate it, but if graphicUpdate will be not null, Nape will not do with Body.graphic anything giving a dev to make it himself in the graphicUpdate callback. I think it’s pretty easy to use it now with Starling – almost as before, only thing you should consider now – is to cast Body.graphic to the Starling DisplayObject before moving\rotating it to avoid this memory exploding.
  - PrimaryFeather permalink
    
    Aha, that sounds just perfect! You’re right, that’s a great solution — thanks, Luca!
yonatan permalink

Very interesting read. You might want to talk to tamarin’s devs about this, they have an irc channel (irc://irc.mozilla.org/tamarin) and a mailing list. It’s usually more productive than going through adobe’s bugbase.
AllenShen permalink

This article is really amazing!!! i have been an as3er for more than 2 years and i always know that it’s not good to use non-strictly typed object, but i am never aware of the consequence. I am working on the memory leak problem of our project and i do think this post will help a lot.
Thanks again!
Manoel Neto permalink

I can confirm the same happens if you try to write to a property dynamically. The following code causes the memory graph to go haywire:

var foo : Rectangle = new Rectangle();
for (var i:int = 0; i < 100000; i++) {
foo["x"] = 3.3;
}

While this code does zero allocations:

var foo : Rectangle = new Rectangle();
for (var i:int = 0; i < 100000; i++) {
foo.x = 3.3;
}

This is a massive problem for me, since I developed an observer/inspector-based interface framework and use lots of dynamic property accesses to read from observed objects and write into UI components.
liangbowen permalink

Damn! I’m going to search all “:*” in my project.

Trackbacks & Pingbacks

Comments are closed.

Epic Flash memory leak track down

The Context

So I though it was going to be simple

Simplifying the context

The ByteCode:

A strange behaviour

Intermediate representation (MIR)

Digging deeper (The MIR)!

sight… It’s not over yet!

Digging even Deeper (Tamarin Core)!

Conclusion

Solutions?

References:

Trackbacks & Pingbacks

Search

Top Posts

Page view

Cloud

Blogroll

My Stuff

RSS

Email Subscription

Meta

Epic Flash memory leak track down

The Context

So I though it was going to be simple

Simplifying the context

The ByteCode:

A strange behaviour

Intermediate representation (MIR)

Digging deeper (The MIR)!

*sight*… It’s not over yet!

Digging even Deeper (Tamarin Core)!

Conclusion

Solutions?

References:

Sharing is good!

Related

Trackbacks & Pingbacks

Search

Top Posts

Page view

Cloud

Blogroll

My Stuff

RSS

Email Subscription

Meta

sight… It’s not over yet!