-
-
Notifications
You must be signed in to change notification settings - Fork 64
Optimize last_modified tracking #950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
41ca27e to
d19369b
Compare
518f336 to
7643b68
Compare
|
Ready to merge on my end! |
7643b68 to
fd217e5
Compare
almarklein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epic work π π
|
Great! π I am a firm believer in the approach of "traversing the scene graph before rendering and actively updating the scene's Transform Matrix." π |
I agree, I think that's where we are now π |
Nice to hear that, π I have always had an idea: During each frame's render (in the renderer's render() method), when we traverse the scene graph to update the world matrices of nodes, is there a way to completely bypass the finer-grained propagation mechanism of RecursiveTransform based on flag_update()? In this case, we only need to simply start from the root node (the scene object) and recursively update the world matrix for each child node. You might think that for static scenes, updating the world matrix here is redundant (Unnecessary 4x4 matrix multiplication). However, for nearly any dynamic scene, this approach is undoubtedly faster than the one that requires tracking the update propagation state of the world transform. This method incurs almost no additional overheadβit's just matrix multiplication. Furthermore, since the update starts from the root node and proceeds downward, the number of matrix multiplications is fixed. For each node in the scene, matrix multiplication occurs only once, and 4x4 matrix multiplication is extremely fast. I even suspect that the cost of performing a single 4x4 matrix multiplication might be smaller than the extra performance overhead caused by RecursiveTransform automatically tracking "whether the world matrix needs to be updated." If thatβs the case, even for static scenes, where the world matrix doesnβt need updating, the act of "checking whether the world matrix needs to be updated" might itself introduce enough performance overhead to counterbalance the cost of matrix multiplication. If thatβs the case, itβs like saying, "If we need to compute c = a + b, we only need to perform the addition to get the value of c, without needing to track whether a and b have changed, nor determining whether we need to recompute c." I will try to do some testing and verification when I have time. |
In addition, if the user determines that it is a static scene (or if the user wants to fully control the update timing of the Transform matrix), we can also provide an option to disable automatic updates of the world matrix. |
|
I respectfully disagree. In this PR, the overhead of flag_update has been eliminated entirely. Flag_update does not propagate anymore. It's now working exactly as you describe, only when a frame is rendered are world matrices computed by traversing the scene graph. And I have one more PR to go to make it even more efficient. |
Yes, Iβm aware of that. However, right now, when traversing the scene graph before rendering, we first need to determine whether the world matrix of each object needs to be updated, and only update those that actually need it. The problem is, checking whether the world matrix needs updating isnβt as simple or lightweight as it seems (because you have to check if the transformations of all its ancestor nodes have changed). I suspect that the cost of this check might not be smaller than the overhead of simply performing a single 4x4 matrix multiplication. So, my thought is that maybe itβs better not to worry about this (whether the world matrix need to be updated ) too much, and just update from the root node downβit might be simpler and more efficient. |
|
I see. Well, I think that you have predicted my next move almost exactly, so I'll ask you to wait for the next PR to come. I will also measure the difference with your proposal! |
|
Maybe I'm stating the obvious here, but IIUC the plan is that during the scene-graph traversal, you check the flag (without any propagation) and when it is dirty, you update the matrix, and from that of all its children (and their children etc). So you kind of have the best of both worlds. |
That's the plan. The tricky bit is working around the cache decorator in this scenario. But it's not impossible. :) |
π Another massive performance boost! The skinning animation example runs at 180 fps on my machine now.
on_updateis removed from the Transform APIlast_modifiedhas its own specialized implementation inAffineTransform,RecursiveTransformandCamera, to be as simple as it can be in each caseflag_updateno longer immediately propagates, instead propagation happens lazily when a cache attribute is accessed inRecursiveTransform.last_modifiedandCamera.last_modified(and not inAffineTransform)cachedecorator only accesseslast_modifiedonce instead of three times_world_last_modifiedflag which is used to track if the buffers need an update when a frame is renderedβοΈ Next up: I am working on another improvement to eliminate redundant propagation of last_modified, but I will do that in a separate pull request.
π₯³ Cool detail: Some pretty cool news, with this PR, finally, the Transform API is no longer at the top of the cProfile report!
So
RecursiveTransform.last_modifiedis called ~460k times in one full animation run, and 4m times if you include recursive calls! π± It's definitely worthwhile to keep optimizing this, I think π it may be pygfx' hottest codepath!