Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Korijn
Copy link
Collaborator

@Korijn Korijn commented Jan 22, 2025

πŸš€ Another massive performance boost! The skinning animation example runs at 180 fps on my machine now.

  • Callback mechanism and weakrefs are completely removed from the Transform API, of course this implies refactoring in a few areas, notably lighting, but actually it is a big improvement because now light buffers are only updated just before rendering, instead of on every transform update
    • Note: I didn't look much further into the lights than needed to make the tests pass
    • Note 2: ⚠️ This is technically a breaking change since on_update is removed from the Transform API
  • last_modified has its own specialized implementation in AffineTransform, RecursiveTransform and Camera, to be as simple as it can be in each case
  • flag_update no longer immediately propagates, instead propagation happens lazily when a cache attribute is accessed in RecursiveTransform.last_modified and Camera.last_modified (and not in AffineTransform)
  • The cache decorator only accesses last_modified once instead of three times
  • WorldObjects have a _world_last_modified flag which is used to track if the buffers need an update when a frame is rendered

⏭️ Next up: I am working on another improvement to eliminate redundant propagation of last_modified, but I will do that in a separate pull request.

πŸ₯³ Cool detail: Some pretty cool news, with this PR, finally, the Transform API is no longer at the top of the cProfile report!

image

ncalls: Total number of calls to the function. If there are two numbers, that means the function recursed and the first is the total number of calls and the second is the number of primitive (non-recursive) calls.

So RecursiveTransform.last_modified is called ~460k times in one full animation run, and 4m times if you include recursive calls! 😱 It's definitely worthwhile to keep optimizing this, I think 😁 it may be pygfx' hottest codepath!

@Korijn Korijn requested a review from almarklein January 23, 2025 22:07
@Korijn Korijn force-pushed the transform-profiling branch from 41ca27e to d19369b Compare January 23, 2025 22:08
@Korijn Korijn marked this pull request as ready for review January 23, 2025 22:10
@Korijn Korijn force-pushed the transform-profiling branch from 518f336 to 7643b68 Compare January 23, 2025 22:21
@Korijn
Copy link
Collaborator Author

Korijn commented Jan 23, 2025

Ready to merge on my end!

@Korijn Korijn force-pushed the transform-profiling branch from 7643b68 to fd217e5 Compare January 24, 2025 08:41
Copy link
Member

@almarklein almarklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic work πŸ˜„ πŸš€

@Korijn Korijn merged commit 3f3b45d into main Jan 24, 2025
14 checks passed
@Korijn Korijn deleted the transform-profiling branch January 24, 2025 08:59
@panxinmiao
Copy link
Contributor

Great! πŸš€

I am a firm believer in the approach of "traversing the scene graph before rendering and actively updating the scene's Transform Matrix." πŸ˜„

@Korijn
Copy link
Collaborator Author

Korijn commented Jan 24, 2025

Great! πŸš€

I am a firm believer in the approach of "traversing the scene graph before rendering and actively updating the scene's Transform Matrix." πŸ˜„

I agree, I think that's where we are now 😁

@panxinmiao
Copy link
Contributor

panxinmiao commented Jan 25, 2025

⏭️ Next up: I am working on another improvement to eliminate redundant propagation of last_modified, but I will do that in a separate pull request.

Nice to hear that, πŸ˜„

I have always had an idea:

During each frame's render (in the renderer's render() method), when we traverse the scene graph to update the world matrices of nodes, is there a way to completely bypass the finer-grained propagation mechanism of RecursiveTransform based on flag_update()?

In this case, we only need to simply start from the root node (the scene object) and recursively update the world matrix for each child node.

You might think that for static scenes, updating the world matrix here is redundant (Unnecessary 4x4 matrix multiplication). However, for nearly any dynamic scene, this approach is undoubtedly faster than the one that requires tracking the update propagation state of the world transform.

This method incurs almost no additional overheadβ€”it's just matrix multiplication. Furthermore, since the update starts from the root node and proceeds downward, the number of matrix multiplications is fixed. For each node in the scene, matrix multiplication occurs only once, and 4x4 matrix multiplication is extremely fast.

I even suspect that the cost of performing a single 4x4 matrix multiplication might be smaller than the extra performance overhead caused by RecursiveTransform automatically tracking "whether the world matrix needs to be updated." If that’s the case, even for static scenes, where the world matrix doesn’t need updating, the act of "checking whether the world matrix needs to be updated" might itself introduce enough performance overhead to counterbalance the cost of matrix multiplication.

If that’s the case, it’s like saying, "If we need to compute c = a + b, we only need to perform the addition to get the value of c, without needing to track whether a and b have changed, nor determining whether we need to recompute c."

I will try to do some testing and verification when I have time.

@panxinmiao
Copy link
Contributor

You might think that for static scenes, updating the world matrix here is redundant (Unnecessary 4x4 matrix multiplication). However, for nearly any dynamic scene, this approach is undoubtedly faster than the one that requires tracking the update propagation state of the world transform.

In addition, if the user determines that it is a static scene (or if the user wants to fully control the update timing of the Transform matrix), we can also provide an option to disable automatic updates of the world matrix.

@Korijn
Copy link
Collaborator Author

Korijn commented Jan 25, 2025

I respectfully disagree. In this PR, the overhead of flag_update has been eliminated entirely. Flag_update does not propagate anymore.

It's now working exactly as you describe, only when a frame is rendered are world matrices computed by traversing the scene graph.

And I have one more PR to go to make it even more efficient.

@panxinmiao
Copy link
Contributor

panxinmiao commented Jan 25, 2025

It's now working exactly as you describe, only when a frame is rendered are world matrices computed by traversing the scene graph.

Yes, I’m aware of that.

However, right now, when traversing the scene graph before rendering, we first need to determine whether the world matrix of each object needs to be updated, and only update those that actually need it. The problem is, checking whether the world matrix needs updating isn’t as simple or lightweight as it seems (because you have to check if the transformations of all its ancestor nodes have changed). I suspect that the cost of this check might not be smaller than the overhead of simply performing a single 4x4 matrix multiplication.

So, my thought is that maybe it’s better not to worry about this (whether the world matrix need to be updated ) too much, and just update from the root node downβ€”it might be simpler and more efficient.

@Korijn
Copy link
Collaborator Author

Korijn commented Jan 25, 2025

I see. Well, I think that you have predicted my next move almost exactly, so I'll ask you to wait for the next PR to come.

I will also measure the difference with your proposal!

@almarklein
Copy link
Member

Maybe I'm stating the obvious here, but IIUC the plan is that during the scene-graph traversal, you check the flag (without any propagation) and when it is dirty, you update the matrix, and from that of all its children (and their children etc). So you kind of have the best of both worlds.

@Korijn
Copy link
Collaborator Author

Korijn commented Jan 25, 2025

Maybe I'm stating the obvious here, but IIUC the plan is that during the scene-graph traversal, you check the flag (without any propagation) and when it is dirty, you update the matrix, and from that of all its children (and their children etc). So you kind of have the best of both worlds.

That's the plan.

The tricky bit is working around the cache decorator in this scenario. But it's not impossible. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants