Properly implement Ueda and Yamashita's regularized Newton method #136

Merged
glen merged 2 commits from Vectornaut/dyna3:grad-norm-reg into main 2026-02-11 15:48:24 +00:00
Member

Goals

Primary

This pull request addresses issue #130.

Secondary

This pull request also:

  • Switches to regularizing the Hessian after we project out the frozen variables, instead of before. This is probably what we should've been doing all along.
  • Clarifies the comment about our handling of frozen entries. When we restrict the computation of the Newton step to the subspace where the velocities of the frozen entries are zero, the outcome shouldn't depend on any implicit choice of complement for that subspace. The new comment points this out.

Notes

Uniform regularization of Newton's method depends on a choice of metric on the configuration space. On the branch to be merged, we keep using the Euclidean metric associated with the computational basis. That means this pull request doesn't address issue #131.

I did test some versions of the branch to be merged that use other metrics, including completions of the Gram partial metric described in issue #131. I found that this branch was among the best in terms of realization quality and number of steps before convergence! I also realized that the computational metric is more geometrically meaningful than I'd appreciated at first, as explained in this comment.

## Goals ### Primary This pull request addresses issue #130. ### Secondary This pull request also: - Switches to regularizing the Hessian after we project out the frozen variables, instead of before. This is probably what we should've been doing all along. - Clarifies the comment about our handling of frozen entries. When we restrict the computation of the Newton step to the subspace where the velocities of the frozen entries are zero, the outcome shouldn't depend on any implicit choice of complement for that subspace. The new comment points this out. ## Notes Uniform regularization of Newton's method depends on a choice of metric on the configuration space. On the branch to be merged, we keep using the Euclidean metric associated with the computational basis. That means this pull request doesn't address issue #131. I did test some versions of the branch to be merged that use other metrics, including completions of the Gram partial metric described in issue #131. I found that this branch was among the best in terms of realization quality and number of steps before convergence! I also realized that the computational metric is more geometrically meaningful than I'd appreciated at first, as explained in [this comment](issues/131#issuecomment-3637).
In the process, switch to regularizing the Hessian after we project out
the frozen variables, instead of before. This is probably what we
should've been doing all along.
Clarify comment on frozen velocity restriction
All checks were successful
/ test (pull_request) Successful in 3m42s
7193f13601
Note that the restriction doesn't depend on a choice of complement for
the subspace we're restricting to.
Owner

The new version adds a gradient term even if the existing Hessian has all strictly positive eigenvalues, whereas the previous one doesn't change the Hessian if all its eigenvalues are strictly positive. That's intentional and not harmful to convergence in the favorable case when the Hessian starts out truly positive definite all by itself?

The new version adds a gradient term even if the existing Hessian has all strictly positive eigenvalues, whereas the previous one doesn't change the Hessian if all its eigenvalues are strictly positive. That's intentional and not harmful to convergence in the favorable case when the Hessian starts out truly positive definite all by itself?
Author
Member

The new version adds a gradient term even if the existing Hessian has all strictly positive eigenvalues […]. That's intentional and not harmful to convergence in the favorable case when the Hessian starts out truly positive definite all by itself?

Yes, I think Ueda and Yamashita intended for this to happen, based on my reading of Section 2 of their paper. My guess is that it helps with stability when the Hessian is positive-definite but has an extremely small eigenvalue. (I can explain that intuition more if you'd like.)

> The new version adds a gradient term even if the existing Hessian has all strictly positive eigenvalues […]. That's intentional and not harmful to convergence in the favorable case when the Hessian starts out truly positive definite all by itself? Yes, I think Ueda and Yamashita intended for this to happen, based on my reading of Section 2 of their paper. My guess is that it helps with stability when the Hessian is positive-definite but has an extremely small eigenvalue. (I can explain that intuition more if you'd like.)
Owner

Yes I get that. I guess I was expecting that when the minimum eigenvalue exceeded some threshold, we'd drop the gradient term?

Yes I get that. I guess I was expecting that when the minimum eigenvalue exceeded some threshold, we'd drop the gradient term?
Author
Member

Yes I get that. I guess I was expecting that when the minimum eigenvalue exceeded some threshold, we'd drop the gradient term?

With our current design, I don't think it would make a difference, because I don't think the minimum eigenvalue of the Hessian can be greater than zero (up to numerical error). This is because the loss stays constant when the conformal group acts on the assembly.

I've confirmed that the engine's behavior doesn't noticeably change when we multiply the gradient norm term by a simple cutoff that drops linearly from 1 to 0 as the minimum eigenvalue goes from 1 to 2.

Since using a cutoff adds parameters, makes the code more complex, and deviates from Ueda and Yamashita's paper, I'm not keen on doing it unless it has a demonstrable advantage.

> Yes I get that. I guess I was expecting that when the minimum eigenvalue exceeded some threshold, we'd drop the gradient term? With our current design, I don't think it would make a difference, because I don't think the minimum eigenvalue of the Hessian can be greater than zero (up to numerical error). This is because the loss stays constant when the conformal group acts on the assembly. I've confirmed that the engine's behavior doesn't noticeably change when we multiply the gradient norm term by a simple cutoff that drops linearly from 1 to 0 as the minimum eigenvalue goes from 1 to 2. Since using a cutoff adds parameters, makes the code more complex, and deviates from Ueda and Yamashita's paper, I'm not keen on doing it unless it has a demonstrable advantage.
Owner

OK, good enough. I will do final review of this PR.

OK, good enough. I will do final review of this PR.
@ -465,0 +478,4 @@
-reg_scale * min_eigval.min(0.0)
+ GRAD_REG_SCALE * grad_norm_squared.powf(0.25)
) * DMatrix::identity(total_dim, total_dim);
history.hess_eigvals.push(hess_eigvals);
Owner

Please just read this over once and see if you find the computations of the terms and pushing to the history a bit intermixed. I'd be inclined to compute the hess_eigvals and push them to the history, then compute the most_neg_eigval as hess_eigvals.min().min(0.0), then compute the sqrt_grad_norm as neg_grad_stacked.norm_squared().powf(0.25), and finally the hess_reg as &hess + DMatrix::identity(total_dim, total_dim) * (-reg_scale * most_neg_eigval + GRAD_REG_SCALE * sqrt_grad_norm)). No change is necessary, only modify if you agree some rearrangement would be clearer, then let me know either way. Thanks!

Please just read this over once and see if you find the computations of the terms and pushing to the history a bit intermixed. I'd be inclined to compute the hess_eigvals and push them to the history, then compute the `most_neg_eigval` as `hess_eigvals.min().min(0.0)`, then compute the `sqrt_grad_norm` as `neg_grad_stacked.norm_squared().powf(0.25)`, and finally the `hess_reg` as `&hess + DMatrix::identity(total_dim, total_dim) * (-reg_scale * most_neg_eigval + GRAD_REG_SCALE * sqrt_grad_norm))`. No change is necessary, only modify if you agree some rearrangement would be clearer, then let me know either way. Thanks!
Author
Member

So far, I've been putting each history push at the end of the relevant paragraph (except for the paragraph that ends with if state.loss < tol { break; }). It'd be nice to switch to a more thoughtful organization at some point, but I'd save that for a general engine cleanup pull request.

Note that pushing the eigenvalue list to the history moves its ownership, so pushing the eigenvalue list before computing the minimum eigenvalue isn't quite as straightforward as just reordering the lines.

So far, I've been putting each history push at the end of the relevant paragraph (except for the paragraph that ends with `if state.loss < tol { break; }`). It'd be nice to switch to a more thoughtful organization at some point, but I'd save that for a general engine cleanup pull request. Note that pushing the eigenvalue list to the history [moves](https://doc.rust-lang.org/rust-by-example/scope/move.html) its ownership, so pushing the eigenvalue list before computing the minimum eigenvalue isn't quite as straightforward as just reordering the lines.
Owner

OK, that sounds like you'd prefer to leave this whole block of code as it is. I will go ahead and merge this PR assuming it runs alright for me.

pushing the eigenvalue list before computing the minimum eigenvalue isn't quite as straightforward as just reordering the lines.

This is a truly unfortunate aspect of rust... if we weren't well steeped in it now, I'd be inclined to veto it on this basis. Being able easily to do trivial-seeming transformations on code to create the clearest organization/easiest reading of the code outweighs some level of airtight safety/performance considerations for the type of project we're writing, I would say.

OK, that sounds like you'd prefer to leave this whole block of code as it is. I will go ahead and merge this PR assuming it runs alright for me. > pushing the eigenvalue list before computing the minimum eigenvalue isn't quite as straightforward as just reordering the lines. This is a truly unfortunate aspect of rust... if we weren't well steeped in it now, I'd be inclined to veto it on this basis. Being able easily to do trivial-seeming transformations on code to create the clearest organization/easiest reading of the code outweighs some level of airtight safety/performance considerations for the type of project we're writing, I would say.
Owner

Wow, the dodecahedron example works much better! I wasn't able to get any of the tangencies to "invert" except by going through configurations where one became a plane. (I was also able to show dyna3 off to Steve Trettel, who was impressed.)

Quick question: if you make all of the spheres tangent in the dodecahedron example, what's the expected number of curvatures you can set before the configuration becomes rigid? is it 3?

Anyhow, really nice work! Merging.

Wow, the dodecahedron example works much better! I wasn't able to get any of the tangencies to "invert" except by going through configurations where one became a plane. (I was also able to show dyna3 off to Steve Trettel, who was impressed.) Quick question: if you make all of the spheres tangent in the dodecahedron example, what's the expected number of curvatures you can set before the configuration becomes rigid? is it 3? Anyhow, really nice work! Merging.
glen merged commit d178cf8d6a into main 2026-02-11 15:48:24 +00:00
Author
Member

Quick question: if you make all of the spheres tangent in the dodecahedron example, what's the expected number of curvatures you can set before the configuration becomes rigid? is it 3?

Yes, it should be three, because that's the dimension of the Möbius group [edit: minus the dimension of the rotation subgroup].

Given the usefulness of this problem so far, I suppose I should add an entry on circle packings to the test problem list. That would be a good place to mention the expected number of degrees of freedom.

> Quick question: if you make all of the spheres tangent in the dodecahedron example, what's the expected number of curvatures you can set before the configuration becomes rigid? is it 3? Yes, it should be three, because that's the dimension of the Möbius group [**[edit:](https://code.studioinfinity.org/StudioInfinity/dyna3/pulls/136#issuecomment-3692)** minus the dimension of the rotation subgroup]. Given the usefulness of this problem so far, I suppose I should add an entry on circle packings to the [test problem list](https://code.studioinfinity.org/StudioInfinity/dyna3/wiki/Test-problems). That would be a good place to mention the expected number of degrees of freedom.
Owner

@Vectornaut wrote in #136 (comment):

Given the usefulness of this problem so far, I suppose I should add an entry on circle packings to the test problem list. That would be a good place to mention the expected number of degrees of freedom.

Yes, I think that would be a good idea. And thanks for pointing out the connection between the packings and the Möbius group. I was also trying to count the dimension by a simplistic "degrees of freedom" argument: a circle on a sphere has 3DOF (location of center and radius). A tangency between circles removes 1DOF, and setting the radius of a circle removes 1DOF. So 12 circles minus 30 tangencies yields 6 DOF, two of which are just rotations of the entire sphere. If setting the radii of 3 spheres makes it now rigid up to rotations of the whole sphere, where did the other DOF go, so to speak?

@Vectornaut wrote in https://code.studioinfinity.org/StudioInfinity/dyna3/pulls/136#issuecomment-3690: > Given the usefulness of this problem so far, I suppose I should add an entry on circle packings to the [test problem list](https://code.studioinfinity.org/StudioInfinity/dyna3/wiki/Test-problems). That would be a good place to mention the expected number of degrees of freedom. Yes, I think that would be a good idea. And thanks for pointing out the connection between the packings and the Möbius group. I was also trying to count the dimension by a simplistic "degrees of freedom" argument: a circle on a sphere has 3DOF (location of center and radius). A tangency between circles removes 1DOF, and setting the radius of a circle removes 1DOF. So 12 circles minus 30 tangencies yields 6 DOF, two of which are just rotations of the entire sphere. If setting the radii of 3 spheres makes it now rigid up to rotations of the whole sphere, where did the other DOF go, so to speak?
Author
Member

So 12 circles minus 30 tangencies yields 6 DOF, two of which are just rotations of the entire sphere. If setting the radii of 3 spheres makes it now rigid up to rotations of the whole sphere, where did the other DOF go, so to speak?

Whoops, I misspoke! The Möbius group has dimension six, as suggested by your dimension-counting calculation. Fixing the radii of enough spheres cuts the packing's symmetry group down to the rotation group, which has dimension three (not two). This suggests that fixing the radii of three spheres should typically be enough to make the packing rigid.

> So 12 circles minus 30 tangencies yields 6 DOF, two of which are just rotations of the entire sphere. If setting the radii of 3 spheres makes it now rigid up to rotations of the whole sphere, where did the other DOF go, so to speak? Whoops, I misspoke! The Möbius group has dimension six, as suggested by your dimension-counting calculation. Fixing the radii of enough spheres cuts the packing's symmetry group down to the rotation group, which has dimension three (not two). This suggests that fixing the radii of three spheres should typically be enough to make the packing rigid.
Owner

Ah, I see -- if I think about four circles, it's clearer. Three of them completely determine the fourth. so if I set their three radii and that they are all four tangent, that seems like only losing 9DOF, but in fact the radius of the fourth is also determined, so it has no degrees of freedom left, i.e. we are down 10DOF, leaving only the two for rotating the whole sphere. Same thing going on with 12 circles. No need to respond further. Thanks!

Ah, I see -- if I think about four circles, it's clearer. Three of them completely determine the fourth. so if I set their three radii and that they are all four tangent, that seems like only losing 9DOF, but in fact the _radius_ of the fourth is also determined, so it has no degrees of freedom left, i.e. we are down 10DOF, leaving only the two for rotating the whole sphere. Same thing going on with 12 circles. No need to respond further. Thanks!
Owner

@glen wrote in #136 (comment):

Ah, I see -- if I think about four circles, it's clearer. Three of them completely determine the fourth. so if I set their three radii and that they are all four tangent, that seems like only losing 9DOF, but in fact the radius of the fourth is also determined, so it has no degrees of freedom left, i.e. we are down 10DOF, leaving only the two for rotating the whole sphere. Same thing going on with 12 circles. No need to respond further. Thanks!

Double oops -- I was just wrong about the dimensionality of the rotation group, thanks for setting me straight. So there is no "extra" DOF lost. I managed to convince myself of something false just by being so sure it had to be true because of miscounting the rotation group. A bit disappointed in myself ;-)

@glen wrote in https://code.studioinfinity.org/StudioInfinity/dyna3/pulls/136#issuecomment-3693: > Ah, I see -- if I think about four circles, it's clearer. Three of them completely determine the fourth. so if I set their three radii and that they are all four tangent, that seems like only losing 9DOF, but in fact the _radius_ of the fourth is also determined, so it has no degrees of freedom left, i.e. we are down 10DOF, leaving only the two for rotating the whole sphere. Same thing going on with 12 circles. No need to respond further. Thanks! Double oops -- I was just wrong about the dimensionality of the rotation group, thanks for setting me straight. So there is no "extra" DOF lost. I managed to convince myself of something false just by being so sure it had to be true because of miscounting the rotation group. A bit disappointed in myself ;-)
Sign in to join this conversation.
No description provided.