Properly implement Ueda and Yamashita's regularized Newton method #136
No reviewers
Labels
No labels
bug
design
duplicate
engine
enhancement
maintenance
prospective
question
regression
stub
todo
ui
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
StudioInfinity/dyna3!136
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "Vectornaut/dyna3:grad-norm-reg"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goals
Primary
This pull request addresses issue #130.
Secondary
This pull request also:
Notes
Uniform regularization of Newton's method depends on a choice of metric on the configuration space. On the branch to be merged, we keep using the Euclidean metric associated with the computational basis. That means this pull request doesn't address issue #131.
I did test some versions of the branch to be merged that use other metrics, including completions of the Gram partial metric described in issue #131. I found that this branch was among the best in terms of realization quality and number of steps before convergence! I also realized that the computational metric is more geometrically meaningful than I'd appreciated at first, as explained in this comment.
The new version adds a gradient term even if the existing Hessian has all strictly positive eigenvalues, whereas the previous one doesn't change the Hessian if all its eigenvalues are strictly positive. That's intentional and not harmful to convergence in the favorable case when the Hessian starts out truly positive definite all by itself?
Yes, I think Ueda and Yamashita intended for this to happen, based on my reading of Section 2 of their paper. My guess is that it helps with stability when the Hessian is positive-definite but has an extremely small eigenvalue. (I can explain that intuition more if you'd like.)
Yes I get that. I guess I was expecting that when the minimum eigenvalue exceeded some threshold, we'd drop the gradient term?
With our current design, I don't think it would make a difference, because I don't think the minimum eigenvalue of the Hessian can be greater than zero (up to numerical error). This is because the loss stays constant when the conformal group acts on the assembly.
I've confirmed that the engine's behavior doesn't noticeably change when we multiply the gradient norm term by a simple cutoff that drops linearly from 1 to 0 as the minimum eigenvalue goes from 1 to 2.
Since using a cutoff adds parameters, makes the code more complex, and deviates from Ueda and Yamashita's paper, I'm not keen on doing it unless it has a demonstrable advantage.
OK, good enough. I will do final review of this PR.
@ -465,0 +478,4 @@-reg_scale * min_eigval.min(0.0)+ GRAD_REG_SCALE * grad_norm_squared.powf(0.25)) * DMatrix::identity(total_dim, total_dim);history.hess_eigvals.push(hess_eigvals);Please just read this over once and see if you find the computations of the terms and pushing to the history a bit intermixed. I'd be inclined to compute the hess_eigvals and push them to the history, then compute the
most_neg_eigvalashess_eigvals.min().min(0.0), then compute thesqrt_grad_normasneg_grad_stacked.norm_squared().powf(0.25), and finally thehess_regas&hess + DMatrix::identity(total_dim, total_dim) * (-reg_scale * most_neg_eigval + GRAD_REG_SCALE * sqrt_grad_norm)). No change is necessary, only modify if you agree some rearrangement would be clearer, then let me know either way. Thanks!So far, I've been putting each history push at the end of the relevant paragraph (except for the paragraph that ends with
if state.loss < tol { break; }). It'd be nice to switch to a more thoughtful organization at some point, but I'd save that for a general engine cleanup pull request.Note that pushing the eigenvalue list to the history moves its ownership, so pushing the eigenvalue list before computing the minimum eigenvalue isn't quite as straightforward as just reordering the lines.
OK, that sounds like you'd prefer to leave this whole block of code as it is. I will go ahead and merge this PR assuming it runs alright for me.
This is a truly unfortunate aspect of rust... if we weren't well steeped in it now, I'd be inclined to veto it on this basis. Being able easily to do trivial-seeming transformations on code to create the clearest organization/easiest reading of the code outweighs some level of airtight safety/performance considerations for the type of project we're writing, I would say.
Wow, the dodecahedron example works much better! I wasn't able to get any of the tangencies to "invert" except by going through configurations where one became a plane. (I was also able to show dyna3 off to Steve Trettel, who was impressed.)
Quick question: if you make all of the spheres tangent in the dodecahedron example, what's the expected number of curvatures you can set before the configuration becomes rigid? is it 3?
Anyhow, really nice work! Merging.
Yes, it should be three, because that's the dimension of the Möbius group [edit: minus the dimension of the rotation subgroup].
Given the usefulness of this problem so far, I suppose I should add an entry on circle packings to the test problem list. That would be a good place to mention the expected number of degrees of freedom.
@Vectornaut wrote in #136 (comment):
Yes, I think that would be a good idea. And thanks for pointing out the connection between the packings and the Möbius group. I was also trying to count the dimension by a simplistic "degrees of freedom" argument: a circle on a sphere has 3DOF (location of center and radius). A tangency between circles removes 1DOF, and setting the radius of a circle removes 1DOF. So 12 circles minus 30 tangencies yields 6 DOF, two of which are just rotations of the entire sphere. If setting the radii of 3 spheres makes it now rigid up to rotations of the whole sphere, where did the other DOF go, so to speak?
Whoops, I misspoke! The Möbius group has dimension six, as suggested by your dimension-counting calculation. Fixing the radii of enough spheres cuts the packing's symmetry group down to the rotation group, which has dimension three (not two). This suggests that fixing the radii of three spheres should typically be enough to make the packing rigid.
Ah, I see -- if I think about four circles, it's clearer. Three of them completely determine the fourth. so if I set their three radii and that they are all four tangent, that seems like only losing 9DOF, but in fact the radius of the fourth is also determined, so it has no degrees of freedom left, i.e. we are down 10DOF, leaving only the two for rotating the whole sphere. Same thing going on with 12 circles. No need to respond further. Thanks!
@glen wrote in #136 (comment):
Double oops -- I was just wrong about the dimensionality of the rotation group, thanks for setting me straight. So there is no "extra" DOF lost. I managed to convince myself of something false just by being so sure it had to be true because of miscounting the rotation group. A bit disappointed in myself ;-)