Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Integration test for simple example #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

willGraham01
Copy link
Collaborator

No description provided.

@willGraham01
Copy link
Collaborator Author

Writing myself a comment here so I don't forget (may transfer to an issue).

A problem that we're now hitting in the "optimisation" step is that the randomness we introduce when drawing samples / computing expectations causes gradient methods to be uspet. This is also coupled with the fact that jax apparently cannot do constrained optimisation (they recommend jaxopt which is pre-v1-package).

Our problem - at least analytically - is deterministic. Find parameters $\Theta$ that maximise some causal estimand $\sigma(\Theta)$ subject to some constraints $\phi(\Theta)$ being within $\epsilon$-tolerance of some observed data $\hat{\phi}$. Note that this is NOT Maximum Likelihood Estimation (we don't want to find the $\Theta$ that maximises the chances of observing $\hat{\phi}$, we want to minimise $\sigma$ whilst keeping $\hat{\phi}$ an $\epsilon$-viable outcome. Note that $\sigma$ and $\phi$ will also have well-defined $\Theta$-gradients (Jacobians) and Hessians, again at least analytically.

The difficultly is that we currently do not have a way to evaluate $\sigma$ or $\phi$ analytically. Right now we rely on methods that involve drawing samples from the underlying distributions and estimating things like the expectation and variance. This theoretically means that evaluating these functions at the same $\Theta$ could return different values, which obviously then messes up things like the gradient (if it can be inferred). As such, attempting to actually solve one of the minimisation problems more often than not just results in no convergence / nonsensical outputs, even when the function is given the analytic answer as the starting point.

Not sure what's out there to help us combat this. If we had access to the CDFs of the distributions, I think our issues would be solved (or even the PDFs maybe - we'd have to numerically integrate them sure but it wouldn't be too bad)? Basically using random samples as the sole basis for evaluating expectations and the like is coming back to bite us.

@willGraham01 willGraham01 changed the base branch from main to wgraham/causalproblem-constraints-fn May 2, 2025 09:42
@willGraham01
Copy link
Collaborator Author

willGraham01 commented May 6, 2025

Further to the above point, swapping the un-commented lines out for their commented equivalents causes the optimisation to break, which at least confirms that it is something to do with the randomness (even fixed randomness) that is upsetting the optimiser.

cp = CausalProblem(graph, label="CP")
cp.set_causal_estimand(
    expectation_with_n_samples(),
    # rvs_to_nodes={"rv": "y"},
    rvs_to_nodes={"rv": "mu"},
    graph_argument="g",
)
cp.set_constraints(
    expectation_with_n_samples(),
    # rvs_to_nodes={"rv": "x"},
    rvs_to_nodes={"rv": "mu"},
    graph_argument="g",
)

Edit 2: We can in fact have rvs_to_nodes={"rv": "y"} in the first call, but not rvs_to_nodes={"rv": "x"} in the second. Having rvs_to_nodes={"rv": "x"} in the 2nd always causes non-convergence.

Base automatically changed from wgraham/causalproblem-constraints-fn to main May 7, 2025 10:03
@willGraham01
Copy link
Collaborator Author

willGraham01 commented May 7, 2025

Further to the above, some hacked-together testing of the affect of providing:

  • MOAR samples
  • The Jacobian of the causal estimand, the constraints function, both, or neither

2025-05-07-11:02_results

All experiments done at a fixed RNG key (jax.random.key(0)) for the 2-normal-distribution problem.

It should be noted that in all cases in this experiment, providing the Jacobian of only the objective function always resulted in non-convergence, which I'm chalking up to the fact that providing the analytic Jacobian was somehow then at odds with the randomness introduced in the function evaluations and/or constraint function changes. The "no jacobians" or "constraints jacobian only" also always return the initial guess, as can be seen from the left-hand plot. This means that only the "provide both Jacobians" method was actually doing anything useful.

Otherwise, beyond some beneficial RNG for some sample sizes, it looks like we can expect the error to decay roughly as the square-root of the number of samples, all other factors being equal. Computation time appears approximately linear but this is likely only because the re-parametrisation trick with normal distributions only relies on element-wise multiplication. For sample sizes higher than $10^8$ my laptop runs out of memory.

Most important takeaway: a reliable Jacobian evaluation for both constaints and objective function is pretty much a requirement.

Some Further Thoughts

  • Wondering what the effect of providing the Hessian would be.
  • Wondering if it is possible to vectorise the causal_estimand and constraint evaluations? (Beyond the scope of this PR, and has internal difficulties what with the current handling of parameter values too).
  • Wondering if it is possible to encode the Jacobian and/or Hessian into the Distribution class, so they can potentially be "pre-constructed" by a CausalProblem instance when the causal_estimand is set? Related to Sampling via Parametrisations #50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant