Problems with convergence when 3 marginals sharing 1 column is used

Hi, I'm trying to understand why the following code won't find a good fit - the convergence rate is high at the end indicating that a good solution has not been found, and it ends because the change in convergence_rate is below the rate_tolerance.

The dummy use-case is that I have 3 tables spanning [education, age], [education, gender], and [education, children], and want a joint distribution of [education, age, gender, children] that fits all marginals of the 3 tables. My real use-case prevents expanding the distributions sequentially, because the data has the tables [age, municipality], [municipality, children], [age, children], which connects in a triangle.

I've tested it purely on dummy data, but that luckily makes it easier to paste and reproduce.
In the dummy data, I have the full distribution for [education, age, gender, children]. I get the marginals by grouping on some of the axes. I then pass the full distribution as the initial value, and try to fit it to the marginals. This should converge after 1 step, because there is a perfect fit initially. But it doesn't and gives a very bad fit, with convergence rate = 2.218433 at the final iteration step, where it stops because the change in convergence rate is below the rate_tolerance.

Am I using the function wrong, or is there a convergence issue?

`# library imports
from ipfn import ipfn
import numpy as np
import pandas as pd

# Generate the full joint distribution
weight = np.array([1., 2., 1., 3., 5., 5., 6., 2., 2., 1., 7., 6.,
                   5., 4., 2., 5., 5., 5., 3., 8., 7., 2., 7., 6.,
                   1., 2., 1., 3., 5., 5., 6., 2., 2., 1., 7., 6.,
                   5., 4., 2., 5., 5., 5., 3., 8., 7., 2., 7., 6.,],)
weight = weight * 0.5 # to still sum to 100.

gender_l  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
             2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
             1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
             2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,]

education_l = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4,
               1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4,
               1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4,
               1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4,]

age_l  = ['20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',
          '20-25','30-35','40-45',]

children_l = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
              0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
              1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
              1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,]

df = pd.DataFrame()
df['gender'] = gender_l
df['education'] = education_l
df['age'] = age_l
df['children'] = children_l
df['weight'] = weight

# Define the 2d target marginal distributions dataframes
gender_education_conditional   = df.groupby(['gender', 'education'])['weight'].sum()
education_age_conditional      = df.groupby(['education', 'age'])['weight'].sum()
education_children_conditional = df.groupby(['education','children'])['weight'].sum()

# Perform the ipfn
aggregates = [gender_education_conditional, education_age_conditional, education_children_conditional]
dimensions = [['gender', 'education'], ['education', 'age'], ['education','children']]
IPF = ipfn.ipfn(df, aggregates, dimensions, weight_col="weight", verbose = 2)
ipf_out = IPF.iteration()
df = ipf_out[0]
flag = ipf_out[1]
convergence_rate = ipf_out[2]
# And print the results for evaluation
print(flag)
print(convergence_rate)
print(df.groupby(["education", "age"])["weight"].sum(), education_age_conditional)
print(df.groupby(["education", "children"])["weight"].sum(), education_children_conditional)
print(df.groupby(["gender", "education"])["weight"].sum(), gender_education_conditional)`



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems with convergence when 3 marginals sharing 1 column is used #22

Generate the full joint distribution

Define the 2d target marginal distributions dataframes

Perform the ipfn

And print the results for evaluation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problems with convergence when 3 marginals sharing 1 column is used #22

Description

Generate the full joint distribution

Define the 2d target marginal distributions dataframes

Perform the ipfn

And print the results for evaluation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions