Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Aniketsy
Copy link
Contributor

Please let me know if my approach or fix needs any improvements . I’m open to feedback and happy to make changes based on suggestions.
Thankyou !

cur_dir = os.path.dirname(os.path.abspath(__file__))
data = np.genfromtxt(os.path.join(cur_dir, "results", fname),
delimiter=" ")
df = pd.read_csv(os.path.join(cur_dir, "results", fname), delimiter=" ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS, this is missing , header=None

cur_dir = os.path.abspath(os.path.dirname(__file__))
fpath = os.path.join(cur_dir, "test_data.txt")
pet = np.genfromtxt(fpath)
pet = pd.read_csv(fpath, header=None).values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like missing delimiter=" "
space delimited

@josef-pkt
Copy link
Member

Thank you for the PR

Still several test failures.
I checked two cases and those are missing the correct option in pandas read_csv, AFAICS.
Based on reading code and files, not verified.

@Aniketsy
Copy link
Contributor Author

Thanks! I’m working on fixing these test failures and will update soon.

@Aniketsy
Copy link
Contributor Author

@josef-pkt There are isort issues in several other files as well. Could you please confirm if I should run isort on all files and include those changes in this PR, or only fix the files directly related to this PR?

@Aniketsy
Copy link
Contributor Author

@josef-pkt I’m currently getting three test failures. Could you please review my changes and guide me on how to fix them? I’m stuck at this point and would appreciate your help so I can proceed.

FAILED ../venv-test/lib/python3.12/site-packages/statsmodels/nonparametric/tests/test_kde.py::TestKDEWGauss::test_evaluate - ValueError: Length of values (1) does not match length of index (60)
  FAILED ../venv-test/lib/python3.12/site-packages/statsmodels/nonparametric/tests/test_kde.py::TestKDEWGauss::test_compare - ValueError: Length of values (1) does not match length of index (60)
  FAILED ../venv-test/lib/python3.12/site-packages/statsmodels/regression/tests/test_glsar_gretl.py::TestGLSARGretl::test_all - AssertionError: 
  Arrays are not almost equal to 3 decimals
  
  (shapes (203,), (202,) mismatch)
   ACTUAL: array(['-2.0742', '-16.829', '17.807', '14.214', '-24.269', '5.7992',
         '-20.886', '9.8798', '7.4863', '14.11', '-18.151', '2.6465',
         '-13.329', '2.0981', '-8.0123', '9.6415', '-5.0585', '-10.143',...
   DESIRED: array([ -2.074, -16.829,  17.807,  14.214, -24.269,   5.799, -20.886,
           9.88 ,   7.486,  14.11 , -18.151,   2.646, -13.329,   2.098,
          -8.012,   9.642,  -5.059, -10.143,   2.719, -12.585, -10.268,...

cur_dir = os.path.abspath(os.path.dirname(__file__))
fpath = os.path.join(cur_dir, "results/leverage_influence_ols_nostars.txt")
lev = np.genfromtxt(fpath, skip_header=3, skip_footer=1,
converters={0: lambda s: s})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this needs either 3 or 4 skip_footer (4 if we count empty line)

lev = pd.read_csv(fpath, skiprows=3, skipfooter=3, engine="python", sep=r"\s+",
                  header=None, names=names)

then the rest here is not needed anymore
i.e. no to.numeric, dtypes are already float

And the np.isnan part was numpy version compat, which is also not needed anymore

(I'm just checking the read_csv part without running the unit tests)

cls.res1 = res1
fname = os.path.join(curdir, "results", "results_kde_weights.csv")
cls.res_density = np.genfromtxt(open(fname, "rb"), skip_header=1)
cls.res_density = pd.read_csv(fname, header=None, dtype=float).to_numpy().ravel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my older pandas version and at least one failing test machine, this breaks if header=None.

header=0 works for me

Otherwise, I was not yet able to figure out where the shape mismatch comes from.
(there are too many ravel to read this quickly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for correcting me! One test case is fixed now. I’ll work on the remaining two that are still failing .

@josef-pkt
Copy link
Member

in general, squeeze should not be replaced by ravel.
squeeze only removes extra dimensions, while ravel converts a 2-d array to 1-d even if neither axis has shape=1.

@Aniketsy
Copy link
Contributor Author

in general, squeeze should not be replaced by ravel. squeeze only removes extra dimensions, while ravel converts a 2-d array to 1-d even if neither axis has shape=1.

@josef-pkt Should I revert all the changes where I replaced squeeze with ravel?

@josef-pkt
Copy link
Member

yes, switch back to squeeze.
At least in the kde tests. I did not look at other changes.

dta_path = os.path.join(current_path, "Matlab_results", "test_coint.csv")
with open(dta_path, "rb") as fd:
dta = np.genfromtxt(fd)
dta = pd.read_csv(dta_path, header=None).values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this csv file is space delimited
this works for me:
dta = pd.read_csv(dta_path, header=None, delimiter="\s+").values

def test_density(self):
npt.assert_almost_equal(self.res1.density, self.res_density,
npt.assert_almost_equal(self.res1.density,
np.asarray(self.res_density).ravel(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS, In these cases self.res_density should have already the correct type and shape.
Check that the attribute is correctly set in setup_class.
Then, we can avoid having to do asarray and squeeze/ravel each time res_density is used.


# read data set and drop rows with missing data
dta = np.genfromtxt("dftest3.data", dt_b, missing=".", usemask=True)
dta = pd.read_csv("dftest3.data", header=None, na_values=".").values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not work. I guess the code will raise an exception.
This is old sandbox code, I guess, without unit tests.

We could leave this for after this PR.
My guess is that the code up to and including line 337 can be replace by dta.dropna()

But it is likely not worth the effort to rescue this module.

("y", float),
]
)
dta = np.genfromtxt("dftest3.data", dt_b, missing=".", usemask=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar case of using masked array as in ols_anova_original.py

# from stata
# forecast = genfromtxt(open(cur_dir+"/arima111_forecasts.csv"),
# delimiter=",", skip_header=1, usecols=[1,2,3,4,5])
# forecast = pd.read_csv(open(cur_dir+"/arima111_forecasts.csv"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also likely wrong, but it's commented out code

I think this was added to show how the reference results can be loaded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the next part to be correct it needs additionally
forecast = forecast.iloc[:, 1:].to_numpy()

@josef-pkt
Copy link
Member

to sandbox

try_ols_anova.py and ols_anova_original.py are currently already broken in the data handling.
missing data file and missing method in numpy.

So, we can ignore any changes there.

aside:
import of ols_anova_original fails with missing datafile
import of statsmodels.sandbox.regression.try_ols_anova still works and has helper functions that predate formulas and pandas categoricals, i.e. support for dummy variables using only numpy.
one function of it is used in anova_nistcertified.
(I can run the module anova_nistcertified.py after replacing local imports by global, absolute imports)

@Aniketsy
Copy link
Contributor Author

@josef-pkt Thank you! I have made the suggested updates and reverted the changes in try_ols_anova.py and ols_anova_original.py Please let me know if any further improvements are needed.

@josef-pkt
Copy link
Member

josef-pkt commented Nov 17, 2025

commit 0dcf9f1 is the main commit for the import cleanup
more import cleanup is in commit 4604ded together with genfromtxt changes

@josef-pkt
Copy link
Member

do you know how to use interactive rebase to squash some commits together?

It looks like it's almost ready to merge.
Before merge I would like to squash it into something like 3 commits, keeping the main import cleanup in a separate commit

Thanks for all this work.

@josef-pkt
Copy link
Member

test run complains about 3 style violations

statsmodels/nonparametric/tests/test_lowess.py:146:45: E127 continuation line over-indented for visual indent
statsmodels/regression/tests/test_glsar_gretl.py:319:1: W293 blank line contains whitespace
statsmodels/tsa/tests/results/results_arma.py:22:1: E302 expected 2 blank lines, found 1

@Aniketsy
Copy link
Contributor Author

I haven’t used interactive rebase for squashing commit before, but I can give it a try.

@josef-pkt
Copy link
Member

Make a copy of the branch to experiment with squashing. It's not too difficult but rebase is always a bit dangerous.
It's useful to learn how to do the interactive rebase. However, I can also do it myself, if you prefer.

I will briefly skim the changes again, but I don't expect that there is anything to change once CI is green

@Aniketsy
Copy link
Contributor Author

test run complains about 3 style violations

statsmodels/nonparametric/tests/test_lowess.py:146:45: E127 continuation line over-indented for visual indent statsmodels/regression/tests/test_glsar_gretl.py:319:1: W293 blank line contains whitespace statsmodels/tsa/tests/results/results_arma.py:22:1: E302 expected 2 blank lines, found 1

I’ve fixed these issues. Should I go ahead and squash the commits now, or wait until all checks pass first?

@josef-pkt
Copy link
Member

wait until the checks pass, just in case there is something left to change.

@Aniketsy
Copy link
Contributor Author

It's already 3 AM here, and I'm feeling a bit sleepy. I'll squash the commits in the morning. Hope that's okay.

@josef-pkt
Copy link
Member

josef-pkt commented Nov 17, 2025

pre testing, with development versions of some dependencies fail with

        if weights is not None:
>           self.kernel.weights /= weights.sum()
E           ValueError: output array is read-only

I don't know where that comes from.
weights array or series is read-only.
Maybe we need to make a copy of weights, given that we make changes to it.
I have not looked at the details yet. This could also be a bug in the actual KDE code that just shows up now.

update
Yes, it's a bug in KDEUnivariate.fit
The code changes the user provided weights in-place.
This needs to use a copy of the array and not /=

The code in fit should be:

        if weights is not None:
            self.kernel.weights = weights / weights.sum()

statsmodels.nonparametric.kde.KDEUnivariate

Can you add this as a BUG: kde fit, avoid inplace modification of weights commit, not to be squashed with your genfromtxt commits.

@josef-pkt
Copy link
Member

It's already 3 AM here

no problem at all.
Instant responses are now exceptional events for statsmodels.
:)

All green except for the bug in pre testing.

@Aniketsy
Copy link
Contributor Author

The code in fit should be:

        if weights is not None:
            self.kernel.weights = weights / weights.sum()

statsmodels.nonparametric.kde.KDEUnivariate

Can you add this as a BUG: kde fit, avoid inplace modification of weights commit, not to be squashed with your genfromtxt commits.

Done with this fix , now i will squash commit together.

@Aniketsy
Copy link
Contributor Author

C:\Users\Aniket.DESKTOP-074O80J\statsmodel\statsmodels>git rebase --continue
[detached HEAD 09542e308] MAINT/TST: consolidate read_csv fixes and small cleanups
 Date: Tue Nov 18 12:26:51 2025 +0530
 12 files changed, 41 insertions(+), 35 deletions(-)
[detached HEAD 61adf2805] MAINT/TST: consolidate read_csv fixes and small cleanups
 Date: Tue Nov 18 12:26:51 2025 +0530
 12 files changed, 40 insertions(+), 35 deletions(-)
Successfully rebased and updated refs/heads/replace-9566.
C:\Users\Aniket.DESKTOP-074O80J\statsmodel\statsmodels>git log --oneline --decorate -n 10
61adf2805 (HEAD -> replace-9566) MAINT/TST: consolidate read_csv fixes and small cleanups
d72a21bf0 Fix pandas read_csv options: add header=None and delimiter
b1948e107 MAINT: main import cleanup
523ab7d44 Remove venv from version control and update .gitignore
a3de41a83 (upstream/main, upstream/HEAD, support/9384, main) Merge pull request #9668 from statsmodels/dependabot/github_actions/github/codeql-action-4
aa3559921 Merge pull request #9669 from statsmodels/dependabot/github_actions/pypa/cibuildwheel-3.2.1
f32b179b9 Bump pypa/cibuildwheel from 3.2.0 to 3.2.1
04f9fbc9f Bump github/codeql-action from 3 to 4
b9a1323a4 Merge pull request #9656 from bashtage/py-314-gh-actions
afccae6d2 CI: Add 3.14 in GH actions

Hi @josef-pkt
I tried squashing the commits, but I think I messed up the history during the interactive rebase. I haven’t pushed anything, but my local branch isn’t in the correct state anymore.
If it’s okay with you, I’d like to hand this over to you and would really appreciate it if you could take it from here.
Sorry for the inconvenience, and thank you for your guidance.

@josef-pkt
Copy link
Member

no problem
I will do it in a few hours.

Thanks for the PR and going through this

@Aniketsy
Copy link
Contributor Author

Hi @josef-pkt just checking in on this PR. Absolutely no hurry, just wanted to make sure it's still on your radar.
Please let me know if I can help with any updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Maint: replace np.genfromtxt with pandas read_csv

2 participants