Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ch08 - Summarizing Text Using Machine Learning #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
amscosta opened this issue Feb 7, 2024 · 1 comment
Open

ch08 - Summarizing Text Using Machine Learning #29

amscosta opened this issue Feb 7, 2024 · 1 comment

Comments

@amscosta
Copy link

amscosta commented Feb 7, 2024

Hello
Running jupyter notebook locally for the section : Summarizing Text Using Machine Learning. Code stops with error when
try to apply the topN function : topN = lambda x: x <= np.ceil(compression_factor * x.max())

     train_df['summaryPost'] = train_df.groupby('ThreadID')['rank'].apply(topN)

(Code from section 1.2 and 1.3 loaded successfully /!python -m spacy download en_core_web_sm/!pip install textdistance)
With the huge following pink error :

ValueError Traceback (most recent call last)
File D:\blueprints-text\ch09ev\lib\site-packages\pandas\core\frame.py:11610, in _reindex_for_setitem(value, index)
11609 try:

11610 reindexed_value = value.reindex(index)._values
11611 except ValueError as err:
11612 # raised in MultiIndex.from_tuples, see test_insert_error_msmgs

File D:\blueprints-text\ch09ev\lib\site-packages\pandas\core\series.py:4918, in Series.reindex(self, index, axis, method, copy, level, fill_value, limit, tolerance)
4901 @doc(
4902 NDFrame.reindex, # type: ignore[has-type]
4903 klass=_shared_doc_kwargs["klass"],
(...)
4916 tolerance=None,
4917 ) -> Series:
-> 4918 return super().reindex(
4919 index=index,
4920 method=method,
4921 copy=copy,
4922 level=level,
4923 fill_value=fill_value,
4924 limit=limit,
4925 tolerance=tolerance,
4926 )

File D:\blueprints-text\ch09ev\lib\site-packages\pandas\core\generic.py:5360, in NDFrame.reindex(self, labels, index, columns, axis, method, copy, level, fill_value, limit, tolerance)
5359 # perform the reindex on the axes
-> 5360 return self._reindex_axes(
5361 axes, level, limit, tolerance, method, fill_value, copy
5362 ).finalize(self, method="reindex")

File D:\blueprints-text\ch09ev\lib\site-packages\pandas\core\generic.py:5375, in NDFrame._reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
5374 ax = self._get_axis(a)
-> 5375 new_index, indexer = ax.reindex(
5376 labels, level=level, limit=limit, tolerance=tolerance, method=method
5377 )
5379 axis = self._get_axis_number(a)

File D:\blueprints-text\ch09ev\lib\site-packages\pandas\core\indexes\base.py:4279, in Index.reindex(self, target, method, level, limit, tolerance)
4277 indexer, _ = self.get_indexer_non_unique(target)
-> 4279 target = self._wrap_reindex_result(target, indexer, preserve_names)
4280 return target, indexer

File D:\blueprints-text\ch09ev\lib\site-packages\pandas\core\indexes\multi.py:2490, in MultiIndex._wrap_reindex_result(self, target, indexer, preserve_names)
2489 try:
-> 2490 target = MultiIndex.from_tuples(target)
2491 except TypeError:
2492 # not all tuples, see test_constructor_dict_multiindex_reindex_flat

File D:\blueprints-text\ch09ev\lib\site-packages\pandas\core\indexes\multi.py:211, in names_compat..new_meth(self_or_cls, *args, **kwargs)
209 kwargs["names"] = kwargs.pop("name")
--> 211 return meth(self_or_cls, *args, **kwargs)

File D:\blueprints-text\ch09ev\lib\site-packages\pandas\core\indexes\multi.py:590, in MultiIndex.from_tuples(cls, tuples, sortorder, names)
588 tuples = np.asarray(tuples._values)
--> 590 arrays = list(lib.tuples_to_object_array(tuples).T)
591 elif isinstance(tuples, list):

File D:\blueprints-text\ch09ev\lib\site-packages\pandas_libs\lib.pyx:2894, in pandas._libs.lib.tuples_to_object_array()

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_9532\1469566075.py in ?()
----> 1 train_df['summaryPost'] = train_df.groupby('ThreadID')['rank'].apply(topN)

D:\blueprints-text\ch09ev\lib\site-packages\pandas\core\frame.py in ?(self, key, value)
3946 # Column to set is duplicated
3947 self._setitem_array([key], value)
3948 else:
3949 # set column
-> 3950 self._set_item(key, value)

D:\blueprints-text\ch09ev\lib\site-packages\pandas\core\frame.py in ?(self, key, value)
4139
4140 Series/TimeSeries will be conformed to the DataFrames index to
4141 ensure homogeneity.
4142 """
-> 4143 value = self._sanitize_column(value)
4144
4145 if (
4146 key in self.columns

D:\blueprints-text\ch09ev\lib\site-packages\pandas\core\frame.py in ?(self, value)
4863 # or through loc single_block_path
4864 if isinstance(value, DataFrame):
4865 return _reindex_for_setitem(value, self.index)
4866 elif is_dict_like(value):
-> 4867 return _reindex_for_setitem(Series(value), self.index)
4868
4869 if is_list_like(value):
4870 com.require_length_match(value, self.index)

D:\blueprints-text\ch09ev\lib\site-packages\pandas\core\frame.py in ?(value, index)
11613 if not value.index.is_unique:
11614 # duplicate axis
11615 raise err
11616

11617 raise TypeError(
11618 "incompatible index of inserted column with frame index"
11619 ) from err
11620 return reindexed_value

TypeError: incompatible index of inserted column with frame index

@amscosta
Copy link
Author

I made a typo : The blueprint is from ch09

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant