Closed
Description
When I read a file with readtxt() and the dtype is set to string, the contents are changed if the column only contains integers. I need to read IDs, which can be integers, but can also contain characters, so i need to read them as strings.
Namely, the last number (in this case 100000) loses one "0" and becomes 10000. Frankly, this only happens to the last number. Even weirder, this only happens if the list ends with a number ending with a zero. It took me hours to track down this issue in my code. Do you have an idea why this happens?
Reproducing code example:
>>> import numpy as np
>>> import pandas as pd
>>> liste = list(range(1,100001))
>>> df = pd.DataFrame(liste)
>>> df
0
0 1
1 2
2 3
3 4
4 5
... ...
99995 99996
99996 99997
99997 99998
99998 99999
99999 100000
[100000 rows x 1 columns]
>>> df.to_csv("testfile",header=False,index=False)
>>> liste2 = np.loadtxt("testfile",dtype="str",delimiter=",",skiprows=0,usecols=0)
>>> liste2[-1]
'10000'
>>> liste2 = np.loadtxt("testfile",dtype="int",delimiter=",",skiprows=0,usecols=0)
>>> liste2[-1]
100000
>>> liste2 = np.loadtxt("testfile",dtype="str",delimiter=",",skiprows=0,usecols=0)
>>> liste1str = list(map(str,liste))
>>> liste1str == liste2
array([ True, True, True, ..., True, True, False])
>>> liste2[99]
'100'
>>> liste2[999]
'1000'
>>> liste2[9999]
'10000'
>>> liste2[99999]
'10000'