Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

Askaholic
Copy link
Contributor

Leaving this here in hopes that someone will tell me there's actually a really easy way of doing this!

On my system (x86_64 ubuntu 18) I've determined that the compiler lays out the BigInt structs like this:

offset item
-------------
0:    BigUInt
   0:    Vec
      0:    RawVec
         0:   Unique<T>
         8:   usize (capacity)
      15:   usize (len)
23:   Sign

So I can simply cast the &BigInt to a &Vec and get the capacity from it.

TODO: Test snippets

@coolreader18
Copy link
Member

Do we have to account for unused extra capacity in the underlying Vec? If not, I'd say just use the bits method to find the size of the integer.

@Askaholic
Copy link
Contributor Author

I'm not really sure where to look for the official spec, but from the python3 docs it sounds like it's supposed to return the amount of memory in bytes that the object is holding.

https://docs.python.org/3/library/sys.html#sys.getsizeof

@windelbouwman
Copy link
Contributor

Are you aware that we implemented sys.getsizeof here? https://github.com/RustPython/RustPython/blob/master/vm/src/sysmodule.rs#L101

Maybe you could look at the mem::size_of_val function?

Maybe the __sizeof__ method should be implemented on the object type, so that all subtypes have a default implementation?

@windelbouwman
Copy link
Contributor

A good test would be to check what cpython reports as a size on various ineteger value. Keep in mind however, that this is probably implementation specific, and that we probably cannot make testcases for it in the tests/snippets directory.

@Askaholic
Copy link
Contributor Author

I don't think mem::size_of_val helps here because we would still need a reference to the BigUInt data Vec. Unfortunately you can't really implement this function in general because you can't know how much memory an arbitrary object owns via a reference/pointer.

The cpython implementation in longobject.c looks like this:

offsetof(PyLongObject, ob_digit) + Py_ABS(Py_SIZE(self))*sizeof(digit);

And in practice that translates to:

>>> (0).__sizeof__()
24
>>> (1).__sizeof__()
28
>>> (1000000000000000).__sizeof__()
32
>>> (10000000000000000000000000000000000000).__sizeof__()
44

And my implementation for RustPython:

>>>>> (0).__sizeof__()
32
>>>>> (1).__sizeof__()
36
>>>>> (1000000000000000).__sizeof__()
40
>>>>> (10000000000000000000000000000000000000).__sizeof__()
48

I would guess that the cpython ints are 4 bytes smaller because they combine the sign with the number of digits i.e. -4 digits means 4 digits, but the sign of the number is negative. That's why they use Py_ABS in the __sizeof__ implementation.

I think using bits we can basically get the vectors len in a safe way. It's still a little ugly because it will do a bunch of extra calculations that we need to undo, and also it gets the length and not the capacity.

@windelbouwman
Copy link
Contributor

I think we should search in the direction of modifying PyObject or the like. This struct holds the custom payload for many types. Maybe we could ask in the rust forums on how to tackle this in a good way? I think the unsafe implementation, as well the manual calculation with the bits are a bit weird. There must be a proper way to solve this in a generic way I think. Can't you use size_of_val on the payload? You could add an extra method to the PyObject which will return the size of the payload in bytes?

@Askaholic
Copy link
Contributor Author

Well, the reason why you can't write a function for this in general is the same reason why you can't write a hashing function that works for all objects: Some objects have pointers, and how can you tell which pointers are important and which ones aren't? You may be able to come up with something that works most of the time if the language has good support for introspection, which I have no experience with in Rust.

Really the only problem here is that we're using the big_num crate, and this crate completely hides away all implementation details from you. And unless we can get a reference to the BigUInt.data field we can't make use of size_of_val. https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=ceee481b5ba8180474a2492af32f999b

I see that in the code I've written here I could use size_of_val to replace the capacity * size_of(u32) calculation, and I should probably change my pointer cast to *const Vec<u32> instead.

As an aside. I've been writing a lot of Python, and of course in python there is no public/private access control (the best you can do is use @property). So it's extremely frustrating to me, not being able to even get a read-only reference to a data member. Like, what is the language trying to protect me against by preventing me from reading valid data?

@windelbouwman
Copy link
Contributor

Hmm. This is too bad. I suggest to implement the __sizeof__ method on the generic object type, and use size_of_val anyway. This will give some result, but it is incorrect. Please mark this with a TODO comment, and maybe log a warning. In the mean time, we could ask at the rust forums around on how to solve this.

I would really like to prevent that we require a specific sizeof method for each type.

Does this sound reasonable to you?

@Askaholic
Copy link
Contributor Author

What is it that you want me to mark with TODO?

I have a suspicion that implementing a generic __sizeof__ that works for ALL types of objects is going to be an even harder problem than implementing __sizeof__ for only one specific object. I might take a look at some other objects or methods though if I feel motivated.

I don't see a lot of point in incorrectly implementing __sizeof__ though. I think that will just cause the whats_left.sh script to give the illusion of the project being further along thprobleman it is.

BTW, a half baked idea I had for a workaround for ints was to maybe store the data vec directly in PyInt and only convert it to a BigInt when it's actually needed for arithmetic. Doesn't seem particularly practical to me either though.

The other foolproof solution would be to write our own implementation of BigInt (or maybe to fork the num repo) and give data members sane access controls.

@windelbouwman
Copy link
Contributor

Sorry, I accidentally closed-reopened this PR.

The reason for me to skip the __sizeof__ method from the whats_left.sh script is to prevent new contributors from hitting this issue. Maybe we could have a whats_left_new_contributors.sh script indicating the more do-able items.

I do not think forking the bigint crate is a good idea. @coolreader18 and @palaviv do you have any ideas on this? For the moment, I would like to park this method, and implement other modules first. I think we can postpone this method for some time.

@coolreader18
Copy link
Member

Forking bigint does seem like it would come with a lot of unnecessary maintenance cost. Also, FWIW, I looked into this a bit ago and I came to the same conclusion; __sizeof__ would have to be implemented manually for each type, as it's not possible to tell what is and isn't significant.

@windelbouwman
Copy link
Contributor

Please see also this issue: rust-num/num-bigint#98

@windelbouwman
Copy link
Contributor

I propose to hold off this issue until the bigint crate has support for querying memory. It appears that there might be an option to do this in future.

@windelbouwman
Copy link
Contributor

I filed an issue for this in the rust repo: rust-lang/rust#63073

Copy link
Contributor

@windelbouwman windelbouwman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait with this change a while until we might get a method from the BigInt crate.

@windelbouwman
Copy link
Contributor

@Askaholic do you agree on closing this pull request, and creating an issue for this topic?

@Askaholic
Copy link
Contributor Author

Yea, good idea!

@windelbouwman
Copy link
Contributor

Changed into issue #1250

@coolreader18 coolreader18 mentioned this pull request Oct 15, 2019
35 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants