-
Notifications
You must be signed in to change notification settings - Fork 578
new serialisation format #805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
roaringbitmap/src/test/java/org/roaringbitmap/longlong/RoaringBitmap64SerialisationTest.java
Dismissed
Show dismissed
Hide dismissed
roaringbitmap/src/test/java/org/roaringbitmap/longlong/RoaringBitmap64SerialisationTest.java
Dismissed
Show dismissed
Hide dismissed
roaringbitmap/src/test/java/org/roaringbitmap/longlong/RoaringBitmap64SerialisationTest.java
Dismissed
Show dismissed
Hide dismissed
|
any opinions on the serialisation |
|
About this change on the serialization format of
@weijietong As initial contributor of the 64bits ART implementations, do you have any opinion on this matter? |
|
(Given the amount of changes in this PR and related matters, especially on the serialization format, I wonder if it would be simpler to go with an alternative implementation, than changing the existing one. The alternative implementation may at some point replace the legacy one (with additional mecanisms to manage the changes of serialization format)). (@mkeskells I guess it may feel overwhelming to push such changes into such an established library, making some changes quite laborious. Hence this suggestion of an alternative implementation. Benchmarks would remain very useful to confirm strength of weaknesses of various designs. And an additional implementation may make such benchamrks easier to conduct). |
I personally did not study ART-based serialization format. The Map-based is pretty straightforward. My view would be to have a look into other implementation (CRoaring, GoRoaring) to check their ART-implementation (if any), and try (...) to converge a portable format. I would be very curious to know if the Map-based format would induce such a big penalty or not when read/written into ART structures. Having a dedicated ART format seems very legitimate too. Though, it should not be too much tied with the implementation (to prevent one big issue here: changing the format due to changing the implementation). I'm not very keen not to do the specification effort, as it has been demonstrated that some users have serialization usages through libraries, and have strong expectations in term of stability. I underatand you own use-cases @mkeskells involves desrialization/deserialization, supposedly long-term scenarios, hence expecting stability on this matter (or with retro-compatibility). |
@mkeskells Is this legit? |
I only added equals for the tests. These are not user visible classes, and along with most mutuble collections dont have a sensible hashcode happy to leave as it is, or change to use another method and adjext the test to not use |
|
It feels awkward to have
I do not get your point. I would feel better if |
Agreed - the ART serialisation is very fragile
The issue that I see is that the current map serialisation format is based on 32 bit roaring bitmaps, whichwould have to be constructed and deconstructed on the fly,and the structure for that is really just a sequnce of (16 bit address, container) I think that for the 64 bit roaring bitmaps it map be better to consider an interchange format based on 16 bit containers Effectively that could make the 32 and 64 bit solutions are similar (conceptually) for the 32 bit the address is 16 bit, and for the 64 bit its 48 bits you could potentially add different prefix sizes in the future/when valhalla delivers From a quick look at the code I think this should be easy for both of the 64 bit implementations. It would add a bit of time to the serialisation and deserialisation, but [I assume] in reality its dominated by the containers both for time and space
at least fo this implementation the docs show the dragons - |
Done. On checking it wasnt used in this PR, it got copied over from the other one, but I will remove it there as well |
this comment is a bit out-dated. We worked into a specification of a portable format. It is compatible with either CRoaring or GoRoaring, and compatible with Java-Roaring if some flag is toggled. Though, it is definitely quite some burden for seemingly limited (but not inexistant) use. |
This is the comment from Roaring64Bitmap. I doubt it would chnage from one java version to another, but would chnage in the internal details of the |
Oh, OK. Then I suppose this could be easier to merge. @lemire I was not aware we were explicitely providing less guarantee on |
Please see |
@lemire It is clear to me Map-based 64-bit implementation has no clear serialization format (I'm the author of this part of the spec :D). I meant here I feel weird we do not provide any guarantee about the stability of the actual format (even if not specificed) (i.e. zero guarantee in term of retrocompatbility), given:
in |
|
Also happy to work to a format that is stable My preference would be to have something based on some metadat and a sequence of the 16 bit Containers, as that seems to be the basis for everything I am not tied to the format in this PR, so if the change was nessessary, but not sufficient, let move to something that is portable, extensible, and reasonable efficient Or have 2 formats, a native fast format with constraints, and a portable format, that is stable |
|
Any more thoughts on what to do here? |
|
We currently have one documented 64-bit format. We even have a Kaitai formal definition. https://github.com/RoaringBitmap/RoaringFormatSpec Sadly, people have been designing their own 64-bit formats left and right in various programming languages which makes interoperability difficult and it is generally difficult to rely or test any one format. I encourage everyone to adopt a common portable standard, as much as possible. |
SUMMARY
new serialisation format
Its a bit draft, and open for discussion. It compiles and tests
mostly a mechanistic code from another PR to simplify
Automated Checks
./gradlew testand made sure that my PR does not break any unit test.