-
Notifications
You must be signed in to change notification settings - Fork 578
Remove LighLowContainer, Containers. Make LeafNode refer to a COntainer, rather than indexing via another component #803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
remove index from ContainerWithIndex (so it just the holder)
|
This is a massive PR. Difficult to review. Before doing anything of the sort, please make sure you have good benchmark results. Changing thousands of lines of code is not something that should be done lightly. |
Thats why I structured the commits to enable to review in bits. The first 5 commits are easy to review small enablers, but removing a data structure in the middle of the implementation had knock on effects. THe 6th commit is bit though
I did try to discuss this as a design issue, but all I go was "you need benchmarks". No design discussion As I said in the summary I will add benchmarks shortly. Benchmarks don't run on code concept, they require an implementation. I offered to benchmark a slice but that was rejected |
|
@mkeskells You need to sync with the main branch. @blacelle @richardstartin and others: would you chime in ?
I have a general bias against far reaching pull requests. My bias is not unusual nor is it controversial. Please read again what I wrote: This is a massive PR. Difficult to review. Before doing anything of the sort, please make sure you have good benchmark results. Changing thousands of lines of code is not something that should be done lightly. Let us go back to how you justify it: It speed up access, by removing a layer if indirection on access to most operation These are two objective and important statements. I am asking you to document and justify these claims. |
|
MArking as Draft as I will not have time on this PR - to sync up, tidy up or benchmark for at least this week |
|
The majority of the new code, other than tests is for the change to the serialisation format (I think. Haven't counted then though) That change is required by this PR but could be implemented against the current main branch. Its not seperate entirely, but mostly Thus would make the serialized form smaller, and less tied to the internal memory structure than the current serialised form. The change is largely orthogonal, but dependent It would also make the serialised form deterministic based on the content, which it isn't currently, as the memory leaks in Containers are reflected in the serialised form, and dehydrated form The numbers are easier to review as well (serialised size, time, garbage). Retained size should be unaffected |
|
Changing the serialization format certainly adds friction to this PR. Do you confirm this PR does break binary compatibility ? Or does it change the binary output because of the optimizations, while keeping compatibility? #598 refers to some matters around Roaring64 serialization format. (It is mostly focused on Map-based, but it refers to ART ones too). |
The new, or another format that just serialised the required data could be made. I haven't looked at the NavigableMap based format, but will have a look The format that I proposed here (but am not too fussed about) retains the Trie structure, and therefore is implementation dependent, cos we could change the tree. It doesn't contain the internal fields of the implementing, so just the node with prefix, and count of children, and key/child array
I would imagine that having a portable format would be useful, but also a fast format has some attractions. Its not black and white. Maybe there is a need for both. Is there a specific view if what the format should be? |
|
copied the serialisation format to #805 WIll check it in more details later, but sharing for the conversation, but its very draft |
Sorry I thought this was obvious. I have mentioned this memory leak many times in our conversations, and I presume this was well know and accepted. Every time that a LafNode is removed (e.g. empty) the Containers null the reference bu doesnt recliam or reuse that slot In terms of the access, it can be seem that most operation that need to do any bitmap operation operation traverses the graph of obects Roaring64Bitmap -> HighLowContainer -> Art -> Node .... LeafNode -> [then to access the container] highLowContainer -> Containers -> array -> Container This can be shortned to Roaring64Bitmap -> Art -> Node .... LeafNode -> Container |
I do not consider this a memory leak. If we were to consider that it is a memory leak, then you'd have to consider the standard HashMap in Java to be plagued by memory leaks. The HashMap implementation does not include any mechanism to reduce the capacity of the table when keys are removed. The resize() method is designed to either initialize the table or increase its size (typically doubling it). There is no code path that creates a smaller table when the number of entries decreases. This does not mean that the current design cannot be improved, of course.
I get that your claim is that you can remove an indirection and thus improve the performance. However, if that is your claim, then you should be able to support it with benchmark results. |
IMHO You have a very narrow definition of a memory leak, lets look a a "accepted" definition of a memory leak - https://en.wikipedia.org/wiki/Memory_leak "In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in a way that memory which is no longer needed is not released" That pretty much exactly match the situation described. Its also not an equivalent comparison. Repeated addition and removal for HashMap doesn't grow memory use (it caps it to the maximum size). Repeated additional and removal for this structure leak more memory. Serialisation and deserialization clean the memory for a HashMap, it doesn't for this structure.
As I said in my initial post, and subsequently, I will provide benchmarks (I agree this is more that a few days), when I have time, and access to a machine which is stable. Laptops are not suitable for CPU comparison due to thermal management and other issues, and this is not my day job, Its not a focus as this is not a library that I will be using as I have a smaller faster closed source direct mapped implementation that meets my much narrower needs |
I did not analyzed the potential memory-leak. @mkeskells Could you provide a unit-test demonstrating the issue, so we can agree current behavior is acceptable or not (independantly of any design analysis). (Similarly to benchmarks on the performance aspect, we're just expecting factual elements before moving forwards on such evolutions). Such a unit-test would also prevent future regressions (standard software practise). |
I can provide a example. Not a unit test (as there isnt a pass/fail) but some code that demmonstrates one of the causes |
|
Anything reproducing the behavior would be useful. Thanks |
|
Some performance results Some of the results are a little out of step so some more digging may be needed (e.g. addEachMising(10k, 10k, true) but generally positive For each case ther eis a summary, csv, and full text in the gists below
as a comparison of the effect of the PR
|
Here is a simple example where a bitmap containing one bit consumes all heap import org.roaringbitmap.art.Containers;
import org.roaringbitmap.longlong.HighLowContainer;
import org.roaringbitmap.longlong.Roaring64Bitmap;
import java.lang.reflect.Field;
public class HighLowMemory {
public static void main(String[] args) throws Exception {
new HighLowMemory().run();
}
private final Field highLow;
private final Field containers;
private final Field containers_containerSize;
private final Field containers_firstLevelIndex;
private final Field containers_secondLevelIndex;
HighLowMemory() throws Exception {
highLow = Roaring64Bitmap.class.getDeclaredField("highLowContainer");
highLow.setAccessible(true);
containers = HighLowContainer.class.getDeclaredField("containers");
containers.setAccessible(true);
containers_containerSize = Containers.class.getDeclaredField("containerSize");
containers_containerSize.setAccessible(true);
containers_firstLevelIndex = Containers.class.getDeclaredField("firstLevelIdx");
containers_firstLevelIndex.setAccessible(true);
containers_secondLevelIndex = Containers.class.getDeclaredField("secondLevelIdx");
containers_secondLevelIndex.setAccessible(true);
}
public void run() throws Exception {
int count = 10000;
Roaring64Bitmap rb = new Roaring64Bitmap();
for (long i = 0; i < 100; i++) {
for (long j = 0; j < 10000000; j++) {
rb.add(j << 16);
rb.remove((j - 1) << 16);
}
System.out.println("After cycle " + i + " size: " + rb.getLongCardinality());
System.out.println("slots used " + getSlotsUsed(rb));
}
}
private String getSlotsUsed(Roaring64Bitmap rb) throws Exception {
Containers container = (Containers)containers.get(highLow.get(rb));
return "containerSize=" + containers_containerSize.getLong(container) +
", firstLevelIdx=" + containers_firstLevelIndex.getInt(container) +
", secondLevelIdx=" + containers_secondLevelIndex.getInt(container);
}
}produces After cycle 0 size: 1
slots used containerSize=1, firstLevelIdx=0, secondLevelIdx=9999999
After cycle 1 size: 1
slots used containerSize=1, firstLevelIdx=0, secondLevelIdx=19999998
After cycle 2 size: 1
slots used containerSize=1, firstLevelIdx=0, secondLevelIdx=29999997
After cycle 3 size: 1
...
slots used containerSize=1, firstLevelIdx=0, secondLevelIdx=439999956
After cycle 44 size: 1
slots used containerSize=1, firstLevelIdx=0, secondLevelIdx=449999955
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.lang.reflect.Array.newArray(Native Method)
at java.base/java.lang.reflect.Array.newInstance(Array.java:78)
at java.base/java.util.Arrays.copyOf(Arrays.java:3721)
at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
at org.roaringbitmap.art.Containers.grow(Containers.java:184)
at org.roaringbitmap.art.Containers.addContainer(Containers.java:107)
at org.roaringbitmap.longlong.HighLowContainer.put(HighLowContainer.java:78)
at org.roaringbitmap.longlong.Roaring64Bitmap.addLong(Roaring64Bitmap.java:67)
at org.roaringbitmap.longlong.Roaring64Bitmap.add(Roaring64Bitmap.java:1028)
at HighLowMemory.run(HighLowMemory.java:37)
at HighLowMemory.main(HighLowMemory.java:9) |
After looking at the inconsistent results, it seems that approx 1 time in 20 to one of the benchmarks seems out of line with the norm. I expect the JIT made a different choice that was good/bad. |
Did anyone have a chance to look at this? |
| 1 + //runOptimized | ||
| 4 + //bitDepth | ||
| this.ebM.serializedSizeInBytes(); | ||
| if (size <= Integer.MAX_VALUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to also rule out the case where size becomes negative.
SUMMARY
Simlify and streamline the access to a Container, having it held directtly by a LeafNode, rather than a reference to the container in another structure
This also removes the inherent unbounded memory leaks in HighLowContainer, and the retained memory (EDIT example in the comments)
It speed up access, by removing a layer if indirection on access to most operation (Foreach is probably the only operation that isnt faster, and may be slightly slower)
As serialisation was bound to this structure there was a rewrite of the serialisation & deserialisation structure and code
The format of the serialised form changes (its smaller)
The serialised form address a version number, and old formats wit fail with a message to indicate the format isnt correct. This also allows for future version aif format changes again
For the Serialisation and deserialisation The code was moved closer to the abstraction being handled, except for the cases of the Containers. I didnt want to chnage the scope of this PR to change the 32 bit structure and code
I think that some code duplication could be removed in a followup as the
DataInput/DataOutputandByteBuffercode are similar, and could be unified, howerev that would IMO be too much for the scop of this PREDIT
Benchmarks and raw results are inclded now as a comment
I have added test for the serialisation format
Automated Checks
./gradlew testand made sure that my PR does not break any unit test.At this is a larger PR I have structure then commits the allow review of some of the enabling chnages prior to the bulk of the change. Each commit was initially tested against the unit tests in the project