-
Couldn't load subscription status.
- Fork 44
Implement new snapshotting system with support for elastic memory growth via mmap()
#277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The |
|
@mcevoypeter I don't think this is related to your changes. #255 is now hitting this too, even though it hasn't touched anything that ought to cause such errors. I tired adding a commit that removes pre-installed libs from the GitHub runner image, but it didn't help. I also tried using this GitHub workflow action, but it's not on the approved list and so caused the workflow to fail. The failures might have something to do with the outage GitHub experienced yesterday? |
|
@ashelkovnykov The |
|
Hi Peter, this is neat! I've been working on fixing the bugs we have in our current snapshot system (namely the strange page issue [we merged a PR for this] and when we crash after a guard page re-centering error [still WIP]). If you see #234 you can see that a reliable reproduction for the guard page crash is to boot a ship with I went ahead and ran the same process with a fakezod using a binary built with this branch Is this PR meant to fix this specific issue? Perhaps the issue isn't related to the snapshot system at all? I haven't yet determined the root cause of this crash in particular, but I'm also still deep in my inspection of backtraces. What do you think? |
|
@matthew-levan this PR does not address #234, but instead aims to reduce memory pressure on the host system by the runtime by |
|
The PR description previously mentioned that the guard page had been removed temporarily. That's no longer true; I reimplemented the guard page today. |
|
@mcevoypeter, I just tried running the tests on my M1 and got this: |
|
I've made the following changes after a live code review with @ashelkovnykov, @joemfb, and @philipcmonk:
|
|
@matthew-levan I suspect the issue you're seeing is related to the fixed address |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not being snarky - anywhere where the comment is phrased as a question I'm genuinely unsure and would like your opinion, since you are more familiar with Vere and C than I.
pkg/pma/pma.c
Outdated
| // CONSTANTS | ||
|
|
||
| /// Number of bits in a byte. | ||
| static_ const size_t kBitsPerByte = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat unrelated, but where does this style of "lower camel case" for some constants come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All constants in pkg/pma are prefixed with a k and written in upper camel case to distinguish them from other identifiers; if you see an identifier starting with k in upper camel case, you know it's a constant and therefore will never change, reducing the amount of information you need to keep in your head (such as "where might this value get changed?").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not K_BITS_PER_BYTE, in that case? Upper snake case is a widely used for constants and the K_ can still represent that it's defined in pkg/pma .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's purely stylistic. We can bikeshed endlessly here. The point is to clearly communicate that a given symbol represents a constant value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am strongly in favor, for what it's worth, of deciding on a single style and configuring clang-format and clang-tidy (or similar) to programmatically enforce that style. In lieu of that, which has met with resistance in the past, I'm simply using a style that I've found in my experience is most clear and straightforward in communicating the intentions of C code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also curious why you're opposed to linters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@barter-simsum Also curious what's wrong with linters / auto-formatting.
@mcevoypeter re: bike-shedding - Sure, you should consider this conversation non-blocking for the purposes of merging. Just seems odd that given a professed preference for idiomatic C to not declare system-level constants as macros, and use standard macro naming conventions for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the linter was as unobtrusive as like the linux kernel's checkpatch I wouldn't mind it much, but most are not. Note the disclaimer:
"Checkpatch is not always right. Your judgement takes precedence over checkpatch messages. If your code looks better with the violations, then its probably best left alone."
Highly opinionated linters really rose in popularity with javascript (sacred airbnb style etc). Prior to that, style guidelines were often just that, guidelines, not law. My experience with these types has been overwhelmingly negative. Code run through these things is often not more readable, just mechanistically more consistent. I also dislike the comment litter they promote if you want to disable certain rules within a region of a file.
strict characters per line, where I should and shouldn't put a comma, opinions on variable names, and var alignment, braces always or braces never, can't use 0xcafebabe (lol) literals and similar, etc etc ad nauseam -- all of this is just hell and then we hook into into the build system.
I don't care if you use a linter to review your own code. They might help find conditions to collapse, unclear conditional expressions, etc, but I definitely don't want it enshrined in the build.
And no, I don't think they reduce "bikeshedding." We just bikeshed about the linter rules instead of the code change itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No codebase can meet the style preferences of everyone working on it if there are even two developers. Even if there's only one developer, it still won't meet the style preferences of everyone reading it. The only thing that you can hope for is consistency.
Having some auto-formatter or linter removes this factor from consideration; it removes an axis on which to bikeshed. You're right that there's still one ultimate bikeshed which is "what do we establish the rules to be?". However, it's a much easier problem with which to grapple once the system is in place: just make the top-level code owner for each lib supreme styling dictator.
I don't care if you use a linter to review your own code... but I definitely don't want it enshrined in the build.
I think that we might be talking past each other here somewhat. I don't think anyone is pitching to connect a strict linter to the build workflow. What I'm hearing pitched is the inclusion of a clang-format config file in which the ultimate style rules for the component reside. This is to be used as a helper tool, so that the developer can write his code however he wants and run this tool once at the end before submitting PR. Maybe I'm the one who's misinterpreting what @mcevoypeter and @matthew-levan would like to have.
My experience with these types has been overwhelmingly negative. Code run through these things is often not more readable, just mechanistically more consistent.
We're just going to have to agree to disagree here. I found this PR very readable.
strict characters per line, where I should and shouldn't put a comma, opinions on variable names, and var alignment, braces always or braces never, can't use 0xcafebabe literals and similar, etc etc ad nauseam
We already have most of these things in u3, they're just manually enforced so contributors have to actively think about them instead of knowing that there's an auto-format tool that'll do it for them. Also, just because Rust chose stupid rules doesn't mean we have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing that you can hope for is consistency
Readability. Not consistency. the former is not context free. The latter is what linters enforce.
I don't think anyone is pitching to connect a strict linter to the build workflow
This is exactly the case in new mars now (recently). I don't think intent to integrate linting into the bazel build of vere is at all an unreasonable assumption. zorp-corp/sword#39
We're just going to have to agree to disagree here. I found this PR very readable
After some of the style changes Peter made, outside of a few extra long functions (made longer by 1 arg per line funcall formatting), I have few complaints about the general style of this PR. The below style I find slightly hard to read, but I didn't comment on it, because I really don't care. I do not, however, want such a style autistically enforced be it programmatic or informal.
if (mmap(ptr,
kPageSz,
PROT_READ,
MAP_FIXED | MAP_PRIVATE,
fd,
offset_)
== MAP_FAILED)
We already have most of these things in
u3
We have style guidelines in u3, you're free to break them if reasonable. See some of the formatting of the pointer compression pr: https://github.com/urbit/vere/pull/164/files#diff-880188529bb675cec6511e9d295a25921cd0b0f95d7c7c4c14bb2c2dbbd2d6f7R2074.
just make the top-level code owner for each lib supreme styling dictator
this encourages a proliferation of libs. We have too many Caesars already. There should just be one benevolent vere dicatator. No one claims this title afaik, could be @joemfb or @belisarius222. (Or a state/deepstate type deal like with Linus and gregkh)
My biggest overall complaint with any of these efforts, is they just get in the way of writing code.
To quote Ryan Dahl: "If you think it would be cute to align all of the equals signs in your code, if you spend time configuring your window manager or editor, if put unicode check marks in your test runner, if you add unnecessary hierarchies in your code directories, if you are doing anything beyond just solving the problem - you don't understand how fucked the whole thing is"
Please don't add any more tools that yell at me, especially for trivial syntactical reasons. Please, in code reviews, avoid commenting on style, except for truly odd choices (like the _ suffix style which was addressed). Basic consistency is easily maintained. Who is harmed by slight deviations: a mul * unwrapped in spaces, a 1-line condition without swaddling braces, a switch case that allows fall through, etc?
I won't discuss this further in github comments. It's not relevant to this pr. We can settle it on the field with pistols. @joemfb is my second. Name yours ;)
More on this-- I am running the test again with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pma code is obviously clear and well-documented. Nice work there, I was able to fairly quickly read it and understand the system (I did glean over verification of the bit arithmetic though).
First, a question on style: How does this code fit in with the rest of the codebase? The PMA is written in idiomatic C, and vere is obviously far from that. Does having both idiomatic C and c3-style C in the same codebase introduce confusion? Genuinely curious about these thoughts from those with more experience with this codebase.
On design, will you please summarize your approach vs. the demand paging implementation in 1.14? It'd be nice to understand the differences and corresponding justifications/arguments.
Lastly, it's clear that this (more or less) works as intended: a few minutes after boot, a fresh fakezod will reduce its memory footprint to a mere ~75MB, which is quite an improvement! Neat.
The PMA is in a separate directory than the other directories which use
Nice! On a similar note, I'm running my planet and star with the changes, and I saw the memory usage of both drop from ~450MB to ~150MB upon switching when running |
eedc798 to
436f900
Compare
|
Awesome thanks for elaborating @mcevoypeter. I think this work has a lot of merits. What's left? |
#293 needs to be reviewed and then merged. The changed suggested only has the potential to adversely affect the sampling profiler, which is only used in development, so it's low risk in my opinion. |
I really don't think there's anything more aggressive than It's also possible that this is too aggressive and madvising with |
This reverts commit 3f4a31e.
After our group review, I've made a few fixes: - removed the `MADV_DONTNEED` immediately after mmap in pma_load. mmap doesn't make anything resident, so it's unnecessary. - removed the redundant `PS_MAPPED_INACCESSIBLE` state and instead check the guard page address, which we store anyway. - removed `PS_UNMAPPED` now that everything is always mapped. - This reduces the number of page states to two: clean or dirty. The default state is dirty, so that if we unexpectedly fault on a page (perhaps due to bad initialization), we will crash in `_handle_page_fault`. - At the end of pma_sync, we no longer loop to mark those pages as clean. It doesn't matter what state they're in, since `_append_dirty_pages` is bounded by the actual size of the heap/stack. - Instead, we `MADV_DONTNEED` the ephemeral space. This allows us to reclaim ephemeral memory on sync. As noted in the comment, this is likely fast enough to do after every event, at least if we use `MADV_FREE`, which is lazy. For now, we use the strict `MADV_DONTNEED` to make it easier to observe its behavior. This last change fixes the issue @mopfel-winrux reported here: #277 (comment)
|
After our group review, I've made a few fixes:
This last change greatly reduces the average memory use of the process by fixing the issue @mopfel-winrux reported here: #277 (comment) I believe these are all the blocking changes we identified, but we did not finish walking through the code. @joemfb and everyone else, let's continue that soon. As a sanity benchmark, I tried refreshing groups (from a few weeks ago, before recent performance improvements), counting only 2nd and subsequent refreshes, both with and without memory pressure (running another ship on the same machine using 2.5G/3.7G, which uses 700MB swap on master). The margin of error is pretty high, but all results were between 15 and 22 seconds, with no discernable pattern (eg master under memory pressure gave both 16s and 22s). Thus, I don't believe this introduces any significant slowdown. |
After our group review, I've made a few fixes: - removed the `MADV_DONTNEED` immediately after mmap in pma_load. mmap doesn't make anything resident, so it's unnecessary. - removed the redundant `PS_MAPPED_INACCESSIBLE` state and instead check the guard page address, which we store anyway. - removed `PS_UNMAPPED` now that everything is always mapped. - This reduces the number of page states to two: clean or dirty. The default state is dirty, so that if we unexpectedly fault on a page (perhaps due to bad initialization), we will crash in `_handle_page_fault`. - At the end of pma_sync, we no longer loop to mark those pages as clean. It doesn't matter what state they're in, since `_append_dirty_pages` is bounded by the actual size of the heap/stack. - Instead, we `MADV_DONTNEED` the ephemeral space. This allows us to reclaim ephemeral memory on sync. As noted in the comment, this is likely fast enough to do after every event, at least if we use `MADV_FREE`, which is lazy. For now, we use the strict `MADV_DONTNEED` to make it easier to observe its behavior. This last change fixes the issue @mopfel-winrux reported here: #277 (comment)
|
We used to write the snapshots to .bhk every time we -- IGNORE misread implementation of |
|
When did we start doing that? I remember bhk being used exclusively for when you upgraded to v1.8 (or thereabouts), and then it got repurposed to be used every chop, which seems proper. I don't remember it happening more often than that, and I feel like you don't want it to -- if you're constantly backing up, then unless corruption is caught immediately (in which case you don't need the backup), your backup will be corrupted. |
|
misread the u3e_backup implementation. Looks like 3c13a6d added a call to origin/develop has some changes from Matt that changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes from a review of the WAL:
| strerror(errno)); | ||
| exit(ECANCELED); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed, it's not safe to unconditionally take a backup on startup (it may be corrupted).
| /// Global checksum. | ||
| uint64_t global_checksum; | ||
| /// WAL version number. | ||
| uint64_t version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version number should be the first member of the struct.
|
|
||
| // Don't include the header length in the entry count calculation. | ||
| if (meta_len > 0) { | ||
| meta_len -= sizeof(_metadata_hdr_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could underflow.
| strerror(err)); | ||
| goto fail; | ||
| } | ||
| assert(hdr.version == kWalVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need an error message on version mismatch.
| } | ||
|
|
||
| // Seek past the global checksum to the first metdata entry. | ||
| if (lseek(wal->meta_fd, sizeof(wal->checksum), SEEK_SET) == (off_t)-1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be sizeof the metadata header.
| return -1; | ||
| } | ||
|
|
||
| char page[kPageIdxSz + kPageSz]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This declaration is vestigial.
| _page_checksum(ssize_t pg_idx, const char pg[kPageSz]) | ||
| { | ||
| uint64_t pg_idx_checksum = 0; | ||
| MurmurHash3_x86_32(&pg_idx, kPageIdxSz, kSeed, &pg_idx_checksum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index can be included directly without hashing it first.
| goto fail; | ||
| } | ||
|
|
||
| if (write_all(wal->meta_fd, &wal->checksum, sizeof(wal->checksum)) == -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also write the full metadata header, referring to the struct explicitly.
|
Note to future self: @joemfb and @philipcmonk are working together on this offline. |
This PR is a forward-port / rewrite of the demand-paging implementation from v1.14 (see urbit/urbit#6063, urbit/urbit#6126, urbit/urbit#6127, and urbit/urbit#6152). The original scope has been decreased, and the implementation simplified: i/o errors are not retried, the dirty page bitmap is manipulated with much simpler code, page offsets/pointers are calculated with macros, &c. There are additional layers of snapshot validation for updates (controlled at compile time, always on as of this PR); clean pages are compared to disk both before and after update. (This validation should stay on for pre-release testing, and possibly for initial release as well.) This PR has been tested extensively on live ships; the corruption issues that plagued v1.14 cannot be reproduced. Fixes #188. Supersedes #277. (Was previously opened as #401, but a typo in the branch name was preventing updates.)
This PR replaces the existing snapshot system in pkg/noun/events.c with a conceptually similar but new implementation. This new implementation removes the need to
mmap()a large chunk (i.e. 2GB) of memory when a ship launches. Instead, it creates a file-backed memory mapping for the snapshot and then lazily maps new pages that lie outside of the snapshot when necessary. When a new snapshot is captured, all anonymous mappings are removed and the snapshot files are remapped, leading to a minimization of the Urbit runtime's memory footprint. This works draws inspiration from @joemfb's work in urbit/urbit#6063 and urbit/urbit#6152.In addition to the functionality described above, this PR also makes the snapshotting system a largely orthogonal component relative to the rest of the runtime. As much as possible, the implementation tries to be simple to read and understand. It's also unit tested. Finally, as an added benefit, SIGSEGV raised as a result of non-loom address accesses now generate a segfault as expected rather than complaining of "address out of loom" (thanks to
sigsegv_init()andsigsegv_dispatch()).Remaining tasks:
Resolves #188.
Testing
Functionality
All unit tests pass, including the new [pkg/pma/pma_tests]:
$ bazel test --config=test --build_tests_only ...Also booted comets on
macos-aarch64andlinux-x86_64using the 1.20, 1.21, and 1.22 pills and successfully sent a DM to my planet. For example, testing the 1.21 pill:Also booted a comet using vere-v1.18, exited, and then successfully ran that comet and sent a DM using the binary built from the tip of this branch:
Performance
Ran pkg/vere/benchmarks.c 20 times on both
i/188(commit SHA 67d596c) anddevelop(commit SHA dee0cef) and computed the following averages:i/188develop