-
Notifications
You must be signed in to change notification settings - Fork 242
Extend prevent-late-fallback by lock-counter #1732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1732 +/- ##
==========================================
+ Coverage 84.62% 84.64% +0.01%
==========================================
Files 76 76
Lines 22727 22982 +255
==========================================
+ Hits 19232 19452 +220
- Misses 3495 3530 +35
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8c35bbb to
65208dc
Compare
65208dc to
cc71a48
Compare
71c4606 to
5c3cfc8
Compare
ejoerns
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zentax-dev Thank you for the contribution!
I had a look at the code and left some comments where I think the feature still deserves some rework. I guess most are just about proper wording/naming and error handling.
You should also point out in the documentation when and why boot counter locking might be preferable to the existing 'mark bad other' solution we have.
aab499f to
5d18924
Compare
|
Thanks for the feedback, think I got everything now. Forced pushed the changes |
5d18924 to
aa96409
Compare
aa96409 to
9875dd0
Compare
6314e02 to
aaaf0c5
Compare
|
semantic question: what should happen here: |
The intention behind this is to make it clear, that the system is misconfigured and not working as intended. If rauc would fall back to disabled, the service would be running, and the user probably wouldn't notice it |
then i have a corner case: |
src/main.c
Outdated
| g_string_append_printf(text, "Variant: %s\n", status->variant); | ||
| g_string_append_printf(text, "Booted from: %s (%s)\n\n", bootedfrom ? bootedfrom->name : NULL, status->bootslot); | ||
| g_string_append_printf(text, "Booted from: %s (%s)\n", bootedfrom ? bootedfrom->name : NULL, status->bootslot); | ||
| g_string_append_printf(text, "Counter lock: %s\n\n", status->lock_counter ? "enabled" : "disabled"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| g_string_append_printf(text, "Counter lock: %s\n\n", status->lock_counter ? "enabled" : "disabled"); | |
| // g_string_append_printf(text, "Counter lock: %s\n\n", status->lock_counter ? "enabled" : "disabled"); |
a rauc status depends on retrieve_status_via_dbus - which isn't implemented yet
so the status would always say "disabled" - which is misleading
(and mis-lead me to debug into the code... sth we could save others from ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's misleading and should not be printed. As it works for the non-service case, however, using
- g_string_append_printf(text, "Counter lock: %s\n\n", status->lock_counter ? "enabled" : "disabled");
+ if (!ENABLE_SERVICE) {
+ g_string_append_printf(text, "Counter lock: %s\n\n", status->lock_counter ? "enabled" : "disabled");
+ }should work until the D-Bus part is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And of course, this should only be printed when counter locking is enabled!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r_status_formatter_readable is called via print_status -> status_start
And there print_status is just called if !ENABLE_SERVICE.
So I thought this is not necessary here.
Or is there another way to call it, which I didn't see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running rauc status needs to be possible for both the 'service' and the 'non-service' case.
In the code, print_status is called independently of ENABLE_SERVICE!
The status_print information is filled by either retrieve_status_via_dbus() (service case) or by filling in the information based on config and status (non-service case).
In a system where you have A/B boot slots and A/B root slots, that should not be an issue. But I think you mean the case where you have A/B boot slots, but only a single root slot? |
when everything is A/B (bootloader, fitimage/kernel, rootfs); during an update rauc only installs a new bootloader and does an atomic operation (to switch the emmc boot-part select bit). Note that neither rauc nor barebox flip that switch back when one slot depletes attempts. |
20120bc to
b7fe88d
Compare
Just a quick heads up: I've briefly talked to @ejoerns about this and he will look into it. |
b7fe88d to
7d10f59
Compare
Is that an actual use case or a theoretical consideration? For most cases, the update of the bootloader is considered atomic, but without any fallback. A proper fallback would require an entity before the bootloader that can decide that a fallback is necessary. A design that actually expects the bootloader to be part of the A/B setup would also need to ensure that the bootloader boots the corresponding slot. Thus, if there is actually a fallback, the system with the older RAUC version should be booted instead. This sounds like what #1750 targets. For good reasons, the boot partition switching is not performed by RAUC here (but by the bootloader itself instead). |
At least the error checking and reporting could be improved here, since the message actually does not give a good indication about what's going wrong. The part that could be addressed in this PR is a change in if (!r_barebox_get_lock_counter(&locked, &ierror)) {
- g_propagate_error(error, ierror);
+ g_propagate_prefixed_error(error, ierror, "Failed to read barebox lock counter: ");
return FALSE;The other part is that output parsing and error code checking for barebox-state happen in an unoptimal order currently. But that's the case for existing methods, too. |
For this, I've created #1781 now. |
src/config_file.c
Outdated
| { | ||
| g_return_val_if_fail(mode != NULL, FALSE); | ||
|
|
||
| if (!mode_str || g_strcmp0(mode_str, "disable") == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can mode_str be NULL? From the calling code, it should at least be ''.
| return res; | ||
| } | ||
|
|
||
| static gboolean parse_late_fallback_mode(const gchar *mode_str, RConfigLateFallback *mode, GError **error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the corresponding commit message:
Better use 'lock-counter' in the title (than just 'lock').
Despite the text suggesting that locking is global, the following text refers to "counter of the slot", which seems confusing.
Since the commit also does not introduce the feature, it might make sense to better point out that only the parsing of the option is implemented.
| g_string_append_printf(text, "Compatible: %s\n", status->compatible); | ||
| g_string_append_printf(text, "Variant: %s\n", status->variant); | ||
| g_string_append_printf(text, "Booted from: %s (%s)\n\n", bootedfrom ? bootedfrom->name : NULL, status->bootslot); | ||
| g_string_append_printf(text, "Booted from: %s (%s)\n", bootedfrom ? bootedfrom->name : NULL, status->bootslot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the commit message:
src/main: add prevent-late-fallback to userspace tools
Since RAUC only runs in 'userspace', I'd leave that term out. The commit also doesn't add 'prevent-late-fallback', it just adds output for counter locking. Essentially, you add counter locking information to rauc status output.
This adds the option to print the current state of prevent-late-fallback
and also to enable and disable sets the lock-counter via userspace.
The sentence looks inconsistent. Also, the corresponding commit does not add any functionallity to enable/disable/set something.
The possibility to get/set it via D-Bus is not yet
implemented and will be added separately.
src/main.c
Outdated
| g_string_append_printf(text, "Variant: %s\n", status->variant); | ||
| g_string_append_printf(text, "Booted from: %s (%s)\n\n", bootedfrom ? bootedfrom->name : NULL, status->bootslot); | ||
| g_string_append_printf(text, "Booted from: %s (%s)\n", bootedfrom ? bootedfrom->name : NULL, status->bootslot); | ||
| g_string_append_printf(text, "Counter lock: %s\n\n", status->lock_counter ? "enabled" : "disabled"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And of course, this should only be printed when counter locking is enabled!
test/bootchooser.c
Outdated
| GError *error = NULL; | ||
| gboolean res; | ||
|
|
||
| // TODO[lsc]: add test here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably still open?
b680ec5 to
a9739c3
Compare
depends: if we want to cover transition states yes, otherwise no |
|
We can't necessarily expect that an older bundle is never installed.
With some more thought put into this contrived example, one might be able to arrive at a simpler sequence of events. With a recovery mode implemented directly into the bootloader we side step this particular issue, but it doesn't sit well with me that there are sequences that have the potential for RAUC to fail to startup not because of prior misconfiguration, but because of the bootloader version in the field. I'd rather have RAUC warn very loudly about this, but then do some fallback behavior, either set the attempts to a high number or just ignore the locking. Interested in your thoughts. |
a9739c3 to
71db5f6
Compare
We've had some discussions about this and I've changed the error for |
src/context.c
Outdated
| g_propagate_prefixed_error(error, ierror, "Failed to read barebox lock counter: "); | ||
| return FALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| g_propagate_prefixed_error(error, ierror, "Failed to read barebox lock counter: "); | |
| return FALSE; | |
| /* If we would throw an error here, RAUC would fail and it might not be possible to execute updates anymore. */ | |
| g_warning("Failed to read barebox lock counter: "); | |
| g_clear_error(&ierror); |
this should also be a "warn very loudly, but proceed" in the case where the state variable is not present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hint, I've changed this and added a test for it
In general the new option for prevent late fallback adds the possibility to lock the attempts counter. As first step towards this, the parser is extended to support the new value. Signed-off-by: Lars Schmidt <[email protected]>
Forward the counter locking option to the barebox bootloader. So the barebox can then stop decrementing the remaining_attempts counter. It inhibits fall-back to a previous version of the system, which can happen if a system is rebootet too frequently before a slot is marked good again and the remaining_attempts counter is incrememented. As a side effect, it inhibits excessive write cycles on the storage medium. This also needs changes to barebox, see [1]. [1] https://lists.infradead.org/pipermail/barebox/2025-June/051393.html Signed-off-by: Lars Schmidt <[email protected]>
The setting is currently only supported by barebox. Signed-off-by: Lars Schmidt <[email protected]>
When activated, the attempts counter of the slot that is marked good will be locked and not decrease and increase anymore. It will be unlocked again, when the slot is marked active. Signed-off-by: Lars Schmidt <[email protected]>
This adds the option to print the current state of prevent-late-fallback. The possibility to get/set it via D-Bus is not yet implemented and will be added separately. Signed-off-by: Lars Schmidt <[email protected]>
It is best to show problems with inconsistent configuration early. When counter locking is enabled in rauc, it must also be set in the bootloader. Signed-off-by: Lars Schmidt <[email protected]>
Signed-off-by: Lars Schmidt <[email protected]>
71db5f6 to
5108a49
Compare
This feature allows to lock the the
remaining_attemptscounter.When
remaining_attemptsis locked, the bootloader should not decremented and incremented the variable anymore during each boot.It is active when a slot is marked good and inactive when a slot is marked active.
This way it prevents fallback to an earlier version, whilst inhibiting additional write cycles to the target medium.
The status can be printed out with
barebox-stateandrauc status.In a prelimary talk with @ejoerns the decision was made to not add this to the D-Bus interface yet. It will be added in a future pull request.
This feature also needs to be supported by the bootloader.
So far, a patch has been handed in for barebox to support this feature, see