Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Conversation

@zentax-dev
Copy link

@zentax-dev zentax-dev commented Jun 15, 2025

This feature allows to lock the the remaining_attempts counter.
When remaining_attempts is locked, the bootloader should not decremented and incremented the variable anymore during each boot.
It is active when a slot is marked good and inactive when a slot is marked active.
This way it prevents fallback to an earlier version, whilst inhibiting additional write cycles to the target medium.
The status can be printed out with barebox-state and rauc status.
In a prelimary talk with @ejoerns the decision was made to not add this to the D-Bus interface yet. It will be added in a future pull request.
This feature also needs to be supported by the bootloader.
So far, a patch has been handed in for barebox to support this feature, see

@zentax-dev zentax-dev requested review from ejoerns and jluebbe June 15, 2025 11:23
@zentax-dev zentax-dev added the enhancement Adds new functionality or enhanced handling to RAUC label Jun 15, 2025
@zentax-dev zentax-dev self-assigned this Jun 15, 2025
@codecov
Copy link

codecov bot commented Jun 15, 2025

Codecov Report

❌ Patch coverage is 85.65891% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.64%. Comparing base (2e390ee) to head (5108a49).

Files with missing lines Patch % Lines
src/mark.c 20.00% 12 Missing ⚠️
src/bootloaders/barebox.c 77.08% 11 Missing ⚠️
src/context.c 58.33% 5 Missing ⚠️
src/main.c 61.53% 5 Missing ⚠️
src/bootchooser.c 92.59% 2 Missing ⚠️
src/config_file.c 92.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1732      +/-   ##
==========================================
+ Coverage   84.62%   84.64%   +0.01%     
==========================================
  Files          76       76              
  Lines       22727    22982     +255     
==========================================
+ Hits        19232    19452     +220     
- Misses       3495     3530      +35     
Flag Coverage Δ
service=false 81.05% <85.65%> (+0.06%) ⬆️
service=true 84.63% <87.14%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zentax-dev zentax-dev force-pushed the lsc/boot_slot_locking branch 2 times, most recently from 8c35bbb to 65208dc Compare June 18, 2025 12:33
@zentax-dev zentax-dev changed the title Add boot slots locking Extend prevent-late-fallback by lock-counter Jun 18, 2025
@zentax-dev zentax-dev force-pushed the lsc/boot_slot_locking branch from 65208dc to cc71a48 Compare June 19, 2025 11:01
@zentax-dev zentax-dev force-pushed the lsc/boot_slot_locking branch 2 times, most recently from 71c4606 to 5c3cfc8 Compare June 19, 2025 13:32
Copy link
Member

@ejoerns ejoerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zentax-dev Thank you for the contribution!

I had a look at the code and left some comments where I think the feature still deserves some rework. I guess most are just about proper wording/naming and error handling.

You should also point out in the documentation when and why boot counter locking might be preferable to the existing 'mark bad other' solution we have.

@zentax-dev zentax-dev force-pushed the lsc/boot_slot_locking branch 6 times, most recently from aab499f to 5d18924 Compare June 20, 2025 13:13
@zentax-dev
Copy link
Author

zentax-dev commented Jun 20, 2025

Thanks for the feedback, think I got everything now. Forced pushed the changes

@zentax-dev zentax-dev force-pushed the lsc/boot_slot_locking branch from 5d18924 to aa96409 Compare June 23, 2025 09:10
@zentax-dev zentax-dev force-pushed the lsc/boot_slot_locking branch from aa96409 to 9875dd0 Compare June 23, 2025 11:04
@zentax-dev zentax-dev removed their assignment Jun 23, 2025
@zentax-dev zentax-dev force-pushed the lsc/boot_slot_locking branch 3 times, most recently from 6314e02 to aaaf0c5 Compare June 23, 2025 12:54
@js731ca
Copy link
Contributor

js731ca commented Jul 28, 2025

semantic question: what should happen here:
systemA and systemB have been updated, both in a "good" state, the system has settled, and the counter is locked.
now a rauc update would unlock the counter, and boot attempts would decrement until the system reaches rauc-mark-good again - so far so good.
but what should happen if the system has settled again, and for some reason a rauc status mark-bad is intentionally called? shouldn't that unlock the counter again? (currently it doesn't - going by what barebox-state -d says)

@zentax-dev
Copy link
Author

gave this PR + the corresponding barebox changes a spin - nice work! sth i stumbled into: rauc fails when configured with prevent-late-fallback=lock-counter, but without the barebox-state being available for $reasons

rauc[839]: Using central status file /mnt/data/rauc/central.raucs
rauc[839]: Using system config file /etc/rauc/system.conf
rauc[845]: no such variable: bootstate.attempts_locked
rauc[839]: Failed to initialize context: No content to read
systemd[1]: rauc.service: Failed with result 'exit-code'.
systemd[1]: Failed to start RAUC Update Service.

-> could this be handled more gracefully by maybe falling back to 'disabled'?

The intention behind this is to make it clear, that the system is misconfigured and not working as intended. If rauc would fall back to disabled, the service would be running, and the user probably wouldn't notice it

@js731ca
Copy link
Contributor

js731ca commented Jul 29, 2025

gave this PR + the corresponding barebox changes a spin - nice work! sth i stumbled into: rauc fails when configured with prevent-late-fallback=lock-counter, but without the barebox-state being available for $reasons

rauc[839]: Using central status file /mnt/data/rauc/central.raucs
rauc[839]: Using system config file /etc/rauc/system.conf
rauc[845]: no such variable: bootstate.attempts_locked
rauc[839]: Failed to initialize context: No content to read
systemd[1]: rauc.service: Failed with result 'exit-code'.
systemd[1]: Failed to start RAUC Update Service.

-> could this be handled more gracefully by maybe falling back to 'disabled'?

The intention behind this is to make it clear, that the system is misconfigured and not working as intended. If rauc would fall back to disabled, the service would be running, and the user probably wouldn't notice it

then i have a corner case:
a system that uses A/B boot-partitions and happens to fall back to an old bootloader that hasn't that state variable could disable/sabotage a updated rauc, that expects this variable

src/main.c Outdated
g_string_append_printf(text, "Variant: %s\n", status->variant);
g_string_append_printf(text, "Booted from: %s (%s)\n\n", bootedfrom ? bootedfrom->name : NULL, status->bootslot);
g_string_append_printf(text, "Booted from: %s (%s)\n", bootedfrom ? bootedfrom->name : NULL, status->bootslot);
g_string_append_printf(text, "Counter lock: %s\n\n", status->lock_counter ? "enabled" : "disabled");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
g_string_append_printf(text, "Counter lock: %s\n\n", status->lock_counter ? "enabled" : "disabled");
// g_string_append_printf(text, "Counter lock: %s\n\n", status->lock_counter ? "enabled" : "disabled");

a rauc status depends on retrieve_status_via_dbus - which isn't implemented yet
so the status would always say "disabled" - which is misleading
(and mis-lead me to debug into the code... sth we could save others from ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's misleading and should not be printed. As it works for the non-service case, however, using

-       g_string_append_printf(text, "Counter lock: %s\n\n", status->lock_counter ? "enabled" : "disabled");
+       if (!ENABLE_SERVICE) {
+               g_string_append_printf(text, "Counter lock: %s\n\n", status->lock_counter ? "enabled" : "disabled");
+       }

should work until the D-Bus part is implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And of course, this should only be printed when counter locking is enabled!

Copy link
Author

@zentax-dev zentax-dev Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r_status_formatter_readable is called via print_status -> status_start
And there print_status is just called if !ENABLE_SERVICE.
So I thought this is not necessary here.
Or is there another way to call it, which I didn't see?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running rauc status needs to be possible for both the 'service' and the 'non-service' case.
In the code, print_status is called independently of ENABLE_SERVICE!
The status_print information is filled by either retrieve_status_via_dbus() (service case) or by filling in the information based on config and status (non-service case).

@zentax-dev
Copy link
Author

then i have a corner case: a system that uses A/B boot-partitions and happens to fall back to an old bootloader that hasn't that state variable could disable/sabotage a updated rauc, that expects this variable

In a system where you have A/B boot slots and A/B root slots, that should not be an issue.
As both slots should then reset to the other state (as far as they are properly grouped).

But I think you mean the case where you have A/B boot slots, but only a single root slot?

@js731ca
Copy link
Contributor

js731ca commented Aug 9, 2025

But I think you mean the case where you have A/B boot slots, but only a single root slot?

when everything is A/B (bootloader, fitimage/kernel, rootfs); during an update rauc only installs a new bootloader and does an atomic operation (to switch the emmc boot-part select bit). Note that neither rauc nor barebox flip that switch back when one slot depletes attempts.
The "problem"/cornercase we would run into if the boot-slot is switched back externally - and happens to be out-of-sync with the proper grouping

@js731ca
Copy link
Contributor

js731ca commented Aug 11, 2025

@ejoerns would be nice if you could review this again

and @a3f : gentle(?) ping :-)
the barebox parts where merged into master - but since both parts are involved: the more reviewers the better

@zentax-dev zentax-dev force-pushed the lsc/boot_slot_locking branch from 20120bc to b7fe88d Compare August 19, 2025 07:20
@zentax-dev
Copy link
Author

@ejoerns would be nice if you could review this again

and @a3f : gentle(?) ping :-) the barebox parts where merged into master - but since both parts are involved: the more reviewers the better

Just a quick heads up: I've briefly talked to @ejoerns about this and he will look into it.
As far as I know @a3f was already alright with everything, but I'll see him the next days, so I'll have a chat with him :)

@zentax-dev zentax-dev force-pushed the lsc/boot_slot_locking branch from b7fe88d to 7d10f59 Compare August 26, 2025 14:02
@ejoerns
Copy link
Member

ejoerns commented Aug 27, 2025

But I think you mean the case where you have A/B boot slots, but only a single root slot?

when everything is A/B (bootloader, fitimage/kernel, rootfs); during an update rauc only installs a new bootloader and does an atomic operation (to switch the emmc boot-part select bit). Note that neither rauc nor barebox flip that switch back when one slot depletes attempts. The "problem"/cornercase we would run into if the boot-slot is switched back externally - and happens to be out-of-sync with the proper grouping

Is that an actual use case or a theoretical consideration?

For most cases, the update of the bootloader is considered atomic, but without any fallback.

A proper fallback would require an entity before the bootloader that can decide that a fallback is necessary.

A design that actually expects the bootloader to be part of the A/B setup would also need to ensure that the bootloader boots the corresponding slot. Thus, if there is actually a fallback, the system with the older RAUC version should be booted instead. This sounds like what #1750 targets. For good reasons, the boot partition switching is not performed by RAUC here (but by the bootloader itself instead).

@ejoerns
Copy link
Member

ejoerns commented Aug 27, 2025

gave this PR + the corresponding barebox changes a spin - nice work! sth i stumbled into: rauc fails when configured with prevent-late-fallback=lock-counter, but without the barebox-state being available for $reasons

rauc[839]: Using central status file /mnt/data/rauc/central.raucs
rauc[839]: Using system config file /etc/rauc/system.conf
rauc[845]: no such variable: bootstate.attempts_locked
rauc[839]: Failed to initialize context: No content to read
systemd[1]: rauc.service: Failed with result 'exit-code'.
systemd[1]: Failed to start RAUC Update Service.

-> could this be handled more gracefully by maybe falling back to 'disabled'?

The intention behind this is to make it clear, that the system is misconfigured and not working as intended. If rauc would fall back to disabled, the service would be running, and the user probably wouldn't notice it

At least the error checking and reporting could be improved here, since the message actually does not give a good indication about what's going wrong. The part that could be addressed in this PR is a change in src/context.c to add something like

                        if (!r_barebox_get_lock_counter(&locked, &ierror)) {
-                               g_propagate_error(error, ierror);
+                               g_propagate_prefixed_error(error, ierror, "Failed to read barebox lock counter: ");
                                return FALSE;

The other part is that output parsing and error code checking for barebox-state happen in an unoptimal order currently. But that's the case for existing methods, too.

@ejoerns
Copy link
Member

ejoerns commented Aug 28, 2025

The other part is that output parsing and error code checking for barebox-state happen in an unoptimal order currently. But that's the case for existing methods, too.

For this, I've created #1781 now.

{
g_return_val_if_fail(mode != NULL, FALSE);

if (!mode_str || g_strcmp0(mode_str, "disable") == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can mode_str be NULL? From the calling code, it should at least be ''.

return res;
}

static gboolean parse_late_fallback_mode(const gchar *mode_str, RConfigLateFallback *mode, GError **error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the corresponding commit message:
Better use 'lock-counter' in the title (than just 'lock').

Despite the text suggesting that locking is global, the following text refers to "counter of the slot", which seems confusing.

Since the commit also does not introduce the feature, it might make sense to better point out that only the parsing of the option is implemented.

g_string_append_printf(text, "Compatible: %s\n", status->compatible);
g_string_append_printf(text, "Variant: %s\n", status->variant);
g_string_append_printf(text, "Booted from: %s (%s)\n\n", bootedfrom ? bootedfrom->name : NULL, status->bootslot);
g_string_append_printf(text, "Booted from: %s (%s)\n", bootedfrom ? bootedfrom->name : NULL, status->bootslot);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the commit message:

    src/main: add prevent-late-fallback to userspace tools

Since RAUC only runs in 'userspace', I'd leave that term out. The commit also doesn't add 'prevent-late-fallback', it just adds output for counter locking. Essentially, you add counter locking information to rauc status output.

    This adds the option to print the current state of prevent-late-fallback
    and also to enable and disable sets the lock-counter via userspace.

The sentence looks inconsistent. Also, the corresponding commit does not add any functionallity to enable/disable/set something.

    The possibility to get/set it via D-Bus is not yet
    implemented and will be added separately.

src/main.c Outdated
g_string_append_printf(text, "Variant: %s\n", status->variant);
g_string_append_printf(text, "Booted from: %s (%s)\n\n", bootedfrom ? bootedfrom->name : NULL, status->bootslot);
g_string_append_printf(text, "Booted from: %s (%s)\n", bootedfrom ? bootedfrom->name : NULL, status->bootslot);
g_string_append_printf(text, "Counter lock: %s\n\n", status->lock_counter ? "enabled" : "disabled");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And of course, this should only be printed when counter locking is enabled!

GError *error = NULL;
gboolean res;

// TODO[lsc]: add test here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably still open?

@zentax-dev zentax-dev force-pushed the lsc/boot_slot_locking branch 3 times, most recently from b680ec5 to a9739c3 Compare August 29, 2025 15:12
@js731ca
Copy link
Contributor

js731ca commented Aug 29, 2025

But I think you mean the case where you have A/B boot slots, but only a single root slot?

when everything is A/B (bootloader, fitimage/kernel, rootfs); during an update rauc only installs a new bootloader and does an atomic operation (to switch the emmc boot-part select bit). Note that neither rauc nor barebox flip that switch back when one slot depletes attempts. The "problem"/cornercase we would run into if the boot-slot is switched back externally - and happens to be out-of-sync with the proper grouping

Is that an actual use case or a theoretical consideration?

depends: if we want to cover transition states yes, otherwise no
transition state would be when coming from an old OS with a rauc version "without this feature", updating to a new one "with this feature". downgrading for $reasons or falling back because of depleted boot-attempts.
then a sane fallback (to e.g. 'disabled') that keeps rauc operational would IMHO be good.

@a3f
Copy link
Contributor

a3f commented Aug 30, 2025

We can't necessarily expect that an older bundle is never installed.
FOTA is not the only one way to do an update, USB recovery (with a bundle that often lags behind the online version) is a possibility in my project. The problematic sequence could look as follows:

  • FOTA installation of new RAUC bundle with this feature here
  • System is partially broken: boot-complete is reached, but it's not possible to do further FOTA
  • USB recovery stick is inserted, update to old system: booth rootfs and bootloader are at old state now
  • Recovered system fails to start up
  • As old system doesn't have support for switching eMMC boot partition on attempt depletion, we end up booting the new rootfs with the old bootloader
  • RAUC refuses to startup: online USB recovery no longer possible

With some more thought put into this contrived example, one might be able to arrive at a simpler sequence of events.

With a recovery mode implemented directly into the bootloader we side step this particular issue, but it doesn't sit well with me that there are sequences that have the potential for RAUC to fail to startup not because of prior misconfiguration, but because of the bootloader version in the field. I'd rather have RAUC warn very loudly about this, but then do some fallback behavior, either set the attempts to a high number or just ignore the locking.

Interested in your thoughts.

@zentax-dev zentax-dev force-pushed the lsc/boot_slot_locking branch from a9739c3 to 71db5f6 Compare September 9, 2025 10:55
@zentax-dev
Copy link
Author

We can't necessarily expect that an older bundle is never installed. FOTA is not the only one way to do an update, USB recovery (with a bundle that often lags behind the online version) is a possibility in my project. The problematic sequence could look as follows:

* FOTA installation of new RAUC bundle with this feature here

* System is partially broken: boot-complete is reached, but it's not possible to do further FOTA

* USB recovery stick is inserted, update to old system: booth rootfs and bootloader are at old state now

* Recovered system fails to start up

* As old system doesn't have support for switching eMMC boot partition on attempt depletion, we end up booting the new rootfs with the old bootloader

* RAUC refuses to startup: online USB recovery no longer possible

With some more thought put into this contrived example, one might be able to arrive at a simpler sequence of events.

With a recovery mode implemented directly into the bootloader we side step this particular issue, but it doesn't sit well with me that there are sequences that have the potential for RAUC to fail to startup not because of prior misconfiguration, but because of the bootloader version in the field. I'd rather have RAUC warn very loudly about this, but then do some fallback behavior, either set the attempts to a high number or just ignore the locking.

Interested in your thoughts.

We've had some discussions about this and I've changed the error for mark_good and and mark_active to warnings and also added some information about it in the documentation,
Also rebased onto current master

src/context.c Outdated
Comment on lines 421 to 422
g_propagate_prefixed_error(error, ierror, "Failed to read barebox lock counter: ");
return FALSE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
g_propagate_prefixed_error(error, ierror, "Failed to read barebox lock counter: ");
return FALSE;
/* If we would throw an error here, RAUC would fail and it might not be possible to execute updates anymore. */
g_warning("Failed to read barebox lock counter: ");
g_clear_error(&ierror);

this should also be a "warn very loudly, but proceed" in the case where the state variable is not present

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint, I've changed this and added a test for it

Lars Schmidt added 7 commits October 21, 2025 12:29
In general the new option for prevent late fallback adds the possibility
to lock the attempts counter.

As first step towards this, the parser is extended to support the new value.

Signed-off-by: Lars Schmidt <[email protected]>
Forward the counter locking option to the barebox bootloader.
So the barebox can then stop decrementing the remaining_attempts
counter.
It inhibits fall-back to a previous version of the system, which
can happen if a system is rebootet too frequently before a slot
is marked good again and the remaining_attempts counter is
incrememented. As a side effect, it inhibits excessive write
cycles on the storage medium.

This also needs changes to barebox, see [1].

[1] https://lists.infradead.org/pipermail/barebox/2025-June/051393.html

Signed-off-by: Lars Schmidt <[email protected]>
The setting is currently only supported by barebox.

Signed-off-by: Lars Schmidt <[email protected]>
When activated, the attempts counter of the slot that is marked good will
be locked and not decrease and increase anymore.
It will be unlocked again, when the slot is marked active.

Signed-off-by: Lars Schmidt <[email protected]>
This adds the option to print the current state of prevent-late-fallback.
The possibility to get/set it via D-Bus is not yet
implemented and will be added separately.

Signed-off-by: Lars Schmidt <[email protected]>
It is best to show problems with inconsistent configuration early.
When counter locking is enabled in rauc, it must also be set in the
bootloader.

Signed-off-by: Lars Schmidt <[email protected]>
@zentax-dev zentax-dev force-pushed the lsc/boot_slot_locking branch from 71db5f6 to 5108a49 Compare October 21, 2025 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Adds new functionality or enhanced handling to RAUC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants