Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@gc00
Copy link
Contributor

@gc00 gc00 commented Oct 11, 2025

This is a draft PR -- still a work in progress. But we can support full forked ckpt (including shared memory), with only a little extra code.

Support shared memory for forked ckpt

  • mtcp_writememoryareas(int fd, int type):
    type = AREA_SHARED, AREA_NONSHARED, AREA_DONE, AREA_ALL
    Then write AREA_SHARED, then fork, and then AREA_NONSHARED.
    This assumes AREA_SHARED is smaller, and will not delay the parent,
    when writing this small amount of memory to the checkpoint image file.

ALSO: *.dmtcp.temp to *.dmtp only when done, and unlink previous ckpt file when we begin a new ckpt.

  • This is to make sure that a user doesn't accidentally see the previous checkpoint image and think that the new one is done.
    But I should revise this. Instead of unlinking the previous ckpt file, I should rename it to *.dmtcp.old, and then unlink it only when the new ckpt image file has been created.

TODO:

  • Verify that we are not doing any new mmap (e.g., JTRACE) in multiple calls to mtcp_writememoryareas()
  • Improve shared-memory1 and shared-memory2 to verify that the ckpt image doesn't include new writes by the parent. Then manually test that, after configuring for --enable-forked-checkpointing.
  • Maybe also have coordinator rename restart_script.sh to restart_script.sh.old and then unlink it when the ckpts are done?
    But this code would be ugly. So, maybe just let the user beware, and they can look for *.dmtcp.old instead.

@gc00 gc00 requested review from karya0 and xuyao0127 October 11, 2025 19:42
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 11, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gc00 gc00 force-pushed the forked-ckpt-improved branch 2 times, most recently from c13227f to 3e88117 Compare October 12, 2025 08:25
gc00 added 3 commits October 12, 2025 04:27
 * mtcp_writememoryareas(int fd, int type):
     type = AREA_SHARED, AREA_NONSHARED, AREA_END, AREA_ALL
   Then write AREA_SHARED, then fork, and then AREA_NONSHARED.
   This assumes AREA_SHARED is smaller, and will not delay the parent,
     when writing this small amount of memory to the checkpoint image file.
@gc00 gc00 force-pushed the forked-ckpt-improved branch 3 times, most recently from 5f889a5 to 8ea8fa6 Compare October 13, 2025 07:42
gc00 added 2 commits October 14, 2025 00:58
 * shared-memory1 test is failing
    Probably, there was an intervening mmap.  Fix it:
     [130279] mtcp_util.c:884 mmap_fixed_noreplace:
      error 9 mapping 0x1000 bytes at 0x7df6d335b000, flags: 0x100000, prot :0x3
    [130279] mtcp_restart.c:867 read_one_memory_area:
      Assertion failed: mmappedat == area.addr
 * forkexec and vfork2 tests are failing.  Either fix, or "not supported".
    root-pids:[81145]msg:unexpected number of checkpoint files, 2 procs, 1 files
 * See ckptserializer.cpp:103 ("A better idea is to block SIGCHLD")
@gc00 gc00 force-pushed the forked-ckpt-improved branch from 8ea8fa6 to 6f5bd94 Compare October 14, 2025 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant