Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[LLDB] Ptrace seize dead process #137041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Jlalond
Copy link
Contributor

@Jlalond Jlalond commented Apr 23, 2025

This the actual PR to my SEIZE RFC. This is currently the bare bones on seizing a dead process, and being able to attach and introspect with LLDB.

Some caveats that I need to address before we publish this PR is how to prevent LLDB from running any expressions or really anything that trys to SIGCONT, because that will immediately terminate the process, I would like this behavior to mimic how we inform the user post mortem processes can't run expressions.

Additionally, right now I only check proc status before seize, and we should double check after seize that the process has not changed. Worth noting is once you seize a coredumping process (and it hits trace stop), Coredumping in status will now report 0.

This is pretty complicated to test because it requires integration with the Kernel, thankfully the setup only involves some very simple toy programs, which I have outlined with instructions in this gist

@labath
Copy link
Collaborator

labath commented Apr 24, 2025

We already have one piece of "status" parsing code in source/Host/linux/Host.cpp. I think it'd be better to reuse that one. I'm slightly torn as to whether reuse Host::GetProcessInfo for this (and add a new field to ProcessInstanceInfo -- or possibly expand on IsZombie), or whether to create a new linux-specific entry point which will return this data.

Some caveats that I need to address before we publish this PR is how to prevent LLDB from running any expressions or really anything that trys to SIGCONT, because that will immediately terminate the process, I would like this behavior to mimic how we inform the user post mortem processes can't run expressions.

I don't know the answer to that, but I can say that I don't think this feature needs to be (or should be) specific to this use case. One of the things that I would like to be able to do is to stop a process right before it exits (regardless of whether that's through the exit syscall, or a fatal signal, etc.). PTRACE_O_TRACEEXIT lets you do that, but it means the process will end up in the same "almost a zombie" state, where any attempt to resume it will cause it to disappear. If we had a mechanism to prevent this, we could use it in this case as well. (and this case, unlike this "dead" state, is actually testable).

I think the tricky part is that (in both cases) the user might legitimately want to let the process exit, and "continue" is the normal way to do that, so I don't think we'd want to just error out of the continue command (or from the vCont packet). I think what we'd want is to make sure that the process doesn't accidentally exit while running an expression (possibly from within a data formatter), and for that I guess we'd need to let lldb know that running expressions is "dangerous". We already have Thread::SafeToCallFunctions, even though it's used for a slightly different purpose, but maybe it could be extended to handle this as well?

@Jlalond
Copy link
Contributor Author

Jlalond commented Apr 24, 2025

I think the tricky part is that (in both cases) the user might legitimately want to let the process exit, and "continue" is the normal way to do that, so I don't think we'd want to just error out of the continue command (or from the vCont packet). I think what we'd want is to make sure that the process doesn't accidentally exit while running an expression (possibly from within a data formatter), and for that I guess we'd need to let lldb know that running expressions is "dangerous". We already have Thread::SafeToCallFunctions, even though it's used for a slightly different purpose, but maybe it could be extended to handle this as well?

I think disallowing any non explicit continues/disconnect is a good user experience as long as we display an appropriate message. The workflow I imagine is when halted in this state any explicit continue or disconnect should just kill the process, but something like p MyVar.Size() should not.

We already have one piece of "status" parsing code in source/Host/linux/Host.cpp. I think it'd be better to reuse that one. I'm slightly torn as to whether reuse Host::GetProcessInfo for this (and add a new field to ProcessInstanceInfo -- or possibly expand on IsZombie), or whether to create a new linux-specific entry point which will return this data.

Will refactor, I looked for something for status and it seems I missed something

@Jlalond Jlalond changed the title Ptrace seize dead process [LLDB] Ptrace seize dead process Apr 24, 2025
@Jlalond Jlalond force-pushed the ptrace-seize-dead-process branch from 8de35b8 to 3b10fcd Compare April 24, 2025 22:26
@Jlalond Jlalond force-pushed the ptrace-seize-dead-process branch from 3b10fcd to f1574f3 Compare April 28, 2025 21:23
@labath
Copy link
Collaborator

labath commented Apr 29, 2025

I see this is still a draft, but to avoid surprised, I want to say that I think this should be two or three patches in the final form. One for the PTRACE_SEIZE thingy, one for the "mechanism to prevent a process from resuming" and maybe (depending on how involved it gets) one for refactoring the /proc/status parser.

@Jlalond
Copy link
Contributor Author

Jlalond commented Apr 29, 2025

I see this is still a draft, but to avoid surprised, I want to say that I think this should be two or three patches in the final form. One for the PTRACE_SEIZE thingy, one for the "mechanism to prevent a process from resuming" and maybe (depending on how involved it gets) one for refactoring the /proc/status parser.

I'm okay with that, I'm still in the 'experiment and see what happens phase' when it comes to preventing continue.

How does this proposal sound:

  • SEIZE + Parsing Proc Status
  • GDB Server changes to prevent resumption
  • Move the Proc Status (not stat) code to the HOST class

For #3, I think it's got some loose scope around if it should replace proc stat or be in addition to it. The biggest complexity here is we're adding information into qProcessInfo that isn't exclusively about the process but now about how we're interacting with the process. So I think tackling that as it's own step makes sense.

@dmpots dmpots self-requested a review April 29, 2025 20:32
if (m_current_process && m_current_process->CanResume()) {
response.Printf("vCont;c;C;s;S;t");
} else {
response.Printf("vCont");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do if the process can't resume anyway?

https://sourceware.org/gdb/current/onlinedocs/gdb.html/Packets.html#vCont-packet

Or is this just a WIP implementation, and doing something rather than nothing meant you didn't have to change a bunch of other stuff.

@DavidSpickett
Copy link
Collaborator

Also if/when you commit parts of this, please include a version of the example gist in one of the commit messages, it might be useful in future.

@DavidSpickett
Copy link
Collaborator

This is pretty complicated to test because it requires integration with the Kernel

Can you make the same thing happen without using a coredumper? I feel like the answer is a solid no but I'm not sure why.

Another way we can do it is to write a test that checks that if the remote says it's in a non-resumable state, we act in a certain way. Only half the story but it's something.

@@ -1304,6 +1304,9 @@ void GDBRemoteCommunicationServerCommon::
if (!abi.empty())
response.Printf("elf_abi:%s;", abi.c_str());
response.Printf("ptrsize:%d;", proc_arch.GetAddressByteSize());
std::optional<bool> non_resumable = proc_info.IsNonResumable();
if (non_resumable)
response.Printf("non_resumable:%d", *non_resumable);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of qProcessInfo (https://lldb.llvm.org/resources/lldbgdbremote.html#qprocessinfo) which is, I presume, only requested once because all the information is constant for the process lifetime.

At least for the situation at hand, non-resumeable is also constant. Though the process had to get into that state somehow, but if you were debugging it before the non-resumable point, it wouldn't have got into the non-resumeable state anyway so it makes no difference.

So unless anyone can think of a situation where non-resumeable could change, this packet is probably fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your intuition is correct, for now this works, but in the future (if we want to support O_TRACEEXIT), we would need to update this. Currently I can get away with attach returning a process info that we can't resume.

This is still very WIP, as I'm trying to sort with Greg the gotcha's. I will break this patch up into pieces soon :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, so something would get added to the remote protocol to make this work but exactly what we can decide later.

@labath
Copy link
Collaborator

labath commented May 5, 2025

Move the Proc Status (not stat) code to the HOST class

I'd put this first (in which case it wouldn't be called "move" but "extend" or "refactor"), for two reasons:

  • it reduces the chance of ending up with two parsers
  • I'm not very happy with the implementation you have here. I think using structured data is overkill and makes using it more complicated. Since this is an internal API, and we don't have to worry about stability, I think a struct with a bool field (or optional<bool> if you need to treat "not present" differently) would be better. (That's also more-or-less what the existing implementation does)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants