-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[LLDB] Ptrace seize dead process #137041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[LLDB] Ptrace seize dead process #137041
Conversation
50df550
to
8de35b8
Compare
We already have one piece of "status" parsing code in
I don't know the answer to that, but I can say that I don't think this feature needs to be (or should be) specific to this use case. One of the things that I would like to be able to do is to stop a process right before it exits (regardless of whether that's through the exit syscall, or a fatal signal, etc.). I think the tricky part is that (in both cases) the user might legitimately want to let the process exit, and "continue" is the normal way to do that, so I don't think we'd want to just error out of the continue command (or from the |
I think disallowing any non explicit continues/disconnect is a good user experience as long as we display an appropriate message. The workflow I imagine is when halted in this state any explicit
Will refactor, I looked for something for status and it seems I missed something |
3b10fcd
to
f1574f3
Compare
I see this is still a draft, but to avoid surprised, I want to say that I think this should be two or three patches in the final form. One for the PTRACE_SEIZE thingy, one for the "mechanism to prevent a process from resuming" and maybe (depending on how involved it gets) one for refactoring the /proc/status parser. |
I'm okay with that, I'm still in the 'experiment and see what happens phase' when it comes to preventing continue. How does this proposal sound:
For #3, I think it's got some loose scope around if it should replace proc stat or be in addition to it. The biggest complexity here is we're adding information into qProcessInfo that isn't exclusively about the process but now about how we're interacting with the process. So I think tackling that as it's own step makes sense. |
if (m_current_process && m_current_process->CanResume()) { | ||
response.Printf("vCont;c;C;s;S;t"); | ||
} else { | ||
response.Printf("vCont"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do if the process can't resume anyway?
https://sourceware.org/gdb/current/onlinedocs/gdb.html/Packets.html#vCont-packet
Or is this just a WIP implementation, and doing something rather than nothing meant you didn't have to change a bunch of other stuff.
Also if/when you commit parts of this, please include a version of the example gist in one of the commit messages, it might be useful in future. |
Can you make the same thing happen without using a coredumper? I feel like the answer is a solid no but I'm not sure why. Another way we can do it is to write a test that checks that if the remote says it's in a non-resumable state, we act in a certain way. Only half the story but it's something. |
@@ -1304,6 +1304,9 @@ void GDBRemoteCommunicationServerCommon:: | |||
if (!abi.empty()) | |||
response.Printf("elf_abi:%s;", abi.c_str()); | |||
response.Printf("ptrsize:%d;", proc_arch.GetAddressByteSize()); | |||
std::optional<bool> non_resumable = proc_info.IsNonResumable(); | |||
if (non_resumable) | |||
response.Printf("non_resumable:%d", *non_resumable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of qProcessInfo (https://lldb.llvm.org/resources/lldbgdbremote.html#qprocessinfo) which is, I presume, only requested once because all the information is constant for the process lifetime.
At least for the situation at hand, non-resumeable is also constant. Though the process had to get into that state somehow, but if you were debugging it before the non-resumable point, it wouldn't have got into the non-resumeable state anyway so it makes no difference.
So unless anyone can think of a situation where non-resumeable could change, this packet is probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your intuition is correct, for now this works, but in the future (if we want to support O_TRACEEXIT
), we would need to update this. Currently I can get away with attach returning a process info that we can't resume.
This is still very WIP, as I'm trying to sort with Greg the gotcha's. I will break this patch up into pieces soon :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, so something would get added to the remote protocol to make this work but exactly what we can decide later.
I'd put this first (in which case it wouldn't be called "move" but "extend" or "refactor"), for two reasons:
|
4236506
to
8f6bb44
Compare
8f6bb44
to
5c25880
Compare
@llvm/pr-subscribers-lldb Author: Jacob Lalonde (Jlalond) ChangesThis the actual PR to my SEIZE RFC. This is currently the bare bones on seizing a dead process, and being able to attach and introspect with LLDB. Additionally, right now I only check proc status before seize, and we should double check after seize that the process has not changed. Worth noting is once you seize a coredumping process (and it hits trace stop), Coredumping in status will now report 0. This is pretty complicated to test because it requires integration with the Kernel, thankfully the setup only involves some very simple toy programs, which I have outlined with instructions in this gist Full diff: https://github.com/llvm/llvm-project/pull/137041.diff 2 Files Affected:
diff --git a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
index 7f2aba0e4eb2c..141e49d8a0b7e 100644
--- a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
+++ b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
@@ -312,10 +312,26 @@ NativeProcessLinux::Manager::Attach(
Log *log = GetLog(POSIXLog::Process);
LLDB_LOG(log, "pid = {0:x}", pid);
- auto tids_or = NativeProcessLinux::Attach(pid);
- if (!tids_or)
- return tids_or.takeError();
- ArrayRef<::pid_t> tids = *tids_or;
+ // This safety check lets us decide if we should
+ // seize or attach.
+ ProcessInstanceInfo process_info;
+ if (!Host::GetProcessInfo(pid, process_info))
+ return llvm::make_error<StringError>("Unable to read process info",
+ llvm::inconvertibleErrorCode());
+
+ std::vector<::pid_t> tids;
+ if (process_info.IsCoreDumping()) {
+ auto attached_or = NativeProcessLinux::Seize(pid);
+ if (!attached_or)
+ return attached_or.takeError();
+ tids = std::move(*attached_or);
+ } else {
+ auto attached_or = NativeProcessLinux::Attach(pid);
+ if (!attached_or)
+ return attached_or.takeError();
+ tids = std::move(*attached_or);
+ }
+
llvm::Expected<ArchSpec> arch_or =
NativeRegisterContextLinux::DetermineArchitecture(tids[0]);
if (!arch_or)
@@ -444,6 +460,88 @@ NativeProcessLinux::NativeProcessLinux(::pid_t pid, int terminal_fd,
SetState(StateType::eStateStopped, false);
}
+llvm::Expected<std::vector<::pid_t>> NativeProcessLinux::Seize(::pid_t pid) {
+ Log *log = GetLog(POSIXLog::Process);
+
+ uint64_t options = GetDefaultPtraceOpts();
+ Status status;
+ // Use a map to keep track of the threads which we have attached/need to
+ // attach.
+ Host::TidMap tids_to_attach;
+ while (Host::FindProcessThreads(pid, tids_to_attach)) {
+ for (Host::TidMap::iterator it = tids_to_attach.begin();
+ it != tids_to_attach.end();) {
+ if (it->second == true) {
+ continue;
+ }
+ lldb::tid_t tid = it->first;
+ if ((status = PtraceWrapper(PTRACE_SEIZE, tid, nullptr, (void *)options))
+ .Fail()) {
+ // No such thread. The thread may have exited. More error handling
+ // may be needed.
+ if (status.GetError() == ESRCH) {
+ it = tids_to_attach.erase(it);
+ continue;
+ }
+ if (status.GetError() == EPERM) {
+ // Depending on the value of ptrace_scope, we can return a
+ // different error that suggests how to fix it.
+ return AddPtraceScopeNote(status.ToError());
+ }
+ return status.ToError();
+ }
+
+ if ((status = PtraceWrapper(PTRACE_INTERRUPT, tid)).Fail()) {
+ // No such thread. The thread may have exited. More error handling
+ // may be needed.
+ if (status.GetError() == ESRCH) {
+ it = tids_to_attach.erase(it);
+ continue;
+ }
+ if (status.GetError() == EPERM) {
+ // Depending on the value of ptrace_scope, we can return a
+ // different error that suggests how to fix it.
+ return AddPtraceScopeNote(status.ToError());
+ }
+ return status.ToError();
+ }
+
+ int wpid =
+ llvm::sys::RetryAfterSignal(-1, ::waitpid, tid, nullptr, __WALL);
+ // Need to use __WALL otherwise we receive an error with errno=ECHLD At
+ // this point we should have a thread stopped if waitpid succeeds.
+ if (wpid < 0) {
+ // No such thread. The thread may have exited. More error handling
+ // may be needed.
+ if (errno == ESRCH) {
+ it = tids_to_attach.erase(it);
+ continue;
+ }
+ return llvm::errorCodeToError(
+ std::error_code(errno, std::generic_category()));
+ }
+
+ LLDB_LOG(log, "adding tid = {0}", tid);
+ it->second = true;
+
+ // move the loop forward
+ ++it;
+ }
+ }
+
+ size_t tid_count = tids_to_attach.size();
+ if (tid_count == 0)
+ return llvm::make_error<StringError>("No such process",
+ llvm::inconvertibleErrorCode());
+
+ std::vector<::pid_t> tids;
+ tids.reserve(tid_count);
+ for (const auto &p : tids_to_attach)
+ tids.push_back(p.first);
+
+ return std::move(tids);
+}
+
llvm::Expected<std::vector<::pid_t>> NativeProcessLinux::Attach(::pid_t pid) {
Log *log = GetLog(POSIXLog::Process);
@@ -513,8 +611,8 @@ llvm::Expected<std::vector<::pid_t>> NativeProcessLinux::Attach(::pid_t pid) {
return std::move(tids);
}
-Status NativeProcessLinux::SetDefaultPtraceOpts(lldb::pid_t pid) {
- long ptrace_opts = 0;
+uint64_t NativeProcessLinux::GetDefaultPtraceOpts() {
+ uint64_t ptrace_opts = 0;
// Have the child raise an event on exit. This is used to keep the child in
// limbo until it is destroyed.
@@ -537,6 +635,11 @@ Status NativeProcessLinux::SetDefaultPtraceOpts(lldb::pid_t pid) {
// the child finishes sharing memory.
ptrace_opts |= PTRACE_O_TRACEVFORKDONE;
+ return ptrace_opts;
+}
+
+Status NativeProcessLinux::SetDefaultPtraceOpts(lldb::pid_t pid) {
+ uint64_t ptrace_opts = GetDefaultPtraceOpts();
return PtraceWrapper(PTRACE_SETOPTIONS, pid, nullptr, (void *)ptrace_opts);
}
diff --git a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.h b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.h
index d345f165a75d8..9ae4e57e74add 100644
--- a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.h
+++ b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.h
@@ -175,7 +175,6 @@ class NativeProcessLinux : public NativeProcessELF,
private:
Manager &m_manager;
ArchSpec m_arch;
-
LazyBool m_supports_mem_region = eLazyBoolCalculate;
std::vector<std::pair<MemoryRegionInfo, FileSpec>> m_mem_region_cache;
@@ -191,9 +190,13 @@ class NativeProcessLinux : public NativeProcessELF,
// Returns a list of process threads that we have attached to.
static llvm::Expected<std::vector<::pid_t>> Attach(::pid_t pid);
+ // Returns a list of process threads that we have seized and interrupted.
+ static llvm::Expected<std::vector<::pid_t>> Seize(::pid_t pid);
static Status SetDefaultPtraceOpts(const lldb::pid_t);
+ static uint64_t GetDefaultPtraceOpts();
+
bool TryHandleWaitStatus(lldb::pid_t pid, WaitStatus status);
void MonitorCallback(NativeThreadLinux &thread, WaitStatus status);
|
@labath @DavidSpickett Thanks for the patience. I've broken down my prototype, and this is now patch 2, where we implement the SEIZE functionality. I will follow up with making it so we can't resume the process when we're in this state. @DavidSpickett you mentioned you wanted me to include my gist, to my knowledge github will include my summary by default. Did you want me to check in the toy program as an example? |
You can include your gist content in the PR summary, I've certainly seen longer commit messages than that in llvm :) The question I want to be able to answer from the final commit message is "what is a dead process and how do I make one?". So that in future if I need to investigate this code, I know where to start. You can do that by linking to well established documentation on the subject, or writing it in your own words, and/or including that example. If we are able to construct a test case, that would serve the same purpose. |
Also FYI this has CI failures, https://buildkite.com/llvm-project/github-pull-requests/builds/177176#0196b180-e7c0-49e9-99ac-65dcc8a3c1a9, not sure if you are aware. LLDB isn't known for being super stable in pre-commit CI, but they are on the same topic as this. |
This the actual PR to my SEIZE RFC. This is currently the bare bones on seizing a dead process, and being able to attach and introspect with LLDB.
Additionally, right now I only check proc status before seize, and we should double check after seize that the process has not changed. Worth noting is once you seize a coredumping process (and it hits trace stop), Coredumping in status will now report 0.
This is pretty complicated to test because it requires integration with the Kernel, thankfully the setup only involves some very simple toy programs, which I have outlined with instructions in this gist