Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@hudson-ayers
Copy link
Contributor

@hudson-ayers hudson-ayers commented Jul 7, 2022

Pull Request Overview

This pull request is just a single commit that builds on #2958 and should not be reviewed until #2958 is merged is ready for review.

This pull request modifies the grant code and process interface in two ways. First, it modifies Process::enter_grant() to return a NonNull<u8>, since a null grant pointer should not be possible. Second, it modifies allocate_grant() to no longer return a pointer, and instead return a bool indicating success or failure. This requires the code which allocates and initializes grants to actually enter the grant before it initializes its contents. This is important because otherwise dropping the EnterGrantKernelManagedLayout struct after initializing causes a call to leave_grant() for a grant that has not been entered at all, which does not make sense.

Testing Strategy

This PR works fine, except for a specific app (analog_comparator) when flashed from a Linux host, but I am pretty confident that bug is unrelated to this PR.

TODO or Help Wanted

N/A

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make prepush.

@github-actions github-actions bot added kernel WG-OpenTitan In the purview of the OpenTitan working group. labels Jul 7, 2022
@bradjc
Copy link
Contributor

bradjc commented Jul 22, 2022

Need rebase now that #2958 is merged.

@hudson-ayers hudson-ayers force-pushed the more-grant-improvements branch from 0c7f917 to f6adf9d Compare July 22, 2022 17:29
@github-actions github-actions bot removed the WG-OpenTitan In the purview of the OpenTitan working group. label Jul 22, 2022
@hudson-ayers
Copy link
Contributor Author

After running tests on Imix with multiple applications, this seems to have broken something (or at least changed timing enough to reveal something already broken in the analog_comparator app?). Flashing both hello_loop and analog_comparator together leads to analog_comparator exceeding its time quantum, which does not happen without this commit. Removing a single println!() from the callback code of analog_comparator fixes this, even though that callback is not being called..so definitely a pretty strange bug. Either way this should not be merged until I get to the bottom of this.

@hudson-ayers
Copy link
Contributor Author

This bug is really nasty, and I am not sure if it is the fault of this PR at all.

A few things I have tried:

  • Current kernel master + libtock-c master acomp app: works
  • This PR + libtock-c master acomp app : fails
  • This PR + libtock-c acomp app with one 4 byte value added to the binary: works
  • This PR with a 48 byte string added to the kernel + libtock-c master acomp app: works
  • This PR with an 8 byte string removed from the kernel + libtock-c master acomp app: works
  • This PR w/ system call tracing enabled: works
  • Modifying master to add a 48 byte string, such that the master kernel is the exact same size as the kernel in this PR: works

All of these results are repeatable. Moving the app to a different location in memory by flashing other apps before it does not seem to fix things.

An interesting process_console printout from the failed version:

tock$ process analog_comparator
𝐀𝐩𝐩: analog_comparator   -   [Running]
 Events Queued: 0   Syscall Count: 3   Dropped Upcall Count: 0
 Restart Count: 0
 Last Syscall: Memop { operand: 11, arg0: 536906356 }
 Completion Code: None


 ╔═══════════╤══════════════════════════════════════════╗
 ║  Address  │ Region Name    Used | Allocated (bytes)  ║
 ╚0x2000A000═╪══════════════════════════════════════════╝
             │ Grant Ptrs      152
             │ Upcalls         360
             │ Process         880
  0x20009A90 ┼───────────────────────────────────────────
             │ ▼ Grant           0
  0x20009A90 ┼───────────────────────────────────────────
             │ Unused
  0x20008A74 ┼───────────────────────────────────────────
             │ ▲ Heap            0 |   4124               S
  0x20008A74 ┼─────────────────────────────────────────── R
             │ Data            628 |    628               A
  0x20008800 ┼─────────────────────────────────────────── M
             │ ▼ Stack         200 |   2048
  0x20008738 ┼───────────────────────────────────────────
             │ Unused
  0x20008000 ┴───────────────────────────────────────────
             .....
  0x00042000 ┬─────────────────────────────────────────── F
             │ App Flash      8128                        L
  0x00040040 ┼─────────────────────────────────────────── A
             │ Protected        64                        S
  0x00040000 ┴─────────────────────────────────────────── H

@bradjc
Copy link
Contributor

bradjc commented Jul 25, 2022

To reproduce:

  • Need Hudson's imix kernel (or to compile on linux, not mac).
  • Run only the analog comparator test app
  • The app calls exit when doing its first sbrk, and gets stuck in a while 1 loop in the exit function. The symptom appears to be the memop call returns an invalid return (fail with value, r0 == 1). However, the kernel does not record that the app ever called memop(1) (sbrk), so its not clear what is happening.

Also note:

  • sbrk uses r0=1, and the error is happening because r0==1 "after" the memop runs. so could be that r0 is not changing (ie not being set by the kernel).

Giving up for now, but I was able to make two versions of the analog comparator test app that differ in one instruction where one causes the error and the other doesn't. Basically inserting one meaningless instruction before the svc 5 memop call and the problem goes away (insert it after the svc 5 and still broken):

diff cortex-m4.userland_debug.lst.WORKING_MINOR_CHANGE cortex-m4.userland_debug.lst.BROKEN_MINOR_CHANGE
1962,1963c1962,1963
<    403c6:	4684      	mov	ip, r0
<    403c8:	df05      	svc	5
---
>    403c6:	df05      	svc	5
>    403c8:	4684      	mov	ip, r0
diff --git a/libtock/tock.c b/libtock/tock.c
index 04787bf..adb2941 100644
--- a/libtock/tock.c
+++ b/libtock/tock.c
@@ -348,16 +348,26 @@ allow_userspace_r_return_t allow_userspace_read(uint32_t driver,
 }

 memop_return_t memop(uint32_t op_type, int arg1) {
+  // uint32_t opppp=op_type;
+  // register uint32_t r5 __asm__ ("r5") = op_type;
   register uint32_t r0 __asm__ ("r0") = op_type;
   register int r1 __asm__ ("r1")      = arg1;
   register uint32_t val __asm__ ("r1");
   register uint32_t code __asm__ ("r0");
+  // register uint32_t myr3 __asm__ ("r3");
+  // register uint32_t opp __asm__ ("r5");
   __asm__ volatile (
-    "svc 5"
+    "mov r12, r0\n"
+    // "mov r12, r0\n"
+    "svc 5\n"
     : "=r" (code), "=r" (val)
     : "r" (r0), "r" (r1)
+    // : "r2", "r3", "memory"
     : "memory"
     );
+  // if (opp == 11) {
+  //   _exit(0xcc);
+  // }
   if (code == TOCK_SYSCALL_SUCCESS) {
     memop_return_t rv = {TOCK_STATUSCODE_SUCCESS, 0};
     return rv;
@@ -369,7 +379,9 @@ memop_return_t memop(uint32_t op_type, int arg1) {
     return rv;
   } else {
     // Invalid return type
-    exit(1);
+    exit(r0);
+    // exit(0xc);
+    // exit(op_type);
   }
 }

Any changes to the app or kernel seem to make the error go away.

Best ideas so far:

  • some sort of exception is happening during the svc call, which is confusing the svc_handler_arm_v7m code somehow and it ends up going back to the app without ever handling the syscall in the kernel.
  • we've stumbled on some very reproducible timing issue, where the svc memop sbrk is happening at the exact right moment, causing weird behavior. Even delaying the memop svc by one instruction causes the issue to go away. Using a different kernel which delays it removes the error, loading a second process changes the timing and the issue goes away, or even just inserting a single instruction before the svc.

Copy link
Contributor

@bradjc bradjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think the main takeaway is that this bug doesn't seem to have anything to do with this PR, and we should move this discussion to an issue and move forward with this PR.

@bradjc bradjc added the last-call Final review period for a pull request. label Jul 27, 2022
Copy link
Contributor

@brghena brghena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me.

Copy link
Member

@ppannuto ppannuto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Agreed, this changeset makes sense and looks good.

@ppannuto
Copy link
Member

Note: Issue migrated to #3109

@bors
Copy link
Contributor

bors bot commented Jul 28, 2022

@bors bors bot merged commit 03e3459 into tock:master Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kernel last-call Final review period for a pull request.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants