-
Notifications
You must be signed in to change notification settings - Fork 3k
[snmp] peer-ify application #5417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3066550 to
ca16869
Compare
|
Rebased to fix merge conflict. |
|
A couple of points:
|
ca16869 to
f76ebc6
Compare
Extra actions taken are:
I have updated the PR with extra "-mgr", "-agent", "-v3mgr" and "-v3agent" to make node names clearer. In general, such clarity is exactly one of the design goals for Would print This way node names are guaranteed not to clash, even if I run two copies of the same test suite on the same host. It also guarantees that one test case that failed and left a node running will not affect another test case attempting to use the same node name. I verified that tests are passing on Linux (Ubuntu20), FreeBSD 13 and Windows machine. |
|
You still do not start the snmp_test_sys_monitor process on all nodes |
|
This is done in No existing test case called I verified that tests pass on Linux, FreeBSD, MacOS and Windows. Is there any other OS or a combination of settings that may not be working with my changes? |
|
?ALIB:start_node is called directly at two places in snmp_agent_SUITE.erl |
f76ebc6 to
54cd4c0
Compare
CT Test Results 2 files 24 suites 47m 12s ⏱️ Results for commit a9d76c5. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
|
Thank you, I updated PR adding this to all peer nodes. I was not able to figure out how |
|
snmp_test_sys_monitor is started on each new node. |
|
A lot of test cases failed in the last nightly test run(s). |
|
There are still a lot of failing test cases. |
|
Thank you! Just in case, I re-run the tests on Ubuntu 20, Windows, Mac OS X, Ubuntu ARM64 and FreeBSD 11, they still pass with no failures. |
|
Its the same for me when I run on my own machine(s). |
Replace test_server:start_node with ?CT_PEER. Clean up unused test case callbacks (e.g. `testcase(suite)`) while I'm at it.
54cd4c0 to
a9d76c5
Compare
|
Rebased, verified Linux & FreeBSD passing. |
|
Some failing test case(s), but not nearly as many as before (so far). *** CT Error Notification 2022-02-02 04:47:06.433 *** snmp_agent_SUITE:end_per_group failed Full error description and stacktrace === Ended at 2022-02-02 04:47:06
|
|
Also, I noticed that there is a (test_server:)start_peer, called with CT_PEER, |
|
Another "failure" (actually its the end per testcase that fails): This is also a case when the node has already died (host name removed). |
|
And another one: *** [2022-02-02 05:20:50.881] INFO test_server@shelob <0.1659.0> *** *** CT Error Notification 2022-02-02 05:20:50.883 *** snmp_agent_SUITE:init_per_group failed Full error description and stacktrace === Ended at 2022-02-02 05:20:50
|
|
I am not able to reproduce these test failures. What operating system I should try, and how should start these tests? I also verify |
|
peer:stop |
|
end_per_group(major_tcs): peer:stop =WARNING REPORT==== 2-Feb-2022::04:51:33.635222 === Note that this happens before end_per_group(major_tcs). In another test case, which are not actually using these nodes. |
|
Now I understand this (although I was not able to reproduce the behaviour even after pulling in 2 commits implementing changes in When the fist node stops (agent), something happens to the
I verified that both workarounds keep tests passing, and can make any desired change. But to be honest I'm not sure whether |
|
I would prefer alt. 1 (it was the workaround I used). Simple and no risk of strange "side effects". |
|
Another thing is that on at least one platform almost the entire agent suite is skipped, This should be impossible on a newly created node. So I am not sure what happens here, |
One of If Since this is quite a big change in behaviour compared to before, it will also be possible to disable this bugfix ( *) This is not completely true. You could also provide this service by implementing "virtual connections" routing signals over other connected nodes. We would, however, not be able to deliver such a solution in the near future. That is, that can only be a long term option. Note that I'm not saying whether or not "virtual connections" will be implemented, just that it could be an option for the future. |
|
I found why I was not able to reproduce the failure. I merged the patch from maint branch, but the feature (disconnecting nodes Adding It also prevents us from running test cases in parallel! I am right now running rpc_SUITE tests concurrently, completing it in just a few seconds. This is one of the improvements that I'd like for OTP tests (so that we can get testing signal in just a few minutes, for the entire OTP codebase). |
I should point out that all nodes in the snmp (agent) test suite are local (same host). |
From the test perspective, they are distributed. Given that test cases call |
|
@max-au wrote:
Yes, it is only enabled in our master tests. I'll soon make a PR available with the changes I've made in master which hopefully can be helpful.
I don't think that is the correct way to handle this. The termination is due to nodes behaving as old style
Yes it would have been nice to able to run everything in parallel, but that is not always possible. When it comes to rpc_SUITE I think that I've disabled global if I remember correctly. I don't think it is a problem to disable |
|
I'm copying the conversations related to |
|
I am removing this from testing. We need to see what it looks like without this PR (again). |
Replace test_server:start_node with ?CT_PEER. Clean up
unused test case callbacks (e.g.
testcase(suite)) while I'm at it.