Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

ethomson
Copy link
Member

Docker can leverage QEMU to run images on other platforms! Introduce an arm32 based build.

@ethomson ethomson force-pushed the ethomson/qemu-build branch 3 times, most recently from 0c98559 to 2eafcde Compare September 12, 2018 10:21
@ethomson
Copy link
Member Author

This. Is. Blowing. My. Mind. 🤯

@ethomson
Copy link
Member Author

2018-09-12T10:22:44.6331866Z Operating system version:
2018-09-12T10:22:44.6649758Z     Linux 2c96ecf832f7 4.15.0-1022-azure #22~16.04.1-Ubuntu SMP Thu Aug 16 10:31:05 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux

@ethomson ethomson changed the title ci: arm32 docker build ci: arm docker builds Sep 12, 2018
@ghuntley
Copy link

@shiftkey
Copy link
Contributor

@neithernut
Copy link
Contributor

It says it could not bind to ::1. Maybe they (CI provider) do't provide IPv6 support on arm for some reason?

@ethomson
Copy link
Member Author

It says it could not bind to ::1. Maybe they (CI provider) do't provide IPv6 support on arm for some reason?

Not sure what's going on. This is running in an emulator in docker, so there's a lot of strange things that could be going on. I pushed up some changes to the proxy so that it will try to bind only on 127.0.0.1 in case it is the IPv6 problem.

@neithernut
Copy link
Contributor

Interestingly, something still tries to bind to ::1. And all Linux-based tests are now failing.

@ethomson
Copy link
Member Author

/rebuild

@libgit2-azure-pipelines
Copy link

Okay, @ethomson, I started to rebuild this pull request.

@neithernut
Copy link
Contributor

At least in the latest test the proxy server does bind to 127.0.0.1:

[2018-09-12 17:18:08] com.microsoft.tfs.tools.poxy.Poxy: Starting server on 127.0.0.1:8118

Still, something tries to bind to ::1, so it's probably another process. Maybe it's just a deceptive error message and it's actually the process trying to connect to the proxy which is printing the error? This would at least explain why the other tests fail all of a sudden.

@ethomson
Copy link
Member Author

Still, something tries to bind to ::1, so it's probably another process.

Yes, I agree. It's a red herring in this, it appears. I can repro this locally - I thought I had all the tests passing last night. So either I misunderstood what I saw last night (likely) or I broke something between now and then (also possible).

@ethomson
Copy link
Member Author

Aha. Running java directly yields a big fat clue:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000004000e18dd4, pid=17, tid=0x000000404d339200
#
# JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build 1.8.0_181-8u181-b13-0ubuntu0.16.04.1-b13)
# Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-aarch64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x3e3dd4]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /build/hs_err_pid17.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

This appears to happen on any jar. 😞

@ethomson
Copy link
Member Author

Aha - this happens on bionic too with OpenJDK 11. However it errors with:

qemu: Unsupported syscall: 168

Right before the segfault. (I suspect that xenial does, too, and I just missed that message.) Right, then, let's try gcj. The poxyproxy is doing so precious little that it might just work.

@ethomson
Copy link
Member Author

/rebuild

@libgit2-azure-pipelines
Copy link

Okay, @ethomson, I started to rebuild this pull request.

.vsts-ci.yml Outdated
qemu: 'true'
imageName: 'libgit2/xenial-arm32:test'
environmentVariables:
SKIP_PROXY_TESTS: true
Copy link
Contributor

@neithernut neithernut Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the wrong kind of whitespace. CI complains about this line, too.

.vsts-ci.yml Outdated
qemu: 'true'
imageName: 'libgit2/xenial-arm64:test'
environmentVariables:
SKIP_PROXY_TESTS: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@ethomson
Copy link
Member Author

ethomson commented Sep 12, 2018 via email

@ethomson ethomson force-pushed the ethomson/qemu-build branch 5 times, most recently from 53b4399 to ca36b88 Compare September 13, 2018 11:24
Copy link
Contributor

@neithernut neithernut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointing out the obvious again.

.vsts-ci.yml Outdated
@@ -56,6 +56,32 @@ jobs:
CMAKE_OPTIONS=-DUSE_HTTPS=mbedTLS -DSHA1_BACKEND=mbedTLS
LEAK_CHECK=valgrind

- job: linux_xenial_x86
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should include "gcc" in the name

.vsts-ci.yml Outdated
CC=gcc
LEAK_CHECK=valgrind

- job: linux_xenial_x86
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and "clang" here.

@ethomson ethomson force-pushed the ethomson/qemu-build branch 4 times, most recently from 9200495 to 7c96eb8 Compare September 13, 2018 18:57
@ethomson
Copy link
Member Author

/rebuild

@libgit2-azure-pipelines
Copy link

Okay, @ethomson, I started to rebuild this pull request.

@ethomson ethomson force-pushed the ethomson/qemu-build branch 2 times, most recently from 7c96eb8 to b869a8a Compare September 18, 2018 02:15
Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool, thanks a lot for working on this! My only worry right now is runtime, as the new jobs seem like they'd run by default. Doesn't this heavily impact the feedback loop?

.vsts-ci.yml Outdated
qemu: 'true'
imageName: 'libgit2/xenial-arm64:test'
environmentVariables: |
SKIP_PROXY_TESTS=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this all supposed to be part of our "main" CI infrastructure? If so, by how much does it increase our CI build times? If the delay is quite noticable, I'd vote to have this as a nightly job instead, as we have said in past discussions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I 100% agree. I was just hoping to iterate on these by putting them in the PR validation configuration so that they would get built. (There are a handful of other ways to do this, I could have set up a new pipelines configuration, but I'm lazy.)

.vsts-ci.yml Outdated
@@ -57,7 +57,7 @@ jobs:
LEAK_CHECK=valgrind

- job: linux_xenial_arm32
displayName: 'Linux (Xenial; arm32)'
displayName: 'Linux (Xenial; arm32; GCC)'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these jobs then also say whether they use OpenSSL or mbedTLS, if we aim for consistency?

@@ -40,6 +41,9 @@ void test_buf_oom__grow(void)
cl_assert(git_buf_oom(&buf));

git_buf_dispose(&buf);
#else
cl_skip();
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, we should just use our custom allocators here. Create a custom test allocater that only has $n$ bytes available, plug it in and then run this test. This should result in a "real" OOM without us having to trick the host. I also think it would be quite helpful in other tests to verify exceptional behaviour.

Obviously, that doesn't have to be part of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooooh, that's a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing this in a separate pullrequest - forthcoming.

...
fun:gcry_pk_sign
obj:*libssh2.so*
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really cannot imagine that libbsh2 or libgcrypt is to blame. I mean, I've tried multiple times to debug this issue and see whether we fail to use the APIs correctly, without any finds. But I just cannot imagine them being so lenient with regards to memory leaks.

@ethomson ethomson force-pushed the ethomson/qemu-build branch from b869a8a to 78ef950 Compare October 21, 2018 08:42
As the number of each grow, separate the CI build scripts from
the YAML definitions.
@ethomson ethomson force-pushed the ethomson/qemu-build branch from 78ef950 to 543a694 Compare October 21, 2018 08:44
Use multiarch arm32 and arm64 docker images to run Xenial-based images
for those platforms.  We can support all the tests on ARM32 and 64
_except_ the proxy-based tests.  Our proxy on ARM seems regrettably
unstable, either due to some shoddy dependencies (with native code?)
or the JREs themselves.

Run these platforms as part of our nightly builds; do not run them
during pull request or CI validation.
Bind the proxy specifically to 127.0.0.1 instead of all addresses.  This
is not strictly necessary for operations, but having a potentially open
proxy on a network is not a good idea.
Use Bionic so that we have a modern libssh2 (for communicating with
GitHub).  We've ported fixes to our Trusty-based amd64 images, but
maintaining patches for multiple platforms is heinous.
Newer dependencies means newer places to leak!
On a 32-bit Linux systems, the value large enough to make malloc
guarantee a failure is also large enough that valgrind considers it
"fishy".  Skip this test on those systems entirely.
We don't need two separate docker images for OpenSSL and mbedTLS.
They've been combined into a single image `trusty-amd64` that supports
both.
@ethomson ethomson force-pushed the ethomson/qemu-build branch from 57ef0f3 to 7c55716 Compare October 21, 2018 09:38
@ethomson
Copy link
Member Author

I've also created a repository that containers the Dockerfiles that we use to create these images.

@ethomson ethomson merged commit 671b244 into master Oct 21, 2018
@pks-t
Copy link
Member

pks-t commented Oct 25, 2018

Cool, this is great! Is there any particular reason why you feel like the Dockerfiles shouldn't be part of libgit2 directly? We already have a "ci" directory, so I'd feel like it's the perfect place for them to live in.

@ethomson
Copy link
Member Author

Cool, this is great! Is there any particular reason why you feel like the Dockerfiles shouldn't be part of libgit2 directly? We already have a "ci" directory, so I'd feel like it's the perfect place for them to live in.

🤔 No. I have no preference either way. I'd be happy to move them into ci if that's what makes the most sense.

@pks-t
Copy link
Member

pks-t commented Oct 25, 2018

I think it makes sense to have them as part of libgit2. It's not like they make any sense as standalone files outside of the scope of libgit2, and they are part of our CI setup. Furthermore, I don't think they're much of a maintenance burden and don't add much to libgit2's size. So only pros, no cons, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants