Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@fgiudici
Copy link
Contributor

@fgiudici fgiudici commented Sep 2, 2020

What type of PR is this?

/kind bug

What this PR does / why we need it:

golang protobuf has been updated to a newer implementation incompatible with the older one.
The ttrpc package relies on the gogo/protobuf implementation no more compatible with the newer golang protobuf implementation.
Until this issue is not solved in the ttrpc package, we have to ensure that dependencies imported in the ttrpc package are compatible with the older protobuf specification as we imported the newer golang protobuf in cri-o (>= 1.4.0).

The ttrpc package and its dependencies that are generated via golang protobuf have been forked to keep compatibility with the gogo/protobuf. We will stick to the forks till the issue will not be solved in the ttrpc package.

Which issue(s) this PR fixes:

Fixes #3991

Special notes for your reviewer:

This is a temporary workaround waiting for containerd/ttrpc package to be fixed. Anyway it will not be an easy fix, and may take time.

Does this PR introduce a user-facing change?

None

@openshift-ci-robot openshift-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Sep 2, 2020
@openshift-ci-robot
Copy link

Hi @fgiudici. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 2, 2020
@codecov
Copy link

codecov bot commented Sep 2, 2020

Codecov Report

Merging #4151 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #4151   +/-   ##
=======================================
  Coverage   40.85%   40.85%           
=======================================
  Files         111      111           
  Lines        9501     9501           
=======================================
  Hits         3882     3882           
  Misses       5242     5242           
  Partials      377      377           

@openshift-ci-robot openshift-ci-robot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Sep 2, 2020
@saschagrunert
Copy link
Member

/ok-to-test

@openshift-ci-robot openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 2, 2020
@fidencio
Copy link
Contributor

fidencio commented Sep 2, 2020

I've just tested this PR and it does work as expected with latest CRI-O.

One thing that i'd like to see is a more detailed commit message, as this problem hit us and hit us hard. So, as much information we can add there, the better.

Last but not least, we'd need to backport this to the release-1.19 branch, once it's approved here.

@fgiudici, thanks a ton for working on this and for providing a temporary solution till we have the things fixed on ttrpc side, very much appreciated!

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fgiudici, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 2, 2020
@haircommander
Copy link
Member

@fgiudici test failures are legit:

# time="2020-09-02T11:51:08Z" level=fatal msg="run pod sandbox failed: rpc error: code = Unknown desc = error creating pod sandbox with name \"k8s_podsandbox1_redhat.test.crio_redhat-test-crio_1\": error creating an ID-mapped copy of layer \"ba0dae6243cc9fa2890df40a625721fdbea5c94ca6da897acdd814d710149770\": 2020/09/02 11:51:08 WARNING: proto: message google.rpc.Status is already registered\nA future release will panic on registration conflicts. See:\nhttps://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict\n\n2020/09/02 11:51:08 WARNING: proto: file \"google/rpc/status.proto\" is already registered\nA future release will panic on registration conflicts. See:\nhttps://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict\n\n"

you can find them by clicking on the details->Artifacts->artifacts/->testout.txt:
https://storage.googleapis.com/origin-federated-results/pr-logs/pull/cri-o_cri-o/4151/test_pull_request_crio_integration_rhel/20783/artifacts/testout.txt

you can run these tests locally with:

sudo test/test_runner.sh

@fgiudici
Copy link
Contributor Author

fgiudici commented Sep 2, 2020

@fgiudici test failures are legit:

# time="2020-09-02T11:51:08Z" level=fatal msg="run pod sandbox failed: rpc error: code = Unknown desc = error creating pod sandbox with name \"k8s_podsandbox1_redhat.test.crio_redhat-test-crio_1\": error creating an ID-mapped copy of layer \"ba0dae6243cc9fa2890df40a625721fdbea5c94ca6da897acdd814d710149770\": 2020/09/02 11:51:08 WARNING: proto: message google.rpc.Status is already registered\nA future release will panic on registration conflicts. See:\nhttps://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict\n\n2020/09/02 11:51:08 WARNING: proto: file \"google/rpc/status.proto\" is already registered\nA future release will panic on registration conflicts. See:\nhttps://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict\n\n"

you can find them by clicking on the details->Artifacts->artifacts/->testout.txt:
https://storage.googleapis.com/origin-federated-results/pr-logs/pull/cri-o_cri-o/4151/test_pull_request_crio_integration_rhel/20783/artifacts/testout.txt

you can run these tests locally with:

sudo test/test_runner.sh

thanks, looking into it

@fidencio
Copy link
Contributor

fidencio commented Sep 4, 2020

/hold

After talking with Francesco, we've decided to explicitly set a "do-not-merge" on this one, in order to avoid it being accidentally merged while he's still digging (more and more) into the issue.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 4, 2020
@fidencio
Copy link
Contributor

fidencio commented Sep 4, 2020

I've tested the PR as it's now and it does solve the registration clash and works as expected, so, thumbs up!

Francesco mentioned (on a private chat) that there may still be two issues (or ways to improve) this PR and that he'll dig into that (and that's the reason I've added the "do-not-merge" label). He'll update the issue with his findings after he's done with the research.

@fidencio
Copy link
Contributor

fidencio commented Sep 4, 2020

Also, Yesterday in the CRI-O meeting Mrunal raised that there may be another solution, based on the fact that we should be only vendoring k8s.io/cri-api, instead of the whole k8s.io/kubernetes. I've started to dig into that and that parallel work may bring good results, but that's a parallel work.

EDITED: Mrunal pointed me to dims' PR for containerd: containerd/cri#1463

@openshift-ci-robot openshift-ci-robot added dco-signoff: no Indicates the PR's author has not DCO signed all their commits. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Sep 4, 2020
@fidencio
Copy link
Contributor

fidencio commented Sep 4, 2020

I ended up opening what may be an alternative version of this PR: #4164

@openshift-ci-robot openshift-ci-robot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Sep 4, 2020
@fgiudici
Copy link
Contributor Author

fgiudici commented Sep 4, 2020

/retest

…obuf

The continerd/ttrpc package, used to support the runtimes of type "vm",
relies on the gogo/protobuf protocol buffer implementation.
Recently, google updated the golang protocol buffer implementation with
a newer and backward incompatible API:
https://blog.golang.org/protobuf-apiv2

The gogo/protobuf protocol has not been updated and is now incompatible
with the newer golang protocol buffer implementation.
Unfortunately, the containerd/ttrpc package imports also the package:
google.golang.org/genproto/googleapis/rpc/status
which is generated by means of the golang protocol buffer. With the
newer version, the generated code is no more compatible with
gogo/protobuf, leading to panic during execution of the ttrpc code.

While the ttrpc package needs a proper fix, this patch brings in a fork
of the containerd/ttrpc package depending upon an older version of the
googleapis/rpc/status package (compiled with the older protobuf API and
so compatible with the gogo/protobuf implementation) which has been forked
in a custom repo too. In this way, ttrpc code will import code that is
all compatible with the older protobuf specification.

Signed-off-by: Francesco Giudici <[email protected]>
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 4, 2020
@fgiudici
Copy link
Contributor Author

fgiudici commented Sep 4, 2020

Import the gogo/protobuf generated version of the gogleapis/rpc/status from the gogo/googleapis repo.
Import a forked version of the google/grpc/status package which has been modified to rely on the gogo gogleapis/rpc/status code.
The changes in the imported packages are now at the bare minimum.

Note that PR #4164 is the current candidate for fixing issue #3991

@openshift-ci-robot
Copy link

@fgiudici: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/openshift-jenkins/e2e_features_fedora 3a41397 link /test e2e_features_fedora
ci/openshift-jenkins/integration_rhel 3a41397 link /test integration_rhel
ci/prow/e2e-aws 3a41397 link /test e2e-aws
ci/openshift-jenkins/e2e_crun_cgroupv2 3a41397 link /test e2e_cgroupv2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fgiudici
Copy link
Contributor Author

fgiudici commented Sep 9, 2020

As PR #4164 has been merged and issue #3991 is now closed, we don't need this anymore (at least for now).
A similar PR has been opened against the containerd/ttrpc package itself: containerd/ttrpc#67.

@fgiudici fgiudici closed this Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

runtime_vm: Panic creating a container due to "panic: protobuf tag not enough fields in Status.state"

6 participants