Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gopherjs deadlock with mqtt/websocket library port #1106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Bluebugs opened this issue Feb 18, 2022 · 4 comments · Fixed by #1108
Closed

gopherjs deadlock with mqtt/websocket library port #1106

Bluebugs opened this issue Feb 18, 2022 · 4 comments · Fixed by #1108
Assignees
Labels

Comments

@Bluebugs
Copy link

Trying gopherjs with mqtt/websocket in both Chrome and Firefox as in this branch: https://github.com/Bluebugs/paho.mqtt.golang/tree/bugs/gopherjs-deadlock lead to a deadlock in the following function: https://github.com/Bluebugs/paho.mqtt.golang/blob/bugs/gopherjs-deadlock/token.go#L101 when called from https://github.com/Bluebugs/paho.mqtt.golang/blob/bugs/gopherjs-deadlock/net.go#L216 .

I tried to write a test that would trigger the problem, but couldn't figure out a way for gopherjs test to execute it as it seems some syscall are missing in that case when doing websocket related traffic.

Instead I added a small example in cmd/websocket that work fine with go run or inside a browser with go wasm target, but fail with gopherjs serve and endup in a dead lock. It use mosquitto public mqtt server which provide websocket interface.

A working output would look like:

[client]   Connect()
[store]    memorystore initialized
[client]   about to write new connect msg
[client]   socket connected to broker
[client]   Using MQTT 3.1.1 protocol
[net]      connect started
[net]      received connack
[client]   startCommsWorkers called
[client]   client is connected/reconnected
[net]      incoming started
[net]      startIncomingComms started
[net]      outgoing started
[net]      startComms started
[client]   startCommsWorkers done
[pinger]   keepalive starting
[net]      outgoing waiting for an outbound message
[client]   exit startClient
[client]   enter Subscribe
[net]      logic waiting for msg on ibound
[net]      startIncomingComms: inboundFromStore complete
[net]      logic waiting for msg on ibound
[client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [gopherjs/tests/attributes]
[client]   sending subscribe message, topic: gopherjs/tests/attributes
[net]      obound priority msg to write, type *packets.SubscribePacket
[client]   exit Subscribe
[net]      outgoing waiting for an outbound message
[net]      startIncoming Received Message
[net]      startIncomingComms: got msg on ibound
[net]      startIncomingComms: received suback, id: 1
[net]      startIncomingComms: granted qoss [1]
[net]      logic waiting for msg on ibound
[client]   enter Publish
[client]   sending publish message, topic: gopherjs/tests/attributes
[net]      obound msg to write 2
[net]      obound wrote msg, id: 2
[net]      outgoing waiting for an outbound message
[net]      startIncoming Received Message
[net]      startIncomingComms: got msg on ibound
[net]      startIncomingComms: received publish, msgId: 1
[net]      logic waiting for msg on ibound
[client]   enter Unsubscribe
[client]   sending unsubscribe message, topics: [gopherjs/tests/attributes]
[client]   exit Unsubscribe
[net]      putting puback msg on obound
[store]    memorystore del: message 1 was deleted
[net]      done putting puback msg on obound
[net]      obound priority msg to write, type *packets.UnsubscribePacket
[net]      outgoing waiting for an outbound message
[net]      obound priority msg to write, type *packets.PubackPacket
[net]      outgoing waiting for an outbound message
[net]      startIncoming Received Message
[net]      startIncomingComms: got msg on ibound
[store]    memorystore del: message 2 was deleted
[net]      startIncomingComms: received puback, id: 2
[net]      logic waiting for msg on ibound

While a non working one will look like :

[client]   Connect()
[store]    memorystore initialized
[client]   about to write new connect msg
[client]   socket connected to broker
[client]   Using MQTT 3.1.1 protocol
[net]      connect started
[net]      received connack
[client]   startCommsWorkers called
[client]   client is connected/reconnected
[net]      incoming started
[net]      startIncomingComms started
[net]      outgoing started
[net]      startComms started
[client]   startCommsWorkers done
[client]   exit startClient
[pinger]   keepalive starting
[net]      logic waiting for msg on ibound
[net]      startIncomingComms: inboundFromStore complete
[net]      logic waiting for msg on ibound
[net]      outgoing waiting for an outbound message
[client]   enter Subscribe
[client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [gopherjs/tests/attributes]
[client]   sending subscribe message, topic: gopherjs/tests/attributes
[client]   exit Subscribe
[net]      obound priority msg to write, type *packets.SubscribePacket
[net]      outgoing waiting for an outbound message
[net]      startIncoming Received Message
[net]      startIncomingComms: got msg on ibound
[net]      startIncomingComms: received suback, id: 1
[net]      startIncomingComms: granted qoss [1]
[client]   enter Publish
[client]   sending publish message, topic: gopherjs/tests/attributes
[net]      obound msg to write 2
[net]      obound wrote msg, id: 2
[net]      outgoing waiting for an outbound message
[net]      startIncoming Received Message
[pinger]   ping check 4.822
[pinger]   ping check 9.822
[pinger]   ping check 14.821
[pinger]   ping check 19.819
[pinger]   ping check 24.819

As you can see in the non working case, it never go into [net] logic waiting for msg on ibound after [net] startIncomingComms: granted qoss [1].

@nevkontakte
Copy link
Member

Thanks for the detailed report, I was able to reproduce it. I'll see if I can find the root cause.

@nevkontakte
Copy link
Member

@Bluebugs I can confirm this is a GopherJS compiler bug. Your intuition was spot on — the bug happens when baseToken.flowComplete() is called from this goroutine. I was able to reduce reproduction to this example: https://gopherjs.github.io/playground/#/j4KWe1e1vS. Note that the Do() method here is called via an interface iface. If we do the same directly in the concrete type impl, all works as expected: https://gopherjs.github.io/playground/#/pH3zCKSEWm. For the sake of sharing knowledge, here's what is happening.

Let's look at what this function compiled into:

func (i *impl) Do() {
	select {
	default:
	}
}
impl.ptr.prototype.Do = function() {
	var _selection, i, $r;
	// Restore function context if the goroutine is being resumed:
	/* */ var $f, $c = false; if (this !== undefined && this.$blk !== undefined) { $f = this; $c = true; _selection = $f._selection; i = $f.i; $r = $f.$r; }
	i = this;
	_selection = $select([[]]);
	if (_selection[0] === 0) {
	}
	// Checkpoint function context if the goroutine is blocked. But we actually do it unconditionally?
	/* */ if ($f === undefined) { $f = { $blk: impl.ptr.prototype.Do }; } $f._selection = _selection; $f.i = i; $f.$r = $r; return $f;
	};
};

Note the lines at the beginning and the end of the compiled function. These are typical prologue and epilogue of a GopherJS function that may get blocked and needs to checkpoint/restore its state. JavaScript runtime doesn't allow blocking functions, but Go very much does, so GopherJS has to emulate it by detecting when the function has blocked, saving local variable state and returning early. Once blocking condition passed, the function will be called again, restore its context and continue execution.

What seems suspicious here is that function Do() is not supposed to be blocking, select{} with a default close never blocks and there's nothing else in the function.

Besides, depending on how we call the function (directly or via an interface) causes the bug to happen or not. Let's look at the call site.

Called directly as a method of impl:

ii.Do();

Called via an interface:

$r = ii.Do(); /* */ $s = 1; case 1: if($c) { $c = false; $r = $r.$blk(); } if ($r && $r.$blk !== undefined) { break s; }

This is actually interesting. When Do() is called directly, its return value is ignored. That means that at least from the caller perspective GopherJS compiler knows this function is not supposed to be blocking, so it doesn't bother to check if it returned its checkpointed state, which is done in the interest of reducing compiled output size. So even though the function attempts to checkpoint, the code continues to execute as normal. But when calling the function via an interface, compiler has no way of knowing whether the actual function will be blocking, so it must check if it attempted to checkpoint itself. If it did, it causes the caller function to checkpoint itself as well and return — all the way up the goroutine stack.

So what we observe here is that the Do() function checkpoints itself as if it was blocked and interrupts the goroutine. But because from the runtime perspective the goroutine isn't supposed to be blocked, it doesn't know how to pass control back to it, and so the goroutine effectively gets lost.

I'll need more time to investigate why the compiler inserts checkpointing logic where it shouldn't, but luckily there is an easy workaround: adding an explicit return at the end of the function:

func (i *impl) Do() {
	select {
	default:
	}
	return
}

In this case generated code looks like so:

impl.ptr.prototype.Do = function() {
	var _selection, i, $r;
	/* */ var $f, $c = false; if (this !== undefined && this.$blk !== undefined) { $f = this; $c = true; _selection = $f._selection; i = $f.i; $r = $f.$r; }
	i = this;
	_selection = $select([[]]);
	if (_selection[0] === 0) {
	}
	return;
	// The line below is never executed, bug avoided.
	/* */ if ($f === undefined) { $f = { $blk: impl.ptr.prototype.Do }; } $f._selection = _selection; $f.i = i; $f.$r = $r; return $f;
};

nevkontakte added a commit to nevkontakte/gopherjs that referenced this issue Feb 20, 2022
In almost every place compiler checks whether a function is blocking by
checking `len(c.Blocking) > 0`, so assigning false to the map confuses
the check, causing unnecessary checkpointing prologue and epilogue to be
added to the function.

Fixes gopherjs#1106.
@nevkontakte
Copy link
Member

Actually, never mind, the fix turned out to be simpler than I though: #1108.

nevkontakte added a commit to nevkontakte/gopherjs that referenced this issue Feb 20, 2022
In almost every place compiler checks whether a function is blocking by
checking `len(c.Blocking) > 0`, so assigning false to the map confuses
the check, causing unnecessary checkpointing prologue and epilogue to be
added to the function.

Fixes gopherjs#1106.
@Bluebugs
Copy link
Author

Thanks for the quick fix and the detailed explanation of how gopherjs work, very much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants