Avoid potential deadlock using non-blocking send

Deadlock scenario:
    1. Due to scheduling, the state runner sends one snapshot into
    snapCh and then attempts to send a second. The first send succeeds
    because the channel is buffered, but the second blocks.
    2. Separately, Manager.Watch is called by the xDS server after
    getting a discovery request from Envoy. This function acquires the
    manager lock and then blocks on receiving the CurrentSnapshot from
    the state runner.
    3. Separately, there is a Manager goroutine that reads the snapshots
    from the channel in step 1. These reads are done to notify proxy
    watchers, but they require holding the manager lock. This goroutine
    goes to acquire that lock, but can't because it is held by step 2.

Now, the goroutine from step 3 is waiting on the one from step 2 to
release the lock. The goroutine from step 2 won't release the lock until
the goroutine in step 1 advances. But the goroutine in step 1 is waiting
for the one in step 3. Deadlock.

By making this send non-blocking step 1 above can proceed. The coalesce
timer will be reset and a new valid snapshot will be delivered after it
elapses or when one is requested by xDS.
This commit is contained in:
freddygv 2021-02-02 11:31:14 -07:00
parent cf9a14ab6a
commit 37190c0d0d

View File

@ -622,7 +622,14 @@ func (s *state) run() {
)
continue
}
s.snapCh <- *snapCopy
select {
case s.snapCh <- *snapCopy:
// try to send
default:
// avoid blocking if a snapshot is already buffered
}
// Allow the next change to trigger a send
coalesceTimer = nil