-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Per #750 I was going to try digging into some possible causes for the intermittent failures in localtests.
I now have a way to reliable reproduce it. zmoazeni@b3955d1
If you read the commit, it'll explain how to reproduce (just run ./force-failure). It may take ~4-15 iterations before you run into the issue. It doesn't take that long. About a minute or two.
The problem seems to be in
Lines 711 to 723 in ffc6c40
| func (this *Applier) ExpectProcess(sessionId int64, stateHint, infoHint string) error { | |
| found := false | |
| query := ` | |
| select id | |
| from information_schema.processlist | |
| where | |
| id != connection_id() | |
| and ? in (0, id) | |
| and state like concat('%', ?, '%') | |
| and info like concat('%', ?, '%') | |
| ` | |
| err := sqlutils.QueryRowsMap(this.db, query, func(m sqlutils.RowMap) error { | |
| found = true |
The timing issue is that at the time we query, the sessionId does exist, however neither the state nor the info columns are populated. So we end up getting no rows/found=false based on our hints in the where clause and we return an error.
Update: A followup query does include both. Which made it hard to debug at first, but I decided to push the logic into Go instead which highlighted the problem.
Here's the output of a full run for me that failed on the 3rd attempt (it says 2nd attempt, but...off by one errors :/)
https://gist.github.com/zmoazeni/e40ba08113bc5f8e3843443d51e03b66
I wonder what you think @shlomi-noach (or anyone else). Do you think we should retry this query? Should we do some special handling on the hint criteria? Something else entirely?