KEMBAR78
Intermittent failures in localtests · Issue #753 · github/gh-ost · GitHub
Skip to content

Intermittent failures in localtests #753

@zmoazeni

Description

@zmoazeni

Per #750 I was going to try digging into some possible causes for the intermittent failures in localtests.

I now have a way to reliable reproduce it. zmoazeni@b3955d1

If you read the commit, it'll explain how to reproduce (just run ./force-failure). It may take ~4-15 iterations before you run into the issue. It doesn't take that long. About a minute or two.

The problem seems to be in

gh-ost/go/logic/applier.go

Lines 711 to 723 in ffc6c40

func (this *Applier) ExpectProcess(sessionId int64, stateHint, infoHint string) error {
found := false
query := `
select id
from information_schema.processlist
where
id != connection_id()
and ? in (0, id)
and state like concat('%', ?, '%')
and info like concat('%', ?, '%')
`
err := sqlutils.QueryRowsMap(this.db, query, func(m sqlutils.RowMap) error {
found = true

The timing issue is that at the time we query, the sessionId does exist, however neither the state nor the info columns are populated. So we end up getting no rows/found=false based on our hints in the where clause and we return an error.

Update: A followup query does include both. Which made it hard to debug at first, but I decided to push the logic into Go instead which highlighted the problem.

Here's the output of a full run for me that failed on the 3rd attempt (it says 2nd attempt, but...off by one errors :/)

https://gist.github.com/zmoazeni/e40ba08113bc5f8e3843443d51e03b66


I wonder what you think @shlomi-noach (or anyone else). Do you think we should retry this query? Should we do some special handling on the hint criteria? Something else entirely?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions