-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
gh-ost's statistics, when run on a large, busy table with a lot of inserts, can become inaccurate over time. Internally we've sometimes seen cutovers become available when the migration is reported as 90% complete or less.
Copy: 1045877600/1264045717 82.7%; Applied: 884999578; Backlog: 12/1000; Time: 460h0m0s(total), 459h59m53s(copy); streamer: mysql-bin.012574:687201905; State: migrating; ETA: 95h57m17s
The 1045877600/1264045717 82.7% above is equivalent to count of rows in _gho table/count of rows in original table. Compared to the actual counts, the size of the new _gho table was 11% greater, so the % completed was 93%. (The count of rows in the original table was quite accurate, if not exact).
Currently gh-ost determines the table counts by
- Getting the row count at the beginning (whether the exact row count or an estimate),
- Parsing the binlogs: +1 for inserts and -1 for deletes.
- Getting the rows_affected from the
insert into _gho select ... from original_table.
The counts are likely inaccurate because due to the concurrent threads of row copying and binlog parsing/applying, we don't know if a binlog INSERT results in a net 1 row increase (since it becomes a REPLACE, and might cancel out via delete+insert) and we don't know if a DELETE deletes a row (because it hadn't been copied over yet by the copy thread). Likely since gh-ost is erring on the side of too-low counts, @shlomi-noach guesses it's the parsed delete counts that are the bigger cause of the discrepancy.
If we have gh-ost get all the _gho row counts from the rows_affected of the writes to the _gho table, this should result in more accurate statistics. E.g.
Line 1011 in 6524a15
| if _, err := tx.Exec(buildResult.query, buildResult.args...); err != nil { |