KEMBAR78
Supporting UnreachableIntermediateMasterWithLaggingReplicas by shlomi-noach · Pull Request #1005 · openark/orchestrator · GitHub
Skip to content
This repository was archived by the owner on Feb 18, 2025. It is now read-only.

Conversation

@shlomi-noach
Copy link
Collaborator

Fixes #999

This PR introduces the UnreachableIntermediateMasterWithLaggingReplicas analysis. As the name suggests, when orchestrator cannot reach an intermediate master, and in addition all of its replicas are lagging -- this analysis is made.

The remediation is similar to that of UnreachableMasterWithLaggingReplicas: orchestrator emergently restarts replication IO_thread on all replicas of said intermediate master.

In scenarios like the one depicted in #999, the replicas then quick identify themselves to be broken. Thus, a next failure detection by orchestrator is expected to analyze a DeadIntermediateMaster and kick a failover.

cc @jfg956

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Orchestrator not detecting intermediate master failure with relay_log_space_limit.

1 participant