Working towards an atomic table swap

As per https://github.com/github/gh-osc/issues/26, the solution of atomically swapping the tables is unsafe in the event of death of connections.

I've been considering some other solutions. I've reached dead-end with all of them, but hopefully someone else can pick on where I stopped and find a solution.

An idea based on views, derived from my earlier work on http://code.openark.org/blog/mysql/auto-caching-tables is as follows:
#### Views & SLEEP(), dead-end on the SLEEP() part
- We assume we want to alter table `tbl`
- Ghost table is called `ghost`
- We issue: `create or replace view ghost_view as select * from ghost where sleep(600)>=0`
  (this view is updatable, and can receive `INSERT`, `UPDATE`, `DELETE`)
- To begin the swap, we issue: `rename table tbl to tbl_old, ghost_view to tbl`
  `RENAME` works just as well on views as it does on tables
- Queries are now executing on the view, but are stalling for a _long enough_ period
  We now have time to apply changes onto `ghost`
- Problem/Dead-end: we would like to `rename table tbl to tbl_old_view, ghost to tbl` but the `rename` is blocked until all those hanging queries are complete. There's no way to migrate those already running queries onto the renamed ghost table.
  - I tried two levels of views (a view calling a view calling the table) and then swapping the mid-layered view. Still does not work. The `rename` hangs until the queries complete, which is not what we want.

Any ideas?
#### Views & GET_LOCK(), spaghetti on GET_LOCK()

Before I realized the spaghetti was there, I found this solution to be appealing:
- True, it uses `GET_LOCK`, so same "what if the connection dies" problem is still there
- However I reduce the number of risky connections from 2 to 1
- And then make it possible to have _n_ connections who will share the risk: it would take death of _all n connections_ to cause for a premature `rename` (as opposed to death of _any_ connection)

And then I found the spaghetti.

The solution is similar to the above, but:
- Connection #1 issues `GET_LOCK('ding', 0)` and succeeds
- We `create or replace view ghost_view as select * from ghost where get_lock('ding',600) >= 0`
- We  `rename table tbl to tbl_old, ghost_view to tbl`
  queries are blocked on `tbl` (which is now a `view`)
- We complete writing events onto `ghost`
- We `rename table tbl to tbl_old_view, ghost to tbl`
  - this blocks due to the already existing queries
- We `RELEASE_LOCK('ding')` in connection #1
- _Problem_ queries are released, but are now blocking each other!!!
- `rename` unblocks and we have our migrated table in place.

Before discussing the _Problem_, note that we can variant as follows:
- Have connections #1 .. #n issue a `SELECT GET_LOCK('ding#1', 600)` to `GET_LOCK('ding#n', 600)` (we take _n_ locks)
- Variant the view: `create or replace view ghost_view as select * from ghost where get_lock('ding#1',600)+get_lock('ding#2',600)+...+get_lock('ding#n',600) >= 0`

Such that it would take the death of _all n connections_ to make a premature `rename`.

So, back to the _problem_. Each query will issue `get_lock` and queries will block each other, leading to really scary workload. We can hack around this by doing crazy stuff like:
`create view... select where release_lock(concat('ding', if(get_lock('ding',600) >= 0, '', ''))) >= 0`. This will make sure to release any acquired lock.
- This still scares me because of crazy locking contention.

Ideas?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Working towards an atomic table swap #32

Views & SLEEP(), dead-end on the SLEEP() part

Views & GET_LOCK(), spaghetti on GET_LOCK()

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Working towards an atomic table swap #32

Description

Views & SLEEP(), dead-end on the SLEEP() part

Views & GET_LOCK(), spaghetti on GET_LOCK()

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions