pt-online-schema-change is a tool in the Percona Toolkit which allows you to make large table alters without locking the database.

Newer versions of MySQL have Online DDL which reduce the need for this tool. But if you’re like us, pt-online-schema-change is still very valuable because we use Galera. Running a DDL query on a Galera cluster blocks the entire cluster (even other databases in the same cluster).

I’ve successfully used pt-online-schema-change in the past with great success. This week I tried to use it again to perform some expensive queries and I ran into an error message like this:

Error creating new table: DBD::mysql::db do failed: Can't write; duplicate key in table '_child_new'

A bit of Google-fu and I discovered it was linked to this bug.

The bug is marked as fixed in version 2.2.17 – but it is not. If you need a truly fixed version, jump to the end of this post: The Fix.

Briefly: How the tool works

To understand the bug, you need to understand a bit about how the tool actually works.

pt-online-schema-change creates a brand new table, applies your ALTER query on the new table (which is “fast” because it’s empty), and then proceeds to copy records from the old table to the new table in chunks. It installs triggers on the old table to handle deletes and updates. Basically, you’re syncing the two tables together.

When the new table is synced with the old table, the tool switches them: The old table is dropped and the new table is renamed to the real table name.

The Problem: Foreign Keys

The problem is foreign keys. The FK needs to exist simultaneously in the old table and the new table, but FK names must be unique across the entire database.

To work around this, pt-online-schema-change prefixes the FK’s with an underscore in the new table. Example:

  • Original FK: MY_FK
  • New FK: _MY_FK

Then when the tables are swapped, you’re still left with that prefixed FK in the production table.

Run the tool a second time? Well, once upon a time, you’d just end up with a double-prefixed FK. So the second time, you’d have __MY_FK. The third time, you’d have ___MY_FK… etc.

The issue is that the maximum length of a FK is 64 characters. Imagine a schema changing many times over the course of a few years, you can easily breach this limit.

So Percona patched the tool so it would detect if there was a FK with an underscore, and switch to stripping the underscore. In other words, they wanted it to alternate: MY_FK to _MY_FK, then back to MY_FK again if you ran it again.

The Bug

The bug is that the original patch checked for any FK with underscores. Meaning, if you had _MY_FK and MY_OTHER_FK, it would detect the underscore on the first one and assume that it needed to strip FKs. Of course, there is no underscore on the second, so it would be a no-op.

  • Originals: _MY_FK, MY_OTHER_FK

As you see, a new table would attempt to create a FK with a name that already exists. And that is the error.

The Fix

Peter Dolberg, the op of the bug report, supplied a suggested patch. Despite the bug being marked as fixed in 2.2.17, it is not. What I ended up doing is copying the source, patching it myself, and then I just used the patched version:

  1. Install percona-toolkit the usual way.

  2. Copy the source of the pt-online-schema-change tool.

cp `which pt-online-schema-change` my-pt-osc
  1. Patch it according to Peter’s instructions. I took the liberty of creating a patch file: