MySQL replication fails with WSREP error after restart

Scenario

A single Percona MySQL 5.7 server replicates asynchronously with a cluster of Percona MySQL 5.7 servers. After rebooting the single server, replication failed with the following symptoms:

mysql> show slave status \G
**************************** 1. row **************
                Slave_IO_State: Waiting to reconnect after a failed registration on master
                   Master_Host: my-replication-partner
                   Master_User: repl
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mysql-bin.000120
           Read_Master_Log_Pos: 3848512
                Relay_Log_File: my-host-hame-relay-bin.000012
                 Relay_Log_Pos: 4
         Relay_Master_Log_File: mysql-bin.000120
              Slave_IO_Running: Connecting
             Slave_SQL_Running: Yes

I checked my MySQL text log (/var/log/mysql.log in my case, but the path may vary, or messages may go to syslog) and found the following error:

Get master clock failed with error: WSREP has not yet prepared node for application use

I was surprised, because I found this error on the single server (not a cluster member), and WSREP errors are related to the Galera clustering library. Why was my non-clustered node reporting a cluster error?

Root Cause Analysis

I then logged into the replication partner, which is a member of a Galera cluster. I realized that the clustering had also failed because the cluster size had decreased to one node. The WSREP error on the non-clustered node was referring to an error state on its clustered replication partner. I had to bootstrap the cluster to restart clustering and restore the cluster to 3 nodes. Once the cluster was running, replication magically started working again. It appears that MySQL replication is smart enough to realize when it’s replicating against a member of a cluster. If that cluster has an issue, the clustered replication partner might not have the latest state of the database. If the non-clustered partner went ahead with replication, it might receive an old state. Therefore, replication stops until the cluster issue is resolved. This feature doesn’t seem to be documented, but it is impressive.

Scenario

Root Cause Analysis

Leave a Comment Cancel Reply