MySQL MGR Troubleshooting Manual¶
Common Commands¶
Retrieve Root Password¶
To find the secret resource ending with "-mgr-secret" in the namespace of the MySQL MGR cluster, here is an example to retrieve the secret for the "kpanda-mgr" cluster:
kubectl get secrets/kpanda-mgr-mgr-secret -n mcamel-system --template={{.data.rootPassword}} | base64 -d
The command will output the root password, for example:
Check Cluster Status¶
You can check the cluster status using the MySQL command line:
Replace Password
with the actual root password retrieved in the previous step.
sh-4.4$ mysqlsh -uroot -pPassword -- cluster status
{
"clusterName": "kpanda_mgr",
"defaultReplicaSet": {
"name": "default",
"primary": "kpanda-mgr-2.kpanda-mgr-instances.mcamel-system.svc.cluster.local:3306",
"ssl": "REQUIRED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"kpanda-mgr-0.kpanda-mgr-instances.mcamel-system.svc.cluster.local:3306": {
"address": "kpanda-mgr-0.kpanda-mgr-instances.mcamel-system.svc.cluster.local:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
},
"kpanda-mgr-1.kpanda-mgr-instances.mcamel-system.svc.cluster.local:3306": {
"address": "kpanda-mgr-1.kpanda-mgr-instances.mcamel-system.svc.cluster.local:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
},
"kpanda-mgr-2.kpanda-mgr-instances.mcamel-system.svc.cluster.local:3306": {
"address": "kpanda-mgr-2.kpanda-mgr-instances.mcamel-system.svc.cluster.local:3306",
"memberRole": "PRIMARY",
"mode": "R/W",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "kpanda-mgr-2.kpanda-mgr-instances.mcamel-system.svc.cluster.local:3306"
}
Note
Under normal circumstances in the cluster:
- The status of all nodes should be ONLINE.
- One node should have the memberRole of PRIMARY, while the other nodes should have the memberRole of SECONDARY.
To check using an SQL statement: SELECT * FROM performance_schema.replication_group_members\G
mysql> SELECT * FROM performance_schema.replication_group_members\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 6f464f4e-ba96-11ee-a028-a225dc125542
MEMBER_HOST: kpanda-mgr-2.kpanda-mgr-instances.mcamel-system.svc.cluster.local
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.31
MEMBER_COMMUNICATION_STACK: MySQL
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: b3a53102-bfec-11ee-a821-0a8fb9f7d1ce
MEMBER_HOST: kpanda-mgr-0.kpanda-mgr-instances.mcamel-system.svc.cluster.local
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: SECONDARY
MEMBER_VERSION: 8.0.31
MEMBER_COMMUNICATION_STACK: MySQL
*************************** 3. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: bdddd16f-bfec-11ee-a7a4-324c8edaca40
MEMBER_HOST: kpanda-mgr-1.kpanda-mgr-instances.mcamel-system.svc.cluster.local
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: SECONDARY
MEMBER_VERSION: 8.0.31
MEMBER_COMMUNICATION_STACK: MySQL
3 rows in set (0.00 sec)
View Member Status¶
View member status: SELECT * FROM performance_schema.replication_group_member_stats\G
mysql> SELECT * FROM performance_schema.replication_group_member_stats\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
VIEW_ID: 17066729271025607:9
MEMBER_ID: 6f464f4e-ba96-11ee-a028-a225dc125542
COUNT_TRANSACTIONS_IN_QUEUE: 0
COUNT_TRANSACTIONS_CHECKED: 4748638
COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 1109
TRANSACTIONS_COMMITTED_ALL_MEMBERS: 6f464f4e-ba96-11ee-a028-a225dc125542:1-10,
95201c7d-ba96-11ee-a018-bed74fb0bf8d:1-8,
b438e224-ba96-11ee-bc57-bed74fb0bf8d:1-12516339,
b439a7d3-ba96-11ee-bc57-bed74fb0bf8d:1-18
LAST_CONFLICT_FREE_TRANSACTION: b438e224-ba96-11ee-bc57-bed74fb0bf8d:12519298
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
COUNT_TRANSACTIONS_REMOTE_APPLIED: 6
COUNT_TRANSACTIONS_LOCAL_PROPOSED: 4748638
COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
VIEW_ID: 17066729271025607:9
MEMBER_ID: b3a53102-bfec-11ee-a821-0a8fb9f7d1ce
COUNT_TRANSACTIONS_IN_QUEUE: 0
COUNT_TRANSACTIONS_CHECKED: 4514132
COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 1110
TRANSACTIONS_COMMITTED_ALL_MEMBERS: 6f464f4e-ba96-11ee-a028-a225dc125542:1-10,
95201c7d-ba96-11ee-a018-bed74fb0bf8d:1-8,
b438e224-ba96-11ee-bc57-bed74fb0bf8d:1-12519027,
b439a7d3-ba96-11ee-bc57-bed74fb0bf8d:1-18
LAST_CONFLICT_FREE_TRANSACTION: b438e224-ba96-11ee-bc57-bed74fb0bf8d:12520590
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
COUNT_TRANSACTIONS_REMOTE_APPLIED: 4514129
COUNT_TRANSACTIONS_LOCAL_PROPOSED: 0
COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
*************************** 3. row ***************************
CHANNEL_NAME: group_replication_applier
VIEW_ID: 17066729271025607:9
MEMBER_ID: bdddd16f-bfec-11ee-a7a4-324c8edaca40
COUNT_TRANSACTIONS_IN_QUEUE: 0
COUNT_TRANSACTIONS_CHECKED: 4658713
COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 1093
TRANSACTIONS_COMMITTED_ALL_MEMBERS: 6f464f4e-ba96-11ee-a028-a225dc125542:1-10,
95201c7d-ba96-11ee-a018-bed74fb0bf8d:1-8,
b438e224-ba96-11ee-bc57-bed74fb0bf8d:1-12519027,
b439a7d3-ba96-11ee-bc57-bed74fb0bf8d:1-18
LAST_CONFLICT_FREE_TRANSACTION: b438e224-ba96-11ee-bc57-bed74fb0bf8d:12520335
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
COUNT_TRANSACTIONS_REMOTE_APPLIED: 4658715
COUNT_TRANSACTIONS_LOCAL_PROPOSED: 0
COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
3 rows in set (0.00 sec)
Assign Member Role¶
-
Assign a node as PRIMARY.
-
mysqlsh
syntax
Common Failure Scenarios¶
A SECONDARY node is not in ONLINE status¶
{
"clusterName": "mgr0117",
"defaultReplicaSet": {
"name": "default",
"primary": "mgr0117-2.mgr0117-instances.m0103.svc.cluster.local:3306",
"ssl": "REQUIRED",
"status": "OK_NO_TOLERANCE_PARTIAL",
"statusText": "Cluster is NOT tolerant to any failures. 1 member is not active.",
"topology": {
"mgr0117-0.mgr0117-instances.m0103.svc.cluster.local:3306": {
"address": "mgr0117-0.mgr0117-instances.m0103.svc.cluster.local:3306",
"instanceErrors": [
"NOTE: group_replication is stopped."
],
"memberRole": "SECONDARY",
"memberState": "OFFLINE",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)",
"version": "8.0.31"
},
"mgr0117-1.mgr0117-instances.m0103.svc.cluster.local:3306": {
"address": "mgr0117-1.mgr0117-instances.m0103.svc.cluster.local:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
},
"mgr0117-2.mgr0117-instances.m0103.svc.cluster.local:3306": {
"address": "mgr0117-2.mgr0117-instances.m0103.svc.cluster.local:3306",
"memberRole": "PRIMARY",
"mode": "R/W",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.31"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "mgr0117-2.mgr0117-instances.m0103.svc.cluster.local:3306"
}
Here, we see the proper address field is mgr0117-0.mgr0117-instances.m0103.svc.cluster.local:3306. Enter the mgr0117-0 pod and execute:
If the data volume is large, this node will remain in the RECOVERING state for a relatively long time.
No PRIMARY Node¶
All Nodes Show OFFLINE¶
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | | | NULL | OFFLINE | | | |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.00 sec)
At this point, you can attempt to restart the cluster from the MySQL shell:
If the issue persists, log in to the previous PRIMARY node using the command line and start the group replication on that node:
set global group_replication_bootstrap_group=on;
start group_replication;
set global group_replication_bootstrap_group=off;
Warning
For the other nodes, run the above commands sequentially.