跳到主要内容

MySQL运维与架构

第12章 备份与恢复

12.1 备份类型与策略

备份类型对比

备份类型说明优点缺点适用场景
逻辑备份导出SQL语句跨平台、可编辑恢复慢、需要解析小数据量、跨版本迁移
物理备份复制数据文件恢复快、原始数据跨平台性差大数据量、快速恢复
全量备份备份所有数据恢复简单耗时长、空间大周期性完整备份
增量备份备份变化数据节省空间和时间恢复复杂频繁备份
差异备份备份上次全量后的变化恢复比增量简单空间比增量大折中方案
热备份在线备份(不锁表)不影响业务需要特殊工具24/7服务
温备份在线备份(读锁)相对简单影响写操作允许短暂停写
冷备份离线备份(停库)一致性好停机影响业务维护窗口

备份策略制定

-- 备份策略示例

-- 1. 全量 + 增量备份策略
-- 周日:全量备份
-- 周一至周六:增量备份
-- 保留周期:4周

-- 2. 全量 + 二进制日志策略
-- 每天凌晨:全量备份
-- 实时:二进制日志备份
-- 保留周期:7天

-- 3. 双备份策略
-- 本地备份:快速恢复
-- 异地备份:灾难恢复

-- 备份周期示例
-- 小型数据库(<10GB):每天全量 + binlog
-- 中型数据库(10-100GB):每周全量 + 每天增量 + binlog
-- 大型数据库(>100GB):每月全量 + 每周增量 + 每天binlog

12.2 逻辑备份工具

mysqldump

# mysqldump基础用法

# 1. 备份单个数据库
mysqldump -u root -p --databases blogdb > blogdb_backup.sql

# 2. 备份多个数据库
mysqldump -u root -p --databases blogdb testdb > multi_db_backup.sql

# 3. 备份所有数据库
mysqldump -u root -p --all-databases > all_databases_backup.sql

# 4. 备份特定表
mysqldump -u root -p blogdb users orders > tables_backup.sql

# 5. 只备份表结构(不备份数据)
mysqldump -u root -p --no-data blogdb > blogdb_schema.sql

# 6. 只备份数据(不备份表结构)
mysqldump -u root -p --no-create-info blogdb > blogdb_data.sql

# 7. 备份存储过程和函数
mysqldump -u root -p --routines --triggers --events blogdb > blogdb_routines.sql

# 8. 压缩备份
mysqldump -u root -p blogdb | gzip > blogdb_backup.sql.gz

# 9. 并行备份(MyISAM)
mysqldump -u root -p --lock-all-tables --all-databases > backup.sql

# 10. InnoDB热备份(推荐)
mysqldump -u root -p --single-transaction --routines --triggers --events \
--all-databases > backup.sql

# 关键参数说明
# --single-transaction: InnoDB热备份,不锁表
# --master-data[=1]: 记录binlog位置(1-注释形式,2-非注释)
# --flush-logs: 备份前刷新日志
# --quick: 逐行检索,避免内存溢出
# --lock-tables: 锁定要备份的表(MyISAM)
# --lock-all-tables: 锁定所有数据库(全局读锁)
# --databases: 指定数据库
# --all-databases: 备份所有数据库
# --add-drop-database: 添加DROP DATABASE语句
# --add-drop-table: 添加DROP TABLE语句
# --set-gtid-purged=ON: 包含GTID信息

mysqldump高级用法

# 1. 增量备份(配合binlog)
# 全量备份
mysqldump -u root -p --single-transaction --master-data=2 --flush-logs \
--all-databases > full_backup_$(date +%Y%m%d).sql

# 记录binlog位置
grep "CHANGE MASTER TO" full_backup_20240115.sql
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=154;

# 备份binlog(从记录位置开始)
mysqlbinlog --start-position=154 /var/lib/mysql/mysql-bin.000003 > incremental.sql

# 2. 部分备份(仅部分表)
mysqldump -u root -p --where="created_at >= '2024-01-01'" \
blogdb orders > orders_partial.sql

# 3. 排除某些表
mysqldump -u root -p --ignore-table=blogdb.logs \
--ignore-table=blogdb.temp_data blogdb > backup.sql

# 4. 仅备份视图、存储过程、触发器
mysqldump -u root -p --no-data --no-create-info --routines --triggers --events \
blogdb > blogdb_objects.sql

# 5. 分片备份(大表)
# 备份orders表的第1-100万行
mysqldump -u root -p --where="1 LIMIT 1000000" blogdb orders > orders_part1.sql

# 6. 远程备份
mysqldump -u root -p -h remote_host blogdb | gzip > backup.sql.gz

# 7. 备份到远程服务器
mysqldump -u root -p blogdb | ssh user@remote "cat > /backup/blogdb.sql"

# 8. 加密备份
mysqldump -u root -p blogdb | openssl enc -aes-256-cbc -salt > backup.enc

mysqldump恢复

# 1. 恢复整个数据库
mysql -u root -p blogdb < blogdb_backup.sql

# 2. 恢复压缩备份
gunzip < blogdb_backup.sql.gz | mysql -u root -p blogdb

# 3. 恢复加密备份
openssl enc -aes-256-cbc -d -in backup.enc | mysql -u root -p blogdb

# 4. 从远程恢复
ssh user@remote "cat /backup/blogdb.sql" | mysql -u root -p blogdb

# 5. 恢复到指定时间点(全量 + binlog)
# 恢复全量备份
mysql -u root -p < full_backup_20240115.sql

# 应用binlog
mysqlbinlog --start-datetime="2024-01-15 10:00:00" \
--stop-datetime="2024-01-15 11:00:00" \
mysql-bin.000003 | mysql -u root -p

# 6. 恢复特定位置
mysqlbinlog --start-position=154 --stop-position=500 \
mysql-bin.000003 | mysql -u root -p

# 7. 恢复特定数据库
mysql -u root -p --one-database blogdb < all_databases_backup.sql

# 8. 并行恢复(大文件)
split -d -l 1000000 backup.sql backup_part_
# 然后并行执行
cat backup_part_00 | mysql -u root -p &
cat backup_part_01 | mysql -u root -p &
cat backup_part_02 | mysql -u root -p &
wait

mysqlpump(MySQL 5.7+)

# mysqlpump:并行备份工具(比mysqldump快)

# 1. 基础用法
mysqlpump -u root -p --databases blogdb > blogdb_backup.sql

# 2. 并行备份(默认2线程)
mysqlpump -u root -p --parallel-threads=4 blogdb > blogdb_backup.sql

# 3. 多数据库并行备份
mysqlpump -u root -p --parallel-schemas=db1:db2:db3 > backup.sql

# 4. 排除数据库
mysqlpump -u root -p --exclude-databases=test,temp > backup.sql

# 5. 只备份特定表
mysqlpump -u root -p blogdb users orders > backup.sql

# 6. 压缩备份
mysqlpump -u root -p blogdb | gzip > blogdb_backup.sql.gz

# 7. 进度显示
mysqlpump -u root -p --watch-progress blogdb > blogdb_backup.sql

# mysqlpump vs mysqldump
# 优势:并行备份、进度显示、更好的压缩
# 劣势:MySQL 5.7+才支持

mydumper

# mydumper:多线程备份工具

# 安装
# CentOS/RHEL: yum install mydumper
# Ubuntu: apt-get install mydumper

# 1. 基础备份
mydumper -u root -p -o /backup/

# 2. 备份指定数据库
mydumper -u root -p -d blogdb -o /backup/blogdb/

# 3. 并行备份(4线程)
mydumper -u root -p -t 4 -o /backup/

# 4. 备份特定表
mydumper -u root -p -B blogdb -T users,orders -o /backup/

# 5. 压缩备份
mydumper -u root -p -c -o /backup/

# 6. 记录binlog位置
mydumper -u root -p --trx-consistency-only --binlog -o /backup/

# 7. 备份文件说明
# -schema.sql: 表结构
# -table.sql: 表数据
# metadata: 元数据(包括binlog位置)

# mydumper优势
# 1. 多线程备份
# 2. 支持断点续传
# 3. 文件组织清晰
# 4. 支持守护进程模式

# 恢复:myloader
myloader -u root -p -d /backup/blogdb/ -B blogdb

# 并行恢复
myloader -u root -p -t 4 -d /backup/blogdb/ -B blogdb

12.3 物理备份工具

XtraBackup

# Percona XtraBackup:在线热备份工具(推荐)

# 安装
# CentOS/RHEL
yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
percona-release enable-only tools release
yum install percona-xtrabackup-80

# Ubuntu
apt-get install percona-xtrabackup-80

# 1. 全量备份
xtrabackup --backup --target-dir=/backup/full \
--user=root --password=password

# 2. 压缩备份
xtrabackup --backup --compress --target-dir=/backup/full \
--user=root --password=password

# 3. 流式备份(备份到标准输出)
# 备份到tar文件
xtrabackup --backup --stream=xbstream \
--user=root --password=password > /backup/full.xbstream

# 压缩并备份
xtrabackup --backup --stream=xbstream --compress \
--user=root --password=password | gzip > /backup/full.xbstream.gz

# 4. 增量备份(基于全量备份)
# 全量备份
xtrabackup --backup --target-dir=/backup/full \
--user=root --password=password

# 增量备份1
xtrabackup --backup --target-dir=/backup/inc1 \
--incremental-basedir=/backup/full \
--user=root --password=password

# 增量备份2
xtrabackup --backup --target-dir=/backup/inc2 \
--incremental-basedir=/backup/inc1 \
--user=root --password=password

# 5. 并行备份(加速大数据库)
xtrabackup --backup --parallel=4 --target-dir=/backup/full \
--user=root --password=password

# 6. 限流备份(避免影响业务)
xtrabackup --backup --throttle=100 --target-dir=/backup/full \
--user=root --password=password

# 7. 备份特定数据库
xtrabackup --backup --databases="blogdb testdb" \
--target-dir=/backup/partial \
--user=root --password=password

# 8. 从库备份(不停止复制)
xtrabackup --backup --safe-slave-backup --slave-info \
--target-dir=/backup/slave \
--user=root --password=password

# 9. 加密备份
xtrabackup --backup --encrypt=AES256 \
--encrypt-key-file=/backup/key \
--target-dir=/backup/full \
--user=root --password=password

XtraBackup恢复

# 1. 准备备份(应用日志)
xtrabackup --prepare --target-dir=/backup/full

# 2. 恢复全量备份
# 停止MySQL
systemctl stop mysqld

# 备份原数据
mv /var/lib/mysql /var/lib/mysql.bak

# 恢复数据
xtrabackup --copy-back --target-dir=/backup/full

# 修改权限
chown -R mysql:mysql /var/lib/mysql

# 启动MySQL
systemctl start mysqld

# 3. 增量备份恢复
# 准备全量备份
xtrabackup --prepare --apply-log-only --target-dir=/backup/full

# 应用增量1
xtrabackup --prepare --apply-log-only --target-dir=/backup/full \
--incremental-dir=/backup/inc1

# 应用增量2
xtrabackup --prepare --target-dir=/backup/full \
--incremental-dir=/backup/inc2

# 最终prepare
xtrabackup --prepare --target-dir=/backup/full

# 恢复
xtrabackup --copy-back --target-dir=/backup/full

# 4. 流式备份恢复
# 解压备份
xbstream -x < /backup/full.xbstream -C /backup/full

# 准备并恢复
xtrabackup --prepare --target-dir=/backup/full
xtrabackup --copy-back --target-dir=/backup/full

# 5. 压缩备份恢复
# 解压
xbstream -x -C /backup/full < /backup/full.xbstream.gz
xtrabackup --decompress --target-dir=/backup/full

# 准备并恢复
xtrabackup --prepare --target-dir=/backup/full
xtrabackup --copy-back --target-dir=/backup/full

# 6. 时间点恢复(备份 + binlog)
# 恢复备份后,应用binlog
mysqlbinlog --start-datetime="2024-01-15 10:00:00" \
/var/lib/mysql/mysql-bin.000003 | mysql -u root -p

LVM快照备份

# LVM快照:文件系统级快照备份

# 1. 创建LVM快照
# 获取MySQL数据目录逻辑卷
lvdisplay

# 刷新表并加读锁
mysql -u root -p -e "FLUSH TABLES WITH READ LOCK;"

# 记录binlog位置
mysql -u root -p -e "SHOW MASTER STATUS;" > /backup/master_info.txt

# 创建快照(假设LV是 /dev/vg0/mysql)
lvcreate -L 10G -s -n mysql_snapshot /dev/vg0/mysql

# 释放锁
mysql -u root -p -e "UNLOCK TABLES;"

# 2. 挂载快照
mkdir /mnt/snapshot
mount /dev/vg0/mysql_snapshot /mnt/snapshot

# 3. 备份快照数据
rsync -av /mnt/snapshot/ /backup/mysql/

# 或者tar打包
tar -czf /backup/mysql_backup.tar.gz -C /mnt/snapshot .

# 4. 清理
umount /mnt/snapshot
lvremove -f /dev/vg0/mysql_snapshot

# 5. 恢复
systemctl stop mysqld
rm -rf /var/lib/mysql/*
tar -xzf /backup/mysql_backup.tar.gz -C /var/lib/mysql/
systemctl start mysqld

12.4 二进制日志与时间点恢复

binlog基础

-- 查看binlog状态
SHOW VARIABLES LIKE 'log_bin%';
SHOW MASTER STATUS;

-- binlog格式
-- STATEMENT:基于SQL语句(默认,MySQL 5.7.6前)
-- ROW:基于行数据(推荐,更可靠)
-- MIXED:混合模式

-- 设置binlog格式
SET GLOBAL binlog_format = 'ROW';

-- 查看binlog文件
SHOW BINARY LOGS;

-- 查看binlog内容
SHOW BINLOG EVENTS IN 'mysql-bin.000003' LIMIT 10;

-- 刷新binlog(开始新文件)
FLUSH BINARY LOGS;

-- 清理binlog
PURGE BINARY LOGS BEFORE '2024-01-01 00:00:00';
PURGE BINARY LOGS TO 'mysql-bin.000010';
RESET MASTER; -- 删除所有binlog

mysqlbinlog工具

# mysqlbinlog:binlog查看和恢复工具

# 1. 查看binlog内容
mysqlbinlog /var/lib/mysql/mysql-bin.000003

# 2. 查看binlog事件(可读格式)
mysqlbinlog --base64-output=DECODE-ROWS -v \
/var/lib/mysql/mysql-bin.000003

# 3. 时间范围过滤
mysqlbinlog --start-datetime="2024-01-15 10:00:00" \
--stop-datetime="2024-01-15 11:00:00" \
/var/lib/mysql/mysql-bin.000003

# 4. 位置范围过滤
mysqlbinlog --start-position=154 --stop-position=500 \
/var/lib/mysql/mysql-bin.000003

# 5. 数据库过滤
mysqlbinlog --database=blogdb \
/var/lib/mysql/mysql-bin.000003

# 6. 应用到MySQL(恢复)
mysqlbinlog /var/lib/mysql/mysql-bin.000003 | mysql -u root -p

# 7. 从远程服务器恢复
mysqlbinlog --read-from-remote-server --host=remote_host \
--user=root --password mysql-bin.000003 | mysql -u root -p

# 8. 并行恢复(多binlog文件)
mysqlbinlog mysql-bin.000003 mysql-bin.000004 | mysql -u root -p

# 9. ROW格式binlog查看
mysqlbinlog --base64-output=DECODE-ROWS -v \
/var/lib/mysql/mysql-bin.000003

# 10. 提取特定表的变更
mysqlbinlog -d blogdb /var/lib/mysql/mysql-bin.000003 | \
grep -A 10 "UPDATE users"

时间点恢复实战

# 场景:误删除数据恢复

# 1. 全量备份(每天凌晨2点)
mysqldump -u root -p --single-transaction --master-data=2 \
--flush-logs blogdb > /backup/blogdb_full_$(date +%Y%m%d).sql

# 2. binlog实时备份
# 每小时备份到异地
rsync -av /var/lib/mysql/mysql-bin.* backup_server:/backup/binlog/

# 3. 误操作:上午10点误删除了users表
# DROP TABLE users;

# 4. 恢复步骤

# 步骤1:查看全量备份中的binlog位置
grep "CHANGE MASTER TO" /backup/blogdb_full_20240115.sql
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=154;

# 步骤2:恢复全量备份(到昨晚2点)
mysql -u root -p blogdb < /backup/blogdb_full_20240115.sql

# 步骤3:应用binlog(到误操作前)
mysqlbinlog --start-position=154 \
--stop-datetime="2024-01-15 09:59:59" \
/var/lib/mysql/mysql-bin.000003 | mysql -u root -p

# 步骤2-3可合并为一条命令
# (mysql -u root -p blogdb < /backup/blogdb_full.sql; \
# mysqlbinlog --start-position=154 --stop-datetime="2024-01-15 09:59:59" \
# mysql-bin.000003 | mysql -u root -p) > restore.log

# 5. 验证恢复
mysql -u root -p -e "SELECT COUNT(*) FROM blogdb.users;"

# 6. 跳过误操作的binlog事件
# 查找误操作的position
mysqlbinlog --start-datetime="2024-01-15 09:59:00" \
--stop-datetime="2024-01-15 10:01:00" \
/var/lib/mysql/mysql-bin.000003 > /tmp/drop_event.sql

# 编辑文件,删除DROP TABLE语句
vi /tmp/drop_event.sql

# 应用修改后的binlog
mysqlbinlog --start-datetime="2024-01-15 10:00:01" \
/var/lib/mysql/mysql-bin.000003 > /tmp/after_drop.sql
mysql -u root -p blogdb < /tmp/after_drop.sql

12.5 备份自动化与验证

自动化备份脚本

#!/bin/bash
# MySQL自动备份脚本

# 配置
BACKUP_DIR="/backup/mysql"
BACKUP_USER="root"
BACKUP_PASS="password"
RETENTION_DAYS=7
MYSQL_DATADIR="/var/lib/mysql"

# 创建备份目录
mkdir -p $BACKUP_DIR/full
mkdir -p $BACKUP_DIR/incremental
mkdir -p $BACKUP_DIR/logs

# 全量备份(每周日凌晨2点)
full_backup() {
local date=$(date +%Y%m%d)
local backup_file="$BACKUP_DIR/full/full_backup_$date.sql.gz"

echo "[$(date)] Starting full backup..." >> $BACKUP_DIR/logs/backup.log

mysqldump -u $BACKUP_USER -p$BACKUP_PASS \
--single-transaction \
--master-data=2 \
--flush-logs \
--routines \
--triggers \
--events \
--all-databases | gzip > $backup_file

if [ $? -eq 0 ]; then
echo "[$(date)] Full backup completed: $backup_file" >> $BACKUP_DIR/logs/backup.log

# 清理旧备份
find $BACKUP_DIR/full -name "full_backup_*.sql.gz" -mtime +$RETENTION_DAYS -delete
else
echo "[$(date)] Full backup failed!" >> $BACKUP_DIR/logs/backup.log
exit 1
fi
}

# 增量备份(每天凌晨2点)
incremental_backup() {
local date=$(date +%Y%m%d)
local backup_file="$BACKUP_DIR/incremental/incremental_$date.xbstream.gz"

echo "[$(date)] Starting incremental backup..." >> $BACKUP_DIR/logs/backup.log

xtrabackup --backup --compress --stream=xbstream \
--user=$BACKUP_USER --password=$BACKUP_PASS \
2>> $BACKUP_DIR/logs/backup.log | gzip > $backup_file

if [ $? -eq 0 ]; then
echo "[$(date)] Incremental backup completed: $backup_file" >> $BACKUP_DIR/logs/backup.log

# 清理旧备份
find $BACKUP_DIR/incremental -name "incremental_*.xbstream.gz" -mtime +$RETENTION_DAYS -delete
else
echo "[$(date)] Incremental backup failed!" >> $BACKUP_DIR/logs/backup.log
exit 1
fi
}

# binlog备份(每小时)
binlog_backup() {
local hour=$(date +%Y%m%d_%H)
local binlog_dir="$BACKUP_DIR/binlog/$hour"

echo "[$(date)] Starting binlog backup..." >> $BACKUP_DIR/logs/backup.log

mkdir -p $binlog_dir
cp $MYSQL_DATADIR/mysql-bin.* $binlog_dir/

if [ $? -eq 0 ]; then
echo "[$(date)] Binlog backup completed: $binlog_dir" >> $BACKUP_DIR/logs/backup.log
else
echo "[$(date)] Binlog backup failed!" >> $BACKUP_DIR/logs/backup.log
exit 1
fi
}

# 备份验证
verify_backup() {
local backup_file=$1

echo "[$(date)] Verifying backup: $backup_file" >> $BACKUP_DIR/logs/backup.log

# 解压并检查文件完整性
gunzip -t $backup_file

if [ $? -eq 0 ]; then
echo "[$(date)] Backup verification passed: $backup_file" >> $BACKUP_DIR/logs/backup.log
return 0
else
echo "[$(date)] Backup verification failed: $backup_file" >> $BACKUP_DIR/logs/backup.log
return 1
fi
}

# 远程同步
sync_to_remote() {
local remote_host="backup_server"
local remote_dir="/remote_backup/mysql"

echo "[$(date)] Syncing to remote server..." >> $BACKUP_DIR/logs/backup.log

rsync -avz --delete $BACKUP_DIR/ $remote_host:$remote_dir/ \
>> $BACKUP_DIR/logs/backup.log 2>&1

if [ $? -eq 0 ]; then
echo "[$(date)] Remote sync completed" >> $BACKUP_DIR/logs/backup.log
else
echo "[$(date)] Remote sync failed!" >> $BACKUP_DIR/logs/backup.log
fi
}

# 发送告警
send_alert() {
local message=$1

# 发送邮件
echo "$message" | mail -s "MySQL Backup Alert" admin@example.com

# 或发送到钉钉/企业微信
# curl -X POST "https://oapi.dingtalk.com/robot/send?access_token=xxx" \
# -H "Content-Type: application/json" \
# -d "{\"msgtype\":\"text\",\"text\":{\"content\":\"$message\"}}"
}

# 主函数
main() {
local day_of_week=$(date +%u) # 1-7 (Monday-Sunday)

# 周日全量备份
if [ $day_of_week -eq 7 ]; then
full_backup
verify_backup "$BACKUP_DIR/full/full_backup_$(date +%Y%m%d).sql.gz"
else
incremental_backup
fi

# binlog备份
binlog_backup

# 远程同步
sync_to_remote

# 清理binlog
mysql -u $BACKUP_USER -p$BACKUP_PASS -e "PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 3 DAY);"
}

# 执行
main

备份验证与测试

# 1. 定期恢复测试
# 每月恢复测试到测试环境

#!/bin/bash
# 恢复测试脚本

TEST_HOST="test_server"
TEST_DB="blogdb_test"
BACKUP_FILE="$1"

if [ -z "$BACKUP_FILE" ]; then
echo "Usage: $0 <backup_file>"
exit 1
fi

echo "Starting restore test..."

# 恢复到测试环境
ssh $TEST_HOST "mysql -u root -p -e 'DROP DATABASE IF EXISTS $TEST_DB; CREATE DATABASE $TEST_DB;'"

gunzip < $BACKUP_FILE | ssh $TEST_HOST "mysql -u root -p $TEST_DB"

if [ $? -eq 0 ]; then
echo "Restore test passed!"

# 数据一致性检查
ssh $TEST_HOST "mysql -u root -p $TEST_DB -e 'SELECT COUNT(*) FROM users;'"

# 发送报告
echo "Restore test passed: $BACKUP_FILE" | mail -s "Restore Test Success" admin@example.com
else
echo "Restore test failed!"
echo "Restore test failed: $BACKUP_FILE" | mail -s "Restore Test Failed" admin@example.com
exit 1
fi

# 2. 备份完整性检查
# 校验备份文件MD5
find /backup -name "*.sql.gz" -exec md5sum {} \; > /backup/checksums.md5

# 定期验证
md5sum -c /backup/checksums.md5

# 3. 自动化测试(cron)
# 每周日凌晨3点测试
# 0 3 * * 0 /scripts/restore_test.sh /backup/full/full_backup_$(date +%Y%m%d).sql.gz

第13章 复制与高可用

13.1 主从复制原理

复制原理

主从复制工作流程:

┌──────────┐ ┌──────────┐
│ Master │ │ Slave │
└────┬─────┘ └────┬─────┘
│ │
│ 1. 数据变更 │
├──────────────┐ │
│ │ │
▼ │ │
┌─────────┐ │ │
│ Binlog │ │ │
│ Cache │ │ │
└────┬────┘ │ │
│ │ │
│ 2. 写入 │ │
│ Binlog │ │
│ │ │
▼ │ │
┌─────────┐ │ │
│ Binary │ │ │
│ Log │ │ │
└────┬────┘ │ │
│ │ │
│ 3. Dump │ │
│ Thread │ │
├─────────────────────────────────►│
│ │ │
│ │ │ 4. IO Thread
│ │ │ 接收并写入
│ │ │
│ │ ▼
│ │ ┌──────────┐
│ │ │ Relay │
│ │ │ Log │
│ │ └────┬─────┘
│ │ │
│ │ │ 5. SQL Thread
│ │ │ 执行并应用
│ │ │
│ │ ▼
│ │ ┌──────────┐
│ │ │ Data │
│ │ │ Files │
│ │ └──────────┘
│ │ │
│ └────────────────────┘


1. 客户端写入

复制线程

-- Master线程
-- 1. Binlog dump线程:读取binlog并发送给Slave

-- Slave线程
-- 1. IO线程:请求Master的binlog,写入Relay Log
-- 2. SQL线程:读取Relay Log,执行SQL语句

-- 查看复制状态
-- Slave上执行
SHOW SLAVE STATUS\G

-- 关键字段
-- Slave_IO_State: IO线程状态
-- Slave_IO_Running: IO线程是否运行
-- Slave_SQL_Running: SQL线程是否运行
-- Seconds_Behind_Master: 延迟秒数
-- Master_Log_File: Master binlog文件
-- Read_Master_Log_Pos: 读取位置
-- Relay_Master_Log_File: Relay log对应的Master binlog
-- Exec_Master_Log_Pos: 执行位置
-- Last_IO_Error, Last_SQL_Error: 最后的错误信息

13.2 搭建主从复制

主库配置

# /etc/my.cnf (Master)

[mysqld]
# 服务器ID(唯一)
server-id = 1

# 启用binlog
log-bin = mysql-bin
binlog_format = ROW

# binlog过期时间(7天)
expire_logs_days = 7

# binlog大小(100MB)
max_binlog_size = 100M

# 需要复制的数据库(可选)
# binlog-do-db = blogdb

# 不需要复制的数据库(可选)
binlog-ignore-db = mysql
binlog-ignore-db = information_schema
binlog-ignore-db = performance_schema

# GTID模式(推荐,MySQL 5.6.5+)
gtid_mode = ON
enforce_gtid_consistency = ON

# 重启MySQL
systemctl restart mysqld

从库配置

# /etc/my.cnf (Slave)

[mysqld]
# 服务器ID(唯一,不能与Master相同)
server-id = 2

# Relay log配置
relay-log = relay-bin
relay_log_purge = 1 # 自动清理relay log

# 只读模式(推荐)
read_only = 1
super_read_only = 1 # 超级用户也只读

# 复制过滤(可选)
# replicate-do-db = blogdb
# replicate-ignore-db = mysql

# 并行复制(MySQL 5.7+)
slave_parallel_type = LOGICAL_CLOCK
slave_parallel_workers = 4

# GTID模式
gtid_mode = ON
enforce_gtid_consistency = ON

# 重启MySQL
systemctl restart mysqld

创建复制用户

-- Master上执行

-- 创建复制用户
CREATE USER 'repl'@'%' IDENTIFIED BY 'ReplicaPassword123!';

-- 授予复制权限
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';

-- 刷新权限
FLUSH PRIVILEGES;

-- 查看Master状态
SHOW MASTER STATUS\G

-- 输出示例:
-- File: mysql-bin.000003
-- Position: 154
-- Binlog_Do_DB:
-- Binlog_Ignore_DB:

启动复制(传统模式)

-- Slave上执行

-- 1. 配置Master信息
CHANGE MASTER TO
MASTER_HOST='192.168.1.100', -- Master IP
MASTER_USER='repl', -- 复制用户
MASTER_PASSWORD='ReplicaPassword123!',
MASTER_LOG_FILE='mysql-bin.000003', -- binlog文件
MASTER_LOG_POS=154, -- binlog位置
MASTER_CONNECT_RETRY=10; -- 重试间隔(秒)

-- 2. 启动Slave
START SLAVE;

-- 3. 查看Slave状态
SHOW SLAVE STATUS\G

-- 关键检查
-- Slave_IO_Running: Yes
-- Slave_SQL_Running: Yes
-- Seconds_Behind_Master: 0(无延迟)

-- 4. 如果有问题,重置Slave
STOP SLAVE;
RESET SLAVE;
-- 重新配置并启动

启动复制(GTID模式,推荐)

-- GTID(Global Transaction ID):全局事务ID

-- 优势:
-- 1. 简化主从切换
-- 2. 自动定位复制位置
-- 3. 避免重复事务

-- Slave上执行
-- 1. 配置Master信息(GTID模式)
CHANGE MASTER TO
MASTER_HOST='192.168.1.100',
MASTER_USER='repl',
MASTER_PASSWORD='ReplicaPassword123!',
MASTER_AUTO_POSITION = 1; -- 自动使用GTID

-- 2. 启动Slave
START SLAVE;

-- 3. 查看状态
SHOW SLAVE STATUS\G

-- 查看GTID执行情况
-- Retrieved_Gtid_Set: 接收的GTID集合
-- Executed_Gtid_Set: 执行的GTID集合

复制状态监控

-- 1. 查看Slave状态
SHOW SLAVE STATUS\G

-- 常用状态查询
SELECT
Slave_IO_State AS IO_State,
Slave_IO_Running AS IO_Running,
Slave_SQL_Running AS SQL_Running,
Seconds_Behind_Master AS Delay_Seconds,
Last_IO_Error AS IO_Error,
Last_SQL_Error AS SQL_Error,
Master_Log_File AS Master_Log,
Relay_Master_Log_File AS Relay_Log,
Exec_Master_Log_Pos as Exec_Position
FROM information_schema.PROCESSLIST
WHERE COMMAND LIKE 'Binlog dump%';

-- 2. 查看复制线程
SHOW PROCESSLIST;

-- Master上应该看到Binlog dump线程
-- Slave上应该看到IO和SQL线程

-- 3. 查看GTID
SHOW MASTER STATUS; -- Master
SHOW SLAVE STATUS\G -- Slave

-- 4. 跳过复制错误(谨慎使用)
-- 跳过一个事务
STOP SLAVE;
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;

-- GTID模式跳过错误
STOP SLAVE;
SET GTID_NEXT='<error-gtid>';
BEGIN; COMMIT;
SET GTID_NEXT='AUTOMATIC';
START SLAVE;

13.3 复制模式

异步复制(默认)

-- 异步复制:Master不等待Slave确认

-- 特点:
-- 1. 性能最好
-- 2. Master故障可能丢失数据
-- 3. 延迟取决于网络和Slave性能

-- 配置(默认模式)
-- 无需特殊配置

-- 适用场景:
-- - 对数据一致性要求不高
-- - 可以容忍少量数据丢失

半同步复制

-- 半同步复制:Master等待至少一个Slave确认

-- 安装插件
-- Master和Slave都执行
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';

-- Master配置
SET GLOBAL rpl_semi_sync_master_enabled = 1;
SET GLOBAL rpl_semi_sync_master_timeout = 1000; -- 超时1ms,降为异步

-- Slave配置
SET GLOBAL rpl_semi_sync_slave_enabled = 1;

-- 重启Slave IO线程
STOP SLAVE IO_THREAD;
START SLAVE IO_THREAD;

-- 查看状态
-- Master
SHOW STATUS LIKE 'Rpl_semi_sync%';
-- Rpl_semi_sync_master_clients: 半同步Slave数量
-- Rpl_semi_sync_master_status: ON(半同步)/ OFF(异步)

-- Slave
SHOW STATUS LIKE 'Rpl_semi_sync%';

-- 持久化配置(my.cnf)
-- Master
[mysqld]
rpl_semi_sync_master_enabled = 1
rpl_semi_sync_master_timeout = 1000

-- Slave
[mysqld]
rpl_semi_sync_slave_enabled = 1

组复制(Group Replication)

-- Group Replication:多主或单主模式,基于Paxos协议

-- 安装插件
INSTALL PLUGIN group_replication SONAME 'group_replication.so';

-- 配置(my.cnf)
[mysqld]
# 服务器ID
server-id = 1

# Group Replication设置
plugin_load_add='group_replication.so'
group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
group_replication_start_on_boot=off
group_replication_local_address="192.168.1.100:33061"
group_replication_group_seeds="192.168.1.100:33061,192.168.1.101:33061,192.168.1.102:33061"
group_replication_bootstrap_group=off

# 单主模式
group_replication_single_primary_mode=ON
group_replication_enforce_update_everywhere_checks=OFF

# 设置复制通道(基于GTID)
SET @@GLOBAL.server_id = 1;
SET @@GLOBAL.gtid_mode = ON;
SET @@GLOBAL.enforce_gtid_consistency = ON;

-- 启动组复制(第一个节点)
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;

-- 其他节点
SET GLOBAL group_replication_bootstrap_group=OFF;
START GROUP_REPLICATION;

-- 查看组状态
SELECT * FROM performance_schema.replication_group_members;

-- 查看主节点
SELECT MEMBER_HOST, MEMBER_PORT, MEMBER_STATE
FROM performance_schema.replication_group_members
WHERE MEMBER_ROLE = 'PRIMARY';

13.4 复制过滤与拓扑

复制过滤

# Master端过滤(binlog级别)
# /etc/my.cnf

# 只复制指定数据库
binlog-do-db = blogdb
binlog-do-db = testdb

# 不复制指定数据库
binlog-ignore-db = mysql
binlog-ignore-db = information_schema
binlog-ignore-db = performance_schema

# Slave端过滤(更灵活)
# /etc/my.cnf

# 只复制指定数据库
replicate-do-db = blogdb
replicate-do-db = testdb

# 不复制指定数据库
replicate-ignore-db = mysql
replicate-ignore-db = information_schema

# 只复制指定表
replicate-do-table = blogdb.users
replicate-do-table = blogdb.orders

# 忽略指定表
replicate-ignore-table = blogdb.logs
replicate-ignore-table = blogdb.temp_data

# 使用通配符
replicate-wild-do-table = blogdb.%
replicate-wild-ignore-table = blogdb.temp_%

复制拓扑

-- 1. 一主一从(最简单)
-- Master ← Slave

-- 2. 一主多从(读写分离)
-- ├── Slave1
-- Master ├── Slave2
-- └── Slave3

-- 3. 链式复制(级联)
-- Master → Relay Master → Slave
-- ↑
-- 配置:log_slave_updates = 1

-- 4. 双主复制(互为主从)
-- Master1 ↔ Master2
-- 注意:需要设置自增步长避免冲突
-- Server1: auto_increment_increment = 2, auto_increment_offset = 1
-- Server2: auto_increment_increment = 2, auto_increment_offset = 2

-- 5. 多主一从(数据汇总)
-- Master1 ─┐
-- Master2 ─┼→ Slave
-- Master3 ─┘

13.5 主从切换与故障转移

手动主从切换

#!/bin/bash
# 主从切换脚本

MASTER_HOST="192.168.1.100"
SLAVE_HOST="192.168.1.101"
MYSQL_USER="root"
MYSQL_PASS="password"

echo "Starting failover..."

# 1. 检查Slave状态
echo "Checking slave status..."
mysql -h $SLAVE_HOST -u $MYSQL_USER -p$MYSQL_PASS -e "SHOW SLAVE STATUS\G" > /tmp/slave_status.txt

# 检查是否延迟
DELAY=$(grep "Seconds_Behind_Master" /tmp/slave_status.txt | awk '{print $2}')
if [ "$DELAY" == "NULL" ] || [ "$DELAY" -gt 10 ]; then
echo "Error: Slave is lagging or stopped!"
exit 1
fi

# 2. 停止Slave复制
echo "Stopping slave replication..."
mysql -h $SLAVE_HOST -u $MYSQL_USER -p$MYSQL_PASS -e "STOP SLAVE;"

# 3. 等待Slave执行完所有relay log
echo "Waiting for slave to apply all relay logs..."
mysql -h $SLAVE_HOST -u $MYSQL_USER -p$MYSQL_PASS -e "START SLAVE UNTIL SQL_AFTER_MTS_GAPS;"

# 4. 提升Slave为新Master
echo "Promoting slave to new master..."
mysql -h $SLAVE_HOST -u $MYSQL_USER -p$MYSQL_PASS -e "RESET MASTER;"
mysql -h $SLAVE_HOST -u $MYSQL_USER -p$MYSQL_PASS -e "SET GLOBAL read_only = OFF;"
mysql -h $SLAVE_HOST -u $MYSQL_USER -p$MYSQL_PASS -e "SET GLOBAL super_read_only = OFF;"

# 5. 将原Master变为Slave
echo "Demoting old master to slave..."
mysql -h $MASTER_HOST -u $MYSQL_USER -p$MYSQL_PASS -e "STOP SLAVE;"
mysql -h $MASTER_HOST -u $MYSQL_USER -p$MYSQL_PASS -e "RESET SLAVE ALL;"
mysql -h $MASTER_HOST -u $MYSQL_USER -p$MYSQL_PASS -e "SET GLOBAL read_only = ON;"
mysql -h $MASTER_HOST -u $MYSQL_USER -p$MYSQL_PASS -e "SET GLOBAL super_read_only = ON;"

# 配置原Master复制到新Master
mysql -h $MASTER_HOST -u $MYSQL_USER -p$MYSQL_PASS <<EOF
CHANGE MASTER TO
MASTER_HOST='$SLAVE_HOST',
MASTER_USER='repl',
MASTER_PASSWORD='ReplicaPassword123!',
MASTER_AUTO_POSITION = 1;
START SLAVE;
EOF

# 6. 更新应用配置
echo "Updating application configuration..."
# 这里需要根据实际情况更新应用的数据库连接配置

# 7. 验证切换
echo "Verifying failover..."
mysql -h $SLAVE_HOST -u $MYSQL_USER -p$MYSQL_PASS -e "SHOW MASTER STATUS;"
mysql -h $MASTER_HOST -u $MYSQL_USER -p$MYSQL_PASS -e "SHOW SLAVE STATUS\G"

echo "Failover completed successfully!"
echo "New Master: $SLAVE_HOST"
echo "New Slave: $MASTER_HOST"

13.6 高可用方案

MHA(Master High Availability)

# MHA:自动故障检测和转移

# 安装MHA
# 所有节点安装Node
yum install mha4mysql-node

# 管理节点安装Manager
yum install mha4mysql-manager

# 配置SSH免密登录
ssh-copy-id root@node1
ssh-copy-id root@node2
ssh-copy-id root@node3

# MHA配置文件
# /etc/masterha/app1.cnf

[server default]
# 管理用户
user=mha_manager
password=ManagerPass123

# 复制用户
repl_user=repl
repl_password=ReplicaPassword123!

# SSH用户
ssh_user=root

# 工作目录
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log

# 远程工作目录
remote_workdir=/var/log/masterha/app1

[server1]
hostname=192.168.1.100
candidate_master=1
check_repl_delay=0

[server2]
hostname=192.168.1.101
candidate_master=1
check_repl_delay=0

[server3]
hostname=192.168.1.102
candidate_master=0
# 该节点不作为新Master

# 检查SSH连接
masterha_check_ssh --conf=/etc/masterha/app1.cnf

# 检查复制状态
masterha_check_repl --conf=/etc/masterha/app1.cnf

# 启动MHA Manager
nohup masterha_manager --conf=/etc/masterha/app1.cnf \
--ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

# 查看MHA状态
masterha_check_status --conf=/etc/masterha/app1.cnf

# 停止MHA Manager
masterha_stop --conf=/etc/masterha/app1.cnf

# 手动故障转移
masterha_master_switch --master_state=dead \
--conf=/etc/masterha/app1.cnf \
--dead_master_host=192.168.1.100 \
--new_master_host=192.168.1.101

Orchestrator

# Orchestrator:MySQL拓扑管理和高可用

# 安装Orchestrator
wget https://github.com/openark/orchestrator/releases/download/v3.2.6/orchestrator-3.2.6-1.x86_64.rpm
rpm -ivh orchestrator-3.2.6-1.x86_64.rpm

# 配置文件
# /usr/local/orchestrator/conf/orchestrator.conf.json

{
"Debug": false,
"ListenAddress": ":3000",
"MySQLTopologyUser": "orchestrator",
"MySQLTopologyPassword": "OrchPass123",
"MySQLTopologyCredentialsConfigFile": "",
"MySQLTopologySSLPrivateKeyFile": "",
"MySQLTopologySSLCertFile": "",
"MySQLTopologySSLCAFile": "",
"MySQLTopologySSLSkipVerify": true,
"MySQLTopologyUseMutualTLS": false,

"BackendDB": "mysql",
"MySQLOrchestratorHost": "127.0.0.1",
"MySQLOrchestratorPort": 3306,
"MySQLOrchestratorDatabase": "orchestrator",
"MySQLOrchestratorUser": "orchestrator",
"MySQLOrchestratorPassword": "OrchPass123",

"MySQLConnectTimeoutSeconds": 1,
"MySQLDiscoveryReadTimeoutSeconds": 3,
"MySQLTopologyReadTimeoutSeconds": 3,

"DiscoverByShowSlaveHosts": true,
"InstancePollSeconds": 5,
"UnseenInstanceForgetHours": 24,
"DiscoveryPollSeconds": 5,

"RecoveryPeriodBlockSeconds": 60,
"RecoveryIgnoreHostnameFilters": [],
"RecoverMasterClusterFilters": ["*"],
"RecoverIntermediateMasterClusterFilters": ["*"],

"OnFailureDetectionProcesses": [
"echo 'Detected {failureType} on {failureCluster}. Affected: {failedHost}' >> /tmp/orchestrator-failure-detection.log"
],

"PreFailoverProcesses": [
"echo 'Will recover from {failureType} on {failureCluster}' >> /tmp/orchestrator-pre-failover.log"
],

"PostFailoverProcesses": [
"echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}; Promoted: {successorHost}' >> /tmp/orchestrator-post-failover.log"
],

"ApplyMySQLPromotionAfterMasterFailover": true,
"MasterFailoverLostInstancesDowntimeMinutes": 10,
"DetachLostSlavesAfterMasterFailover": true
}

# 启动Orchestrator
systemctl start orchestrator
systemctl enable orchestrator

# 访问Web界面
# http://orchestrator-host:3000

# CLI使用
orchestrator-client -c discover -i 192.168.1.100:3306
orchestrator-client -c topologies

# 故障转移
orchestrator-client -c graceful-master-takeover -a 192.168.1.101:3306

ProxySQL(读写分离)

# ProxySQL:数据库代理,实现读写分离

# 安装ProxySQL
yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
percona-release enable-only tools release
yum install proxysql

# 或使用RPM
rpm -ivh https://github.com/sysown/proxysql/releases/download/v2.5.3/proxysql-2.5.3-1-centos7.x86_64.rpm

# 启动ProxySQL
systemctl start proxysql
systemctl enable proxysql

# 配置ProxySQL
# 连接到ProxySQL管理接口(默认6032端口)
mysql -u admin -padmin -h 127.0.0.1 -P 6032

-- 配置后端服务器
INSERT INTO mysql_servers(hostgroup_id, hostname, port) VALUES (10, '192.168.1.100', 3306); -- Master(写)
INSERT INTO mysql_servers(hostgroup_id, hostname, port) VALUES (20, '192.168.1.101', 3306); -- Slave1(读)
INSERT INTO mysql_servers(hostgroup_id, hostname, port) VALUES (20, '192.168.1.102', 3306); -- Slave2(读)
INSERT INTO mysql_servers(hostgroup_id, hostname, port) VALUES (20, '192.168.1.103', 3306); -- Slave3(读)

-- 配置监控用户
UPDATE global_variables SET variable_value='monitor' WHERE variable_name='mysql-monitor_username';
UPDATE global_variables SET variable_value='MonitorPass123' WHERE variable_name='mysql-monitor_password';

-- 配置应用用户
INSERT INTO mysql_users(username, password, default_hostgroup, max_connections) VALUES ('app_user', 'AppPass123', 10, 1000);

-- 配置查询规则(读操作路由到Slave)
INSERT INTO mysql_query_rules(rule_id, active, match_pattern, destination_hostgroup, apply) VALUES (1, 1, '^SELECT.*FOR UPDATE', 10, 1); -- SELECT FOR UPDATE到Master
INSERT INTO mysql_query_rules(rule_id, active, match_pattern, destination_hostgroup, apply) VALUES (2, 1, '^SELECT', 20, 1); -- 普通SELECT到Slave

-- 加载配置到运行时
LOAD MYSQL SERVERS TO RUNTIME;
LOAD MYSQL USERS TO RUNTIME;
LOAD MYSQL QUERY RULES TO RUNTIME;

-- 保存配置到磁盘
SAVE MYSQL SERVERS TO DISK;
SAVE MYSQL USERS TO DISK;
SAVE MYSQL QUERY RULES TO DISK;

-- 查看配置
SELECT * FROM mysql_servers;
SELECT * FROM mysql_users;
SELECT * FROM mysql_query_rules;

-- 连接到ProxySQL(MySQL端口6033)
mysql -u app_user -pAppPass123 -h 127.0.0.1 -P 6033 -e "SELECT * FROM blogdb.users;"

-- 查看统计信息
SELECT * FROM stats_mysql_connection_pool;
SELECT * FROM stats_mysql_query_digest;

第14章 监控与调优

14.1 性能监控指标

QPS/TPS

-- QPS(Queries Per Second):每秒查询数
-- TPS(Transactions Per Second):每秒事务数

-- 查询统计
SHOW STATUS LIKE 'Questions';
SHOW STATUS LIKE 'Uptime';

-- 计算QPS
-- QPS = Questions / Uptime

-- 实时QPS(每秒采样)
#!/bin/bash
while true; do
QUESTIONS1=$(mysql -u root -p -e "SHOW STATUS LIKE 'Questions'" | awk 'NR==2 {print $2}')
sleep 1
QUESTIONS2=$(mysql -u root -p -e "SHOW STATUS LIKE 'Questions'" | awk 'NR==2 {print $2}')
QPS=$((QUESTIONS2 - QUESTIONS1))
echo "QPS: $QPS"
done

-- TPS计算
SHOW STATUS LIKE 'Com_commit';
SHOW STATUS LIKE 'Com_rollback';

-- TPS = (Com_commit + Com_rollback) / Uptime

-- 完整监控脚本
mysql -u root -p -e "
SELECT
ROUND(QUESTIONS / UPTIME, 2) AS QPS,
ROUND((COM_COMMIT + COM_ROLLBACK) / UPTIME, 2) AS TPS
FROM (
SELECT
VARIABLE_VALUE AS QUESTIONS
FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'QUESTIONS'
) q,
(
SELECT
VARIABLE_VALUE AS UPTIME
FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'UPTIME'
) u,
(
SELECT
SUM(IF(VARIABLE_NAME IN ('COM_COMMIT', 'COM_ROLLBACK'), VARIABLE_VALUE, 0)) AS TRANS
FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME IN ('COM_COMMIT', 'COM_ROLLBACK')
) t;
"

连接数

-- 查看连接数
SHOW STATUS LIKE 'Threads_connected';
SHOW STATUS LIKE 'Threads_running';
SHOW STATUS LIKE 'Max_used_connections';
SHOW VARIABLES LIKE 'max_connections';

-- 连接相关指标
SELECT
VARIABLE_NAME,
VARIABLE_VALUE,
CASE
WHEN VARIABLE_NAME = 'Threads_connected' THEN '当前连接数'
WHEN VARIABLE_NAME = 'Threads_running' THEN '活跃连接数'
WHEN VARIABLE_NAME = 'Max_used_connections' THEN '最大使用连接数'
WHEN VARIABLE_NAME = 'max_connections' THEN '最大连接数限制'
END AS DESCRIPTION
FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME IN ('Threads_connected', 'Threads_running', 'Max_used_connections')
UNION ALL
SELECT
VARIABLE_NAME,
VARIABLE_VALUE,
'最大连接数限制' AS DESCRIPTION
FROM PERFORMANCE_SCHEMA.GLOBAL_VARIABLES
WHERE VARIABLE_NAME = 'max_connections';

-- 连接使用率
SELECT
ROUND(Threads_connected / max_connections * 100, 2) AS connection_usage_percent
FROM (
SELECT
(SELECT VARIABLE_VALUE FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Threads_connected') AS Threads_connected,
(SELECT VARIABLE_VALUE FROM PERFORMANCE_SCHEMA.GLOBAL_VARIABLES WHERE VARIABLE_NAME = 'max_connections') AS max_connections
) t;

-- 查看当前连接详情
SHOW PROCESSLIST;
-- 或
SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST;

-- 查看长时间运行的查询
SELECT
ID,
USER,
HOST,
DB,
COMMAND,
TIME,
STATE,
LEFT(INFO, 100) AS QUERY
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE TIME > 10
ORDER BY TIME DESC;

-- 杀死长时间运行的查询
-- KILL <process_id>;

缓冲池命中率

-- InnoDB缓冲池命中率
-- 命中率 > 99% 为优

SELECT
ROUND(100 - (Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests * 100), 2) AS hit_rate_percent
FROM (
SELECT
(SELECT VARIABLE_VALUE FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads') AS Innodb_buffer_pool_reads,
(SELECT VARIABLE_VALUE FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests') AS Innodb_buffer_pool_read_requests
) t;

-- 详细缓冲池统计
SELECT
VARIABLE_NAME,
VARIABLE_VALUE,
CASE VARIABLE_NAME
WHEN 'Innodb_buffer_pool_read_requests' THEN '读请求数'
WHEN 'Innodb_buffer_pool_reads' THEN '物理读次数'
WHEN 'Innodb_buffer_pool_read_ahead' THEN '预读次数'
WHEN 'Innodb_buffer_pool_read_ahead_evicted' THEN '预读并驱逐'
WHEN 'Innodb_buffer_pool_pages_total' THEN '总页数'
WHEN 'Innodb_buffer_pool_pages_free' THEN '空闲页数'
WHEN 'Innodb_buffer_pool_pages_dirty' THEN '脏页数'
WHEN 'Innodb_buffer_pool_pages_flushed' THEN '刷新页数'
END AS DESCRIPTION
FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME LIKE 'Innodb_buffer_pool%'
ORDER BY VARIABLE_NAME;

-- 缓冲池状态
SHOW ENGINE INNODB STATUS\G
-- 查看BUFFER POOL AND MEMORY部分

其他关键指标

-- 1. 慢查询
SHOW STATUS LIKE 'Slow_queries';
-- 配置慢查询阈值
SHOW VARIABLES LIKE 'long_query_time';

-- 2. 锁等待
SHOW STATUS LIKE 'Table_locks_waited';
SHOW STATUS LIKE 'Table_locks_immediate';

-- 锁等待率(越低越好)
SELECT
ROUND(Table_locks_waited / (Table_locks_waited + Table_locks_immediate) * 100, 2) AS lock_wait_percent
FROM (
SELECT
(SELECT VARIABLE_VALUE FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Table_locks_waited') AS Table_locks_waited,
(SELECT VARIABLE_VALUE FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Table_locks_immediate') AS Table_locks_immediate
) t;

-- 3. InnoDB行锁
SELECT
VARIABLE_NAME,
VARIABLE_VALUE,
CASE VARIABLE_NAME
WHEN 'Innodb_row_lock_current_waits' THEN '当前等待锁'
WHEN 'Innodb_row_lock_time' THEN '锁等待总时间(ms)'
WHEN 'Innodb_row_lock_waits' THEN '锁等待次数'
WHEN 'Innodb_row_lock_time_avg' THEN '平均锁等待时间(ms)'
END AS DESCRIPTION
FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME LIKE 'Innodb_row_lock%';

-- 4. 临时表使用
SHOW STATUS LIKE 'Created_tmp_disk_tables';
SHOW STATUS LIKE 'Created_tmp_tables';

-- 磁盘临时表比率(越低越好)
SELECT
ROUND(Created_tmp_disk_tables / Created_tmp_tables * 100, 2) AS disk_tmp_table_percent
FROM (
SELECT
(SELECT VARIABLE_VALUE FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Created_tmp_disk_tables') AS Created_tmp_disk_tables,
(SELECT VARIABLE_VALUE FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Created_tmp_tables') AS Created_tmp_tables
) t;

-- 5. 网络流量
SHOW STATUS LIKE 'Bytes_received';
SHOW STATUS LIKE 'Bytes_sent';

-- 6. 复制延迟
SHOW SLAVE STATUS\G
-- Seconds_Behind_Master字段

14.2 服务器参数调优

内存相关参数

-- 1. InnoDB缓冲池大小(最重要)
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
-- 建议:物理内存的50-80%
SET GLOBAL innodb_buffer_pool_size = 4294967296; -- 4GB

-- 2. 缓冲池实例
SHOW VARIABLES LIKE 'innodb_buffer_pool_instances';
-- 建议:当缓冲池 > 1GB时,设置为多个实例(每个实例1GB左右)
SET GLOBAL innodb_buffer_pool_instances = 4;

-- 3. 查询缓存(MySQL 8.0已移除)
-- MySQL 5.7及之前
SHOW VARIABLES LIKE 'query_cache%';
-- query_cache_type: ON/OFF/DEMAND
-- query_cache_size: 缓存大小
-- 建议:生产环境通常关闭(QPS > 1000时反而降低性能)

-- 4. 排序缓冲区
SHOW VARIABLES LIKE 'sort_buffer_size';
-- 默认:256KB
-- 建议:256KB-2MB(每个连接独立)
SET GLOBAL sort_buffer_size = 2097152; -- 2MB

-- 5. Join缓冲区
SHOW VARIABLES LIKE 'join_buffer_size';
-- 默认:256KB
-- 建议:256KB-1MB
SET GLOBAL join_buffer_size = 1048576; -- 1MB

-- 6. 读缓冲区
SHOW VARIABLES LIKE 'read_buffer_size';
SHOW VARIABLES LIKE 'read_rnd_buffer_size';
-- 建议:128KB-2MB

-- 7. MyISAM键缓冲区
SHOW VARIABLES LIKE 'key_buffer_size';
-- 建议:仅使用MyISAM时设置,建议20-30%的内存
SET GLOBAL key_buffer_size = 268435456; -- 256MB

-- 8. 表定义缓存
SHOW VARIABLES LIKE 'table_definition_cache';
-- 建议:2000+(根据表数量)
SET GLOBAL table_definition_cache = 2000;

-- 9. 表打开缓存
SHOW VARIABLES LIKE 'table_open_cache';
-- 建议:4000-10000
SET GLOBAL table_open_cache = 4000;

-- 10. 线程缓存
SHOW VARIABLES LIKE 'thread_cache_size';
-- 建议:16-100(根据连接数)
SET GLOBAL thread_cache_size = 32;

InnoDB参数

-- 1. 刷盘策略
SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit';
-- 0: 每秒刷新到磁盘(可能丢失1秒数据)
-- 1: 每次提交刷新(默认,最安全)
-- 2: 每次提交写到缓存,每秒刷新

-- 2. IO能力
SHOW VARIABLES LIKE 'innodb_io_capacity';
-- 建议:SSD: 2000, SAS: 200, SATA: 100
SET GLOBAL innodb_io_capacity = 2000;
SET GLOBAL innodb_io_capacity_max = 4000;

-- 3. 日志文件大小
SHOW VARIABLES LIKE 'innodb_log_file_size';
-- 默认:48MB
-- 建议:256MB-1GB(大的日志文件减少checkpoint)
-- 注意:修改后需要删除旧日志文件重启

-- 4. 日志缓冲区
SHOW VARIABLES LIKE 'innodb_log_buffer_size';
-- 默认:16MB
-- 建议:16MB-64MB
SET GLOBAL innodb_log_buffer_size = 33554432; -- 32MB

-- 5. 脏页刷盘
SHOW VARIABLES LIKE 'innodb_max_dirty_pages_pct';
-- 默认:75
-- 建议:50-75(SSD可以设低一些)
SET GLOBAL innodb_max_dirty_pages_pct = 50;

-- 6. 文件格式
SHOW VARIABLES LIKE 'innodb_file_per_table';
-- ON: 每个表独立文件(推荐)
-- OFF: 所有表在共享表空间
SET GLOBAL innodb_file_per_table = ON;

-- 7. 页大小
SHOW VARIABLES LIKE 'innodb_page_size';
-- 默认:16KB(创建实例时确定,不可修改)

-- 8. 自适应哈希索引
SHOW VARIABLES LIKE 'innodb_adaptive_hash_index';
-- 建议:某些场景关闭(OLTP通常ON)

-- 9. 死锁检测
SHOW VARIABLES LIKE 'innodb_deadlock_detect';
-- ON: 启用死锁检测(默认)
-- OFF: 使用锁等待超时

连接与线程参数

-- 1. 最大连接数
SHOW VARIABLES LIKE 'max_connections';
-- 建议:500-2000(根据应用需求)
SET GLOBAL max_connections = 1000;

-- 2. 连接超时
SHOW VARIABLES LIKE 'connect_timeout';
SHOW VARIABLES LIKE 'wait_timeout';
SHOW VARIABLES LIKE 'interactive_timeout';
-- 建议:10-30秒
SET GLOBAL connect_timeout = 10;
SET GLOBAL wait_timeout = 600; -- 10分钟
SET GLOBAL interactive_timeout = 600;

-- 3. 最大允许数据包
SHOW VARIABLES LIKE 'max_allowed_packet';
-- 默认:4MB
-- 建议:16MB-64MB(大字段查询需要增大)
SET GLOBAL max_allowed_packet = 67108864; -- 64MB

-- 4. 线程栈大小
SHOW VARIABLES LIKE 'thread_stack';
-- 默认:256KB
-- 建议:256KB-512KB

14.3 慢查询优化

慢查询分析

# 使用pt-query-digest分析慢查询

# 1. 分析慢查询日志
pt-query-digest /var/log/mysql/mysql-slow.log

# 2. 输出到文件
pt-query-digest /var/log/mysql/mysql-slow.log > slow_query_report.txt

# 3. 分析最近1小时的慢查询
pt-query-digest --since '1h ago' /var/lib/mysql/mysql-slow.log

# 4. 分析指定时间范围
pt-query-digest --since '2024-01-15 00:00:00' --until '2024-01-15 23:59:59' \
/var/lib/mysql/mysql-slow.log

# 5. 按查询时间排序,显示前20条
pt-query-digest --order-by Query_time --limit 20% /var/lib/mysql/mysql-slow.log

# 6. 过滤特定查询
pt-query-digest --filter '$event->{fingerprint} =~ m/^SELECT/i' \
/var/lib/mysql/mysql-slow.log

# 7. 过滤特定数据库
pt-query-digest --filter '$event->{db} eq "blogdb"' \
/var/lib/mysql/mysql-slow.log

# 8. 输出JSON格式
pt-query-digest --output=json /var/lib/mysql/mysql-slow.log > slow_report.json

慢查询优化案例

-- 案例1:索引缺失
-- 慢查询
SELECT * FROM orders WHERE user_id = 100;
-- Query_time: 5.234567

-- 分析:无索引,全表扫描
EXPLAIN SELECT * FROM orders WHERE user_id = 100;
-- type: ALL(全表扫描)

-- 优化:添加索引
CREATE INDEX idx_user_id ON orders(user_id);

-- 验证
EXPLAIN SELECT * FROM orders WHERE user_id = 100;
-- type: ref(索引查找)

-- 案例2:索引失效
-- 慢查询
SELECT * FROM users WHERE YEAR(created_at) = 2024;
-- Query_time: 8.901234

-- 分析:函数破坏索引
EXPLAIN SELECT * FROM users WHERE YEAR(created_at) = 2024;
-- type: ALL

-- 优化:改写查询
SELECT * FROM users
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

-- 案例3:JOIN优化
-- 慢查询
SELECT u.*, o.*
FROM users u
LEFT JOIN orders o ON u.username = o.user_id
WHERE u.city = 'Beijing';
-- Query_time: 15.678901

-- 分析:连接字段类型不一致
SHOW CREATE TABLE users\G
-- username: VARCHAR(50)

SHOW CREATE TABLE orders\G
-- user_id: INT

-- 优化:确保类型一致
ALTER TABLE orders MODIFY COLUMN user_id VARCHAR(50);
CREATE INDEX idx_user_id ON orders(user_id);

-- 案例4:子查询优化
-- 慢查询
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE amount > 1000);
-- Query_time: 10.123456

-- 优化:改写为JOIN
SELECT DISTINCT u.*
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.amount > 1000;

-- 案例5:分页优化
-- 慢查询(深分页)
SELECT * FROM orders ORDER BY id LIMIT 1000000, 10;
-- Query_time: 20.345678

-- 优化:使用子查询
SELECT * FROM orders
WHERE id >= (SELECT id FROM orders ORDER BY id LIMIT 1000000, 1)
LIMIT 10;

-- 或记录上次最大ID(适用于连续浏览)
SELECT * FROM orders WHERE id > 1000000 ORDER BY id LIMIT 10;

14.4 监控工具

Percona Toolkit

# Percona Toolkit:强大的MySQL管理工具集

# 安装
yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
percona-release enable-only tools release
yum install percona-toolkit

# 常用工具

# 1. pt-query-digest:慢查询分析
pt-query-digest /var/lib/mysql/mysql-slow.log

# 2. pt-mysql-summary:MySQL概要信息
pt-mysql-summary --host=localhost --user=root --password=password

# 3. pt-mext:并行查看SHOW GLOBAL STATUS
pt-mext -n 10 "SHOW GLOBAL STATUS"

# 4. pt-table-checksum:主从数据一致性检查
pt-table-checksum --host=localhost --user=root --password=password

# 5. pt-table-sync:修复主从数据不一致
pt-table-sync --host=slave_host --user=root --password=password \
--sync-to-master --print --execute

# 6. pt-online-schema-change:在线修改表结构(不锁表)
pt-online-schema-change \
--host=localhost \
--user=root \
--password=password \
--alter="ADD COLUMN new_col INT" \
D=blogdb,t=users \
--execute

# 7. pt-stalk:问题诊断与收集
pt-stalk --function=status --variable=Threads_running --threshold=100 \
--cycles=5 --interval=20 --notify-by-email

# 8. pt-kill:杀死长时间运行的查询
pt-kill --host=localhost --user=root --password=password \
--match-command="Query" \
--match-state="Sending data" \
--victims=all \
--busy-time=60 \
--kill

# 9. pt-index-usage:分析索引使用情况
pt-index-usage /var/lib/mysql/mysql-slow.log

# 10. pt-fk-error-checker:检查外键错误
pt-fk-error-checker --host=localhost --user=root --password=password

Prometheus + Grafana

# Prometheus + Grafana监控方案

# 1. 安装mysqld_exporter(MySQL指标采集器)
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.0/mysqld_exporter-0.15.0.linux-amd64.tar.gz
tar -xzf mysqld_exporter-0.15.0.linux-amd64.tar.gz
mv mysqld_exporter-0.15.0.linux-amd64/mysqld_exporter /usr/local/bin/

# 2. 创建监控用户
mysql -u root -p <<EOF
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'ExporterPass123';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;
EOF

# 3. 创建配置文件
cat > /etc/.mysqld_exporter.cnf <<EOF
[client]
user=exporter
password=ExporterPass123
host=localhost
port=3306
EOF

chmod 600 /etc/.mysqld_exporter.cnf

# 4. 启动exporter
mysqld_exporter --config.my-cnf=/etc/.mysqld_exporter.cnf \
--web.listen-address=:9104

# 或使用systemd管理
cat > /etc/systemd/system/mysqld_exporter.service <<EOF
[Unit]
Description=MySQL Exporter
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/bin/mysqld_exporter \
--config.my-cnf=/etc/.mysqld_exporter.cnf \
--web.listen-address=:9104
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl start mysqld_exporter
systemctl enable mysqld_exporter

# 5. 配置Prometheus
# /etc/prometheus/prometheus.yml

scrape_configs:
- job_name: 'mysql'
static_configs:
- targets: ['localhost:9104']

# 6. 导入Grafana仪表板
# 推荐:MySQL Overview (Prometheus) - ID 7362
# 或:MySQL InnoDB Metrics (Prometheus) - ID 6239

# 7. 关键监控面板
# - QPS/TPS
# - 连接数
# - 缓冲池命中率
# - 慢查询
# - 复制延迟
# - 锁等待

自定义监控脚本

#!/bin/bash
# MySQL监控脚本

MYSQL_USER="root"
MYSQL_PASS="password"
ALERT_EMAIL="admin@example.com"
ALERT_WEBHOOK="https://oapi.dingtalk.com/robot/send?access_token=xxx"

# 检查函数
check_metric() {
local metric_name=$1
local warning=$2
local critical=$3
local query=$4

local value=$(mysql -u $MYSQL_USER -p$MYSQL_PASS -se "$query")

if [ -n "$critical" ] && [ $(echo "$value >= $critical" | bc) -eq 1 ]; then
send_alert "CRITICAL: $metric_name = $value (threshold: $critical)"
elif [ -n "$warning" ] && [ $(echo "$value >= $warning" | bc) -eq 1 ]; then
send_alert "WARNING: $metric_name = $value (threshold: $warning)"
fi
}

# 发送告警
send_alert() {
local message=$1

# 发送邮件
echo "$message" | mail -s "MySQL Alert" $ALERT_EMAIL

# 发送钉钉
curl -X POST "$ALERT_WEBHOOK" \
-H "Content-Type: application/json" \
-d "{\"msgtype\":\"text\",\"text\":{\"content\":\"$message\"}}"
}

# 监控项
# 1. 连接数
check_metric "Connections" 800 900 \
"SELECT COUNT(*) FROM INFORMATION_SCHEMA.PROCESSLIST"

# 2. 慢查询
check_metric "Slow Queries" 100 500 \
"SELECT VARIABLE_VALUE FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Slow_queries'"

# 3. 复制延迟
check_metric "Replication Delay" 60 300 \
"SELECT Seconds_Behind_Master FROM (SHOW SLAVE STATUS) AS ss"

# 4. 缓冲池命中率
hit_rate=$(mysql -u $MYSQL_USER -p$MYSQL_PASS -se "
SELECT ROUND(100 - (
(SELECT VARIABLE_VALUE FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads') /
(SELECT VARIABLE_VALUE FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests') * 100
), 2)")

if [ $(echo "$hit_rate < 95" | bc) -eq 1 ]; then
send_alert "WARNING: Buffer pool hit rate = $hit_rate%"
fi

# 5. InnoDB死锁
deadlocks=$(mysql -u $MYSQL_USER -p$MYSQL_PASS -se "
SELECT VARIABLE_VALUE FROM PERFORMANCE_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Innodb_deadlocks'")

if [ "$deadlocks" -gt 0 ]; then
send_alert "WARNING: InnoDB deadlocks detected: $deadlocks"
fi

# 6. 磁盘空间
disk_usage=$(df -h /var/lib/mysql | awk 'NR==2 {print $5}' | sed 's/%//')
if [ "$disk_usage" -gt 80 ]; then
send_alert "WARNING: MySQL data disk usage = $disk_usage%"
fi

小结

本章深入讲解了MySQL运维与架构的三个重要方面:

第12章 - 备份与恢复重点:

  • 理解逻辑备份和物理备份的区别和适用场景
  • 熟练使用mysqldump、mysqlpump、mydumper、XtraBackup等备份工具
  • 掌握二进制日志和时间点恢复
  • 了解备份策略制定和自动化实现
  • 能够进行备份验证和恢复测试

第13章 - 复制与高可用重点:

  • 理解主从复制的工作原理和线程机制
  • 掌握传统复制和GTID复制的配置方法
  • 了解异步、半同步、组复制三种复制模式
  • 掌握主从切换和故障转移的操作流程
  • 了解MHA、Orchestrator、ProxySQL等高可用方案

第14章 - 监控与调优重点:

  • 掌握QPS、TPS、连接数、缓冲池命中率等关键监控指标
  • 熟练进行服务器参数调优(内存、InnoDB、连接相关)
  • 掌握慢查询分析和优化方法
  • 了解Percona Toolkit、Prometheus+Grafana等监控工具
  • 能够编写自定义监控脚本

核心要点:

  • 备份是数据库安全的最后一道防线,必须定期测试恢复
  • 主从复制是高可用架构的基础,需要掌握故障转移技能
  • 监控是性能优化的前提,要建立完善的监控体系
  • 参数调优需要根据实际情况,不能盲目套用
  • 工具是提高效率的手段,但要理解底层原理

学习建议:

  1. 在测试环境完整演练备份恢复流程
  2. 搭建主从环境并手动切换验证
  3. 使用pt-query-digest分析实际慢查询日志
  4. 部署Prometheus+Grafana监控体系
  5. 定期回顾和更新运维文档

下章预告: 至此,MySQL系列文档已经完整覆盖从基础到高级、从开发到运维的全部内容。建议结合实际项目进行综合实践,构建完整的MySQL知识体系。