一. 参考

 

二. 前言
pt-table-checksum是一个在线验证主从数据一致性的工具,主要用于以下场景:
1. 数据迁移前后,进行数据一致性检查
2. 当主从复制出现问题,待修复完成后,对主从数据进行一致性检查
3. 把从库当成主库,进行数据更新,产生了"脏数据"
4. 定期校验
三. 环境
master
|----slave1
|----slave2
 
四. 临时校验(端口3306)
  • 准备工作:
  1. 创建用户checksums,192.168.250.249为当前写库
    1. GRANT SELECT, PROCESS, SUPER, REPLICATION SLAVE ON *.* TO 'checksums'@'192.168.250.249' IDENTIFIED BY 'xxxx';
    select 查看所有库的表,原理可加 explain选项查看
    process show processlist
    super set binlog_format='statement'
    replication slave show slave hosts

    连接范围: 允许master连接

  2. 参数解释
    DSN h='192.168.250.249',u='checksums',p='xxxx',P=3306

    -d

    指定校验的库,多个库用逗号(,)分隔
    --nocheck-replication-filters

    忽略replication-do-db规则

    --replicate=test.checksums

    在test库下创建checksums表,并将数据写入

    --recursion-method=none

    指定复制检查的方式,默认情况下使用SHOW PROCESSLIST,如果非标准的3306端口,就使用SHOW SLAVE HOSTS的方式,推荐使用dsn方式,手动指定
    METHOD USES =========== ================== processlist SHOW PROCESSLIST hosts SHOW SLAVE HOSTS dsn=DSN DSNs from a table none Do not find slaves DSN格式:dsn=h=host,D=pt,t=dsns
  • 使用举例:
  1. 登录主库,增加授权
    1. GRANT SELECT, PROCESS, SUPER, REPLICATION SLAVE ON *.* TO 'checksums'@'192.168.250.249' IDENTIFIED BY 'xxxx';
     
  2. 登录主库,执行pt-table-checksums脚本
  3. 只检查mysql库
    1. pt-table-checksum h='192.168.250.249',u='checksums',p='xxxx',P=3306 -d mysql --nocheck-replication-filters --replicate=test.checksums
     
  4. 只检查mysql库,但是过滤mysql.user表
    1. pt-table-checksum h='192.168.250.249',u='checksums',p='xxxx',P=3306 -d mysql --ignore-tables=mysql.user --nocheck-replication-filters --replicate=test.checksums
     
  5. 手动在从库上查询的sql
    --chunk-size 指定块的大小,默认是1000条.
    1. SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM test.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
 
五. 利用脚本做定期任务.
  • 准备工作
  1. 建库
    1. CREATE DATABASE pt CHARACTER SET utf8;
     
  2. 授权
    1. GRANT SELECT, PROCESS, SUPER, REPLICATION SLAVE ON *.* TO 'checksums'@'192.168.250.249' IDENTIFIED BY 'xxxx'; GRANT ALL ON pt.* TO 'checksums'@'192.168.250.249' IDENTIFIED BY 'xxxx';
     
  3. 在pt库中建表,并插入从库的信息.
    1. CREATE TABLE `dsns` ( `id` int(11) NOT NULL AUTO_INCREMENT, `parent_id` int(11) DEFAULT NULL, `dsn` varchar(255) NOT NULL, PRIMARY KEY (`id`) );
       
    2.  
    3. -- 写入从库信息
       
    4. INSERT INTO dsns (parent_id,dsn) values(1, "h=replica_host,u=checksums,p=password,P=3306");
       
    5. -- 如果有多个从库,就插入多条记录.
       
    6. -- 也可以按如下简写
       
    7. INSERT INTO dsns (parent_id,dsn) values(1, "h=replica_host");
     
  4. 全库校验的命令
    1. ./pt-table-checksum h='192.168.250.249',u='checksums',p='xxxx',P=3306 --nocheck-replication-filters --replicate=pt.checksums --recursion-method=dsn=h=host,D=pt,t=dsns
  • 脚本
    1. #!/bin/bash
    2. # ########################################################################
    3. # This program is percona-toolkit shell
    4. # Version: 1.0 (20120926)
    5. # Authors: lianjie.ning@qunar.com
    6. # History:
    7. # ########################################################################
    8. set -e
    9. set -u
    10. if [ $# -eq 0 ]; then
    11. echo "Usage:
    12. $0 -h, help
    13. $0 -P port -c [pt-table-checksum]"
    14. exit
    15. fi
    16. trap "send_mail" EXIT
    17. # send mail
    18. function send_mail()
    19. {
    20. ESUBJECT="$HOSTNAME $0"
    21. EMAIL='xxx@xxx.com'
    22. if [ -r $LOG ]; then
    23. cat $LOG |mail -s "$ESUBJECT" $EMAIL
    24. else
    25. echo "The log($LOG) is not exist !" |mail -s "$ESUBJECT" $EMAIL
    26. fi
    27. rm "$LOG"
    28. }
    29. # default variables
    30. TIME=`date +%Y%m%d%H%M%S`
    31. LOG="/tmp/$HOSTNAME.checksum.log"
    32. #CHECKSUM="/usr/local/bin/pt-table-checksum"
    33. CHECKSUM="/home/lianjie.ning/pt-table-checksum"
    34. exec 1>>$LOG 2>$LOG
    35. function checksum()
    36. {
    37. master_host="192.168.250.249"
    38. user="checksums"
    39. password="xxxx"
    40. : ${port:="3306"}
    41. db_table="pt.checksums"
    42. echo "MySQL Port: $port"
    43. echo "Verify MySQL replication integrity"
    44. $CHECKSUM h="$master_host",u="$user",p="$password",P="$port" --nocheck-replication-filters --replicate=$db_table --recursion-method=dsn=D=pt,t=dsns
    45. }
    46. # main
    47. while getopts :P:c arg
    48. do
    49. case $arg in
    50. P)
    51. port="$OPTARG"
    52. ;;
    53. c)
    54. checksum
    55. ;;
    56. :)
    57. echo "$arg: 错误参数"
    58. ;;
    59. \?)
    60. echo "$arg: 非法选项"
    61. ;;
    62. esac
    63. done
    六. 问题处理:
    1. 在主库上执行
      pt-table-checksum h='192.168.250.249',u='checksums',p='xxxx',P=3306 -d mysql --nocheck-replication-filters --replicate=test.checksums
      需要注意test.checksums是否存在,如果存在,则手工删除,否则会报如下错误
      06-07T16:54:23 User does not have all privileges on --replicate table `test`.`checksums`.
    2. binlog_format问题(脚本bug)
      06-07T19:06:07 Error checksumming table mysql.columns_priv: Error executing checksum query: Checksum query for table mysql.columns_priv caused MySQL error 1592:
      Level: Note
      Code: 1592
      Message: Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. REPLACE... SELECT is unsafe because the order in which rows are retrieved by the SELECT determines which (if any) rows are replaced. This order cannot be predicted and may differ on master and the slave.
      Query: REPLACE INTO `test`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT() AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `host`, `db`, `user`, `table_name`, `column_name`, `timestamp` + 0, `column_priv`)) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `mysql`.`columns_priv` /*checksum table/
      该问题与复制过滤器replicate-do-db 和 pt-table-checksums,innodb_strict_mode = 1均没有关系
      脚本的bug,在mysql 5.5.18版本之后,都会出现这个问题
      下个新版本,已经修复
    3. 索引问题
      06-07T21:19:31 Cannot checksum table db.table: There is no good index and the table is oversized. at ./pt-table-checksum line 3899.
      因为该工具是根据主键索引或者唯一索引来分块进行校验的,默认是1000行为一块.

文章原创链接: