Preface       Replication delay is a common issue in MySQL replications.Especially in those replications not using semi-synchronous method.Long term delay leads to inconsistency between master and slaves.Therefore,we should regard the replication delay as an important monitor item.To a certain extent,the value of "Seconds_Behind_Master" shows the situation of replication delay,but it's always inaccurate.Even if the value is "0" sometimes,the replication delay remains.   Introduce       pt-heartbeat is the very tool can be used to monitor replication delay of MySQL or PostgreSQL.It provides a way to update a master or to monitor a replica.If you do not speicify the connection options,it will retrieve them in your default my.cnf file.
  Procedure   Usage
1 pt-heartbeat [OPTIONS] [DSN] --update|--monitor|--check|--stop

 

Common parameters
 1 --check //Specify the check mode to slave delay.It only checks once then exits.
 2 --monitor //Specify the monitor mode to slave delay.
 3 --update //Specify the update mode on master.
 4 --check-read-only //It skips insert operations if the server is read-only status.
 5 --create-table //Create a heartbeat table.
 6 --table //Specify the heartbeat table(default "heartbeat").
 7 --daemonize //Run it as background process.
 8 --dbi-driver //Specify the drive of database(default "mysql",other value is "pg").
 9 --file //Specify a output file when using --monitor.
10 --log //Print all output to the file when using --daemonize
11 --master-server-id //Calculate delay from this master server id.
12 --interval //Specify the frequency to update or check the heartbeat table(default "1.0").
13 --run-time //Specify the period of time to run(default suffix is "s",if you set 60,it means 60s).
14 
15 Notice:
16 Specify at least one of --stop, --update, --monitor, or --check.
17 --update, --monitor, and --check are mutually exclusive.
18 --daemonize and --check are mutually exclusive.

 

SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。 Examples   Execute pt-heartbeat to update the heartbeat of master.
 1 [root@zlm2 03:21:21 ~]
 2 #pt-heartbeat --database=sysbench --create-table --table=hb --daemonize -h192.168.1.101 -P3306 -uzlm -pzlmzlm --update
 3 
 4 [root@zlm2 03:21:34 ~]
 5 #
 6 
 7 (zlm@192.168.1.101 3306)[sysbench]>show tables like 'hb';
 8 +-------------------------+
 9 | Tables_in_sysbench (hb) |
10 +-------------------------+
11 | hb                      |
12 +-------------------------+
13 1 row in set (0.00 sec)
14 
15 (zlm@192.168.1.101 3306)[sysbench]>select * from hb;
16 +----------------------------+-----------+------------------+----------+-----------------------+---------------------+
17 | ts                         | server_id | file             | position | relay_master_log_file | exec_master_log_pos |
18 +----------------------------+-----------+------------------+----------+-----------------------+---------------------+
19 | 2018-07-19T03:34:07.005490 |   1013306 | mysql-bin.000015 |   975231 | NULL                  |                NULL |
20 +----------------------------+-----------+------------------+----------+-----------------------+---------------------+
21 1 row in set (0.00 sec)
22 
23 //The operation above will create a heartbeat table "hb" in database "sysbench" if the target table not exist.
24 //It will update the table which contains the heartbeat info of master every time you execute with "--update".

 

Execute pt-heartbeat to check delay of slave.
 1 [root@zlm2 03:40:19 ~]
 2 #pt-heartbeat --database=sysbench -h192.168.1.102 -P3306 -uzlm -pzlmzlm --check
 3 DBD::mysql::db selectrow_arrayref failed: Table 'sysbench.heartbeat' doesn't exist [for Statement "SHOW CREATE TABLE `sysbench`.`heartbeat`"] at /usr/bin/pt-heartbeat line 5873.
 4 
 5 [root@zlm2 03:40:56 ~]
 6 #pt-heartbeat --database=sysbench --create-table --table=hb -h192.168.1.102 -P3306 -uzlm -pzlmzlm --check
 7 0.00
 8 
 9 (zlm@192.168.1.102 3306)[sysbench]>show tables like 'hb';
10 +-------------------------+
11 | Tables_in_sysbench (hb) |
12 +-------------------------+
13 | hb                      |
14 +-------------------------+
15 1 row in set (0.00 sec)
16 
17 (zlm@192.168.1.102 3306)[sysbench]>select * from hb;
18 +----------------------------+-----------+------------------+----------+-----------------------+---------------------+
19 | ts                         | server_id | file             | position | relay_master_log_file | exec_master_log_pos |
20 +----------------------------+-----------+------------------+----------+-----------------------+---------------------+
21 | 2018-07-19T03:43:53.005890 |   1013306 | mysql-bin.000015 |  2258210 | NULL                  |                NULL |
22 | 2018-07-19 03:42:02        |   1023306 | NULL             |     NULL | NULL                  |                NULL |
23 +----------------------------+-----------+------------------+----------+-----------------------+---------------------+
24 2 rows in set (0.00 sec)
25 
26 //The connection option refers to slave.
27 //It will also create the same heartbeat table on the slave.
28 //The output will show the delay time.

 

Execute pt-heartbeat to monitor delay of slave.
 1 [root@zlm2 03:52:42 ~]
 2 #pt-heartbeat --database=sysbench --create-table --table=hb --daemonize --log=/root/monitor.log -h192.168.1.102 -P3306 -uzlm -pzlmzlm --monitor
 3 
 4 [root@zlm2 03:54:02 ~]
 5 #ls -l|grep monitor
 6 -rw-r--r--   1 root root    2448 Jul 19 03:54 monitor.log
 7 
 8 [root@zlm2 03:54:24 ~]
 9 #cat monitor.log
10 *******************************************************************
11  Using the default of SSL_verify_mode of SSL_VERIFY_NONE for client
12  is deprecated! Please set SSL_verify_mode to SSL_VERIFY_PEER
13  possibly with SSL_ca_file|SSL_ca_path for verification.
14  If you really don't want to verify the certificate and keep the
15  connection open to Man-In-The-Middle attacks please set
16  SSL_verify_mode explicitly to SSL_VERIFY_NONE in your application.
17 *******************************************************************
18   at /usr/bin/pt-heartbeat line 4438.
19 *******************************************************************
20  Using the default of SSL_verify_mode of SSL_VERIFY_NONE for client
21  is deprecated! Please set SSL_verify_mode to SSL_VERIFY_PEER
22  possibly with SSL_ca_file|SSL_ca_path for verification.
23  If you really don't want to verify the certificate and keep the
24  connection open to Man-In-The-Middle attacks please set
25  SSL_verify_mode explicitly to SSL_VERIFY_NONE in your application.
26 *******************************************************************
27   at /usr/bin/pt-heartbeat line 4438.
28 
29 # A software update is available:
30 #   * The current version for Percona::Toolkit is 3.0.5
31 
32 0.00s [  0.00s,  0.00s,  0.00s ]
33 0.00s [  0.00s,  0.00s,  0.00s ]
34 0.00s [  0.00s,  0.00s,  0.00s ]
35 0.00s [  0.00s,  0.00s,  0.00s ]
36 0.00s [  0.00s,  0.00s,  0.00s ]
37 0.00s [  0.00s,  0.00s,  0.00s ]
38 0.00s [  0.00s,  0.00s,  0.00s ]
39 0.00s [  0.00s,  0.00s,  0.00s ]
40 0.00s [  0.00s,  0.00s,  0.00s ]
41 
42 [root@zlm2 03:54:32 ~]
43 #
44 
45 //The output will be write into the logfile sepcified by option "--log".
46 //The logfile will be continuously written unless you've specified th option "--run-time".
47 //It seems no delay between master and slave at all.We can use sysbench generate some transactions to observe the difference.

 

Execute sysbench and watch again.
  1 [root@zlm2 04:16:24 ~/sysbench-1.0/src/lua]
  2 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables=1 --table-size=10000000 --mysql-storage-engine=innodb prepare
  3 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
  4 
  5 Creating table 'sbtest1'...
  6 Inserting 10000000 records into 'sbtest1'
  7 FATAL: mysql_drv_query() returned error 1114 (The table 'sbtest1' is full) for query 'INSERT INTO sbtest1(k, c, pad) VALUES(5007652, '86766538515-66952496290-52637383744-17777505783-29917382722-84335317264-36504610739-38121689798-84219011968-91366385874', '49260963569-36115667021-10552248239-18067079911-61369170530'),(4980838, '54654185213-72432916652-68658723905-01905687478-16685611449-05853232015-51457796697-51819403372-03987006557-33172822697', '21128589837-49224626065-81618031353-10869323172-08902635377'),(4369690, '19346998374-76157809978-61295763131-66817961402-14155793123-89485702729-52643919933-37547938307-65477074083-42947658759', '48407396301-60012651284-13188093294-65156227928-43584415056'),(4980019, '70453863386-09471491416-32761370166-00808275356-79347375847-13151837625-86471458250-39802434455-19087793196-21946389164', '33264881266-44980565111-26757160090-86077734704-00045098929'),(4997783, '58799482920-02512874911-59302214059-00300283635-35103600246-26397800977-48778245550-17735174970-94350595573-16543920606', '23090791422-15031674988-38742619953-96110910554-14814557148'),(4996542, '02526238751-03201222067-54203245066-96888627735-91568973200-70159837175-05408478565-91790491503-65208127071-01416690185', '49867218536-45143413529-09551369789-59447916184-43851467884'),(5016589, '74498181715-26316533762-09669873675-78321331809-32282301673-46976991777-42037713521-13856469837-75059942943-62359417982', '17496574432-77585238967-84065009097-69416190626-30209633909'),(4994933, '71859258017-48270005429-16959639772-39718531870-78995154112-71510567312-49374130735-00512159149-93514688119-69243734179', '23940826906-68127513411-04170272492-93173723876-07024484156'),(5024203, '01612812086-88501262208-29821147265-27833323182-26465297118-67798980074-90884201988-32250326631-52571482980-27120623387', '32676071422-40807168337-23971263649-30743049902-70369274393'),(4162450, '82998769803-54845523696-25503288553-48015207428-64262352808-97661531385-74384257021-19880800516-86393457605-98507752399', '02796854361-18464776198-06166096636-53988338130-49624903182'),(4995444, '66477706887-39253236907-03545461775-18565950790-70596896493-96343527077-27384490842-67789533300-13229220655-04716282051', '93177901449-78610249192-43379190612-81683429736-42624094282'),(5014023, '00026114498-69425370048-46519646170-15138611515-26550011239-94540841776-52595856582-34808319037-14376366590-67052294983', '56614501773-95380362876-81940933213-48069389750-96896566709'),(5024261, '49006870590-81308017607-77748035537-80561018761-10903078210-73474291600-34790301926-33257007983-62587725038-62428934340', '19026821007-38039245675-09902375187-72268734965-98944641058'),(5024236, '10110443501-28024080051-93513912505-09437560828-34230239659-09523768623-23688243983-59021499845-46898378159-07771020838', '24997608907-31970368651-18989687877-75418790951-46453076386'),(5049980, '19429198356-67366238484-55380613354-35977328479-83088969020-14656225800-25216048033-45250663251-98396746788-21125067075', '71090143904-45428036794-81542588223-10154666659-75546296798'),(5471904, '75703864024-12238744221-17802200544-33445694372-23795426538-35115713085-36013837723-42595287843-50130126814-13689292194', '33729585352-83996219254-04756782645-33242342933-26629849933'),(5019788, '26736440482-52751355069-31401045555-46875459054-74471530241-80094262346-04701622365-49412060729-73926195317-10568100029', '94218352951-20508982927-74070061682-74623136939-21804519403'),(5021303, '68369236188-31813582854-91111181432-49899358000-41024394879-31154794562-68384652868-38150046317-87574937898-60778176522', '57835818598-56026673181-41314509499-62765615632-08938407644'),(5015123, '67124253129-45231369455-80389246483-30556743024-23531555287-47922585120-65610495252-46796320051-45675319664-05751522029', '17364348778-61672427639-08537678844-68212837883-00352572907'),(3676715, '05788144973-64579114586-22354309091-20803641999-44828403219-79251514115-97961584283-16337702597-51136491029-67923490682', '13053760861-60814974240-44320882636-73693584211-05836315795'),(5236763, '68612629880-77489830256-75974497057-76984695030-FATAL: `sysbench.cmdline.call_command' function failed: ./oltp_common.lua:230: db_bulk_insert_next() failed
  8 
  9 //Master
 10 [root@zlm2 04:33:54 ~/sysbench-1.0/src/lua]
 11 #df -h
 12 Filesystem               Size  Used Avail Use% Mounted on
 13 /dev/mapper/centos-root  8.4G  8.4G   68M 100% / //The disk space of master is full.
 14 devtmpfs                 488M     0  488M   0% /dev
 15 tmpfs                    497M     0  497M   0% /dev/shm
 16 tmpfs                    497M  6.6M  491M   2% /run
 17 tmpfs                    497M     0  497M   0% /sys/fs/cgroup
 18 /dev/sda1                497M  118M  379M  24% /boot
 19 none                      87G   80G  6.6G  93% /vagrant
 20 
 21 //Slave
 22 [root@zlm3 04:34:17 ~]
 23 #df -h
 24 Filesystem               Size  Used Avail Use% Mounted on
 25 /dev/mapper/centos-root  8.4G  8.4G   20K 100% / //The disk space of slave is full,too.
 26 devtmpfs                 488M     0  488M   0% /dev
 27 tmpfs                    497M     0  497M   0% /dev/shm
 28 tmpfs                    497M  6.5M  491M   2% /run
 29 tmpfs                    497M     0  497M   0% /sys/fs/cgroup
 30 /dev/sda1                497M  118M  379M  24% /boot
 31 none                      87G   80G  6.6G  93% /vagrant
 32 
 33 //Check the output logfile.
 34 [root@zlm2 04:18:21 ~]
 35 #tail monitor.log 
 36 1.00s [  6.40s,  1.65s,  0.55s ]
 37 0.00s [  6.26s,  1.65s,  0.55s ]
 38 0.99s [  6.13s,  1.65s,  0.55s ]
 39 1.99s [  6.00s,  1.66s,  0.55s ]
 40 2.99s [  5.86s,  1.67s,  0.56s ]
 41 3.99s [  5.73s,  1.68s,  0.56s ]
 42 4.99s [  5.59s,  1.70s,  0.57s ]
 43 6.00s [  5.46s,  1.72s,  0.57s ]
 44 4.01s [  5.29s,  1.73s,  0.58s ]
 45 4.99s [  5.16s,  1.75s,  0.58s ]
 46 
 47 //Long time later.
 48 
 49 [root@zlm2 04:59:24 ~]
 50 #tail monitor.log 
 51 1769.00s [ 1739.50s, 1619.50s, 1319.50s ]
 52 1769.99s [ 1740.50s, 1620.50s, 1320.50s ]
 53 1770.99s [ 1741.50s, 1621.50s, 1321.50s ]
 54 1771.99s [ 1742.50s, 1622.50s, 1322.50s ]
 55 1772.99s [ 1743.50s, 1623.50s, 1323.50s ]
 56 1773.99s [ 1744.50s, 1624.50s, 1324.50s ]
 57 1774.99s [ 1745.50s, 1625.50s, 1325.50s ]
 58 1775.99s [ 1746.50s, 1626.50s, 1326.50s ]
 59 1776.99s [ 1747.50s, 1627.50s, 1327.50s ]
 60 1778.00s [ 1748.50s, 1628.50s, 1328.50s ]
 61 
 62 [root@zlm2 05:02:54 ~]
 63 #tail monitor.log 
 64 2122.00s [ 2092.50s, 1972.50s, 1672.50s ]
 65 2123.00s [ 2093.50s, 1973.50s, 1673.50s ]
 66 2124.00s [ 2094.50s, 1974.50s, 1674.50s ]
 67 2125.00s [ 2095.50s, 1975.50s, 1675.50s ]
 68 2126.00s [ 2096.50s, 1976.50s, 1676.50s ]
 69 2127.00s [ 2097.50s, 1977.50s, 1677.50s ]
 70 2127.99s [ 2098.50s, 1978.50s, 1678.50s ]
 71 2129.00s [ 2099.50s, 1979.50s, 1679.50s ]
 72 2129.99s [ 2100.50s, 1980.50s, 1680.50s ]
 73 2131.00s [ 2101.50s, 1981.50s, 1681.50s ]
 74 
 75 (zlm@192.168.1.102 3306)[sysbench]>show slave status\G
 76 ERROR 2013 (HY000): Lost connection to MySQL server during query
 77 (zlm@192.168.1.102 3306)[sysbench]>show slave status\G
 78 ERROR 2006 (HY000): MySQL server has gone away
 79 No connection. Trying to reconnect...
 80 Connection id:    10
 81 Current database: sysbench
 82 
 83 //The slave has hung because of the lack of disk space.
 84 //The delay continuously increasing what can be seen in the logfile on master.
 85 
 86 #180719  4:31:55 server id 1013306  end_log_pos 2044211         Table_map: `sysbench`.`hb` mapped to number 108
 87 # at 2044211
 88 #180719  4:31:55 server id 1013306  end_log_pos 2044359         Update_rows: table id 108 flags: STMT_END_F
 89 ### UPDATE `sysbench`.`hb`
 90 ### WHERE
 91 ###   @1='2018-07-19T04:31:55.004000' /* VARSTRING(78) meta=78 nullable=0 is_null=0 */
 92 ###   @2=1013306 /* INT meta=0 nullable=0 is_null=0 */
 93 ###   @3='mysql-bin.000019' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */
 94 ###   @4=2043294 /* LONGINT meta=0 nullable=1 is_null=0 */
 95 ###   @5=NULL /* VARSTRING(765) meta=765 nullable=1 is_null=1 */
 96 ###   @6=NULL /* LONGINT meta=0 nullable=1 is_null=1 */
 97 ### SET
 98 ###   @1='2018-07-19T04:31:55.004400' /* VARSTRING(78) meta=78 nullable=0 is_null=0 */
 99 ###   @2=1013306 /* INT meta=0 nullable=0 is_null=0 */
100 ###   @3='mysql-bin.000019' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */
101 ###   @4=2043294 /* LONGINT meta=0 nullable=1 is_null=0 */
102 ###   @5=NULL /* VARSTRING(765) meta=765 nullable=1 is_null=1 */
103 ###   @6=NULL /* LONGINT meta=0 nullable=1 is_null=1 */
104 # at 2044359
105 #180719  4:31:55 server id 1013306  end_log_pos 2044386         Xid = 94022
106 COMMIT/*!*/;
107 
108 //The output above is the detail of updating heartbeat table "hb" on master.

 

Summary
  • pt-heartbeat is rather useful in checking replica lag of MySQL and it also support PostgreSQL.
  • pt-heartbeat will read connection options in my.cnf by default if you do not specify them.
  • If you've executed the pt-heartbeat with option "--daemonize",don't forget to kill the redundant processes.
  • We can retrieve these lag information in heartbeat table using our monitor tool or script.

 

扫码关注我们
微信号:SRE实战
拒绝背锅 运筹帷幄