mha0.56版本安装使用排错

佚名 7年前 (2019-05-08) 随笔 1356人围观抢沙发百度已收录

1.master_check_ssh --conf=/etc/app1.conf 这个检查就报错的我觉得百分之九十都是ssh之间连接问题。务必要保证各节点之间都可以免秘钥访问！ 2.master_check_repl --conf=/etc/app1.conf (1)报错代码：类似就是说什么copyuser复制用户在节点没有权限的代码，解决方法是每个节点创建这个用户即可。要是主从复制已经开启，记得节点先stop slave；再分别创建用户。 MHA版本，应该需要在所有的数据库中都开启二进制日志，中继日志，授权也应该都相同，配置文件也基本相同。我想在这个前提下在安装执行MHA应该不会遇上太多问题了。只是目前还不能确定这种做法是不是正解。 (2)报错代码： Tue Apr 30 09:26:44 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr 30 09:26:44 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Tue Apr 30 09:26:44 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
Tue Apr 30 09:26:44 2019 - [info] MHA::MasterMonitor version 0.56.
Tue Apr 30 09:26:45 2019 - [info] GTID failover mode = 0
Tue Apr 30 09:26:45 2019 - [info] Dead Servers:
Tue Apr 30 09:26:45 2019 - [info] Alive Servers:
Tue Apr 30 09:26:45 2019 - [info]   103.75.1.22(103.75.1.22:3306)
Tue Apr 30 09:26:45 2019 - [info]   103.75.1.23(103.75.1.23:3306)
Tue Apr 30 09:26:45 2019 - [info]   103.75.1.24(103.75.1.24:3306)
Tue Apr 30 09:26:45 2019 - [info] Alive Slaves:
Tue Apr 30 09:26:45 2019 - [info]   103.75.1.23(103.75.1.23:3306) Version=5.7.25-log (oldest major version between slaves) log-bin:enabled
Tue Apr 30 09:26:45 2019 - [info]     Replicating from 103.75.1.22(103.75.1.22:3306)
Tue Apr 30 09:26:45 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr 30 09:26:45 2019 - [info]   103.75.1.24(103.75.1.24:3306) Version=5.7.25-log (oldest major version between slaves) log-bin:enabled
Tue Apr 30 09:26:45 2019 - [info]     Replicating from 103.75.1.22(103.75.1.22:3306)
Tue Apr 30 09:26:45 2019 - [info] Current Alive Master: 103.75.1.22(103.75.1.22:3306)
Tue Apr 30 09:26:45 2019 - [info] Checking slave configurations..
Tue Apr 30 09:26:45 2019 - [info] read_only=1 is not set on slave 103.75.1.24(103.75.1.24:3306).
Tue Apr 30 09:26:45 2019 - [info] Checking replication filtering settings..
Tue Apr 30 09:26:45 2019 - [info] binlog_do_db= , binlog_ignore_db=
Tue Apr 30 09:26:45 2019 - [info] Replication filtering check ok.
Tue Apr 30 09:26:45 2019 - [info] GTID (with auto-pos) is not supported
Tue Apr 30 09:26:45 2019 - [info] Starting SSH connection tests..
Tue Apr 30 09:26:53 2019 - [info] All SSH connection tests passed successfully.
Tue Apr 30 09:26:53 2019 - [info] Checking MHA Node version..
Tue Apr 30 09:26:57 2019 - [info] Version check ok.
Tue Apr 30 09:26:57 2019 - [info] Checking SSH publickey authentication settings on the current master..
Tue Apr 30 09:26:58 2019 - [info] HealthCheck: SSH to 103.75.1.22 is reachable.
Tue Apr 30 09:26:59 2019 - [info] Master MHA Node version is 0.56.
Tue Apr 30 09:26:59 2019 - [info] Checking recovery script configurations on 103.75.1.22(103.75.1.22:3306)..
Tue Apr 30 09:26:59 2019 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data --output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000008
Tue Apr 30 09:26:59 2019 - [info]   Connecting to root@103.75.1.22(103.75.1.22:22)..
Failed to save binary log: Binlog not found from /data! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again.
at /usr/bin/save_binary_logs line 123
    eval {...} called at /usr/bin/save_binary_logs line 70
    main::main() called at /usr/bin/save_binary_logs line 66
Tue Apr 30 09:27:00 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln158] Binlog setting check failed!
Tue Apr 30 09:27:00 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln405] Master configuration failed.
Tue Apr 30 09:27:00 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48
Tue Apr 30 09:27:00 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Tue Apr 30 09:27:00 2019 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK! 解决方法：如果手动定义了二进制日志文件的路径，就必须在mha的配置文件中制定master_binlog_dir=‘二进制日志文件所在目录'   我是直接在app1.conf配置文件#注释掉这个master_binlog_dir=/data
(3)报错代码：

Tue Apr 30 10:04:21 2019 - [info] Checking replication health on 103.75.1.23..
Tue Apr 30 10:04:21 2019 - [info] ok.
Tue Apr 30 10:04:21 2019 - [info] Checking replication health on 103.75.1.24..
Tue Apr 30 10:04:21 2019 - [info] ok.
Tue Apr 30 10:04:21 2019 - [warning] master_ip_failover_script is not defined.
Tue Apr 30 10:04:21 2019 - [warning] shutdown_script is not defined.
Tue Apr 30 10:04:21 2019 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

这个报错代码出现在检查的最后面，意思是未定义这两个文件。未定义这两个文件我直接启动manage是卡住的。解决方法，在app1.conf配置文件添加master_ip_failover_scipt='脚本文件目录' 附脚本地址:http://control.blog.sina.com.cn/admin/article/article_edit.php?blog_id=b4fca5310102yan0

(3)报错代码： 103.75.1.22(103.75.1.22:3306) (current master)
+--103.75.1.23(103.75.1.23:3306)
+--103.75.1.24(103.75.1.24:3306)

Tue Apr 30 10:44:55 2019 - [info] Checking replication health on 103.75.1.23..
Tue Apr 30 10:44:55 2019 - [info] ok.
Tue Apr 30 10:44:55 2019 - [info] Checking replication health on 103.75.1.24..
Tue Apr 30 10:44:55 2019 - [info] ok.
Tue Apr 30 10:44:55 2019 - [info] Checking master_ip_failover_script status:
Tue Apr 30 10:44:55 2019 - [info]   /data/mastermha/app1/master_ip_failover --command=status --ssh_user=root --orig_master_host=103.75.1.22 --orig_master_ip=103.75.1.22 --orig_master_port=3306
Tue Apr 30 10:44:55 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. Can't exec "/data/mastermha/app1/master_ip_failover": Permission denied at /usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm line 68.
Tue Apr 30 10:44:55 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Tue Apr 30 10:44:55 2019 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!
Tue Apr 30 10:44:55 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln226] Failed to get master_ip_failover_script status with return code 1:0.
Tue Apr 30 10:44:55 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48
Tue Apr 30 10:44:55 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Tue Apr 30 10:44:55 2019 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK! 这个报错查了很多资料。我一直以为是我的master_ip_fialover脚本有问题。结果不是，是这个脚本没有执行权限，参考解决办法：赋权！ chmod +x /data/mastermha/app1/master_ip_failover 再次执行发现解决！！附完工图！ [root@localhost ~]# chmod +x /data/mastermha/app1/master_ip_failover
[root@localhost ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
Tue Apr 30 10:51:59 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr 30 10:51:59 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Tue Apr 30 10:51:59 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
Tue Apr 30 10:51:59 2019 - [info] MHA::MasterMonitor version 0.56.
Tue Apr 30 10:52:00 2019 - [info] GTID failover mode = 0
Tue Apr 30 10:52:00 2019 - [info] Dead Servers:
Tue Apr 30 10:52:00 2019 - [info] Alive Servers:
Tue Apr 30 10:52:00 2019 - [info]   103.75.1.22(103.75.1.22:3306)
Tue Apr 30 10:52:00 2019 - [info]   103.75.1.23(103.75.1.23:3306)
Tue Apr 30 10:52:00 2019 - [info]   103.75.1.24(103.75.1.24:3306)
Tue Apr 30 10:52:00 2019 - [info] Alive Slaves:
Tue Apr 30 10:52:00 2019 - [info]   103.75.1.23(103.75.1.23:3306) Version=5.7.25-log (oldest major version between slaves) log-bin:enabled
Tue Apr 30 10:52:00 2019 - [info]     Replicating from 103.75.1.22(103.75.1.22:3306)
Tue Apr 30 10:52:00 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr 30 10:52:00 2019 - [info]   103.75.1.24(103.75.1.24:3306) Version=5.7.25-log (oldest major version between slaves) log-bin:enabled
Tue Apr 30 10:52:00 2019 - [info]     Replicating from 103.75.1.22(103.75.1.22:3306)
Tue Apr 30 10:52:00 2019 - [info] Current Alive Master: 103.75.1.22(103.75.1.22:3306)
Tue Apr 30 10:52:00 2019 - [info] Checking slave configurations..
Tue Apr 30 10:52:00 2019 - [info] read_only=1 is not set on slave 103.75.1.24(103.75.1.24:3306).
Tue Apr 30 10:52:00 2019 - [info] Checking replication filtering settings..
Tue Apr 30 10:52:00 2019 - [info] binlog_do_db= , binlog_ignore_db=
Tue Apr 30 10:52:00 2019 - [info] Replication filtering check ok.
Tue Apr 30 10:52:00 2019 - [info] GTID (with auto-pos) is not supported
Tue Apr 30 10:52:00 2019 - [info] Starting SSH connection tests..
Tue Apr 30 10:52:07 2019 - [info] All SSH connection tests passed successfully.
Tue Apr 30 10:52:07 2019 - [info] Checking MHA Node version..
Tue Apr 30 10:52:11 2019 - [info] Version check ok.
Tue Apr 30 10:52:11 2019 - [info] Checking SSH publickey authentication settings on the current master..
Tue Apr 30 10:52:12 2019 - [info] HealthCheck: SSH to 103.75.1.22 is reachable.
Tue Apr 30 10:52:14 2019 - [info] Master MHA Node version is 0.56.
Tue Apr 30 10:52:14 2019 - [info] Checking recovery script configurations on 103.75.1.22(103.75.1.22:3306)..
Tue Apr 30 10:52:14 2019 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000008
Tue Apr 30 10:52:14 2019 - [info]   Connecting to root@103.75.1.22(103.75.1.22:22)..
Creating /data/mastermha/app1 if not exists..    ok.
Checking output directory is accessible or not..
   ok.
Binlog found at /var/lib/mysql, up to master-bin.000008
Tue Apr 30 10:52:16 2019 - [info] Binlog setting check done.
Tue Apr 30 10:52:16 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Apr 30 10:52:16 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=103.75.1.23 --slave_ip=103.75.1.23 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=5.7.25-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Tue Apr 30 10:52:16 2019 - [info]   Connecting to root@103.75.1.23(103.75.1.23:22)..
Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to relay-log.000005
    Temporary relay log file is /var/lib/mysql/relay-log.000005
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Apr 30 10:52:17 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=103.75.1.24 --slave_ip=103.75.1.24 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=5.7.25-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Tue Apr 30 10:52:17 2019 - [info]   Connecting to root@103.75.1.24(103.75.1.24:22)..
Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to relay-log.000006
    Temporary relay log file is /var/lib/mysql/relay-log.000006
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Apr 30 10:52:19 2019 - [info] Slaves settings check done.
Tue Apr 30 10:52:19 2019 - [info]
103.75.1.22(103.75.1.22:3306) (current master)
+--103.75.1.23(103.75.1.23:3306)
+--103.75.1.24(103.75.1.24:3306)

Tue Apr 30 10:52:19 2019 - [info] Checking replication health on 103.75.1.23..
Tue Apr 30 10:52:19 2019 - [info] ok.
Tue Apr 30 10:52:19 2019 - [info] Checking replication health on 103.75.1.24..
Tue Apr 30 10:52:19 2019 - [info] ok.
Tue Apr 30 10:52:19 2019 - [info] Checking master_ip_failover_script status:
Tue Apr 30 10:52:19 2019 - [info]   /data/mastermha/app1/master_ip_failover --command=status --ssh_user=root --orig_master_host=103.75.1.22 --orig_master_ip=103.75.1.22 --orig_master_port=3306

IN SCRIPT TEST====/sbin/ifconfig bond1:1 down==/sbin/ifconfig bond1:1 103.75.1.30/26===

Checking the Status of the script.. OK
SIOCSIFADDR: No such device
SIOCSIFNETMASK: No such device
SIOCGIFADDR: No such device
SIOCSIFBROADCAST: No such device
bond1:1: unknown interface: No such device
Tue Apr 30 10:52:21 2019 - [info] OK.
Tue Apr 30 10:52:21 2019 - [warning] shutdown_script is not defined.
Tue Apr 30 10:52:21 2019 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

3.master_manage --conf=/etc/app1.conf 这里我卡住。后来查找资料发现启动方式不一样 [root@localhost ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf > /data/mastermha/app1/manager.log &1 &
[1] 2190
上面的就是启动命令，需要启动文件和日志 [root@localhost ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 monitoring program is now on initialization phase(10:INITIALIZING_MONITOR). Wait for a while and try checking again. 查看状态就会提示在初始化，稍后一段时间，再次执行就会发现启动成功 app1 monitoring program is now on initialization phase(10:INITIALIZING_MONITOR). Wait for a while and try checking again.
[root@localhost ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:2190) is running(0:PING_OK), master:103.75.1.22