MySQL MHA

1、什么是MHA

MHA(MasterHigh Availability)是一套优秀的MySQL高可用环境下故障切换和主从复制的软件。
MHA 的出现就是解决MySQL 单点的问题,实现MySQL主从故障切换功能。
MySQL故障切换过程中,MHA能做到0-30秒内自动完成故障切换操作。
MHA能在故障切换的过程中最大程度上保证数据的一致性,以达到真正意义上的高可用。

2、MHA组成

MHA组成 作用
MHA Node(数据节点) MHA Node 运行在每台 MySQL 服务器上。
MHA Manager(管理节点) MHA Manager 可以单独部署在一台独立的机器上,管理多个 master-slave 集群;也可以部署在一台 slave 节点上。MHA Manager 会定时探测集群中的 master 节点。当 master 出现故障时,它可以自动将最新数据的 slave 提升为新的 master, 然后将所有其他的 slave 重新指向新的 master。整个故障转移过程对应用程序完全透明。

3、MHA特点

自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据不丢失,使用半同步复制,可以大大降低数据丢失的风险,如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性,目前MHA支持一主多从架构,最少三台服务,即一主两从。

注意:MHA是一次性的,当完成一次主从切换后,MHA进程将自动退出;必须重新对MHA进行配置恢复才能够重新启用。

4、MHA部署

实验思路:

  • MHA架构

    • 数据库安装
    • 一主两从
    • MHA搭建
  • 故障模拟

    • 主库失效
    • 备选主库成为主库
    • 原故障主库恢复重新加入到MHA成为从库

实验环境 :

实验主机 操作系统 安装组件
10.0.0.17(Master) CentOS7.4(64 位) 该节点是manager节点,安装mysql5.7、MHA manager和 node 组件
10.0.0.27(Slave) CentOS7.4(64 位) 安装mysql5.7、MHA node 组件
10.0.0.37(Slave) CentOS7.4(64 位) 安装mysql5.7、MHA node 组件

MHA下载地址:https://github.com/yoshinorim/

实验步骤:

1.创建软连接,所有节点都需操作

[root@db-17 ~]# ln -s /opt/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
[root@db-17 ~]# ln -s /opt/mysql/bin/mysql /usr/bin/mysql
#其他节点同理

2.配置各主机之间ssh免密登录

[root@db-17 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:CJZrb8tk1KjfsFbhr7UB3iuo/5SjBq9NpJMGV0X4AB4 root@db-17
The key's randomart image is:
+---[RSA 2048]----+
|    E. oo        |
|   . oo.         |
|    = .o         |
|   . + +o        |
|  . + =.S.       |
|   +.B .o+       |
|    *oBo=.+      |
|   . X=B.o.+     |
|    o*O.+oo      |
+----[SHA256]-----+

[root@db-17 ~]# ssh-copy-id root@10.0.0.17
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host '10.0.0.17 (10.0.0.17)' can't be established.
ECDSA key fingerprint is SHA256:rTf+awJf/okAhV+47f4pbkueWtE//bqfsTNJmzaw0Rg.
ECDSA key fingerprint is MD5:7a:3e:15:3a:6f:d3:9f:5c:25:d2:73:ae:a2:4c:0b:da.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@10.0.0.17's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@10.0.0.17'"
and check to make sure that only the key(s) you wanted were added.

[root@db-17 ~]# ssh-copy-id root@10.0.0.27
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@10.0.0.27's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@10.0.0.27'"
and check to make sure that only the key(s) you wanted were added.

[root@db-17 ~]# ssh-copy-id root@10.0.0.37
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@10.0.0.37's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@10.0.0.37'"
and check to make sure that only the key(s) you wanted were added.

#其他节点同理

3.所有节点安装MHA node软件,10.0.0.17节点安装MHA manager软件

#所有节点安装node软件依赖包
yum -y install epel-release --nogpgcheck
yum -y install perl-DBD-MySQL

#所有节点安装node软件
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm

#node 组件安装后会生成几个脚本(这些工具通常由MHA Manager的脚本触发,无需人为操作):
save_binary_logs	#保存和复制 master 的二进制日志
apply_diff_relay_logs	#识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog	#去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs	#清除中继日志(不会阻塞SQL线程)

#manager节点安装manager软件依赖包
yum -y install perl-Config-Tiny \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN

#manager节点安装manager软件
rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

#manager软件安装后会生成几个工具,主要包括以下几个:
masterha_check_ssh	#检查MHA的SSH配置状况
masterha_check_repl	#检查MySQL复制状况
masterha_manger	#启动manager的脚本
masterha_check_status	#检测当前MHA运行状态
masterha_master_monitor	#检测master是否宕机
masterha_master_switch	#控制故障转移(自动或者手动)
masterha_conf_host	#添加或删除配置的server信息
masterha_stop	#关闭manager

4.配置数据库主从复制,并在数据库中创建mha用户

#主从复制配置此处省略,如有需要可查阅前面文章;注意从库必须配置为只读模式
mysql> set global read_only=1;

#主从复制配置完毕后,创建mha数据库用户,在主库中创建即可
mysql> grant all privileges on *.* to 'mha'@'10.0.0.%' identified by '1';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

5.在manager节点上准备MHA配置文件

#创建配置文件目录和日志目录
[root@db-17 ~]# mkdir /etc/mha
[root@db-17 ~]# mkdir -p /var/log/mha/app1

#编写MHA配置文件
[root@db-17 ~]# vim /etc/mha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1
master_binlog_dir=/var/log/mysql
user=mha
password=1
ping_interval=2
repl_user=zcfz
repl_password=hzz123!
ssh_user=root

[server1]
hostname=10.0.0.17
port=3306

[server2]
candidate_master=1
check_repl_delay=0
hostname=10.0.0.27
port=3306

[server3]
hostname=10.0.0.37
port=3306

#配置文件解析
[server default]
manager_log=/var/log/mha/app1/manager.log	#MHA日志
manager_workdir=/var/log/mha/app1	#MHA工作目录
master_binlog_dir=/var/log/mysql	#主库的binlog目录
user=mha	#监控用的数据库用户
password=1	#监控用的数据库用户的密码
ping_interval=2	#心跳检测间隔时间
repl_user=zcfz	#主从复制使用的用户
repl_password=hzz123!	#主从复制用户的密码
ssh_user=root	#ssh用户

[server1]	#MHA node节点信息
hostname=10.0.0.17	#节点IP地址或域名
port=3306	#数据库端口

[server2]
candidate_master=1	#设置为候选master,设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个从库不是集群中最新的slave
check_repl_delay=0	#强制设置为候选master,默认情况下主从切换时,从库中继日志落后于主库大于100M时,MHA将不在选择该从库成为主库,该参数用于忽略中继日志落后情况
hostname=10.0.0.27
port=3306

[server3]
hostname=10.0.0.37
port=3306

6.在manager节点上进行状态检测,检测主从复制状态和ssh状态

#检测主从复制状态,出现ok说明没问题
[root@db-17 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf

#检测ssh免密登录状态,出现successfully说明没问题
[root@db-17 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf

7.在manager节点上开启MHA manager程序

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &

--remove_dead_master_conf	#该参数代表当发生主从切换后,离线数据库节点信息将从配置文件中移除
--ignore_last_failover	#在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA 发生切换后会在日志记目录,也就是上面设置的日志app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover。

8.在manager节点上查看MHA状态

[root@db-17 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:2824) is running(0:PING_OK), master:10.0.0.17

5、MHA应用透明

1.VIP故障转移脚本配置,将脚本上传到manager节点,然后授予权限

故障转移脚本下载地址:https://github.com/yoshinorim/mha4mysql-manager/tree/master/samples/scripts

[root@db-17 ~]# mv master_ip_failover master_ip_online_change power_manager send_report /usr/local/bin/ 
[root@db-17 ~]# chmod +x /usr/local/bin/*
[root@db-17 ~]# ll /usr/local/bin/
total 32
-rwxr-xr-x 1 root root  3648 Mar 23  2018 master_ip_failover
-rwxr-xr-x 1 root root  9870 Mar 23  2018 master_ip_online_change
-rwxr-xr-x 1 root root 11867 Mar 23  2018 power_manager
-rwxr-xr-x 1 root root  1360 Mar 23  2018 send_report

2.替换字符,将Windows字符转换为Linux字符

[root@db-17 ~]# yum -y install dos2unix
[root@db-17 ~]# dos2unix /usr/local/bin/*
dos2unix: converting file /usr/local/bin/master_ip_failover to Unix format ...
dos2unix: converting file /usr/local/bin/master_ip_online_change to Unix format ...
dos2unix: converting file /usr/local/bin/power_manager to Unix format ...
dos2unix: converting file /usr/local/bin/send_report to Unix format ...

3.修改脚本内容,删除脚本中所有内容,添加下面脚本内容,按情况修改my $vip,my $key,my $ssh_start_vip,my $ssh_stop_vip,my $ssh_Bcast_arp的值

[root@db-17 ~]# vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

my $vip = '10.0.0.100/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens36:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens36:$key down";
my $ssh_Bcast_arp = "/usr/sbin/arping -q -A -c 3 -I ens36 $vip";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    elsif ( $command eq "change" ) {
        &stop_vip();
        &start_vip();
        print "Change the VIP of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status|change --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

4.修改MHA配置文件,调用VIP故障转移脚本

[root@db-17 ~]# vim /etc/mha/app1.cnf
[server default]
.......
master_ip_failover_script=/usr/local/bin/master_ip_failover

5.重启MHA

[root@db-17 ~]# masterha_stop --conf=/etc/mha/app1.cnf
Stopped app1 successfully.

[root@db-17 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &

6.手动在主库节点添加VIP

[root@db-17 ~]# ifconfig ens36:1 10.0.0.100/24

7.手动切换VIP命令

master_ip_failover --command=change --ssh_user=root  --orig_master_host=10.0.0.27 --orig_master_ip=10.0.0.27 --orig_master_port=3306 --new_master_host=10.0.0.17 --new_master_ip=10.0.0.17 --new_master_port=3306

此处使用ifconfig命令来实现VIP转移,也可使用keepalive实现VIP的转移。

6、MHA邮件告警

1.修改之前上传的“send_report”脚本,该脚本通过邮件的方式发送故障提醒;以下为脚本内容,根据情况修改my $smtp,my $mail_from,my $mail_user,my $mail_pass,my $mail_to的值

[root@db-17 ~]# vim /usr/local/bin/send_report
#!/usr/bin/perl

use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;
 
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );

my $smtp = 'smtp.qq.com';
my $mail_from = '2207118234@qq.com';
my $mail_user = '2207118234@qq.com';
my $mail_pass = 'xxxxxxx';
my $mail_to = '2207118234@qq.com';
#my $mail_to=['to1@qq.com','to2@qq.com'];
 
GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);
 
# Do whatever you want here
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
 
sub mailToContacts {
    my ($smtp, $mail_from, $mail_user, $mail_pass, $mail_to, $subject, $msg ) = @_;
    open my $DEBUG, ">/var/log/mha/app1/mail.log"
        or die "Can't open the debug    file:$!\n";
    my $sender = new Mail::Sender {
        ctype        => 'text/plain;charset=utf-8',
        encoding    => 'utf-8',
        smtp        => $smtp,
        from        => $mail_from,
        auth        => 'LOGIN',
        TLS_allowed    => '0',
        authid        => $mail_user,
        authpwd        => $mail_pass,
        to        => $mail_to,
        subject        => $subject,
        debug        => $DEBUG
    };
    $sender->MailMsg(
        {
            msg => $msg,
            debug => $DEBUG
        }
    ) or print $Mail::Sender::Error;
    return 1;
}
 
exit 0;

2.修改MHA配置文件,调用故障提醒脚本

[root@db-17 ~]# vim /etc/mha/app1.cnf
[server default]
.......
report_script=/usr/local/bin/send_report

3.重启MHA

[root@db-17 ~]# masterha_stop --conf=/etc/mha/app1.cnf
Stopped app1 successfully.

[root@db-17 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &

7、MHA数据补偿

1.随便选一台数据库服务器,配置binlog server提供数据补偿功能;在manager配置文件中添加binlog server配置

[root@db-17 ~]# vim /etc/mha/app1.cnf
[binlog1]
no_master=1
hostname=10.0.0.17
port=3306
master_binlog_dir=/data/binlog_server

2.在对应的binlog server服务器上创建相应目录并授权

[root@db-17 ~]# mkdir /data/binlog_server
[root@db-17 ~]# chown -R mysql.mysql /data/binlog_server

3.查看从库同步到哪个binlog文件,从最新的binlog文件开始拉取binlog日志

[root@db-17 ~] mysql -h10.0.0.27 -umha -p1 -e 'show slave status \G' | grep 'Master_Log'
              Master_Log_File: mysql_bin.000011
          Read_Master_Log_Pos: 194
        Relay_Master_Log_File: mysql_bin.000011
          Exec_Master_Log_Pos: 194
[root@db-17 ~]# cd /data/binlog_server/
[root@db-17 /data/binlog_server]# nohup mysqlbinlog -R --host=10.0.0.17 --user=mha --password=1 --raw --stop-never mysql_bin.000011 &> /dev/null &

[root@db-17 /data/binlog_server]# ll
total 4
-rw-r----- 1 root root 194 Apr  2 15:01 mysql_bin.000011

4.重启MHA

[root@db-17 ~]# masterha_stop --conf=/etc/mha/app1.cnf
Stopped app1 successfully.

[root@db-17 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &

8、MHA故障模拟

故障模拟流程:

1、关闭主库
2、查看VIP是否转移
3、查看从库是否成功升为主库
4、查看主从复制关系是否切换成功

1.关闭主库

[root@db-17 ~]# systemctl stop mysqld.service

#注意MHA是一次性的,当完成主从切换后MHA和mysqlbinlog进程已经自动停止
[1]-  Done                    nohup mysqlbinlog -R --host=10.0.0.17 --user=mha --password=1 --raw --stop-never mysql_bin.000011 &>/dev/null
[2]+  Done                    nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1

2.查看VIP是否转移

[root@db-17 ~]# ip a
3: ens36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:6a:bc:97 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.17/24 brd 10.0.0.255 scope global ens36
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe6a:bc97/64 scope link 
       valid_lft forever preferred_lft forever

[root@db-27 ~]# ip a
3: ens36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:2b:c5:e8 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.27/24 brd 10.0.0.255 scope global ens36
       valid_lft forever preferred_lft forever
    inet 10.0.0.100/24 brd 10.0.0.255 scope global secondary ens36:1
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe2b:c5e8/64 scope link 
       valid_lft forever preferred_lft forever

#VIP成功转移

**3.在其他从库上查看是否成功切换为新的主库,同时查看主从复制关系也已经切换成功 **

[root@db-37 ~]# mysql -uroot -p1 -e 'show slave status \G' | grep "Master_Host"
                  Master_Host: 10.0.0.27

[root@db-37 ~]# mysql -uroot -p1 -e 'show slave status \G' | grep "Slave_"
               Slave_IO_State: Waiting for master to send event
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

9、MHA重新上线

1.确保所有MySQL服务处于正常状态

[root@db-17 ~]# systemctl start mysqld.service

2.修复主从复制关系,目前10.0.0.27为主库,修改原主库为10.0.0.27的从库

[root@db-17 ~]# mysql -uroot -p1
mysql> CHANGE MASTER TO
MASTER_HOST='10.0.0.27',
MASTER_PORT=3306,
MASTER_USER='zcfz',
MASTER_PASSWORD='hzz123!',
MASTER_AUTO_POSITION=1;
Query OK, 0 rows affected, 2 warnings (0.01 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

3.确保VIP存在于主库服务器上

[root@db-27 ~]# ip a
3: ens36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:2b:c5:e8 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.27/24 brd 10.0.0.255 scope global ens36
       valid_lft forever preferred_lft forever
    inet 10.0.0.100/24 brd 10.0.0.255 scope global secondary ens36:1
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe2b:c5e8/64 scope link 
       valid_lft forever preferred_lft forever

4.检查binlog server并修复

#确保进程已经退出
[root@db-17 ~]# ps aux | grep [m]ysqlbinlog

#删除之前主库的binlog同步文件
[root@db-17 ~]# rm -rf /data/binlog_server/*

#重启开始同步主库的binlog,此时主库已经切换为10.0.0.27
[root@db-17 ~]# mysql -h10.0.0.37 -umha -p1 -e 'show slave status \G' | grep 'Master_Log'
              Master_Log_File: mysql_bin.000005
          Read_Master_Log_Pos: 194
        Relay_Master_Log_File: mysql_bin.000005
          Exec_Master_Log_Pos: 194

[root@db-17 ~]# cd /data/binlog_server/
[root@db-17 /data/binlog_server]# nohup mysqlbinlog -R --host=10.0.0.27 --user=mha --password=1 --raw --stop-never mysql_bin.000005 &> /dev/null &
[1] 9327
[root@db-17 /data/binlog_server]# ll
total 4
-rw-r----- 1 root root 194 Apr  2 16:02 mysql_bin.000005

5.检查manager配置文件,MHA会将离线节点从配置文件删除,此时需要检查并将修复好的节点添加进配置文件

#通过命令向配置文件添加节点
[root@db-17 ~]# masterha_conf_host --command=add --conf=/etc/mha/app1.cnf --block=server1 --hostname=10.0.0.17 --params='port=3306'

#通过命令从配置文件删除节点
[root@db-17 ~]# masterha_conf_host --command=delete --conf=/etc/mha/app1.cnf --block=server1

6.检查ssh互信和主从复制关系

[root@db-17 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf

[root@db-17 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf

7.删除故障转移产生的确认文件

[root@db-17 ~]# rm -rf /var/log/mha/app1/app1.failover.complete

8.启动MHA

[root@db-17 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &

10、MHA热切换主从

1.编辑master_ip_online_change脚本加入下面代码,根据情况修改my $vip,my $key,my $ssh_start_vip,my $ssh_stop_vip,my $ssh_Bcast_arp配置

[root@db-17 ~]# vim /usr/local/bin/master_ip_online_change
#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;
my (
  $command,
);

my $vip = '10.0.0.100/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens36:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens36:$key down";
my $ssh_Bcast_arp = "/usr/sbin/arping -q -A -c 3 -I ens36 $vip";

GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
  'new_master_ssh_user=s'    => \$new_master_ssh_user,
);

exit &main();

sub current_time_us {
  my ( $sec, $microsec ) = gettimeofday();
  my $curdate = localtime($sec);
  return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
  my $elapsed = tv_interval($_tstart);
  if ( $_running_interval > $elapsed ) {
    sleep( $_running_interval - $elapsed );
  }
}

sub get_threads_util {
  my $dbh                    = shift;
  my $my_connection_id       = shift;
  my $running_time_threshold = shift;
  my $type                   = shift;
  $running_time_threshold = 0 unless ($running_time_threshold);
  $type                   = 0 unless ($type);
  my @threads;

  my $sth = $dbh->prepare("SHOW PROCESSLIST");
  $sth->execute();

  while ( my $ref = $sth->fetchrow_hashref() ) {
    my $id         = $ref->{Id};
    my $user       = $ref->{User};
    my $host       = $ref->{Host};
    my $command    = $ref->{Command};
    my $state      = $ref->{State};
    my $query_time = $ref->{Time};
    my $info       = $ref->{Info};
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
    next if ( $my_connection_id == $id );
    next if ( defined($query_time) && $query_time < $running_time_threshold );
    next if ( defined($command)    && $command eq "Binlog Dump" );
    next if ( defined($user)       && $user eq "system user" );
    next
      if ( defined($command)
      && $command eq "Sleep"
      && defined($query_time)
      && $query_time >= 1 );

    if ( $type >= 1 ) {
      next if ( defined($command) && $command eq "Sleep" );
      next if ( defined($command) && $command eq "Connect" );
    }

    if ( $type >= 2 ) {
      next if ( defined($info) && $info =~ m/^select/i );
      next if ( defined($info) && $info =~ m/^show/i );
    }

    push @threads, $ref;
}

sub main {
  if ( $command eq "stop" ) {
    ## Gracefully killing connections on the current master
    # 1. Set read_only= 1 on the new master
    # 2. DROP USER so that no app user can establish new connections
    # 3. Set read_only= 1 on the current master
    # 4. Kill current queries
    # * Any database access failure will result in script die.
    my $exit_code = 1;
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error(die_on_error)_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );
      print current_time_us() . " Set read_only on the new master.. ";
      $new_master_handler->enable_read_only();
      if ( $new_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }
      # Connecting to the orig master, die if any database error happens
      my $orig_master_handler = new MHA::DBHelper();
      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
        $orig_master_user, $orig_master_password, 1 );

      #$orig_master_handler->disable_log_bin_local();
      #print current_time_us() . " Drpping app user on the orig master..\n";
      ## Waiting for N * 100 milliseconds so that current connections can exit
      my $time_until_read_only = 15;
      $_tstart = [gettimeofday];
      my @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_read_only > 0 && $#threads >= 0 ) {
        if ( $time_until_read_only % 5 == 0 ) {
          printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_read_only * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_read_only--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      print current_time_us() . " Set read_only=1 on the orig master.. ";
      $orig_master_handler->enable_read_only();
      if ( $orig_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }

      my $time_until_kill_threads = 5;
      @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
        if ( $time_until_kill_threads % 5 == 0 ) {
          printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        $time_until_kill_threads--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }



                print "Disabling the VIP on old master: $orig_master_host \n";
                &stop_vip();


      ## Terminating all threads
      print current_time_us() . " Killing all application threads..\n";
      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
      print current_time_us() . " done.\n";
      #$orig_master_handler->enable_log_bin_local();
      $orig_master_handler->disconnect();

      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
      $exit_code = 0;
    };
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {
    ## Activating master ip on the new master
    # 1. Create app user with write privileges
    # 2. Moving backup script if needed
    # 3. Register new master's ip to the catalog database

# If exit code is 0 or 10, MHA does not abort
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      #$new_master_handler->disable_log_bin_local();
      print current_time_us() . " Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      ## Creating an app user on the new master
      #print current_time_us() . " Creating app user on the new master..\n";
      #FIXME_xxx_create_app_user($new_master_handler);
      #$new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();
                $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

# A simple system call that enable the VIP on the new master 
sub start_vip() {
    `ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
    `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
  print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --orig_master_user=user --orig_master_password=password --orig_master_ssh_user=sshuser --new_master_host=host --new_master_ip=ip --new_master_port=port --new_master_user=user --new_master_password=password --new_master_ssh_user=sshuser \n";
  die;
}

2.修改MHA配置文件,调用master_ip_online_change脚本

[root@db-17 ~]# vim /etc/mha/app1.cnf
[server default]
.......
master_ip_online_change_script=/usr/local/bin/master_ip_online_change

3.停止MHA

[root@db-17 ~]# masterha_stop --conf=/etc/mha/app1.cnf
Stopped app1 successfully.

4.使用masterha_master_switch脚本对主从关系进行热切换,VIP也会随之转移;之前MHA重新上线后10.0.0.27为主库,此时将主库切换为10.0.0.17

[root@db-17 ~]# masterha_master_switch --conf=/etc/mha/app1.cnf --master_state=alive --new_master_host=10.0.0.17 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000

5.重构binlog server,由于主库发生切换,需要对binlog server进行重构

[root@db-17 ~]# pkill mysqlbinlog
[root@db-17 ~]# rm -rf /data/binlog_server/*

#在从库上查看同步的主库binlog文件
[root@db-17 ~]# mysql -h10.0.0.37 -umha -p1 -e 'show slave status \G' | grep 'Master_Log'
mysql: [Warning] Using a password on the command line interface can be insecure.
              Master_Log_File: mysql_bin.000013
          Read_Master_Log_Pos: 194
        Relay_Master_Log_File: mysql_bin.000013
          Exec_Master_Log_Pos: 194

#拉取主库的binlog日志
[root@db-17 ~]# cd /data/binlog_server/
[root@db-17 /data/binlog_server]# nohup mysqlbinlog -R --host=10.0.0.17 --user=mha --password=1 --raw --stop-never mysql_bin.000013 &> /dev/null &

[root@db-17 /data/binlog_server]# ll
total 4
-rw-r----- 1 root root 194 Apr  2 17:51 mysql_bin.000013

6.启动MHA

[root@db-17 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &