keepalived的配置解析安装与爬坑
⽬录:
⼀. 前情提要
⼆. 官⽅配置说明
三. 案例解析
四. 其他配置⽅式收集
五. 爬坑
附1: 安装keepalived
------------------------
⼀. 前情提要
以下试验以及说明是经过试验确定了的,准确!!另外,如果想知道每个参数的真正含义,建议看官⽹
解决的问题:
1,当⼀个节点(Linux设备)挂了,2个VIP都浮动到⼀个节点上
2,当这个节点(Linux设备)好了,由于业务有⼀定的延时,所以还不想浮动IP⽴马漂移回来
3,如果⼀个节点的业务(设备上运⾏的业务进程)完蛋了,需要⾃⼰主动交出VIP
4,等⾃⼰节点的业务(设备上运⾏的业务进程)⼜好了,那么不能⽴马夺权,⽽是有⼀个过渡再夺权
⼆. 官⽅配置说明
概述:
VRRP实例部分:
⾸先,keepalived通过创建⼀个个VRRP实例来实现浮动IP的管理,⼀个VRRP实例可以看做是⼀个连接实例(使⽤VRRP协议);
⼀个实例对应⼀个VIP,⼀台设备可以配置多个VRRP实例即参与多个VIP的抢占;
然后,具有相同VRRP实例配置的⼀对设备,会因为实例匹配⽽成功配对;
最后,通过协商得到谁是master谁是slave,以及谁来占有VIP。
1. 全局配置部分
1. 预定义⼀个脚本以及脚本管理⽅式,之后⽤于VRRP实例引⽤
vrrp_script <SCRIPT_NAME> {
# 脚本的路径,或者直接就是脚本本⾝
script <STRING>|<QUOTED-STRING>
# 间隔多长时间执⾏⼀次脚本
interval <INTEGER>
#脚本执⾏如果没有正确返回,则这段时间后就算超时,然后算作是failed了
timeout <INTEGER>
# adjust priority by this weight, (default: 0).For description of reverse, see track_script.
# 'weight 0 reverse' will cause the vrrp instance to be down when the script is up, and vice versa.
weight <INTEGER:-253..253> [reverse]
# required number of successes for OK transition
rise <INTEGER>
# required number of successes for KO transition
fall <INTEGER>
# 以哪个⽤户⾝份去执⾏脚本的⼈是谁
user USERNAME [GROUPNAME]
# 假设初始时脚本是执⾏失败的
init_fail
}
2. VRRP实例部分
# Ignore VRRP interface faults (default unset)
dont_track_primary #表⽰的含义是,⼀旦接⼝有问题,则忽略之,否则keepalived的代码中对链路有做检查,发现链路down则进⼊fault状态,于是将放弃所有浮动ip
# optional, monitor these as well. go to FAULT state if any of these go down if unweighted.
# When a weight is specified in track_interface, instead of setting the vrrp instance to the FAULT state in case of failure, its priority will be
# increased by the weight when the interface is up (for positive weights), or decreased by the weight's absolute value when the interface is down
# (for negative weights), unless reverse is specified, in which case the direction of adjustment of the priority is reversed.
# The weight must be comprised between -253 and +253 inclusive.0is the default behaviour which means that a failure implies a
# FAULT state. The common practice is to use positive weights to count a limited number of good services so that the server with the highest count
# becomes master. Negative weights are better to count unexpected failures among a high number of interfaces, as it will not saturate even with high
# number of interfaces. Use reverse to increase priority if an interfaces is down
track_interface {
eth0
eth1
eth2 weight <-253..253> [reverse]
...
}
# 1 to 255 used to differentiate multiple instances of vrrpd running on the same NIC (and hence same socket).
virtual_router_id 51 #⽤来区分多VRRP实例?, 是指为⼀台设备配置多个实例,还是⼀个局域⽹中的多个实例? 貌似是后者,待确认
preempt_delay 300 #表⽰的含义是,我当前是backup⾝份,但是我发现对⽅的master不如我,即优先级⽐我低,那么我不会⽴马去抢占,⽽是等五分钟后再去抢占
关于weight,rise,fall的综合⽤法
A positive weight means that <rise> successes will add <weight> to the priority of all VRRP instances which monitor it.
On the opposite, a negative weight will be subtracted from the initial priority in case of <fall> failures
解析:rise和正数的weight结合使⽤,如果rise次脚本执⾏都是成功的(返回0),则增加weight数量的优先级
fall和负数的weight结合使⽤,如果是fall次脚本执⾏都是失败的(返回1),则减少|weight|数量的优先级
其余的组合⽅式不起任何作⽤,即不会影响优先级的增减
三. 案例解析
节点1:
简介:我是backup⾝份,但因为我的优先级⾼,所以是实际的掌权者,当我发现我节点上的业务已经挂了那么我就降低我的级别,让真正的master去掌权直到我的级别⼜上来了,我也不会⽴马夺权,⽽是等待⼀段时间后再夺权
vrrp_script chkBackup {
##检查进程是否存在,如果存在检查联通性,如果联通了。则返回0,如果不存在或者不联通则返回1
script "ps -fe|grep tranproxy |grep -v gre; [[ $? -eq 0 ]] && (/usr/local/bin/x.out; [[ $? -eq 0 ]] && exit 0 || exit 1) || exit 1"
interval 30
fall 2 ##2次KO再降级,两次返回1(即两次进程不存在)则优先级下降20
weight -20
user root
}
vrrp_instance VI_1 {
state BACKUP
#表⽰发vrrp包的接⼝,可以选择⼀对专⽤接⼝做⼼跳线,这⾥千万注意,⽹上那些直接抄别⼈的博客说这个就是绑定vip的接⼝,真不要脸,简直误⼈⼦弟
interface eno2
#虽然指定了从eno2上发的包,但是如果想要给他搞⼀个假的ip就⽤他
unicast_src_ip 182.168.1.30
unicast_peer {
182.168.1.245
}
#这个也很重要,通常⼼跳线都是主被之间直连,⼀旦主机掉电(注意,⼀定是没有电的情况),则备机上的⼼跳接⼝链路成DOWN状态,于是keepalived进⼊FAULT状态,进⽽放弃了所有vip dont_track_primary
virtual_ipaddress {
##vip真正绑定再哪个接⼝上是在这⾥配置的,当然如果你不指定,可不就绑定到interface那⾥配置的那个接⼝了
192.168.1.33/24 brd 192.168.1.255 dev eno1 label eno1:1
}
virtual_router_id 1
priority 110 ##⾼优先级,实际我是主宰着
track_script2023年全国高速路免费时间表
{
chkBackup #如果我发现⾃⼰挂了,则⽴马降低⾃⼰的优先级,master会⽴刻夺权
}
preempt_delay 300 ##发现优先级⽐我低的master,不会⽴马夺权,⽽是5分钟后再夺权
}
节点2:
简介:我是Master⾝份,但因为我的优先级低,所以对端才是实际的掌权者,当对端节点上的业务已经挂了那么会降低优先级,于是我开始去掌权
并且我是会⽴马掌权的(不确定,记得去环境上看⼀下)
节点2上的全局配置,节点1上类似,先以这个配置为例进⾏解析
global_defs {
notification_email {
wuxiaoyun@huanxingnet
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id k-two2-fst-hx ##⼀个局域⽹上id需要唯⼀,⼀般使⽤hostname。wxy:公司的测试环境中可能有多套测试环境,hostname都⼀样,所以还是不要直接⽤hostname
script_user root
enable_script_security
}
节点2上的实例配置,以其中⼀个实例为例进⾏解析
vrrp_instance VI_1 {
state MASTER
interface eno2
unicast_src_ip 182.168.1.30
unicast_peer {
182.168.1.245
}
virtual_router_id 1 ##虚拟路由id,⼀对vrrp实例使⽤⼀个router id,具体什么含义没再多去研究
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 11111
}
virtual_ipaddress {
192.168.1.33/24 brd 192.168.1.255 dev eno1 label eno1:0
}
}
附:vrrp报⽂交互,可以看到使⽤的是182⽹段(eno2)的地址,交换的是192⽹段(eno1)的VIp
四. 其他配置⽅式收集
1. 不指定将vip绑定到哪个接⼝上
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_ipaddress {
192.168.48.232
}
}
此时,使⽤ifconfig是看不到这个ip地址,需要使⽤ip a
[root@k8s-master1-192-168-48-231 keepalived]# ifconfig eth0
尹毓恪
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.48.231 netmask 255.255.255.0 broadcast 192.168.48.255
inet6 fe80::1e63:e31:eb50:4005 prefixlen 64 scopeid 0x20<link>
inet6 fe80::2a6e:d4ff:fe88:c80e prefixlen 64 scopeid 0x20<link>
ether 28:6e:d4:88:c8:0e txqueuelen 1000 (Ethernet)
...
[root@k8s-master1-192-168-48-231 keepalived]# ip a |grep eth0 -A5
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 28:6e:d4:88:c8:0e brd ff:ff:ff:ff:ff:ff
inet 192.168.48.231/24 brd 192.168.48.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.48.232/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::2a6e:d4ff:fe88:c80e/64 scope link
...
-------------------------------------------------------华丽丽的分隔线,接下来是安装以及安装过程中遇到的坑,简单记录,漏洞百出......----------------------------------------------------------------------------
五. 爬坑
坑1:写脚本可能遇到的坑:
vrrp_script chkBackup {
script "./keepalived_script.sh 172.18.1.10"
interval 10
fall 2 ##2次KO再降级
weight -20
user root
}
报错1:Disabling track script chkBackup since not found/accessible
原因:不能使⽤相对路径,应该使⽤绝对路径,改为:
script "/etc/keepalived/keepalived_script.sh 172.18.1.10"
报错2:Error exec-ing command '/etc/keepalived/keepalived_script.sh', error 8: Exec format error
直接执⾏脚本是没有问题的
原因:直接执⾏是⽤#bash /etc/keepalived/keepalived_script.sh 172.18.1.10
所以脚本中必须加上:#!bin/bash
报错3:本地没有分到vip,查看⽇志信息报错为
Keepalived_vrrp[1884]: Assigned address 182.168.1.245 for interface enp5s0
Aug 20 11:37:31 one1-fst-hx Keepalived_vrrp[1884]: Assigned address fe80::fafd:41aa:f8d4:c6a4 for interface enp5s0
Aug 20 11:37:31 one1-fst-hx Keepalived_vrrp[1884]: (VI_1) entering FAULT state
Aug 20 11:37:31 one1-fst-hx Keepalived_vrrp[1884]: (VI_2) entering FAULT state含蓄的骂人
解析:我就奇怪了,要么是MASTER要么是SLAVE state,为什么是fault
原因1:⽹络问题,不到被绑定的ip,如下
详解:
virtual_ipaddress {
192.168.1.51/24 brd 192.168.1.255 dev eno1 label eno1:0 ---要绑定eno1
192.168.2.51/24 brd 192.168.1.255 dev ens1f0 label ens1f0:0 ---要绑定ens1f0
火灾自救方法}
[root@two2-asm-hx keepalived]# ip link
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 -----我是被绑定接⼝1
link/ether ac:1f:6b:d6:0d:ac brd ff:ff:ff:ff:ff:ff
不带脏字的骂人话3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 ---我是⼼跳接⼝
link/ether ac:1f:6b:d6:0d:ad brd ff:ff:ff:ff:ff:ff
4: ens1f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000 ---我是被绑定接⼝2 link/ether 00:1b:21:bf:5c:3c brd ff:ff:ff:ff:ff:ff
9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: Opening file '/etc/f'.
9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: Assigned address 182.168.1.184 for interface eno2
9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: (VI_1) entering FAULT state
9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: (VI_2) entering FAULT state
9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: Registering gratuitous ARP shared channel
9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: (VI_1) removing VIPs.
9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: (VI_2) removing VIPs.
⼩结:由于被绑定接⼝没有全部up,因此就认为我的设备有问题,也因此放权,不占⽤vip
解决,当然要⾃⼰保证想要的接⼝都是up的,不知道通过配置track_interface是否可⾏,简单试验是不⾏的,但是没有具体的去试验
原因2:⼼跳接⼝down
9⽉ 24 20:07:37 two2-asm-hx Keepalived_vrrp[14273]: Netlink reports eno2 down -----因为⼼跳接⼝down掉了
9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: Netlink reports ens1f0 down
9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_1) Entering FAULT STATE
9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_1) sent 0 priority
9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_1) removing VIPs.
9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_2) Entering FAULT STATE
9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_2) sent 0 priority
9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_2) removing VIPs
详解1:⼼跳接⼝为什么down掉,有⼀种场景就是因为⼼跳链路是直连,因此当另⼀端掉电,则本端的链路也会呈现DOWN状态。
详解2:
9⽉ 24 22:10:42 two2-asm-hx Keepalived_vrrp[12568]: Netlink reports eno2 down ----当发现链路断开后
9⽉ 24 22:10:46 two2-asm-hx Keepalived_vrrp[12568]: Deassigned address 182.168.1.184 from inte
rface eno2 ---我会将⼼跳接⼝上的ip地址给去除9⽉ 24 22:11:04 two2-asm-hx Keepalived_vrrp[12568]: Netlink reports eno2 up ---当发现链路ok
9⽉ 24 22:11:04 two2-asm-hx Keepalived_vrrp[12568]: Assigned address 182.168.1.184 for interface eno2 --再添加上
⼩结:这种就是说arp发不出去了,可以通过添加配置改变:dont_track_primary
此时,就如下log显⽰,尽管监测到接⼝down,但是并不改变浮动ip
wxy:实际上,这个所谓去除ip是针对keepalived,⼀旦链路down,即使没有keepalived,内核照样会将ip去掉?
坑2:启动失败
[root@89 sbin]# ./opensipsctl start
INFO: Starting OpenSIPS :
qq安全中心登陆ERROR: PID file /var/run/opensips.pid does not exist -- OpenSIPS start failed
原因1:经过各种试验得知,原因是debug模式就是如此,将debug关闭,ok
原因2:
tail -f /var/log/messages
Sep 24 21:06:16 mail ./opensips[66657]: ERROR:db_mysql:db_mysql_connect: driver error(1045): Access denied for user 'opensips'@'localhost' (using password: YES)
Sep 24 21:06:16 mail ./opensips[66657]: ERROR:db_mysql:db_mysql_new_connection: initial connect failed
Sep 24 21:06:16 mail ./opensips[66657]: ERROR:core:db_do_init: could not add connection to the pool
Sep 24 21:06:16 mail ./opensips[66657]: ERROR:uri:mod_init: Could not connect to database
Sep 24 21:06:16 mail ./opensips[66657]: ERROR:core:init_mod: failed to initialize module uri
Sep 24 21:06:16 mail ./opensips[66657]: ERROR:core:main: error while initializing modules
Sep 24 21:06:16 mail ./opensips[66657]: INFO:core:cleanup: cleanup
Sep 24 21:06:16 mail ./opensips[66657]: NOTICE:core:main:
Sep 24 21:06:16 mail opensips: INFO:core:daemonize: pre-daemon process exiting with -1
原来是数据库没有创建,或者是创建错误了,正是因为参考⽂档中写错了.......
坑3:客户端连接超时
定位过程:起初只是抓包udp协议,发现有来⾃客户端的注册请求,没有应答,所以⼀位是opensip安装有恶,于是还重装等各种操作
之后突然想到,应该不过滤抓包才⾏
解决:完整抓包发现,有应答,为icmp包:主机不可达, host administratively prohibited
知道多半是iptables的问题,尽管关闭的firewall其实还是有效的,于是增加
# iptables -t filter -IINPUT -p udp --dport 5060 -j ACCEPT
问题解决
或者:
systemctl stop iptables.service
systemctl disable iptables.service
/usr/local/opensips/sbin/opensipsctl start
坑4:其他任何失败的问题⾸先检查防⽕墙是否关闭
如果是之前没有关闭防⽕墙,然后创建了应答绑定,此时是发送不出去的
然后关闭防⽕墙,此时还是不能发送出去
所以,需要再配置udp之前,关闭防⽕墙
坑5: ipv6
virtual_ipaddress {
192.168.1.160/24 brd 192.168.1.255 dev eno1 label eno1:1
1::161/64 dev eno1 label eno1:3
}
Nov 210:35:54 one1-asm-hx Keepalived_vrrp[17901]: (Line 54) Cannot specify label for IPv6 addresses (1::162/64) - ignoring label
Nov 210:35:54 one1-asm-hx Keepalived_vrrp[17901]: (Line 54) (VI_1): address family must match VRRP instance [1::162/64] - ignoring
Nov 210:35:54 one1-asm-hx Keepalived_vrrp[17901]: (Line 79) Cannot specify label for IPv6 addresses (1::161/64) - ignoring label
Nov 210:35:54 one1-asm-hx Keepalived_vrrp[17901]: (Line 79) (VI_2): address family must match VRRP instance [1::161/64] - ignoring
virtual_ipaddress {
发布评论