keepalived的配置解析安装与爬坑

⽬录：

⼀. 前情提要

⼆. 官⽅配置说明

三. 案例解析

四. 其他配置⽅式收集

五. 爬坑

附1: 安装keepalived

------------------------

⼀. 前情提要

以下试验以及说明是经过试验确定了的，准确！！另外，如果想知道每个参数的真正含义，建议看官⽹

解决的问题：

1，当⼀个节点(Linux设备)挂了，2个VIP都浮动到⼀个节点上

2，当这个节点(Linux设备)好了，由于业务有⼀定的延时，所以还不想浮动IP⽴马漂移回来

3，如果⼀个节点的业务(设备上运⾏的业务进程)完蛋了，需要⾃⼰主动交出VIP

4，等⾃⼰节点的业务(设备上运⾏的业务进程)⼜好了，那么不能⽴马夺权，⽽是有⼀个过渡再夺权

⼆. 官⽅配置说明

概述：

keepalived的具体实现原理这⾥就不做阐述，但是从其配置⽂件的⾓度⼤致将其⼯作模块分成两部分: 全局部分，和VRRP实例部分。全局部分，顾名思义就是整体相关的配置；

VRRP实例部分：

⾸先，keepalived通过创建⼀个个VRRP实例来实现浮动IP的管理，⼀个VRRP实例可以看做是⼀个连接实例(使⽤VRRP协议)；

⼀个实例对应⼀个VIP，⼀台设备可以配置多个VRRP实例即参与多个VIP的抢占；

然后，具有相同VRRP实例配置的⼀对设备，会因为实例匹配⽽成功配对；

最后，通过协商得到谁是master谁是slave，以及谁来占有VIP。

1. 全局配置部分

1. 预定义⼀个脚本以及脚本管理⽅式，之后⽤于VRRP实例引⽤

vrrp_script <SCRIPT_NAME> {

# 脚本的路径，或者直接就是脚本本⾝

script <STRING>|<QUOTED-STRING>

# 间隔多长时间执⾏⼀次脚本

interval <INTEGER>

#脚本执⾏如果没有正确返回，则这段时间后就算超时，然后算作是failed了

timeout <INTEGER>

# adjust priority by this weight, (default: 0).For description of reverse, see track_script.

# 'weight 0 reverse' will cause the vrrp instance to be down when the script is up, and vice versa.

weight <INTEGER:-253..253> [reverse]

# required number of successes for OK transition

rise <INTEGER>

# required number of successes for KO transition

fall <INTEGER>

# 以哪个⽤户⾝份去执⾏脚本的⼈是谁

user USERNAME [GROUPNAME]

# 假设初始时脚本是执⾏失败的

init_fail

}

2. VRRP实例部分

# Ignore VRRP interface faults (default unset)

dont_track_primary #表⽰的含义是，⼀旦接⼝有问题，则忽略之，否则keepalived的代码中对链路有做检查，发现链路down则进⼊fault状态，于是将放弃所有浮动ip

# optional, monitor these as well. go to FAULT state if any of these go down if unweighted.

# When a weight is specified in track_interface, instead of setting the vrrp instance to the FAULT state in case of failure, its priority will be

# increased by the weight when the interface is up (for positive weights), or decreased by the weight's absolute value when the interface is down

# (for negative weights), unless reverse is specified, in which case the direction of adjustment of the priority is reversed.

# The weight must be comprised between -253 and +253 inclusive.0is the default behaviour which means that a failure implies a

# FAULT state. The common practice is to use positive weights to count a limited number of good services so that the server with the highest count

# becomes master. Negative weights are better to count unexpected failures among a high number of interfaces, as it will not saturate even with high

# number of interfaces. Use reverse to increase priority if an interfaces is down

track_interface {

eth0

eth1

eth2 weight <-253..253> [reverse]

...

}

# 1 to 255 used to differentiate multiple instances of vrrpd running on the same NIC (and hence same socket).

virtual_router_id 51 #⽤来区分多VRRP实例?, 是指为⼀台设备配置多个实例，还是⼀个局域⽹中的多个实例? 貌似是后者，待确认

preempt_delay 300 #表⽰的含义是，我当前是backup⾝份，但是我发现对⽅的master不如我，即优先级⽐我低，那么我不会⽴马去抢占，⽽是等五分钟后再去抢占

关于weight,rise,fall的综合⽤法

A positive weight means that <rise> successes will add <weight> to the priority of all VRRP instances which monitor it.

On the opposite, a negative weight will be subtracted from the initial priority in case of <fall> failures

解析：rise和正数的weight结合使⽤，如果rise次脚本执⾏都是成功的(返回0)，则增加weight数量的优先级

fall和负数的weight结合使⽤，如果是fall次脚本执⾏都是失败的(返回1)，则减少|weight|数量的优先级

其余的组合⽅式不起任何作⽤，即不会影响优先级的增减

三. 案例解析

节点1:

简介：我是backup⾝份，但因为我的优先级⾼，所以是实际的掌权者，当我发现我节点上的业务已经挂了那么我就降低我的级别，让真正的master去掌权直到我的级别⼜上来了，我也不会⽴马夺权，⽽是等待⼀段时间后再夺权

vrrp_script chkBackup {

##检查进程是否存在，如果存在检查联通性，如果联通了。则返回0，如果不存在或者不联通则返回1

script "ps -fe|grep tranproxy |grep -v gre; [[ $? -eq 0 ]] && (/usr/local/bin/x.out; [[ $? -eq 0 ]] && exit 0 || exit 1) || exit 1"

interval 30

fall 2 ##2次KO再降级，两次返回1(即两次进程不存在)则优先级下降20

weight -20

user root

}

vrrp_instance VI_1 {

state BACKUP

#表⽰发vrrp包的接⼝，可以选择⼀对专⽤接⼝做⼼跳线，这⾥千万注意，⽹上那些直接抄别⼈的博客说这个就是绑定vip的接⼝，真不要脸，简直误⼈⼦弟

interface eno2

#虽然指定了从eno2上发的包，但是如果想要给他搞⼀个假的ip就⽤他

unicast_src_ip 182.168.1.30

unicast_peer {

182.168.1.245

}

#这个也很重要，通常⼼跳线都是主被之间直连，⼀旦主机掉电(注意，⼀定是没有电的情况)，则备机上的⼼跳接⼝链路成DOWN状态，于是keepalived进⼊FAULT状态，进⽽放弃了所有vip dont_track_primary

virtual_ipaddress {

##vip真正绑定再哪个接⼝上是在这⾥配置的，当然如果你不指定，可不就绑定到interface那⾥配置的那个接⼝了

192.168.1.33/24 brd 192.168.1.255 dev eno1 label eno1:1

}

virtual_router_id 1

priority 110 ##⾼优先级，实际我是主宰着

track_script2023年全国高速路免费时间表

{

chkBackup #如果我发现⾃⼰挂了，则⽴马降低⾃⼰的优先级，master会⽴刻夺权

}

preempt_delay 300 ##发现优先级⽐我低的master，不会⽴马夺权，⽽是5分钟后再夺权

}

节点2：

简介：我是Master⾝份，但因为我的优先级低，所以对端才是实际的掌权者，当对端节点上的业务已经挂了那么会降低优先级，于是我开始去掌权

并且我是会⽴马掌权的(不确定，记得去环境上看⼀下)

节点2上的全局配置，节点1上类似，先以这个配置为例进⾏解析

global_defs {

notification_email {

wuxiaoyun@huanxingnet

}

notification_email_from Alexandre.Cassen@firewall.loc

smtp_server 127.0.0.1

smtp_connect_timeout 30

router_id k-two2-fst-hx ##⼀个局域⽹上id需要唯⼀，⼀般使⽤hostname。wxy：公司的测试环境中可能有多套测试环境，hostname都⼀样，所以还是不要直接⽤hostname

script_user root

enable_script_security

}

节点2上的实例配置，以其中⼀个实例为例进⾏解析

vrrp_instance VI_1 {

state MASTER

interface eno2

unicast_src_ip 182.168.1.30

unicast_peer {

182.168.1.245

}

virtual_router_id 1 ##虚拟路由id，⼀对vrrp实例使⽤⼀个router id，具体什么含义没再多去研究

priority 100

advert_int 1

authentication {

auth_type PASS

auth_pass 11111

}

virtual_ipaddress {

192.168.1.33/24 brd 192.168.1.255 dev eno1 label eno1:0

}

附：vrrp报⽂交互，可以看到使⽤的是182⽹段(eno2)的地址，交换的是192⽹段(eno1)的VIp

四. 其他配置⽅式收集

1. 不指定将vip绑定到哪个接⼝上

vrrp_instance VI_1 {

state MASTER

interface eth0

virtual_ipaddress {

192.168.48.232

}

此时，使⽤ifconfig是看不到这个ip地址，需要使⽤ip a

[root@k8s-master1-192-168-48-231 keepalived]# ifconfig eth0

尹毓恪

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet 192.168.48.231 netmask 255.255.255.0 broadcast 192.168.48.255

inet6 fe80::1e63:e31:eb50:4005 prefixlen 64 scopeid 0x20<link>

inet6 fe80::2a6e:d4ff:fe88:c80e prefixlen 64 scopeid 0x20<link>

ether 28:6e:d4:88:c8:0e txqueuelen 1000 (Ethernet)

...

[root@k8s-master1-192-168-48-231 keepalived]# ip a |grep eth0 -A5

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

link/ether 28:6e:d4:88:c8:0e brd ff:ff:ff:ff:ff:ff

inet 192.168.48.231/24 brd 192.168.48.255 scope global eth0

valid_lft forever preferred_lft forever

inet 192.168.48.232/32 scope global eth0

valid_lft forever preferred_lft forever

inet6 fe80::2a6e:d4ff:fe88:c80e/64 scope link

...

-------------------------------------------------------华丽丽的分隔线，接下来是安装以及安装过程中遇到的坑，简单记录，漏洞百出......----------------------------------------------------------------------------

五. 爬坑

坑1：写脚本可能遇到的坑：

vrrp_script chkBackup {

script "./keepalived_script.sh 172.18.1.10"

interval 10

fall 2 ##2次KO再降级

weight -20

user root

}

报错1：Disabling track script chkBackup since not found/accessible

原因：不能使⽤相对路径，应该使⽤绝对路径，改为：

script "/etc/keepalived/keepalived_script.sh 172.18.1.10"

报错2：Error exec-ing command '/etc/keepalived/keepalived_script.sh', error 8: Exec format error

直接执⾏脚本是没有问题的

原因：直接执⾏是⽤#bash /etc/keepalived/keepalived_script.sh 172.18.1.10

所以脚本中必须加上：#!bin/bash

报错3：本地没有分到vip，查看⽇志信息报错为

Keepalived_vrrp[1884]: Assigned address 182.168.1.245 for interface enp5s0

Aug 20 11:37:31 one1-fst-hx Keepalived_vrrp[1884]: Assigned address fe80::fafd:41aa:f8d4:c6a4 for interface enp5s0

Aug 20 11:37:31 one1-fst-hx Keepalived_vrrp[1884]: (VI_1) entering FAULT state

Aug 20 11:37:31 one1-fst-hx Keepalived_vrrp[1884]: (VI_2) entering FAULT state含蓄的骂人

解析：我就奇怪了，要么是MASTER要么是SLAVE state，为什么是fault

原因1：⽹络问题，不到被绑定的ip，如下

详解：

virtual_ipaddress {

192.168.1.51/24 brd 192.168.1.255 dev eno1 label eno1:0 ---要绑定eno1

192.168.2.51/24 brd 192.168.1.255 dev ens1f0 label ens1f0:0 ---要绑定ens1f0

火灾自救方法}

[root@two2-asm-hx keepalived]# ip link

2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 -----我是被绑定接⼝1

link/ether ac:1f:6b:d6:0d:ac brd ff:ff:ff:ff:ff:ff

不带脏字的骂人话

3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 ---我是⼼跳接⼝

link/ether ac:1f:6b:d6:0d:ad brd ff:ff:ff:ff:ff:ff

4: ens1f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000 ---我是被绑定接⼝2 link/ether 00:1b:21:bf:5c:3c brd ff:ff:ff:ff:ff:ff

9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: Opening file '/etc/f'.

9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: Assigned address 182.168.1.184 for interface eno2

9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: (VI_1) entering FAULT state

9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: (VI_2) entering FAULT state

9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: Registering gratuitous ARP shared channel

9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: (VI_1) removing VIPs.

9⽉ 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: (VI_2) removing VIPs.

⼩结：由于被绑定接⼝没有全部up，因此就认为我的设备有问题，也因此放权，不占⽤vip

解决，当然要⾃⼰保证想要的接⼝都是up的，不知道通过配置track_interface是否可⾏，简单试验是不⾏的，但是没有具体的去试验

原因2：⼼跳接⼝down

9⽉ 24 20:07:37 two2-asm-hx Keepalived_vrrp[14273]: Netlink reports eno2 down -----因为⼼跳接⼝down掉了

9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: Netlink reports ens1f0 down

9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_1) Entering FAULT STATE

9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_1) sent 0 priority

9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_1) removing VIPs.

9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_2) Entering FAULT STATE

9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_2) sent 0 priority

9⽉ 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_2) removing VIPs

详解1：⼼跳接⼝为什么down掉，有⼀种场景就是因为⼼跳链路是直连，因此当另⼀端掉电，则本端的链路也会呈现DOWN状态。

详解2：

9⽉ 24 22:10:42 two2-asm-hx Keepalived_vrrp[12568]: Netlink reports eno2 down ----当发现链路断开后

9⽉ 24 22:10:46 two2-asm-hx Keepalived_vrrp[12568]: Deassigned address 182.168.1.184 from inte

rface eno2 ---我会将⼼跳接⼝上的ip地址给去除9⽉ 24 22:11:04 two2-asm-hx Keepalived_vrrp[12568]: Netlink reports eno2 up ---当发现链路ok

9⽉ 24 22:11:04 two2-asm-hx Keepalived_vrrp[12568]: Assigned address 182.168.1.184 for interface eno2 --再添加上

⼩结：这种就是说arp发不出去了，可以通过添加配置改变：dont_track_primary

此时，就如下log显⽰，尽管监测到接⼝down，但是并不改变浮动ip

wxy：实际上，这个所谓去除ip是针对keepalived，⼀旦链路down，即使没有keepalived，内核照样会将ip去掉？

坑2：启动失败

[root@89 sbin]# ./opensipsctl start

INFO: Starting OpenSIPS :

qq安全中心登陆ERROR: PID file /var/run/opensips.pid does not exist -- OpenSIPS start failed

原因1：经过各种试验得知，原因是debug模式就是如此，将debug关闭，ok

原因2：

tail -f /var/log/messages

Sep 24 21:06:16 mail ./opensips[66657]: ERROR:db_mysql:db_mysql_connect: driver error(1045): Access denied for user 'opensips'@'localhost' (using password: YES)

Sep 24 21:06:16 mail ./opensips[66657]: ERROR:db_mysql:db_mysql_new_connection: initial connect failed

Sep 24 21:06:16 mail ./opensips[66657]: ERROR:core:db_do_init: could not add connection to the pool

Sep 24 21:06:16 mail ./opensips[66657]: ERROR:uri:mod_init: Could not connect to database

Sep 24 21:06:16 mail ./opensips[66657]: ERROR:core:init_mod: failed to initialize module uri

Sep 24 21:06:16 mail ./opensips[66657]: ERROR:core:main: error while initializing modules

Sep 24 21:06:16 mail ./opensips[66657]: INFO:core:cleanup: cleanup

Sep 24 21:06:16 mail ./opensips[66657]: NOTICE:core:main:

Sep 24 21:06:16 mail opensips: INFO:core:daemonize: pre-daemon process exiting with -1

原来是数据库没有创建，或者是创建错误了，正是因为参考⽂档中写错了.......

坑3：客户端连接超时

定位过程：起初只是抓包udp协议，发现有来⾃客户端的注册请求，没有应答，所以⼀位是opensip安装有恶，于是还重装等各种操作

之后突然想到，应该不过滤抓包才⾏

解决：完整抓包发现，有应答，为icmp包：主机不可达， host administratively prohibited

知道多半是iptables的问题，尽管关闭的firewall其实还是有效的，于是增加

# iptables -t filter -IINPUT -p udp --dport 5060 -j ACCEPT

问题解决

或者：

systemctl stop iptables.service

systemctl disable iptables.service

/usr/local/opensips/sbin/opensipsctl start

坑4：其他任何失败的问题⾸先检查防⽕墙是否关闭

如果是之前没有关闭防⽕墙，然后创建了应答绑定，此时是发送不出去的

然后关闭防⽕墙，此时还是不能发送出去

所以，需要再配置udp之前，关闭防⽕墙

坑5: ipv6

virtual_ipaddress {

192.168.1.160/24 brd 192.168.1.255 dev eno1 label eno1:1

1::161/64 dev eno1 label eno1:3

}

Nov 210:35:54 one1-asm-hx Keepalived_vrrp[17901]: (Line 54) Cannot specify label for IPv6 addresses (1::162/64) - ignoring label

Nov 210:35:54 one1-asm-hx Keepalived_vrrp[17901]: (Line 54) (VI_1): address family must match VRRP instance [1::162/64] - ignoring

Nov 210:35:54 one1-asm-hx Keepalived_vrrp[17901]: (Line 79) Cannot specify label for IPv6 addresses (1::161/64) - ignoring label

Nov 210:35:54 one1-asm-hx Keepalived_vrrp[17901]: (Line 79) (VI_2): address family must match VRRP instance [1::161/64] - ignoring

virtual_ipaddress {

keepalived的配置解析安装与爬坑

发布评论取消回复

最近发表

热门文章

标签列表