Redis Cluster是Redis官方在3.0版本后提供的官方集群支持,在使用时比较简单,下面是简单的步骤。

场景:

  • 开始使用6个实例,按照3主3从的方式进行部署
  • 数据操作,查看数据分布
  • 然后添加一个新的主
  • 数据操作,分配slot
  • 再添加一个新的从
  • 实验一个主挂掉的场景

基本集群Redis实例设定

由于实验时,只使用了一个物理机器,所以这里分别用不同的端口(7000~7005)来建立redis实例。

每个端口建立一个独立的文件夹,用来存贮配置文件,RDB文件和node.conf文件。

mkdir 7000 7001 7002 7003 7004 7005

每个文件夹里放置对应的redis.config (如果没有,可以从官方下载,并根据需要修改)。

建立cluster时,注意下面几个配置:

daemonize yes # 程序后台运行
port 7000 # 注意每个实例的端口设定
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000

最终的文件夹为:

➜  $  tree .
.
├── 7000
│   └── redis.conf
├── 7001
│   └── redis.conf
├── 7002
│   └── redis.conf
├── 7003
│   └── redis.conf
├── 7004
│   └── redis.conf
├── 7005
└── redis.conf
6 directories, 6 files

运行redis时,可以建立一个bash文件来一次运行所有实例:

➜  $  cat runcluster.sh
for (( i = 0; i < 8; ++i )); do
cd 700$i && redis-server redis.conf && cd ..
done
➜ $ ./runcluster.sh

这时,可以查看redis的运行情况:

➜  $  ps -ef | grep redis
501 85482 1 0 3:10PM ?? 0:00.02 redis-server *:7000 [cluster]
501 85484 1 0 3:10PM ?? 0:00.02 redis-server *:7001 [cluster]
501 85486 1 0 3:10PM ?? 0:00.02 redis-server *:7002 [cluster]
501 85488 1 0 3:10PM ?? 0:00.02 redis-server *:7003 [cluster]
501 85490 1 0 3:10PM ?? 0:00.02 redis-server *:7004 [cluster]
501 85492 1 0 3:10PM ?? 0:00.02 redis-server *:7005 [cluster]

创建集群

Redis本身没有直接提供集群的创建,而是通过官方提供的一个小工具redis-trib.rb来操作集群信息,这个是一个基于ruby写的小工具,在redis源文件里,或者可以单独官方下载。

redis-trib里提供了cluster的常用操作方法,如下:

➜  $  ../redis-trib.rb help
Usage: redis-trib <command> <options> <arguments ...>

create host1:port1 ... hostN:portN
--replicas <arg>
check host:port
info host:port
fix host:port
--timeout <arg>
reshard host:port
--from <arg>
--to <arg>
--slots <arg>
--yes
--timeout <arg>
--pipeline <arg>
rebalance host:port
--weight <arg>
--auto-weights
--threshold <arg>
--use-empty-masters
--timeout <arg>
--simulate
--pipeline <arg>
add-node new_host:new_port existing_host:existing_port
--slave
--master-id <arg>
del-node host:port node_id
set-timeout host:port milliseconds
call host:port command arg arg .. arg
import host:port
--from <arg>
--copy
--replace
help (show this help)

For check, fix, reshard, del-node, set-timeout you can specify the host and port of any working node in the cluster.

创建集群时,可以通过这样把所有正在执行状态的redis实例关联起来(这里要注意,在创建cluster时指定的ip要时将来client要使用的ip,因为将来因client操作需要redis cluster做rediect动作时,会返回这个ip,如果为如127.0.0.1这种本机ip,可能导致无法访问):

➜  $  ../redis-trib.rb create --replicas 1 192.168.1.80:7000 192.168.1.80:7001 192.168.1.80:7002 192.168.1.80:7003 192.168.1.80:7004 192.168.1.80:7005
>>> Creating cluster
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
192.168.1.80:7000
192.168.1.80:7001
192.168.1.80:7002
Adding replica 192.168.1.80:7003 to 192.168.1.80:7000
Adding replica 192.168.1.80:7004 to 192.168.1.80:7001
Adding replica 192.168.1.80:7005 to 192.168.1.80:7002
M: f822e285d00305a3158408185abf838639b3db3e 192.168.1.80:7000
slots:0-5460 (5461 slots) master
M: 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 192.168.1.80:7001
slots:5461-10922 (5462 slots) master
M: bc86c1677869e05b6119139a838b06181f8378a2 192.168.1.80:7002
slots:10923-16383 (5461 slots) master
S: 42a80e0c660a65d6fd3132baf9787a1ced899cee 192.168.1.80:7003
replicates f822e285d00305a3158408185abf838639b3db3e
S: 87cd77a57f5074a01117243516b0dbfb4fd725fd 192.168.1.80:7004
replicates 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861
S: 030674361cdc4dad424ae70115b2002908b40fe4 192.168.1.80:7005
replicates bc86c1677869e05b6119139a838b06181f8378a2
Can I set the above configuration? (type 'yes' to accept):

注意,执行过程中,会提示对应的slots分配情况,如果没有问题,输入yes后继续执行:

>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join......
>>> Performing Cluster Check (using node 192.168.1.80:7000)
M: f822e285d00305a3158408185abf838639b3db3e 192.168.1.80:7000
slots:0-5460 (5461 slots) master
M: 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 192.168.1.80:7001
slots:5461-10922 (5462 slots) master
M: bc86c1677869e05b6119139a838b06181f8378a2 192.168.1.80:7002
slots:10923-16383 (5461 slots) master
M: 42a80e0c660a65d6fd3132baf9787a1ced899cee 192.168.1.80:7003
slots: (0 slots) master
replicates f822e285d00305a3158408185abf838639b3db3e
M: 87cd77a57f5074a01117243516b0dbfb4fd725fd 192.168.1.80:7004
slots: (0 slots) master
replicates 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861
M: 030674361cdc4dad424ae70115b2002908b40fe4 192.168.1.80:7005
slots: (0 slots) master
replicates bc86c1677869e05b6119139a838b06181f8378a2
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

集群正式生成(这里可以观察上面的信息,里面包含对应的集群信息以及slots的分配)。

这时,查看目录结构,你会发现对应新生成的dump.rdb和nodes.conf文件:

➜  $  tree
.
├── 7000
│   ├── dump.rdb
│   ├── nodes.conf
│   └── redis.conf
├── 7001
│   ├── dump.rdb
│   ├── nodes.conf
│   └── redis.conf
├── 7002
│   ├── dump.rdb
│   ├── nodes.conf
│   └── redis.conf
├── 7003
│   ├── dump.rdb
│   ├── nodes.conf
│   └── redis.conf
├── 7004
│   ├── dump.rdb
│   ├── nodes.conf
│   └── redis.conf
├── 7005
│   ├── dump.rdb
│   ├── nodes.conf
│   └── redis.conf
└── runcluster.sh

6 directories, 19 files

查看nodes.conf,里面就是描述的具体的clust信息,如:

➜  $  cat 7002/nodes.conf
42a80e0c660a65d6fd3132baf9787a1ced899cee 192.168.1.80:7003 slave f822e285d00305a3158408185abf838639b3db3e 0 1453937827236 4 connected
943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 192.168.1.80:7001 master - 0 1453937829255 2 connected 5461-10922
87cd77a57f5074a01117243516b0dbfb4fd725fd 192.168.1.80:7004 slave 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 0 1453937828246 5 connected
030674361cdc4dad424ae70115b2002908b40fe4 192.168.1.80:7005 slave bc86c1677869e05b6119139a838b06181f8378a2 0 1453937825215 6 connected
bc86c1677869e05b6119139a838b06181f8378a2 192.168.1.80:7002 myself,master - 0 0 3 connected 10923-16383
f822e285d00305a3158408185abf838639b3db3e 192.168.1.80:7000 master - 0 1453937826226 1 connected 0-5460
vars currentEpoch 6 lastVoteEpoch 0

这里,7000/7001/7002是对应的主(master),7003/7004/7005是分别对应的从(slave)。

基本集群测试

可以通过redis-cli链接集群进行简单的测试,链接时,注意要制定-c来标示是链接集群的:

➜  ~  redis-cli --help
redis-cli 3.0.6

Usage: redis-cli [OPTIONS] [cmd [arg [arg ...]]]
-h <hostname> Server hostname (default: 127.0.0.1).
-p <port> Server port (default: 6379).
-c Enable cluster mode (follow -ASK and -MOVED redirections).
.
.
.

测试简单的set命令:

➜  ~  redis-cli -c -h 192.168.1.80 -p 7002
192.168.1.80:7002> set hello world
-> Redirected to slot [866] located at 192.168.1.80:7000
OK
192.168.1.80:7000> set hello2 world2
-> Redirected to slot [7486] located at 192.168.1.80:7001
OK
192.168.1.80:7001> set hello3 world3
-> Redirected to slot [3359] located at 192.168.1.80:7000
OK
192.168.1.80:7000> set hello4 world4
-> Redirected to slot [15864] located at 192.168.1.80:7002
OK
192.168.1.80:7002>

这里,注意set一个key后,如果redis cluster计算出key在的slot不属于当前的实例,会做一个Redirect动作,到对应应该保存key的实例上,同时,对应的cli下面链接的redis实例也变成了新的实例,这也说明了,redis cluster会让client跳转,而不是代理。

另外一个测试,生成连续的10个key,看对应key在集群里的分布,如下:

➜  $  cat initkeys.sh
for (( i = 0; i < 6; i++ )); do
redis-cli -c -h 192.168.1.80 -p 700$i flushdb
done
for (( i = 0; i < 10; i++ )); do
echo "set foo${i}:"
redis-cli -c -h 192.168.1.80 -p 7000 set foo$i bar
done
➜ $ ./initkeys.sh

这时,查看生成的key的分布:

➜  $  cat listkeys.sh
for (( i = 0; i < 8; ++i )); do
echo "redis at port: 700${i}"
redis-cli -c -h 192.168.1.7 -p 700$i keys "*"
done
➜ $ ./listkeys.sh
redis at port: 7000
1) "foo3"
2) "foo7"
3) "foo6"
4) "foo2"
redis at port: 7001
1) "foo4"
2) "foo8"
3) "foo0"
redis at port: 7002
1) "foo1"
2) "foo9"
3) "foo5"
redis at port: 7003
1) "foo7"
2) "foo3"
3) "foo2"
4) "foo6"
redis at port: 7004
1) "foo4"
2) "foo8"
3) "foo0"
redis at port: 7005
1) "foo1"
2) "foo9"
3) "foo5"

注意,slave里的key分布是和master一致的。

添加节点

按照同样的方式,创建7006/7007两个新的redis实例。

并运行redis-trib.rb add-node,把对应的7006实例添加到集群中:

➜  $  redis-server 7006/redis.conf
➜ $ ../redis-trib.rb add-node 192.168.1.80:7006 192.168.1.80:7000
>>> Adding node 192.168.1.80:7006 to cluster 192.168.1.80:7000
>>> Performing Cluster Check (using node 192.168.1.80:7000)
M: f822e285d00305a3158408185abf838639b3db3e 192.168.1.80:7000
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: 87cd77a57f5074a01117243516b0dbfb4fd725fd 192.168.1.80:7004
slots: (0 slots) slave
replicates 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861
M: 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 192.168.1.80:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
M: bc86c1677869e05b6119139a838b06181f8378a2 192.168.1.80:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: 030674361cdc4dad424ae70115b2002908b40fe4 192.168.1.80:7005
slots: (0 slots) slave
replicates bc86c1677869e05b6119139a838b06181f8378a2
S: 42a80e0c660a65d6fd3132baf9787a1ced899cee 192.168.1.80:7003
slots: (0 slots) slave
replicates f822e285d00305a3158408185abf838639b3db3e
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 192.168.1.80:7006 to make it join the cluster.
[OK] New node added correctly.

新节点添加到cluster里了,接下来,查看一下集群信息:

➜  $  redis-cli -c -h 192.168.1.80 -p 7000 cluster nodes
f822e285d00305a3158408185abf838639b3db3e 192.168.1.80:7000 myself,master - 0 0 1 connected 0-5460
ee016572fb68ff192276fd8dd83ab1429a534b43 192.168.1.80:7006 master - 0 1453941587053 0 connected
87cd77a57f5074a01117243516b0dbfb4fd725fd 192.168.1.80:7004 slave 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 0 1453941592097 5 connected
943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 192.168.1.80:7001 master - 0 1453941591592 2 connected 5461-10922
bc86c1677869e05b6119139a838b06181f8378a2 192.168.1.80:7002 master - 0 1453941591089 3 connected 10923-16383
030674361cdc4dad424ae70115b2002908b40fe4 192.168.1.80:7005 slave bc86c1677869e05b6119139a838b06181f8378a2 0 1453941590083 6 connected
42a80e0c660a65d6fd3132baf9787a1ced899cee 192.168.1.80:7003 slave f822e285d00305a3158408185abf838639b3db3e 0 1453941589074 4 connected

注意,这时,7006对应的实例是master,但是分配的slots是0,这种情况下是不会有任何key分配到7006的。需要对cluster做一次reshard,重新分配集群的slots后才能有key分配过来。

redis clust的reshard语法为:./redis-trib.rb reshard <host>:<port> --from <node-id> --to <node-id> --slots --yes,也可以不指定fromto节点,通过向导的方式进行分配,如:

➜  $  ../src/redis-trib.rb reshard 192.168.1.80:7000
>>> Performing Cluster Check (using node 192.168.1.80:7000)
M: f822e285d00305a3158408185abf838639b3db3e 192.168.1.80:7000
slots:1666-5460 (3795 slots) master
1 additional replica(s)
M: ee016572fb68ff192276fd8dd83ab1429a534b43 192.168.1.80:7006
slots: (0 slots) master
0 additional replica(s)
S: 87cd77a57f5074a01117243516b0dbfb4fd725fd 192.168.1.80:7004
slots: (0 slots) slave
replicates 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861
M: 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 192.168.1.80:7001
slots:7128-10922 (3795 slots) master
1 additional replica(s)
M: bc86c1677869e05b6119139a838b06181f8378a2 192.168.1.80:7002
slots:12589-16383 (3795 slots) master
1 additional replica(s)
S: 030674361cdc4dad424ae70115b2002908b40fe4 192.168.1.80:7005
slots: (0 slots) slave
replicates bc86c1677869e05b6119139a838b06181f8378a2
S: 42a80e0c660a65d6fd3132baf9787a1ced899cee 192.168.1.80:7003
slots: (0 slots) slave
replicates f822e285d00305a3158408185abf838639b3db3e
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 5000
What is the receiving node ID? ee016572fb68ff192276fd8dd83ab1429a534b43

这里会终端让你输入需要分配多少个slots以及分配到哪个实例上,输入后,则会显示所有的slots从旧节点到新节点的迁移过程,如:

Moving slot 12587 from 192.168.1.80:7002 to 192.168.1.80:7006:
Moving slot 12588 from 192.168.1.80:7002 to 192.168.1.80:7006:

这时,查看集群信息:

➜  $  redis-cli -c -h 192.168.1.80 -p 7000 cluster nodes
f822e285d00305a3158408185abf838639b3db3e 192.168.1.80:7000 myself,master - 0 0 1 connected 1666-5460
ee016572fb68ff192276fd8dd83ab1429a534b43 192.168.1.80:7006 master - 0 1453942147886 7 connected 0-1665 5461-7127 10923-12588
87cd77a57f5074a01117243516b0dbfb4fd725fd 192.168.1.80:7004 slave 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 0 1453942144357 5 connected
943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 192.168.1.80:7001 master - 0 1453942150409 2 connected 7128-10922
bc86c1677869e05b6119139a838b06181f8378a2 192.168.1.80:7002 master - 0 1453942149401 3 connected 12589-16383
030674361cdc4dad424ae70115b2002908b40fe4 192.168.1.80:7005 slave bc86c1677869e05b6119139a838b06181f8378a2 0 1453942148391 6 connected
42a80e0c660a65d6fd3132baf9787a1ced899cee 192.168.1.80:7003 slave f822e285d00305a3158408185abf838639b3db3e 0 1453942147381 4 connected

你会发现,当前4个master,3个slave,7006对应的slots已经分配,7006实例是没有slave的。

检查当前的key分布,key的位置发生了调整,7006上也有key了

➜  $  ./listkeys.sh
redis at port: 7000
1) "foo3"
2) "foo7"
redis at port: 7001
1) "foo4"
2) "foo8"
3) "foo0"
redis at port: 7002
1) "foo1"
2) "foo9"
3) "foo5"
redis at port: 7003
1) "foo7"
2) "foo3"
redis at port: 7004
1) "foo4"
2) "foo8"
3) "foo0"
redis at port: 7005
1) "foo1"
2) "foo9"
3) "foo5"
redis at port: 7006
1) "foo6"
2) "foo2"

这时,启动7007实例,并把它添加为7006的slave节点,如下:

➜  $  redis-server 7007/redis.conf

➜ $ ../redis-trib.rb add-node --slave --master-id ee016572fb68ff192276fd8dd83ab1429a534b43 192.168.1.80:7007 192.168.1.80:7000
>>> Adding node 192.168.1.80:7007 to cluster 192.168.1.80:7000
>>> Performing Cluster Check (using node 192.168.1.80:7000)
M: f822e285d00305a3158408185abf838639b3db3e 192.168.1.80:7000
slots:1666-5460 (3795 slots) master
1 additional replica(s)
M: ee016572fb68ff192276fd8dd83ab1429a534b43 192.168.1.80:7006
slots:0-1665,5461-7127,10923-12588 (4999 slots) master
0 additional replica(s)
S: 87cd77a57f5074a01117243516b0dbfb4fd725fd 192.168.1.80:7004
slots: (0 slots) slave
replicates 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861
M: 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 192.168.1.80:7001
slots:7128-10922 (3795 slots) master
1 additional replica(s)
M: bc86c1677869e05b6119139a838b06181f8378a2 192.168.1.80:7002
slots:12589-16383 (3795 slots) master
1 additional replica(s)
S: 030674361cdc4dad424ae70115b2002908b40fe4 192.168.1.80:7005
slots: (0 slots) slave
replicates bc86c1677869e05b6119139a838b06181f8378a2
S: 42a80e0c660a65d6fd3132baf9787a1ced899cee 192.168.1.80:7003
slots: (0 slots) slave
replicates f822e285d00305a3158408185abf838639b3db3e
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 192.168.1.80:7007 to make it join the cluster.
Waiting for the cluster to join.
>>> Configure node as replica of 192.168.1.80:7006.
[OK] New node added correctly.

已有节点的下线、上线

我们通过kill掉一个进程7006来模拟redis实例挂掉的情况:

➜  $  ps -ef | grep 7006 | awk '{print $2}' | xargs kill
➜ $ redis-cli -c -h 192.168.1.80 -p 7000 cluster nodes
f822e285d00305a3158408185abf838639b3db3e 192.168.1.80:7000 myself,master - 0 0 1 connected 1666-5460
ee016572fb68ff192276fd8dd83ab1429a534b43 192.168.1.80:7006 master,fail - 1453942479213 1453942473057 7 disconnected
87cd77a57f5074a01117243516b0dbfb4fd725fd 192.168.1.80:7004 slave 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 0 1453942501726 5 connected
943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 192.168.1.80:7001 master - 0 1453942498379 2 connected 7128-10922
bc86c1677869e05b6119139a838b06181f8378a2 192.168.1.80:7002 master - 0 1453942503450 3 connected 12589-16383
030674361cdc4dad424ae70115b2002908b40fe4 192.168.1.80:7005 slave bc86c1677869e05b6119139a838b06181f8378a2 0 1453942501420 6 connected
42a80e0c660a65d6fd3132baf9787a1ced899cee 192.168.1.80:7003 slave f822e285d00305a3158408185abf838639b3db3e 0 1453942502436 4 connected
261efedb181985a24731c1f17353a379c86efe5b 192.168.1.80:7007 master - 0 1453942500405 8 connected 0-1665 5461-7127 10923-12588

这时,7007(原来7006的slave)会自动被提升为master,而7006会显示fail的状态。

重新启动7006:

➜  $  redis-server 7006/redis.conf
➜ $ redis-cli -c -h 192.168.1.80 -p 7000 cluster nodes
f822e285d00305a3158408185abf838639b3db3e 192.168.1.80:7000 myself,master - 0 0 1 connected 1666-5460
ee016572fb68ff192276fd8dd83ab1429a534b43 192.168.1.80:7006 slave 261efedb181985a24731c1f17353a379c86efe5b 0 1453942543558 8 connected
87cd77a57f5074a01117243516b0dbfb4fd725fd 192.168.1.80:7004 slave 943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 0 1453942544465 5 connected
943e0d3987a4bdfa35229cc46ecc5e0c4c64c861 192.168.1.80:7001 master - 0 1453942542034 2 connected 7128-10922
bc86c1677869e05b6119139a838b06181f8378a2 192.168.1.80:7002 master - 0 1453942541015 3 connected 12589-16383
030674361cdc4dad424ae70115b2002908b40fe4 192.168.1.80:7005 slave bc86c1677869e05b6119139a838b06181f8378a2 0 1453942543457 6 connected
42a80e0c660a65d6fd3132baf9787a1ced899cee 192.168.1.80:7003 slave f822e285d00305a3158408185abf838639b3db3e 0 1453942546079 4 connected
261efedb181985a24731c1f17353a379c86efe5b 192.168.1.80:7007 master - 0 1453942545070 8 connected 0-1665 5461-7127 10923-12588

这时,7006的角色会变成7007的slave。