特色
MySQL/Galera 是一种多主同步集群,但只限于使用 MySQL/InnoDB 引擎,并具有下面特点
同步复制
多个主服务器的拓扑结构
可以在任意节点上进行读写
自动控制成员,自动删除故障节点
自动加入节点
真正给予行级别的并发复制
调度客户连接
优势
参考下面基于 DBMS 集群的解决方法
不存在从服务器角色
不存在事务丢失
读写操作可根据需要进行随意扩展
更少的闩操作
MySQL/Galera 集群使用 Galera 库执行复制,对应 Galera 复制接口,我们需要MySQL 服务器支持 wsrep API 接口
http://www.codership.com/products/mysql-write-set-replication-project
不可以,因为 mysql 中没有支持 wsrep_ 数据复制的参数,当然代码级别上也具有很大差别
工作原理
单纯的 mariadb 无法实现多个主服务器数据同步
多台数据库中数据同步由 wsrep 接口实现
最终目标,实现多个 MySQL 同时读写
wsrep API
wsrep API 是一种数据库插件接口,比较类似一种应用程序,主要针对写复制
该程序主要用于定义应用程序如何调用复制库实现回写
wsrep API 由支持改库的应用程序动态装载
wsrep API 描述下面复制模型,一个应用程序,如数据库当前的一个对象,当前被客户端修改,对象改变导致事务产生一系列的原子性改变, 在集群中所有的节点都具备相同的对象,并由同步复制应用都各自节点,按照相同的顺序产生相同变化从而实现数据同步
标识对象的改变
标识对象自身 ID 最后状态(正常情况下,ID 是连续不中断的)
GTID 包含
一个 UUID 作为对象标识及经历改变的序号,序号会发生连续的改变
GTID 允许比较应用程序状态,建立对象改变的顺序,决定对象的变化是否需要更新 GTID
通常 GTID 会卑记录成下面格式
45eec521-2f34-11e0-0800-2a36050b826b:94530586304
言归正传,我们需要编译 mariadb-mysql 及 galera 插件
galera/mysql 编译步骤
https://downloads.mariadb.org/interstitial/mariadb-galera-5.5.33a/kvm-tarbake-jaunty-x86/mariadb-galera-5.5.33a.tar.gz/from/http://mirrors.scie.in/mariadb
yum install -y cmake
tar xf mariadb-galera-5.5.33a.tar.gz
cd mariadb-5.5.33a/
cmake -LAH
参考 CMakeCache.txt 文件中的配置信息
cmake -DINSTALL_MYSQLDATADIR:STRING=/mdb -DINSTALL_UNIX_ADDRDIR:STRING=/var/run/mysqld/mysql5.socketmake
make install
galera 编译
https://launchpad.net/galera/2.x/23.2.7/+download/galera-23.2.7-src.tar.gz
添加数据源
baseurl=http://mirror.neu.edu.cn/fedora/epel//6Server/x86_64/添加下面软件包
yum erase -y mysql.x86_64 mysql-devel.x86_64 mysql-libs.x86_64yum install -y boost-devel.x86_64 libodb-boost-devel.x86_64 bzr scons
tar xf galera-23.2.7-src.tar.gz
cd galera-23.2.7-src
scons
编译后能生成 libgalera_smm.so
复制编译好的库至下面位置 /usr/local/galera/lib/libgalera_smm.so
mkdir /usr/local/galera/lib -pcp /usr/src/galera-23.2.7-src/libgalera_smm.so /usr/local/galera/lib/libgalera_smm.so
basedir=/usr/local/mysql
big-tables
bind-address=0.0.0.0
character-set-server=utf8
datadir=/mdb
log-error=/var/log/mysqld/mysql5-error.log
socket=/var/run/mysqld/mysql5.socket
pid-file=/var/run/mysqld/mysql5.pid
port=3306
user=mysql
binlog_format = ROW
binlog_cache_size = 1M
character_set_server = utf8
collation_server = utf8_general_ci
default-storage-engine = InnoDB
expire_logs_days = 10
innodb_buffer_pool_size = 300M
innodb_thread_concurrency = 16
innodb_log_buffer_size = 8M
innodb_doublewrite = 1
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
server-id = 1
max_connections = 1000
net_buffer_length = 8K
open-files-limit = 65535
wsrep_cluster_address = 'gcomm://192.168.200.163,192.168.200.171,172.18.8.49,172.18.8.50'
wsrep_provider = /usr/local/galera/lib/libgalera_smm.so
wsrep_retry_autocommit = 0
wsrep_sst_method = rsync
wsrep_provider_options="gcache.size=256m; gcache.page_size=256m"
wsrep_slave_threads=16
wsrep_cluster_name='my_cluster'
wsrep_node_name='db5'
wsrep_sst_auth=tt:tt123
初始化数据库
mkdir /mdbcd /usr/local/mysql
./scripts/install_mysql_db --datadir=/mdb
# chkconfig: 2345 64 36
# description: A very fast and reliable SQL database engine.
basedir=/usr/local/mysql
datadir=/mdb
# Default value, in seconds, afterwhich the script should timeout waiting
# for server start.
# Value here is overriden by value in my.cnf.
# 0 means don't wait at all
# Negative numbers mean to wait indefinitely
service_startup_timeout=900
startup_sleep=1
# Lock directory for RedHat / SuSE.
lockdir='/var/lock/subsys'
lock_file_path="$lockdir/mysql"
# The following variables are only set for letting mysql.server find things.
# Set some defaults
mysqld_pid_file_path=/var/run/mysqld/mysql5.pid
if test -z "$basedir"
then
basedir=/usr/local/mysql
bindir=/usr/local/mysql/bin
if test -z "$datadir"
then
datadir=/usr/local/mysql/data
fi
sbindir=/usr/local/mysql/bin
libexecdir=/usr/local/mysql/bin
else
bindir="$basedir/bin"
if test -z "$datadir"
then
datadir="$basedir/data"
fi
sbindir="$basedir/sbin"
if test -f "$basedir/bin/mysqld"
then
libexecdir="$basedir/bin"
else
libexecdir="$basedir/libexec"
fi
fi
# datadir_set is used to determine if datadir was set (and so should be
# *not* set inside of the --basedir= handler.)
datadir_set=
#
# Use LSB init script functions for printing messages, if possible
#
lsb_functions="/lib/lsb/init-functions"
if test -f $lsb_functions ; then
. $lsb_functions
else
log_success_msg()
{
echo " SUCCESS! $@"
}
log_failure_msg()
{
echo " ERROR! $@"
}
fi
PATH="/sbin:/usr/sbin:/bin:/usr/bin:$basedir/bin"
export PATH
mode=$1 # start or stop
[ $# -ge 1 ] && shift
other_args="$*" # uncommon, but needed when called from an RPM upgrade action
# Expected: "--skip-networking --skip-grant-tables"
# They are not checked here, intentionally, as it is the resposibility
# of the "spec" file author to give correct arguments only.
case `echo "testingc"`,`echo -n testing` in
*c*,-n*) echo_n= echo_c= ;;
*c*,*) echo_n=-n echo_c= ;;
*) echo_n= echo_c='c' ;;
esac
parse_server_arguments() {
for arg do
case "$arg" in
--basedir=*) basedir=`echo "$arg" | sed -e 's/^[^=]*=//'`
bindir="$basedir/bin"
if test -z "$datadir_set"; then
datadir="$basedir/data"
fi
sbindir="$basedir/sbin"
if test -f "$basedir/bin/mysqld"
then
libexecdir="$basedir/bin"
else
libexecdir="$basedir/libexec"
fi
libexecdir="$basedir/libexec"
;;
--datadir=*) datadir=`echo "$arg" | sed -e 's/^[^=]*=//'`
datadir_set=1
;;
--pid-file=*) mysqld_pid_file_path=`echo "$arg" | sed -e 's/^[^=]*=//'` ;;
--service-startup-timeout=*) service_startup_timeout=`echo "$arg" | sed -e 's/^[^=]*=//'` ;;
esac
done
}
wait_for_pid () {
verb="$1" # created | removed
pid="$2" # process ID of the program operating on the pid-file
pid_file_path="$3" # path to the PID file.
sst_progress_file=$datadir/sst_in_progress
i=0
avoid_race_condition="by checking again"
while test $i -ne $service_startup_timeout ; do
case "$verb" in
'created')
# wait for a PID-file to pop into existence.
test -s "$pid_file_path" && i='' && break
;;
'removed')
# wait for this PID-file to disappear
test ! -s "$pid_file_path" && i='' && break
;;
*)
echo "wait_for_pid () usage: wait_for_pid created|removed pid pid_file_path"
exit 1
;;
esac
# if server isn't running, then pid-file will never be updated
if test -n "$pid"; then
if kill -0 "$pid" 2>/dev/null; then
: # the server still runs
else
# The server may have exited between the last pid-file check and now.
if test -n "$avoid_race_condition"; then
avoid_race_condition=""
continue # Check again.
fi
# there's nothing that will affect the file.
log_failure_msg "The server quit without updating PID file ($pid_file_path)."
return 1 # not waiting any more.
fi
fi
if test -e $sst_progress_file && [ $startup_sleep -ne 100 ];then
echo $echo_n "SST in progress, setting sleep higher"
startup_sleep=100
fi
echo $echo_n ".$echo_c"
i=`expr $i + 1`
sleep $startup_sleep
done
if test -z "$i" ; then
log_success_msg
return 0
else
log_failure_msg
return 1
fi
}
# Get arguments from the my.cnf file,
# the only group, which is read from now on is [mysqld]
if test -x ./bin/my_print_defaults
then
print_defaults="./bin/my_print_defaults"
elif test -x $bindir/my_print_defaults
then
print_defaults="$bindir/my_print_defaults"
elif test -x $bindir/mysql_print_defaults
then
print_defaults="$bindir/mysql_print_defaults"
else
# Try to find basedir in /etc/my.cnf
conf=/usr/local/mysql/etc/my.cnf
print_defaults=
if test -r $conf
then
subpat='^[^=]*basedir[^=]*=(.*)$'
dirs=`sed -e "/$subpat/!d" -e 's//1/' $conf`
for d in $dirs
do
d=`echo $d | sed -e 's/[ ]//g'`
if test -x "$d/bin/my_print_defaults"
then
print_defaults="$d/bin/my_print_defaults"
break
fi
if test -x "$d/bin/mysql_print_defaults"
then
print_defaults="$d/bin/mysql_print_defaults"
break
fi
done
fi
# Hope it's in the PATH ... but I doubt it
test -z "$print_defaults" && print_defaults="my_print_defaults"
fi
#
# Read defaults file from 'basedir'. If there is no defaults file there
# check if it's in the old (depricated) place (datadir) and read it from there
#
extra_args=""
if test -r "$basedir/my.cnf"
then
extra_args="-e $basedir/my.cnf"
else
if test -r "$datadir/my.cnf"
then
extra_args="-e $datadir/my.cnf"
fi
fi
parse_server_arguments `$print_defaults $extra_args mysqld server mysql_server mysql.server`
#
# Set pid file if not given
#
if test -z "$mysqld_pid_file_path"
then
mysqld_pid_file_path=$datadir/`hostname`.pid
else
case "$mysqld_pid_file_path" in
/* ) ;;
* ) mysqld_pid_file_path="$datadir/$mysqld_pid_file_path" ;;
esac
fi
case "$mode" in
'start')
# Start daemon
# Safeguard (relative paths, core dumps..)
cd $basedir
echo $echo_n "Starting MySQL"
if test -x $bindir/mysqld_safe
then
# Give extra arguments to mysqld with the my.cnf file. This script
# may be overwritten at next upgrade.
$bindir/mysqld_safe --datadir="$datadir" --pid-file="$mysqld_pid_file_path" $other_args >/dev/null 2>&1 &
wait_for_pid created "$!" "$mysqld_pid_file_path"; return_value=$?
# Make lock for RedHat / SuSE
if test -w "$lockdir"
then
touch "$lock_file_path"
fi
exit $return_value
else
log_failure_msg "Couldn't find MySQL server ($bindir/mysqld_safe)"
fi
;;
'stop')
# Stop daemon. We use a signal here to avoid having to know the
# root password.
if test -s "$mysqld_pid_file_path"
then
mysqld_pid=`cat "$mysqld_pid_file_path"`
if (kill -0 $mysqld_pid 2>/dev/null)
then
echo $echo_n "Shutting down MySQL"
kill $mysqld_pid
# mysqld should remove the pid file when it exits, so wait for it.
wait_for_pid removed "$mysqld_pid" "$mysqld_pid_file_path"; return_value=$?
else
log_failure_msg "MySQL server process #$mysqld_pid is not running!"
rm "$mysqld_pid_file_path"
fi
# Delete lock for RedHat / SuSE
if test -f "$lock_file_path"
then
rm -f "$lock_file_path"
fi
exit $return_value
else
log_failure_msg "MySQL server PID file could not be found!"
fi
;;
'restart')
# Stop the service and regardless of whether it was
# running or not, start it again.
if $0 stop $other_args; then
$0 start $other_args
else
log_failure_msg "Failed to stop running server, so refusing to try to start."
exit 1
fi
;;
'reload'|'force-reload')
if test -s "$mysqld_pid_file_path" ; then
read mysqld_pid < "$mysqld_pid_file_path"
kill -HUP $mysqld_pid && log_success_msg "Reloading service MySQL"
touch "$mysqld_pid_file_path"
else
log_failure_msg "MySQL PID file could not be found!"
exit 1
fi
;;
'status')
# First, check to see if pid file exists
if test -s "$mysqld_pid_file_path" ; then
read mysqld_pid < "$mysqld_pid_file_path"
if kill -0 $mysqld_pid 2>/dev/null ; then
log_success_msg "MySQL running ($mysqld_pid)"
exit 0
else
log_failure_msg "MySQL is not running, but PID file exists"
exit 1
fi
else
# Try to find appropriate mysqld process
mysqld_pid=`pidof $libexecdir/mysqld`
# test if multiple pids exist
pid_count=`echo $mysqld_pid | wc -w`
if test $pid_count -gt 1 ; then
log_failure_msg "Multiple MySQL running but PID file could not be found ($mysqld_pid)"
exit 5
elif test -z $mysqld_pid ; then
if test -f "$lock_file_path" ; then
log_failure_msg "MySQL is not running, but lock file ($lock_file_path) exists"
exit 2
fi
log_failure_msg "MySQL is not running"
exit 3
else
log_failure_msg "MySQL is running but PID file could not be found"
exit 4
fi
fi
;;
'configtest')
# Safeguard (relative paths, core dumps..)
cd $basedir
echo $echo_n "Testing MySQL configuration syntax"
daemon=$bindir/mysqld
if test -x $libexecdir/mysqld
then
daemon=$libexecdir/mysqld
elif test -x $sbindir/mysqld
then
daemon=$sbindir/mysqld
elif test -x `which mysqld`
then
daemon=`which mysqld`
else
log_failure_msg "Unable to locate the mysqld binary!"
exit 1
fi
help_out=`$daemon --help 2>&1`; r=$?
if test "$r" != 0 ; then
log_failure_msg "$help_out"
log_failure_msg "There are syntax errors in the server configuration. Please fix them!"
else
log_success_msg "Syntax OK"
fi
exit $r
;;
'bootstrap')
# Bootstrap the cluster, start the first node
# that initiate the cluster
echo $echo_n "Bootstrapping the cluster"
$0 start $other_args --wsrep-new-cluster
;;
*)
# usage
basename=`basename "$0"`
echo "Usage: $basename {start|stop|restart|reload|force-reload|status|configtest|bootstrap} [ MySQL server options ]"
exit 1
;;
esac
exit 0
启动每一台数据库
service mysql5 start
GRANT ALL PRIVILEGES on *.* to tt@'%';
GRANT USAGE ON *.* to tt@'localhost' IDENTIFIED BY 'tt123';
GRANT ALL PRIVILEGES on *.* to tt@'localhost';
flush privileges;
创建并加入集群
集群中第一个节点启动 (192.168.200.163)
创建软链接,并启动集群,集群启动过程中会自动启动 mariadb
cd /usr/local/
./mysql-galera -g gcomm:// start
测试是否成功启动方法, 查询是否会自动启动 4567 端口
[root@db2 local]# netstat -ntlActive Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN
登录 mysql 之后,查询当前是否启用 galera 插件
MariaDB [(none)]> show status like 'wsrep_ready';+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wsrep_ready | ON |
+---------------+-------+
1 row in set (0.00 sec)
./mysql-galera stop
第一台 (192.168.200.163) 节点已经启动成功
第二台 (192.168.200.171) 需要加入集群
./mysql-galera -g gcomm://192.168.200.163 start
+--------------------------+-------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------+
| wsrep_incoming_addresses | 192.168.200.171:3306,192.168.200.163:3306 |
+--------------------------+-------------------------------------------+
1 row in set (0.00 sec)
第三台 (172.18.8.49) 需要加入集群
cd /usr/local/./mysql-galera -g gcomm://192.168.200.163,192.168.200.171 start
./mysql-galera -g gcomm://192.168.200.163,192.168.200.171,172.18.8.49 start
注: 每次集群启动, 将会启用数据同步机制,令每个集群中的数据同步
假如,集群工作期间,节点 3(172.18.8.49) 脱离集群,重启,发生故障
而脱机期间,节点1,2,4 仍可继续工作
当节点3 重新在线时,加入集群前,将会自动进行数据同步
重新在线方法与上文中加入节点方法一致
另外,假如觉得要定义所有的服务器地址麻烦,可以加入集群时候只定义其中一台的地址,如 gcomm://192.168.200.163 集群也能够自动在加入后添加其他集群 url地址
+----------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------+--------------------------------------+
| wsrep_local_state_uuid | bb5b9e17-66c8-11e3-86ba-96854521d205 | uuid 集群唯一标记
| wsrep_protocol_version | 4 |
| wsrep_last_committed | 16 | sql 提交记录
| wsrep_replicated | 4 | 随着复制发出的次数
| wsrep_replicated_bytes | 692 | 数据复制发出的字节数
| wsrep_received | 18 | 数据复制接收次数
| wsrep_received_bytes | 3070 | 数据复制接收的字节数
| wsrep_local_commits | 4 | 本地执行的 sql
| wsrep_local_cert_failures | 0 | 本地失败事务
| wsrep_local_bf_aborts | 0 |从执行事务过程被本地中断
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 | 本地发出的队列
| wsrep_local_send_queue_avg | 0.142857 | 队列平均时间间隔
| wsrep_local_recv_queue | 0 | 本地接收队列
| wsrep_local_recv_queue_avg | 0.000000 | 本地接收时间间隔
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 0.000000 | 并发数量
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 1.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.000000 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 0 |
| wsrep_causal_reads | 0 |
| wsrep_incoming_addresses | 172.18.8.50:3306,172.18.8.49:3306 | 连接中的数据库
| wsrep_cluster_conf_id | 18 |
| wsrep_cluster_size | 2 | 集群成员个数
| wsrep_cluster_state_uuid | bb5b9e17-66c8-11e3-86ba-96854521d205 | 集群 ID
| wsrep_cluster_status | Primary | 主服务器
| wsrep_connected | ON | 当前是否连接中
| wsrep_local_index | 1 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 2.7(rXXXX) |
| wsrep_ready | ON | 插件是否应用中
+----------------------------+--------------------------------------+
40 rows in set (0.05 sec)
另,如使用 rpm 则十分方便,网路很多教程, 不详细描述
auto_increment
当更多的 MariaDB 加入到集群之后,集群中的数据库会自动进行协调,并且自动定义偏移量, 这个比较人性化,自动化,如下描述
db1:
MariaDB [(none)]> show variables like 'auto_increment%';+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| auto_increment_increment | 4 |
| auto_increment_offset | 3 |
+--------------------------+-------+
2 rows in set (0.00 sec)
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| auto_increment_increment | 4 |
| auto_increment_offset | 4 |
+------------------------
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| auto_increment_increment | 4 |
| auto_increment_offset | 2 |
+--------------------------+-------+
2 rows in set (0.00 sec)
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| auto_increment_increment | 4 |
| auto_increment_offset | 1 |
+--------------------------+-------+
2 rows in set (0.00 sec)
模拟测试1
创建测试表
MariaDB [(none)]> desc terry.t2;+-------+-------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(20) | YES | | NULL | |
| time | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------+-------------+------+-----+-------------------+-----------------------------+
3 rows in set (0.00 sec)
在每台电脑中创建对应的数据插入脚本 (太大量的并发插入会导致服务器不断脱离集群,最终只剩下一次,因此减少数据插入量)
[root@db6 mdb]# cat /tmp/in.sh#!/bin/bash
for (( a=1 ; a<=1000 ; a++ ))
do
name="db6.$a"
mysql -u terry -p123 -e "insert into terry.t2 (name, time) values ("$name", now())"
done
插入过程中, 会出现锁,有一个数据库集群会自动脱离集群 >_<"
MySQL thread id 16, OS thread handle 0x7f2f2019a700, query id 4666 applied write set 183192TABLE LOCK table `terry`.`t2` trx id 2D3EF lock mode IX
---TRANSACTION 2D3EE, ACTIVE 0 sec
mysql tables in use 1, locked 1
1 lock struct(s), heap size 376, 0 row lock(s), undo log entries 1
MySQL thread id 633, OS thread handle 0x7f2f20076700, query id 4664 localhost terry query end
insert into terry.t2 (name, time) values ("db5.603", now())
TABLE LOCK table `terry`.`t2` trx id 2D3EE lock mode IX
---TRANSACTION 2D3ED, ACTIVE (PREPARED) 0 sec preparing
1 lock struct(s), heap size 376, 0 row lock(s), undo log entries 1
MySQL thread id 3, OS thread handle 0x7f2f3be4e700, query id 4662 committing 183190
参考其他三台 时间返回值
db1 (使用 13 秒)
db2 (使用 24秒)
db3(使用14秒)
db4(写入 35 条数据后 crash)