2014年7月9日 星期三

在GlusterFS架構下安裝Clustered Samba

1. Architecture

1.1 Prerequisite and Foundation

  • CentOS 6.x
  • GlusterFS
  • CTDB
  • Samba
縮寫 全名 說明
CIFS Common Internet File System 簡單地說, Windows的網路上的芳鄰, 網路文件共享系統(CIFS)
NFS Network File System
PV Physical Volume
VG Volume Group
LV Logical Volume
Clustered Samba

1.2 網路配置

準備兩台機器, 各有三張網路卡介面 Network digram
Add the following hostnames in /etc/hosts
# NFS/CIFS access  nas1.rickpc gluster01  nas2.rickpc gluster02

# CTDB interconnect    gluster01c    gluster02c

# GlusterFS interconnect    gluster01g    gluster02g

1.3. 建立實體硬碟

若要瞭解Linux磁碟檔案系統的基本原理和如何使用fdisk來分切磁碟可參考 NFS伺服器1介紹, 以下僅列出基本指令
Prepare phylical partition to create /dev/sdb1
$ fdisk /dev/sdb
$ partprobe
分切nas1和nas2的磁碟, 結果如下,
筆者所使用的硬碟為8G, 但只切出
/dev/sdb4 64M
/dev/sdb5 2.1G (將做為physical volume空間)
Disk /dev/sdb: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x9815603c

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               9        1044     8321670    5  Extended
/dev/sdb4               1           8       64228+  83  Linux
/dev/sdb5               9         270     2104483+  83  Linux

1.4. 建立Linux Volume

若對於PV, VG, LV的概念原理想深入瞭解的話, 可參考 Logical Volume Manager2的解釋 Volume配置
Create phylical volume
$ pvcreate /dev/sdb5
Create volume group
$ vgcreate vg_bricks /dev/sdb5
Create logical volume
$ lvcreate -n lv_lock -L 64M vg_bricks
$ lvcreate -n lv_brick01 -L 1.5G vg_bricks
Install XFS package
$ yum install -y xfsprogs
format linux file system
$ mkfs.xfs -i size=512 /dev/vg_bricks/lv_lock
$ mkfs.xfs -i size=512 /dev/vg_bricks/lv_brick01
$ echo '/dev/vg_bricks/lv_lock /bricks/lock xfs defaults 0 0' >> /etc/fstab
$ echo '/dev/vg_bricks/lv_brick01 /bricks/brick01 xfs defaults 0 0' >> /etc/fstab
$ mkdir -p /bricks/lock
$ mkdir -p /bricks/brick01
$ mount /bricks/lock
$ mount /bricks/brick01
分別在nas1和nas2上建立PV, VG和LV, 結果如下:
[root@nas1 ~]# lvdisplay
  --- Logical volume ---
  LV Path                /dev/vg_bricks/lv_lock
  LV Name                lv_lock
  VG Name                vg_bricks
  LV UUID                rnRNbZ-QFun-pxvS-AS3f-pvn3-dvCY-h3qXgi
  LV Write Access        read/write
  LV Creation host, time nas1.rickpc, 2014-07-04 16:54:20 +0800
  LV Status              available
  # open                 1
  LV Size                64.00 MiB
  Current LE             16
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2

  --- Logical volume ---
  LV Path                /dev/vg_bricks/lv_brick01
  LV Name                lv_brick01
  VG Name                vg_bricks
  LV UUID                BwMD2T-YOJi-spM4-aarC-3Yyj-Jfe2-nsecIJ
  LV Write Access        read/write
  LV Creation host, time nas1.rickpc, 2014-07-04 16:56:11 +0800
  LV Status              available
  # open                 1
  LV Size                1.50 GiB
  Current LE             384
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

1.5. 安裝GlusterFS and create volumes

想瞭解CTDB與GlusterFS之間是如何運作以及如何安裝GlusterFS和CTDB, 可參考 GlusterFS/CTDB Integration3 和 Clustered NAS For Everyone Clustering Samba With CTDB4.
Install GlusterFS packages on all nodes
$ wget -nc http://download.gluster.org/pub/gluster/glusterfs/3.5/LATEST/RHEL/glusterfs-epel.repo -O /etc/yum.repos.d/glusterfs-epel.repo
$ yum install -y rpcbind glusterfs-server
$ chkconfig rpcbind on
$ service rpcbind restart
$ service glusterd restart
Do not auto start glusterd with chkconfig.
Configure cluster and create volumes from gluster01
將 gluster02g 加入可信任的儲存池 (Trusted Stroage Pool)
$ gluster peer probe gluster02g
若遇到 gluster peer probe: failed: Probe returned with unknown errno 107, 請參考5
gluster peer status
建立 Volume: 在 glusterfs 的架構中,每一個 volume 就代表了單獨的虛擬檔案系統。
# transport tcp
$ gluster volume create lockvol replica 2 gluster01g:/bricks/lock gluster02g:/bricks/lock force
$ gluster volume create vol01 replica 2 gluster01g:/bricks/brick01 gluster02g:/bricks/brick01 force
$ gluster vol start lockvol
$ gluster vol start vol01
nas1和nas2分別建立了GlusterFS的虛擬檔案系統, 結果如下:
                         60736    3576     57160   6% /bricks/lock
                       1562624  179536   1383088  12% /bricks/brick01
localhost:/lockvol       60672    3584     57088   6% /gluster/lock
localhost:/vol01       1562624  179584   1383040  12% /gluster/vol01

1.6. Install and configure Samba/CTDB

Install Samba/CTDB packages6 on all nodes with samba-3.6.9, samba-client-3.6.9 and ctdb-
$ yum install -y samba samba­client ctdb
Install NFS7 with rpcbind-0.2.0, nfs-utils-1.2.3
$ yum install -y rpcbind nfs-utils
$ chkconfig rpcbind on
$ service rpcbind start
Configure CTDB and Samba only on gluster01
$ mkdir -p /gluster/lock
$ mount -t glusterfs localhost:/lockvol /gluster/lock
Edit /gluster/lock/ctdb
# Only when using Samba. Unnecessary for NFS.
# some tunables
Edit /gluster/lock/nodes
Edit /gluster/lock/public_addresses eth0 eth0
Edit /gluster/lock/smb.conf
    workgroup = MYGROUP
    server string = Samba Server Version %v
    clustering = yes
    security = user
    passdb backend = tdbsam
    comment = Shared Directories
    path = /gluster/vol01
    browseable = yes
    writable = yes
Create symlink to config files on all nodes
$ mv /etc/sysconfig/ctdb /etc/sysconfig/ctdb.orig
$ mv /etc/samba/smb.conf /etc/samba/smb.conf.orig
$ ln -s /gluster/lock/ctdb /etc/sysconfig/ctdb
$ ln -s /gluster/lock/nodes /etc/ctdb/nodes
$ ln -s /gluster/lock/public_addresses /etc/ctdb/public_addresses
$ ln -s /gluster/lock/smb.conf /etc/samba/smb.conf
Set SELinux permissive for smbd_t on all nodes due to the non-standard smb.conf location
$ yum install -y policycoreutils-python
$ semanage permissive -a smbd_t
We'd better set an appropriate seculity context, but there's an open issue for using chcon with GlusterFS.
Create the following script for start/stop services in /usr/local/bin/ctdb_manage
function runcmd {
        echo exec on all nodes: $@
        ssh gluster01 $@ &
        ssh gluster02 $@ &
case $1 in
        runcmd service glusterd start
        sleep 1
        runcmd mkdir -p /gluster/lock
        runcmd mount  -t glusterfs localhost:/lockvol /gluster/lock
        runcmd mkdir -p /gluster/vol01
        runcmd mount  -t glusterfs localhost:/vol01 /gluster/vol01
        runcmd service ctdb start

        runcmd service ctdb stop
        runcmd umount /gluster/lock
        runcmd umount /gluster/vol01
        runcmd service glusterd stop
        runcmd pkill glusterfs

1.7. Start services

Set samba password and check shared directories via one of floating IP's.
$ pdbedit -a -u root
Test samba connection
$ smbclient -L -U root
$ smbclient -L -U root
Check Windows connection
$ ssh gluster01 netstat -aT | grep microsoft

2. Testing your clustered Samba

2.1. Client Disconnection

在一台Windows的PC上, 設定Z槽的網路磁碟機, 並執行下述的run_client.bat
echo off
 echo "%time% (^_-) Writing on file in the shared folder...."
 echo %time% >> z:/wintest.txt
 sleep 2

 echo "%time% (-_^) Writing on file in the shared folder...."
 echo %time% >> z:/wintest.txt
 sleep 2
每兩秒會將目前的timestamp寫入Z:/wintest.txt中, 測試步驟如下:
  1. 執行run_client.bat
  2. 將Windows上的網路卡介面關閉, 程式無法把資料寫入cluster file system
  3. 重新啟動網路卡介面, 程式又在很短時間內寫入cluster file system

2.2. CTDB Failover

使用ctdb status和ctdb ip查看目前cluster file system的狀態 測試步驟:
  1. 在Windows PC上執行run_client.bat
  2. 在任一台Cluster node上, 關閉ctdb, 指令如下:
  3. [root@nas2 ~]# ctdb stop
  4. 觀察PC上的timestamp正常寫入cluster file system

2.3. Cluster Node Crash

將一台Cluster node reboot, 觀察Windows PC上的連線狀況 測試步驟:
  1. 在Windows PC上執行run_client.bat
  2. 將任一台Cluster node OS shutdown
  3. 觀察PC上的timestamp的變化
    "12:16:49.59 (-_^) Writing on file in the shared folder...."
    "12:16:51.62 (^_-) Writing on file in the shared folder...."
    "12:16:53.66 (-_^) Writing on file in the shared folder...."
    "12:16:55.70 (^_-) Writing on file in the shared folder...."
    "12:16:57.74 (-_^) Writing on file in the shared folder...."
    "12:17:41.90 (^_-) Writing on file in the shared folder...."
    "12:17:43.92 (-_^) Writing on file in the shared folder...."
    "12:17:45.95 (^_-) Writing on file in the shared folder...."
    "12:17:48.00 (-_^) Writing on file in the shared folder...."
"12:16:57.74 (-_^) Writing on file in the shared folder...."
"12:17:41.90 (^_-) Writing on file in the shared folder...."
紅色兩行的結果, 發現Winodws的連線會有數秒的中斷, 但在數秒後, PC上的test program將重新連上, 符合HA-level recovery

2.4. Ping_pong for CTDB lock rate

Ping_pong8是Samba open source所提供的一個小工具, 用來測量CTDB的lock rate
筆者稍微修改原程式碼, 並加入了將lock rate寫入到Graphite9, 方便長時間觀察lock rate的變化
source code

3. Reference

  1. Linux 磁碟與檔案系統管理, 鳥哥
  2. 邏輯捲軸管理員 (Logical Volume Manager), 鳥哥
  3. GlusterFS/CTDB Integration, Etsuji Nakai
  4. Clustered NAS For Everyone Clustering Samba With CTDB, Michael Adam
  5. gluster peer probe: failed: Probe returned with unknown errno 107, Network Administrator Blog
  6. SAMBA 伺服器, 鳥哥
  7. NFS 伺服器, 鳥哥
  8. Ping pong, Samba
  9. Graphite - Scalable Realtime Graphing

