The Storges of Docker Swarm

发表于 2024-09-25 更新于 2025-01-08 分类于运维阅读次数： Waline：本文字数： 5.4k 阅读时长 ≈ 20 分钟

Devops Part.3：基于docker swarm进行应用服务的部署与容器编排时，docker swarm的后端存储使用GlusterFS(本文最下方一小部分包含kubernetes中的sc与pc、pvc对GlusterFS卷的使用)，此篇为GlusterFS的使用过程

GlusterFS 使用流程：

基于前一篇，已完成应用服务的部署，但基于docker swarm编排的随机性（虽然部署时可控，但故障时调度到其他节点上时，仍需要保留数据的一致性与连续性）而volume部分启用分布式文件系统（GlusterFS）

简介

GlusterFS 是一个开源的分布式文件系统，主要由存储服务器（BrickServer）、客户端及 NFS/Samba 存储网关（可选，根据需要选择使用）组成。GlusterFS同时也是Scale-Out（横向扩展）存储解决方案Gluster的核心，在存储数据方面具有强大的横向扩展能力，通过扩展能够支持数PB存储容量和处理数千客户端。
GlusterFS借助TCP/IP或InfiniBandRDMA网络（一种支持多并发链接的技术，具有高带宽、低时延、高扩展性的特点）将物理分散分布的存储资源汇聚在一起，统一提供存储服务，并使用统一全局命名空间来管理数据。

集成部分

Brick（存储块）
指可信主机池中由主机提供的用于物理存储的专用分区，是GlusterFS中的基本存储单元，同时也是可信存储池中服务器上对外提供的存储目录。
存储目录的格式由服务器和目录的绝对路径构成，表示方法为 SERVER:EXPORT，如 192.168.126.10:/data/mydir/。

Volume（逻辑卷）
一个逻辑卷是一组 Brick 的集合。卷是数据存储的逻辑设备，类似于 LVM 中的逻辑卷。大部分 Gluster 管理操作是在卷上进行的。

FUSE
是一个内核模块，允许用户创建自己的文件系统，无须修改内核代码。

VFS
内核空间对用户空间提供的访问磁盘的接口。

Glusterd（后台管理进程）
在存储群集中的每个节点上都要运行。

工作流程

1）客户端或应用程序通过GlusterFS的挂载点访问数据。
2）linux系统内核通过VFS API收到请求并处理。
3）VFS将数据递交给FUSE内核文件系统, FUSE内核文件系统则是将数据通过/dev/fuse设备文件递交给了GlusterFS client端，可以将 FUSE文件系统理解为一个代理。
4）GlusterFS client 收到数据后，client根据配置文件的配置对数据进行处理。
5）通过网络将数据传递至远端的GlusterFS Server，并且将数据写入到服务器存储设备上。

使用方式：

分布式卷(Distributed)：RAID 0，高扩展，但无冗余，如果其中某块硬盘损坏导致其gluster服务不可用，则该盘的数据丢失，但gluster Volume仍可用，默认方式，也是读写效率最高的方式
条带卷：RAID 0，以数据块为单位，因此会对大文件进行拆分读写，对于大文件进行的读取优化，如果出现其中一块盘损坏，则直接整个gluster Volume不可用(新版本已经不支持)
复制卷(Replicated)：RAID 1，拥有容错能力，读性能提高，写性能下降，至少有2块服务器硬盘起步，但空间少一半，高可用，但空间少了，数据量少但需要高可用的场景，但副本数超过两个时，需要配置仲裁器卷防止脑裂
分布式条带卷：结合分布式卷和条带卷的优点，服务器起点数为4台，出现故障则直接不可用
分布式复制卷(Distributed Replicated)：RAID 10，服务器起点数为4台，条带卷和复制卷的结合，高可用，适用于需要高性能与高可靠兼顾的场景
分布式条带复制卷：三种基本卷的复合卷
分散卷(Dispersed)：能在节省空间的同时，提供防止磁盘或服务器故障的保护，比复制卷更高的存储效率且更好的数据容错机制，适用于大量数据且需要容错的场景
分布式分散卷(Distributed Dispersed)：RAID 10，分布式与分散卷的结合，拥有且比Distributed Replicated卷更优的优势，同时应用场景也与Distributed Replicated相同

具体搭建步骤：

节点	目录	挂载点
192.168.40.239	/data	/data
192.168.50.207	/data	/data
192.168.50.208	/data	/data
192.168.40.175	/data	/data
192.168.40.240	/data	/data

生产环境可以通过ansible进行快速搭建：https://github.com/gluster/gluster-ansible
Linux Kernel 参数调优部分（建议在搭建前配置完毕）：https://docs.gluster.org/en/latest/Administrator-Guide/Linux-Kernel-Tuning/#commentbengland_1

# 关闭防火墙，后续搭建完后基于端口重新开放（24007:24008，49152:49156）
systemctl stop firewalld
# 关闭selinux
setenforce 0
# 所有节点都配置好/data目录
mkdir /data
# 此步骤可省略，此部分为每台机最好空闲一块全新的硬盘进行操作，避免对原有数据进行干扰：
# mkfs.xfs -i size=512 /dev/sdb1 # 此部分也可以是mkfs.ext4，区别在于，磁盘空间小用ext4，磁盘空间大用xfs
# mkdir -p /data/brick1
# echo '/dev/sdb1 /data/brick1 xfs defaults 1 2' >> /etc/fstab
# mount -a && mount
# 配置好内部的dns
echo "192.168.40.239 node1" >> /etc/hosts
echo "192.168.50.207 node2" >> /etc/hosts
echo "192.168.50.208 node3" >> /etc/hosts
echo "192.168.40.175 node4" >> /etc/hosts
echo "192.168.40.240 node5" >> /etc/hosts
echo "192.168.50.177 node6" >> /etc/hosts
# 此处为了方便，应该配置一台机子为manager，然后部署好ssh免密
# ssh-keygen -t rsa -q -P "秘钥密码" -f ~/.ssh/id_rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.50.207
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.50.208
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.40.175
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.40.240
# 需要提前准备好的内核模块
lsmod | grep -q fuse || modprobe fuse
# 预下载的准备
yum -y install openssh-server wget fuse fuse-libs openmpi libibverbs
# 分别安装glusterfs
dnf install centos-release-gluster9 -y
dnf install -y glusterfs glusterfs-api glusterfs-fuse glusterfs-rdma glusterfs-libs glusterfs-server
# 配置为开机自启,并启动
systemctl enable glusterfsd.service --now
systemctl enable glusterd.service --now
# 在manager机子上进行运行，因为glusterfs是以端对端的方式建立联系
gluster peer probe node2
gluster peer probe node3
gluster peer probe node4
gluster peer probe node5
gluster peer probe node6
# 此时需要通过任一一台机子将manager也增加进pool中
gluster peer probe node1
# 此时可以查看目前有哪些节点在gluster 池中
gluster pool list
# 可以通过gluster peer status 查看连接状态（每个节点上查看会略有不同，但数量是一致的）
# 每个节点进行 GlusterFS 卷的配置
# 后续配置进volume的节点均需要配置
# mkdir -p /data/glusterfs/(卷名)/brick1(磁盘块数)
mkdir -p /data/glusterfs/online-share/brick1

# 以下在任意位置上执行即可
## 部署分布式卷
gluster volume create online-share transport tcp node1:/data/glusterfs/online-share/brick1 node2:/data/glusterfs/online-share/brick1 node3:/data/glusterfs/online-share/brick1 node4:/data/glusterfs/online-share/brick1

## 部署复制卷
# 2个卷
gluster volume create online-share replica 2 transport tcp node1:/data/glusterfs/online-share/brick1 node2:/data/glusterfs/online-share/brick1
# 3个卷
gluster volume create online-share replica 3 arbiter 1 transport tcp node1:/data/glusterfs/online-share/brick1 node2:/data/glusterfs/online-share/brick1 node3:/data/glusterfs/online-share/brick1
# 4个卷
gluster volume create online-share replica 4 transport tcp node1:/data/glusterfs/online-share/brick1 node2:/data/glusterfs/online-share/brick1 node3:/data/glusterfs/online-share/brick1 node4:/data/glusterfs/online-share/brick1

## 部署分布式复制卷
# 4个卷
gluster volume create online-share replica 2 transport tcp node1:/data/glusterfs/online-share/brick1 node2:/data/glusterfs/online-share/brick1 node3:/data/glusterfs/online-share/brick1 node4:/data/glusterfs/online-share/brick1
# 6个卷
gluster volume create online-share replica 2 transport tcp node1:/data/glusterfs/online-share/brick1 node2:/data/glusterfs/online-share/brick1 node3:/data/glusterfs/online-share/brick1 node4:/data/glusterfs/online-share/brick1 node5:/data/glusterfs/online-share/brick1 node6:/data/glusterfs/online-share/brick1

## 部署分散卷
# 可用空间计算公式：<Usable size> = <Brick size> * (#Bricks - Redundancy)，即每块容量 * （总硬盘数 - 冗余磁盘数），荣誉磁盘数==可损坏的磁盘数
# 4个卷，不指定redundancy则会自动计算最佳值,取值如果等于1/2的话，复制卷的效率更高，但有坑，就是如果要删除卷的时候务必保证偶数（4+2之类的）
gluster volume create online-share disperse 4 [redundancy 1-2] transport tcp node1:/data/glusterfs/online-share/brick1 node2:/data/glusterfs/online-share/brick1 node3:/data/glusterfs/online-share/brick1 node4:/data/glusterfs/online-share/brick1

## 部署分布式分散卷
# disperse 必须配置 
# redundancy == disperse
gluster volume create online-share disperse 3 transport tcp node1:/data/glusterfs/online-share/brick1 node2:/data/glusterfs/online-share/brick1 node3:/data/glusterfs/online-share/brick1 node4:/data/glusterfs/online-share/brick1 node5:/data/glusterfs/online-share/brick1 node6:/data/glusterfs/online-share/brick1

# 启动卷
gluster volume start online-share
# 查看gluster有哪些卷
gluster volume list
# 查看卷的详细信息
gluster volume info online-share
# 暂停该卷的使用步骤
# 所有节点关闭该卷的挂载点
umount mount-point
gluster volume stop online-share
# 修改卷的配置
gluster volume set online-share config.transport tcp,rdma OR tcp OR rdma
# 更改挂载方式
mount -t glusterfs -o transport=rdma node1:/data/glusterfs/online-share/brick1 /mnt/glusterfs

# 扩容：
# 新增节点（新增磁盘也类似的操作）：
gluster peer probe newNode
gluster volume add-brick online-share newNode:/data/glusterfs/online-share/brick1
# 缩容（以分散卷为例子）：
gluster volume heal online-share info # 查看是否有自愈进程在运行
# 因为此处是4块硬盘并配置了自动计算redundancy，所以必须要满足4+2才可以进行缩容
gluster volume remove-brick online-share node4:/data/glusterfs/online-share/brick1 start -> commit
# 可选，扩缩容之后建议进行rebalance的配置，进行磁盘rebalance
gluster volume rebalance online-share [fix-layout] start [force]

# 基于分散卷在创建完之后的数量无法进行变更，因此只能通过新旧替换的方式进行扩缩容
gluster volume replace-brick online-share node5:/data/glusterfs/online-share/brick1 node6:/data/glusterfs/online-share/brick1 commit force

# 配置volume 的acl访问
gluster volume set online-share auth.allow * # 此处可以是具体的ip

# 允许 volume 当某块brick不在线时也不会影响到客户端的挂载，提高了高可用性，但增加了数据不一致的可能性（不建议使用，会导致脑裂）
# 当大多数卷可用时，服务才可用
gluster volume set online-share cluster.server-quorum-type none
# 当执行写操作的时候，需要得到大多数副本节点的可用才执行
gluster volume set online-share cluster.quorum-type none

# 实际使用：
gluster volume create online-share disperse 4 transport tcp node1:/data/glusterfs/online-share/brick1 node2:/data/glusterfs/online-share/brick1 node3:/data/glusterfs/online-share/brick1 node4:/data/glusterfs/online-share/brick1 force

具体使用步骤：

注意，因为glusterfs卷集群中任一节点已经包含整个集群的信息，所以只需要访问任一一个即可访问卷集群，但会出现访问单点故障问题：
1、外层套一层DNS轮询，但会有DNS缓存与更新的延迟问题
2、在外层增加一个负载均衡器，比如HAProxy等
3、客户端进行所有节点的配置，例：mount -t glusterfs node1,node2,node3:/data/glusterfs/online-share/brick1 /mnt/glusterfs
参考文档：https://ruan.dev/blog/2019/03/05/setup-a-3-node-replicated-storage-volume-with-glusterfs

普通linux节点：

# 需要安排上glusterfs-client
yum -y install openssh-server wget fuse fuse-libs openmpi libibverbs
dnf install centos-release-gluster9 -y
dnf install -y glusterfs glusterfs-api glusterfs-fuse glusterfs-rdma glusterfs-libs glusterfs-cli glusterfs-client-xlators
for i in openssh-server wget fuse fuse-libs openmpi libibverbs centos-release-gluster9;do ansible hosts -m yum -a 'name='"${i}"' state=present';done
for i in glusterfs glusterfs-api glusterfs-fuse glusterfs-rdma glusterfs-libs glusterfs-cli glusterfs-client-xlators;do ansible hosts -m yum -a 'name='"${i}"' state=present';done
# 客户端也需要配置好这些节点
echo "192.168.40.239 node1" >> /etc/hosts
echo "192.168.50.207 node2" >> /etc/hosts
echo "192.168.50.208 node3" >> /etc/hosts
echo "192.168.40.175 node4" >> /etc/hosts
echo "192.168.40.240 node5" >> /etc/hosts
echo "192.168.50.177 node6" >> /etc/hosts
ansible hosts -m shell -a 'grep -q "192.168.40.239 node1" /etc/hosts || echo "192.168.40.239 node1" >> /etc/hosts'
ansible hosts -m shell -a 'grep -q "192.168.50.207 node2" /etc/hosts || echo "192.168.50.207 node2" >> /etc/hosts'
ansible hosts -m shell -a 'grep -q "192.168.50.208 node3" /etc/hosts || echo "192.168.50.208 node3" >> /etc/hosts'
ansible hosts -m shell -a 'grep -q "192.168.40.175 node4" /etc/hosts || echo "192.168.40.175 node4" >> /etc/hosts'
ansible hosts -m shell -a 'grep -q "192.168.40.240 node5" /etc/hosts || echo "192.168.40.240 node5" >> /etc/hosts'
ansible hosts -m shell -a 'grep -q "192.168.50.177 node6" /etc/hosts || echo "192.168.50.177 node6" >> /etc/hosts'
# 如果曾经挂载过，需要先 systemctl daemon-reload
mount -t glusterfs node1:online-share /mnt/glusterfs
# 配置开机启动：/etc/fstab：node1:online-share /mnt/glusterfs glusterfs defaults,_netdev,direct-io-mode=enable,backup-volfile-servers=192.168.40.239:192.168.50.207:192.168.50.208:192.168.50.177 0 0
ansible hosts -m shell -a '( df -Th | grep -q "/mnt/glusterfs" && cat /etc/fstab | grep -q "/mnt/glusterfs" ) || (echo "node1:online-share /mnt/glusterfs glusterfs defaults,_netdev,direct-io-mode=enable,backup-volfile-servers=node1:node2:node3:node6 0 0" >> /etc/fstab && systemctl daemon-reload && mkdir -p /mnt/glusterfs && mount -a)'

在操作过程中的误操作记录：

# 以下命令有大坑，导致/etc/fstab被清空了，如果这时候系统发生重启会非常危险
# ansible hosts -m shell -a 'awk "!seen[$0]++" /etc/fstab > /etc/fstab && mount -a'
# 假设重启后，需要的配置：
mount -o remount,rw /
systemctl restart NetworkManager
# 以上即可恢复network
# 恢复/etc/fstab
#!/bin/bash

findmnt --real > mounted_info.txt
blkid > partition_info.txt
sed '1d' mounted_info.txt | while read line; do
    location=$(echo -n "${line}" | awk '{print $1}')
    source=$(echo -n "${line}" | awk '{print $2}')
    fstype=$(echo -n "${line}" | awk '{print $3}')
    echo "${location}" | grep -q "├"
    if [[ $? -eq 0 ]];then
        location=$(echo -n "${location}" | awk -F "─" '{print $NF}')
        cat /etc/fstab | grep -q "${location}"
        [[ $? -eq 0 ]] || {
            if [[ "${location}" == "/boot" ]];then
                fsid=$(cat partition_info.txt | grep "${source}" | awk -F "\"" '{print $2}')
                echo "UUID=${fsid} ${location} ${fstype} defaults 0 0" >> /etc/fstab && systemctl daemon-reload && mount -a
            elif [[ "${location}" == "/mnt/glusterfs" ]];then
                echo "node1:online-share /mnt/glusterfs glusterfs defaults,_netdev,direct-io-mode=enable,backup-volfile-servers=node1:node2:node3:node6 0 0" >> /etc/fstab && systemctl daemon-reload && mkdir -p /mnt/glusterfs && mount -a
            else
                pass
            fi
        }
    else
        echo "${location}" | grep -q "└"
        if [[ $? -eq 0 ]];then
        location=$(echo -n "${location}" | awk -F "─" '{print $NF}')
        cat /etc/fstab | grep -q "${location}"
        [[ $? -eq 0 ]] || {
            if [[ "${location}" == "/boot" ]];then
                fsid=$(cat partition_info.txt | grep "${source}" | awk -F "\"" '{print $2}')
                echo "UUID=${fsid} ${location} ${fstype} defaults 0 0" >> /etc/fstab && systemctl daemon-reload && mount -a
            elif [[ "${location}" == "/mnt/glusterfs" ]];then
                echo "node1:online-share /mnt/glusterfs glusterfs defaults,_netdev,direct-io-mode=enable,backup-volfile-servers=node1:node2:node3:node6 0 0" >> /etc/fstab && systemctl daemon-reload && mkdir -p /mnt/glusterfs && mount -a
            else
                pass
            fi
        }
        else
            (cat /etc/fstab | grep -q "${source}") || (echo "${source} ${location} ${fstype} defaults 0 0" >> /etc/fstab && systemctl daemon-reload && mount -a)
        fi
    fi
done
# 此处 df -Th
ansible hosts -m shell -a "ls -lh /etc/fstab"
ansible hosts -m copy -a "src=/root/demo.sh dest=/root/ force=yes owner=root group=root mode=644"
ansible hosts -m shell -a "/bin/bash /root/demo.sh"

docker swarm：

# 实际使用体验跟直接挂载到linux上差不多，因为需要额外安装相应的plugin且需要手动在对应的节点上创建好volume
# 目前trajano这个插件已经不更新了，但仍能继续使用，并且已经非docker swarm集成，需要每个节点都需要安装（插件需要梯子）
docker plugin install --alias glusterfs trajano/glusterfs-volume-plugin --grant-all-permissions --disable
docker plugin set glusterfs SERVERS=192.168.40.239,192.168.50.207,192.168.50.208,192.168.40.240
docker plugin enable glusterfs # 启用这个插件
# docker plugin inspect glusterfs # 可以查看具体的细节
# 上面这个插件似乎已经无法使用,因此更换另一个插件
docker plugin install --alias glusterfs mikebarkmin/glusterfs SERVERS=192.168.40.239,192.168.50.207,192.168.50.208,192.168.40.240 VOLNAME=online-share
docker volume create -d glusterfs -o servers=192.168.40.239,192.168.50.207,192.168.50.208,192.168.40.240 -o volname=online-share -o subdir=/data --scope multi --sharing all glustervolume
# 然后直接挂载使用即可
services:
  ...:
    ...
    volumes:
      - glustervolume:/data
volumes:
  glustervolume:
    driver: glusterfs
    name: "glustervolume"

kubernetes

有以下几种使用方式：
1、通过 Heketi 管理 GlusterFS，Kubernetes 调用 Heketi 的接口（可以动态的扩展存储）
以下两种方式需要对所有节点进行配置，可能会遇到某些节点无法这样配置
2、GlusterFS 结合 NFS-Ganesha 提供 NFS 存储，Kubernetes 采用 NFS 的方式挂载
3、Kubernetes 挂载 GlusterFS 提供的数据卷到本地的存储目录，Kubernetes 采用 hostpatch 的方式
4、Container Storage Interface (CSI) volume plugins（更符合标准规范，可能是更好的选择，但无法自动扩展）

# 后面三种方式需要给所有的节点都需要安排上glusterfs-client，否则无法使用
yum -y install openssh-server wget fuse fuse-libs openmpi libibverbs
dnf install centos-release-gluster9 -y
dnf install -y glusterfs glusterfs-api glusterfs-fuse glusterfs-rdma glusterfs-libs glusterfs-cli glusterfs-client-xlators
# 然后参考github上的配置：https://github.com/rootsongjc/kubernetes-handbook/tree/master/manifests/glusterfs

# 也可以通过Heketi集群进行自动化的glusterfs的api式管理（但此方式是基于heketi对一些全新的glusterfs节点进行自动配置（无需提前配置glusterfs））

# glusterfs卷的每个节点均需要允许kubernetes访问的iptables
iptables -N HEKETI
iptables -A HEKETI -p tcp -m state --state NEW -m tcp --dport 24007 -j ACCEPT
iptables -A HEKETI -p tcp -m state --state NEW -m tcp --dport 24008 -j ACCEPT
iptables -A HEKETI -p tcp -m state --state NEW -m tcp --dport 2222 -j ACCEPT
iptables -A HEKETI -p tcp -m state --state NEW -m multiport --dports 49152:49251 -j ACCEPT
service iptables save

# 需要提前先准备好内核模块（heketi要求）
# 检查 lsmod |  egrep 'dm_snapshot|dm_mirror|dm_thin_pool'
modprobe dm_snapshot
modprobe dm_mirror
modprobe dm_thin_pool
# 对应的ansible剧本为 
for i in dm_snapshot dm_mirror dm_thin_pool;do ansible nodes -m command -a 'modprobe '"$i"'';done

# 需要确认好的sshd（否则创建heketi集群时会报错）
echo "PubkeyAcceptedKeyTypes=+ssh-rsa" >> /etc/ssh/sshd_config
echo "HostKeyAlgorithms=+ssh-rsa" >> /etc/ssh/sshd_config
systemctl restart sshd
ansible nodes -m shell -a 'echo "PubkeyAcceptedKeyTypes=+ssh-rsa" >> /etc/ssh/sshd_config;echo "HostKeyAlgorithms=+ssh-rsa" >> /etc/ssh/sshd_config;systemctl restart sshd'

# 以下是配置Heketi集群，与glusterfs配置在相同节点上形成集群
wget https://github.com/heketi/heketi/releases/download/v10.4.0/heketi-v10.4.0-release-10.linux.amd64.tar.gz
tar -zxvf heketi-v10.4.0-release-10.linux.amd64.tar.gz
cp heketi/{heketi,heketi-cli} /usr/bin/

# 因为heketi不会使用root进行操作
useradd -d /var/lib/heketi -s /sbin/nologin heketi
ssh-keygen -N '' -t rsa -q -f /etc/heketi/heketi_key
chown -R heketi.heketi /etc/heketi
ssh-copy-id -i /etc/heketi/heketi_key root@node1
ssh-copy-id -i /etc/heketi/heketi_key root@node2
ssh-copy-id -i /etc/heketi/heketi_key root@node3
ssh-copy-id -i /etc/heketi/heketi_key root@node6

# heketi的相关配置
mkdir -p /etc/heketi
cat << EOF > /etc/heketi/heketi.json
{
  "_port_comment": "Heketi Server Port Number",
  "port": "18080",

	"_enable_tls_comment": "Enable TLS in Heketi Server",
	"enable_tls": false,

	"_cert_file_comment": "Path to a valid certificate file",
	"cert_file": "",

	"_key_file_comment": "Path to a valid private key file",
	"key_file": "",


  "_use_auth": "Enable JWT authorization. Please enable for deployment",
  "use_auth": true,

  "_jwt": "Private keys for access",
  "jwt": {
    "_admin": "Admin has access to all APIs",
    "admin": {
      "_key_comment": "Set the admin key in the next line",
      "key": "admin@P@88W0rd"
    },
    "_user": "User only has access to /volumes endpoint",
    "user": {
      "_key_comment": "Set the user key in the next line",
      "key": "user@P@88W0rd"
    }
  },

  "_backup_db_to_kube_secret": "Backup the heketi database to a Kubernetes secret when running in Kubernetes. Default is off.",
  "backup_db_to_kube_secret": false,

  "_profiling": "Enable go/pprof profiling on the /debug/pprof endpoints.",
  "profiling": false,

  "_glusterfs_comment": "GlusterFS Configuration",
  "glusterfs": {
    "_executor_comment": [
      "Execute plugin. Possible choices: mock, ssh",
      "mock: This setting is used for testing and development.",
      "      It will not send commands to any node.",
      "ssh:  This setting will notify Heketi to ssh to the nodes.",
      "      It will need the values in sshexec to be configured.",
      "kubernetes: Communicate with GlusterFS containers over",
      "            Kubernetes exec api."
    ],
    "executor": "ssh",

    "_sshexec_comment": "SSH username and private key file information",
    "sshexec": {
      "keyfile": "/etc/heketi/heketi_key",
      "user": "root",
      "port": "22",
      "fstab": "/etc/fstab"
    },

    "_db_comment": "Database file name",
    "db": "/var/lib/heketi/heketi.db",

     "_refresh_time_monitor_gluster_nodes": "Refresh time in seconds to monitor Gluster nodes",
    "refresh_time_monitor_gluster_nodes": 120,

    "_start_time_monitor_gluster_nodes": "Start time in seconds to monitor Gluster nodes when the heketi comes up",
    "start_time_monitor_gluster_nodes": 10,

    "_loglevel_comment": [
      "Set log level. Choices are:",
      "  none, critical, error, warning, info, debug",
      "Default is warning"
    ],
    "loglevel" : "warning"
  }
}
EOF

# 以systemd服务方式启动heketi
cat << EOF > /usr/lib/systemd/system/heketi.service
[Unit]
Description=Heketi Server

[Service]
Type=simple
WorkingDirectory=/var/lib/heketi
User=heketi
ExecStart=/usr/bin/heketi --config=/etc/heketi/heketi.json
Restart=on-failure
StandardOutput=syslog
StandardError=syslog

[Install]
WantedBy=multi-user.target

EOF
systemctl enable heketi --now
systemctl status heketi -l # 检查heketi服务状态

# 以配置文件创建heketi集群
cat << EOF > /etc/heketi/topology.json
{
  "clusters": [
    {
      "nodes": [
        {
          "node": {
            "hostnames": {
              "manage": [
                "node1"
              ],
              "storage": [
                "192.168.40.239"
              ]
            },
            "zone": 1
          },
          "devices": [
            {
              "name": "/dev/vdb",
              "destroydata": false
            },
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "node2"
              ],
              "storage": [
                "192.168.50.207"
              ]
            },
            "zone": 1
          },
          "devices": [
            {
              "name": "/dev/vdb",
              "destroydata": false
            },
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "node3"
              ],
              "storage": [
                "192.168.50.208"
              ]
            },
            "zone": 1
          },
          "devices": [
            {
              "name": "/dev/vdb",
              "destroydata": false
            },
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "node6"
              ],
              "storage": [
                "192.168.50.177"
              ]
            },
            "zone": 1
          },
          "devices": [
            {
              "name": "/dev/vdb",
              "destroydata": false
            },
          ]
        }
      ]
    }
  ]
}
EOF

# 在heketi-manager机器上进行配置
heketi-cli --server http://192.168.40.239:18080 --user admin --secret admin@P@88W0rd topology load --json=/etc/heketi/topology.json
# 此alias选做
echo "alias heketi-cli='heketi-cli --server http://192.168.40.239:18080 --user admin --secret admin@P@88W0rd'" >> ~/.bashrc
heketi-cli cluster list # 查看集群信息，获取集群信息，给后续的k8s配置sc使用

# kubernetes中使用
# 所有节点仍然要进行基础配置，否则无法使用该存储
yum -y install openssh-server wget fuse fuse-libs openmpi libibverbs
dnf install centos-release-gluster9 -y
dnf install -y glusterfs glusterfs-api glusterfs-fuse glusterfs-rdma glusterfs-libs glusterfs-cli glusterfs-client-xlators

heketiSecret=$(echo -n "admin@P@88W0rd" | base64)
cat << EOF > /etc/heketi/heketi-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: heketi-secret
  namespace: kube-system
data:
  key: ${heketiSecret}
type: kubernetes.io/glusterfs
EOF

kubectl apply -f heketi-secret.yaml

cat << EOF > /etc/heketi/heketi-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: glusterfs
  namespace: kube-system
parameters:
  resturl: "http://192.168.40.239:18080"
  clusterid: "9ad37206ce6575b5133179ba7c6e0935"
  restauthenabled: "true" 
  restuser: "admin"
  secretName: "heketi-secret"
  secretNamespace: "kube-system"
  volumetype: "replicate:3" # 副本卷 3副本 # disperse 4 2 分散卷 4Data 2冗余 # none 条带卷
provisioner: kubernetes.io/glusterfs
reclaimPolicy: Delete # Retain 保留，Recycle 回收，Delete 删除
EOF

kubectl apply -f heketi-storageclass.yaml

cat << EOF > /etc/heketi/heketi-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: heketi-pvc
  annotations:
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/glusterfs
spec:
  storageClassName: "glusterfs"
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
EOF

kubectl apply -f heketi-pvc.yaml

问题处理：

1、glusterfs分区能被成功挂载后，可以正常创建文件,可以查询文件内容，往文件内添加修改内容，不会影响到该目录的任何操作
详情可以参考：https://www.cnblogs.com/wiseo/p/13035886.html

1
2
3

# 先暂停使用
ansible hosts -m shell -a "exit $(lsof /mnt/glusterfs)"
ansible hosts -m shell -a 'df -Th | grep -q "/mnt/glusterfs" && umount /mnt/glusterfs'

查看脑裂信息

1	gluster volume heal <VOLNAME> info

当调用此命令时，将生成一个glfsheal进程，该进程将读取//.glusterfs/indices/下的各个子目录中（它可以连接到的）所有brick条目;
这些条目是需要修复文件的gfid;
一旦从一个brick中获得GFID条目，就根据该文件在副本集和trusted.afr.*扩展属性的每个brick上进行查找，确定文件是否需要修复，是否处于脑裂或其他状态。

其中文件的状态：
Is in split-brain：该文件或文件夹需要进行修复，否则无法自愈
Is possibly undergoing heal：已经被锁定的文件，正检查是否需要修复

[root@node-08 glusterfs]# gluster volume heal online-share info 
Brick node1:/data/glusterfs/online-share/brick1
Status: Connected
Number of entries: 0

Brick node2:/data/glusterfs/online-share/brick1
/ 
Status: Connected
Number of entries: 1

Brick node3:/data/glusterfs/online-share/brick1
/ 
Status: Connected
Number of entries: 1

Brick node6:/data/glusterfs/online-share/brick1
Status: Connected
Number of entries: 0

# 如果出现了文件损坏，则一般会带着split-brain，不戴则表示某些文件出现了脑裂

修复脑裂：

# 修复可以直接修复的文件
gluster volume heal <VOLNAME> info split-brain
# 处于脑裂状态的文件不可以被修复，只能通过其他方式
# 目前是目录脑裂，故从目录脑裂中入手，后续如果文件脑裂，则添加文件脑裂修复部分
# 很神奇，只需要取消文件挂载，就会自动修改了，实际上是上述的两个配置导致在使用过程中出现脑裂，故不建议修改配置