Day: December 13, 2024

How to Deploy Lustre with ZFS Backend (RDMA, ACLs, Nodemaps, Clients

 

This step-by-step guide walks you through deploying a production-ready Lustre filesystem backed by ZFS, including RDMA networking, MDT/OST setup, nodemaps, ACL configuration, and client mounting. This guide assumes:

  • MGS + MDS on one node
  • One or more OSS nodes
  • Clients mounting over RDMA (o2ib)
  • ZFS as the backend filesystem

0. Architecture & Assumptions

  • Filesystem name: lustrefs
  • MGS/MDS RDMA IP: 172.16.0.10
  • OSS RDMA IP: 172.16.0.20
  • Client RDMA IP: 172.16.0.30
  • RDMA interface: ib0
  • Network type: o2ib

1. Manager Server Setup (MGS + MDS with ZFS)

1.1 Install ZFS and Lustre MDS packages

sudo apt update
sudo apt install -y zfsutils-linux
sudo apt install -y lustre-osd-zfs-mount lustre-utils

1.2 Create a ZFS pool for MDT

sudo zpool create mdtpool mirror /dev/nvme0n1 /dev/nvme1n1 ashift=12
sudo zfs create -o recordsize=4K -o primarycache=metadata mdtpool/mdt0

1.3 Format MDT & enable MGS

sudo mkfs.lustre \
  --fsname=lustrefs \
  --mgs \
  --mdt \
  --index=0 \
  --backfstype=zfs mdtpool/mdt0

1.4 Mount MDT

sudo mkdir -p /mnt/mdt0
sudo mount -t lustre mdtpool/mdt0 /mnt/mdt0

2. RDMA + LNET Configuration (All Nodes)

2.1 Install RDMA core utilities

sudo apt install -y rdma-core

2.2 Bring up the RDMA interface

sudo ip addr add 172.16.0.10/24 dev ib0
sudo ip link set ib0 up

2.3 Configure LNET to use o2ib

Create /etc/modprobe.d/lustre.conf:

options lnet networks="o2ib(ib0)"

Load and enable LNET

sudo modprobe lnet
sudo systemctl enable lnet
sudo systemctl start lnet
sudo lctl list_nids

3. OFED / Mellanox Optional Performance Tuning

These settings are optional but recommended for high-performance Lustre deployments using Mellanox or OFED-based InfiniBand hardware.

3.1 Relevant config locations

  • /etc/infiniband/*
  • /etc/modprobe.d/mlx5.conf
  • /etc/security/limits.d/rdma.conf
  • /etc/sysctl.conf (MTU, hugepages, buffers)
  • /etc/rdma/modules/

3.2 Increase RDMA MTU (InfiniBand)

sudo ip link set ib0 mtu 65520

3.3 Increase RDMA network buffers

echo 262144 | sudo tee /proc/sys/net/core/rmem_max
echo 262144 | sudo tee /proc/sys/net/core/wmem_max

These settings improve performance when using high-speed links (56Gb, 100Gb, HDR100, etc.).


4. OSS Node Setup (ZFS + OSTs)

4.1 Install ZFS + Lustre OSS components

sudo apt update
sudo apt install -y zfsutils-linux lustre-osd-zfs-mount lustre-utils

4.2 Create an OST ZFS pool

sudo zpool create ostpool raidz2 \
    /dev/sdc /dev/sdd /dev/sde /dev/sdf ashift=12

sudo zfs create -o recordsize=1M ostpool/ost0

4.3 Format OST using RDMA to MGS

sudo mkfs.lustre \
  --fsname=lustrefs \
  --ost \
  --index=0 \
  --mgsnode=172.16.0.10@o2ib \
  --backfstype=zfs ostpool/ost0

4.4 Mount OST

sudo mkdir -p /mnt/ost0
sudo mount -t lustre ostpool/ost0 /mnt/ost0

5. Client Node Setup

5.1 Install Lustre client packages

sudo apt update
sudo apt install -y lustre-client-modules-$(uname -r) lustre-utils

If you setup MGS/OSS and client correctly when you mount from the client 




5.2 Configure RDMA + LNET (same as above)

sudo ip addr add 172.16.0.30/24 dev ib0
sudo ip link set ib0 up

echo 'options lnet networks="o2ib(ib0)"' | sudo tee /etc/modprobe.d/lustre.conf

sudo modprobe lnet
sudo systemctl start lnet
sudo lctl list_nids

6. How to Get Lustre Target Names

List OSTs

lfs osts

List MDTs

lfs mdts

List all targets and connections

lctl dl

Check space and OST availability

lfs df -h

7. Nodemap Configuration (Access Control)

7.1 Create and enable default nodemap

sudo lctl nodemap_add default
sudo lctl nodemap_modify default --property enable=1
sudo lctl nodemap_modify default --property map_mode=identity

7.2 Restrict access to an RDMA subnet

sudo lctl nodemap_modify default --add ranges=172.16.0.0@o2ib/24

7.3 Make a subnet read-only (optional)

sudo lctl nodemap_modify default --property readonly=true

8. ACL Configuration (ZFS + Lustre)

8.1 Enable ACL support in ZFS (MDT)

sudo zfs set acltype=posixacl mdtpool/mdt0
sudo zfs set xattr=sa mdtpool/mdt0
sudo zfs set compression=off mdtpool/mdt0

8.2 Enable ACLs in Lustre

sudo lctl set_param mdt.*.enable_acls=1

8.3 Use ACLs from clients

sudo setfacl -m u:alice:rwx /mnt/lustre/data
getfacl /mnt/lustre/data

9. Mounting Lustre on Clients (Over RDMA)

9.1 Mount command

sudo mkdir -p /mnt/lustre

sudo mount -t lustre \
  172.16.0.10@o2ib:/lustrefs \
  /mnt/lustre

example without ibnetwork
[root@vbox ~]# mount -t lustre 192.168.50.5@tcp:/lustre /mnt/lustre-client
[root@vbox ~]# 
[root@vbox ~]# # Verify the mount worked
[root@vbox ~]# df -h /mnt/lustre-client
Filesystem                Size  Used Avail Use% Mounted on
192.168.50.5@tcp:/lustre   12G  2.5M   11G   1% /mnt/lustre-client
[root@vbox ~]# lfs df -h
UUID                       bytes        Used   Available Use% Mounted on
lustre-MDT0000_UUID         4.5G        1.9M        4.1G   1% /mnt/lustre-client[MDT:0]
lustre-OST0000_UUID         7.5G        1.2M        7.0G   1% /mnt/lustre-client[OST:0]
lustre-OST0001_UUID         3.9G        1.2M        3.7G   1% /mnt/lustre-client[OST:1]
filesystem_summary:        11.4G        2.4M       10.7G   1% /mnt/lustre-client

9.2 Verify the mount

df -h /mnt/lustre
lfs df -h

9.3 Persistent fstab entry

172.16.0.10@o2ib:/lustrefs  /mnt/lustre  lustre  _netdev,defaults  0 0

10. Summary of the Correct Order

  1. Install ZFS + Lustre on MGS/MDS
  2. Create MDT ZFS dataset & format MDT+MGS
  3. Configure RDMA + LNET
  4. Apply optional OFED/Mellanox tuning
  5. Install ZFS + Lustre on OSS, create OSTs
  6. Format and mount OSTs
  7. Install Lustre client packages
  8. Mount client via RDMA
  9. Retrieve target names (OST/MDT)
  10. Configure nodemaps
  11. Configure ACLs

Final Notes

You now have a complete ZFS-backed Lustre filesystem with RDMA transport, OFED/Mellanox tunings, ACLs, and nodemaps. This layout provides parallel filesystem HIGH grade performance and clean scalability.

Note: I have also created a ansible-playbook that can deploy this across clients and test everything; its currently not a public repo; email me at support@nicktailor.com. If you like to hire me to set it up.

├── inventory/

│   └── hosts.yml              # Inventory file with host definitions

├── group_vars/

│   └── all.yml                # Global variables

├── roles/

│   ├── infiniband/            # InfiniBand/RDMA setup

│   ├── zfs/                   # ZFS installation and configuration

│   ├── lustre_mgs_mds/        # MGS/MDS server setup

│   ├── lustre_oss/            # OSS server setup

│   ├── lustre_client/         # Client setup

│   ├── lustre_nodemaps/       # Nodemap configuration

│   └── lustre_acls/           # ACL configuration

├── site.yml                   # Main deployment playbook

├── test_connectivity.yml      # Connectivity testing playbook

└── README.md                  # This file

0