Day: November 12, 2024
Deploying Lustre File System with RDMA, Node Maps, and ACLs
Lustre is the de facto parallel file system for high-performance computing (HPC) clusters, providing extreme scalability, high throughput, and low-latency access across thousands of nodes. This guide walks through a complete deployment of Lustre using RDMA over InfiniBand for performance, along with Node Maps for client access control and ACLs for fine-grained permissions.
1. Understanding the Lustre Architecture
Lustre separates metadata and data services into distinct roles:
- MGS (Management Server) – Manages Lustre configuration and coordinates cluster services.
- MDT (Metadata Target) – Stores file system metadata (names, permissions, directories).
- OST (Object Storage Target) – Stores file data blocks.
- Clients – Mount and access the Lustre file system for I/O.
The typical architecture looks like this:
+-------------+ +-------------+
| Client 1 | | Client 2 |
| /mnt/lustre | | /mnt/lustre |
+------+------+ +------+------+
| |
+--------o2ib RDMA-------+
|
+-------+-------+
| OSS/OST |
| (Data I/O) |
+-------+-------+
|
+-------+-------+
| MGS/MDT |
| (Metadata) |
+---------------+
2. Prerequisites and Environment
| Component | Requirements |
|---|---|
| OS | RHEL / Rocky / AlmaLinux 8.x or higher |
| Kernel | Built with Lustre and OFED RDMA modules |
| Network | InfiniBand fabric (Mellanox or compatible) |
| Lustre Version | 2.14 or later |
| Devices | Separate block devices for MDT, OST(s), and client mount |
3. Install Lustre Packages
On MGS, MDT, and OSS Nodes:
dnf install -y lustre kmod-lustre lustre-osd-ldiskfs
On Client Nodes:
dnf install -y lustre-client kmod-lustre-client
4. Configure InfiniBand and RDMA (o2ib)
InfiniBand provides the lowest latency for Lustre communication via RDMA. Configure the o2ib network type for Lustre.
1. Install and verify InfiniBand stack
dnf install -y rdma-core infiniband-diags perftest libibverbs-utils
systemctl enable --now rdma
ibstat
2. Configure IB network
nmcli con add type infiniband ifname ib0 con-name ib0 ip4 10.0.0.1/24
nmcli con up ib0
3. Verify RDMA link
ibv_devinfo
ibv_rc_pingpong -d mlx5_0
4. Configure LNET for o2ib
Create /etc/modprobe.d/lustre.conf with:
options lnet networks="o2ib(ib0)"
modprobe lnet
lnetctl lnet configure
lnetctl net add --net o2ib --if ib0
lnetctl net show
Expected output:
net:
- net type: o2ib
interfaces:
0: ib0
5. Format and Mount Lustre Targets
Metadata Server (MGS + MDT)
mkfs.lustre --fsname=lustrefs --mgs --mdt --index=0 /dev/sdb
mount -t lustre /dev/sdb /mnt/mdt
Object Storage Server (OSS)
mkfs.lustre --fsname=lustrefs --ost --index=0 --mgsnode=<MGS>@o2ib /dev/sdc
mount -t lustre /dev/sdc /mnt/ost
Client Node
mount -t lustre <MGS>@o2ib:/lustrefs /mnt/lustre
sudo mkdir -p /mnt/lustre
sudo mount -t lustre \
172.16.0.10@o2ib:/lustrefs \
/mnt/lustre
example without ibnetwork
[root@vbox ~]# mount -t lustre 172.16.0.10@tcp:/lustre /mnt/lustre-client
[root@vbox ~]#
[root@vbox ~]# # Verify the mount worked
[root@vbox ~]# df -h /mnt/lustre-client
Filesystem Size Used Avail Use% Mounted on
172.16.0.10@tcp:/lustre 12G 2.5M 11G 1% /mnt/lustre-client
[root@vbox ~]# lfs df -h
UUID bytes Used Available Use% Mounted on
lustre-MDT0000_UUID 4.5G 1.9M 4.1G 1% /mnt/lustre-client[MDT:0]
lustre-OST0000_UUID 7.5G 1.2M 7.0G 1% /mnt/lustre-client[OST:0]
lustre-OST0001_UUID 3.9G 1.2M 3.7G 1% /mnt/lustre-client[OST:1]
filesystem_summary: 11.4G 2.4M 10.7G 1% /mnt/lustre-client
6. Configuring Node Maps (Access Control)
Node maps allow administrators to restrict Lustre client access based on network or host identity.
1. View current node maps
lctl nodemap_list
2. Create a new node map for trusted clients
lctl nodemap_add trusted_clients
3. Add allowed network range or host
lctl nodemap_add_range trusted_clients 10.0.0.0/24
4. Enable enforcement
lctl set_param nodemap.trusted_clients.admin=1
lctl set_param nodemap.trusted_clients.trust_client_ids=1
5. Restrict default map
lctl set_param nodemap.default.reject_unauthenticated=1
This ensures only IPs in 10.0.0.0/24 can mount and access the Lustre filesystem.
7. Configuring Access Control Lists (ACLs)
Lustre supports standard POSIX ACLs for fine-grained directory and file permissions.
1. Enable ACL support on mount
mount -t lustre -o acl <MGS>@o2ib:/lustrefs /mnt/lustre
2. Verify ACL support
mount | grep lustre
Should show:
/dev/sda on /mnt/lustre type lustre (rw,acl)
3. Set ACLs on directories
setfacl -m u:researcher:rwx /mnt/lustre/projects
setfacl -m g:analysts:rx /mnt/lustre/reports
4. View ACLs
getfacl /mnt/lustre/projects
Sample output:
# file: projects
# owner: root
# group: root
user::rwx
user:researcher:rwx
group::r-x
group:analysts:r-x
mask::rwx
other::---
8. Verifying Cluster Health
On all nodes:
lctl ping <MGS>@o2ib
lctl dl
lctl get_param -n net.*.state
Check RDMA performance:
lctl get_param -n o2iblnd.*.stats
Check file system mount from client:
df -h /mnt/lustre
Optional: Check node map enforcement
Try mounting from an unauthorized IP — it should fail:
mount -t lustre <MGS>@o2ib:/lustrefs /mnt/test
mount.lustre: mount <MGS>@o2ib:/lustrefs at /mnt/test failed: Permission denied
9. Common Issues and Troubleshooting
| Issue | Possible Cause | Resolution |
|---|---|---|
Mount failed: no route to host | IB subnet mismatch or LNET not configured | Verify lnetctl net show and ping -I ib0 between nodes. |
Permission denied | Node map restriction active | Check lctl nodemap_list and ensure client IP range is allowed. |
Slow performance | RDMA disabled or fallback to TCP | Verify lctl list_nids shows @o2ib transport. |
10. Final Validation Checklist
- InfiniBand RDMA verified with
ibv_rc_pingpong - LNET configured for
o2ib(ib0) - MGS, MDT, and OST mounted successfully
- Clients connected via
@o2ib - Node maps restricting unauthorized hosts
- ACLs correctly enforcing directory-level access
Summary
With RDMA transport, Lustre achieves near line-rate performance while node maps and ACLs enforce robust security and access control. This combination provides a scalable, high-performance, and policy-driven storage environment ideal for AI, HPC, and research workloads.
