复制自: https://community.mellanox.com/s/article/howto-configure-nvme-over-fabrics

This post is a quick guide to bring up NVMe over Fabrics host to target association using RDMA transport layer.

NVMEoF can run over any RDMA capable adapter (e.g. ConnectX-3/ConnectX-4) using IB/RoCE link layer.

>>Learn for free about Mellanox solutions and technologies in the Mellanox Online Academy

Note: This post focus on NVMEoF configuration for the target and host, and assumes that the RDMA layer is enabled. Refer to RDMA/RoCE Solutions for topics related to the RDMA layer.

References

Setup

img

Configuration Video By Mellanox Academy

Before you Start

Using MLNX_OFED

Note that MLNX_OFED does not necessarily have to be installed on the servers. In case MLNX_OFED is needed, install v3.4.2 or later.

See HowTo Install MLNX_OFED Driver and make sure to install it with the –add-kernel-support and –with-nvmf flags.

# ./mlnxofedinstall –add-kernel-support –with-nvmf

Benchmarks

Make sure that the RDMA layer is configured correctly and that it is running.

Test the RDMA performance using one of the methods, see for example: HowTo Enable, Verify and Troubleshoot RDMA.

In case MLNX_OFED is not installed for RDMA benchmark testing, followHowTo Enable Perftest Package for Upstream Kernel to verify that the RDMA layer is working correctly using the perftest package (ib_send_bw,ib_write_bw …)

InfiniBand Network Considerations

This post discusses the Ethernet network. InfiniBand configuration is the same as the network type is flexible.

To enable NVMEoF over an InfiniBand network:

Prerequisites

  1. Follow HowTo Compile Linux Kernel for NVMe over Fabrics and make sure that you have nvme modules on the client and target servers.

  2. Make sure that the mlx4 (ConnectX-3/ConnectX-3 Pro) or mlx5 (ConnectX-4/ConnectX-4 Lx) drivers are loaded.

mlx4 Driver Example:

# modprobe mlx4_core

# lsmod | grep mlx

mlx4_ib 148806 0

ib_core 195846 13 rdma_cm,ib_cm,iw_cm,rpcrdma,mlx4_ib,ib_srp,ib_ucm,ib_iser,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert

mlx4_en 97313 0

ptp 12434 1 mlx4_en

mlx4_core 294165 2 mlx4_en,mlx4_ib

mlx5 Driver Example:

# modprobe mlx5_core

# lsmod | grep mlx

mlx5_ib 167936 0

ib_core 208896 14 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm

mlx5_core 188416 1 mlx5_ib

  1. On the target server, load nvmet and nvmet-rdma kernel modules.

# modprobe nvmet

# modprobe nvmet-rdma

# modprobe nvme-rdma <– This is to run a client on the target server (if needed)

# lsmod | grep nvme

nvmet_rdma 24576 1

nvmet 49152 7 nvmet_rdma

rdma_cm 53248 2 rdma_ucm,nvmet_rdma

ib_core 237568 11 ib_cm,rdma_cm,ib_umad,ib_uverbs,ib_ipoib,iw_cm,mlx5_ib,ib_ucm,rdma_ucm,nvmet_rdma,mlx4_ib

mlx_compat 16384 16 ib_cm,rdma_cm,ib_umad,ib_core,ib_uverbs,nvmet,mlx4_en,ib_ipoib,mlx5_core,iw_cm,mlx5_ib,mlx4_core,ib_ucm,rdma_ucm,nvmet_rdma,mlx4_ib

nvme 28672 2

nvme_core 36864 3 nvme

  1. On the client server, load nvme-rdma kernel module.

# modprobe nvme-rdma # lsmod | grep nvme

nvme_rdma 28672 0

nvme_fabrics 20480 1 nvme_rdma

nvme 28672 0

nvme_core 49152 3 nvme_fabrics,nvme_rdma,nvme

rdma_cm 53248 2 nvme_rdma,rdma_ucm

ib_core 237568 11 ib_cm,rdma_cm,ib_umad,nvme_rdma,ib_uverbs,ib_ipoib,iw_cm,mlx5_ib,ib_ucm,rdma_ucm,mlx4_ib

mlx_compat 16384 18 ib_cm,rdma_cm,ib_umad,nvme_fabrics,ib_core,nvme_rdma,ib_uverbs,nvme,nvme_core,mlx4_en,ib_ipoib,mlx5_core,iw_cm,mlx5_ib,mlx4_core,ib_ucm,rdma_ucm,mlx4_ib

NVME Target Configuration

Prerequisites

For more information about NVMe subsystem, refer to: http://www.nvmexpress.org/specifications/

  1. Create nvmet-rdma subsystem. Run the ‘mkdir /sys/kernel/config/nvmet/subsystems/’ command. Select any name.

# mkdir /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name

# cd /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name

  1. Allow any host to be connected to this target.

# echo 1 > attr_allow_any_host

Note: ACLs are supported, yet they are not described in this post.

  1. Create a namespace inside the subsystem using the ‘mkdir namespaces/’ command, where ns_num is the namespace number to create (similar to lun).

# mkdir namespaces/10

# cd namespaces/10

  1. Set the path to the NVMe device (e.g. /dev/nvmeon1) and enable the namespace.

# echo -n /dev/nvme0n1> device_path

# echo 1 > enable

Note: The enabling command will not work in case you do not have NVMe a device installed. For NVMEoF benchmark networking you can use null block device instead.

# modprobe null_blk nr_devices=1

# ls /dev/nullb0

/dev/nullb0

# echo -n /dev/nullb0 > device_path

# echo 1 > enable

  1. Create the following directory with an NVMe port. Any port number can be set. Use the ‘mkdir /sys/kernel/config/nvmet/ports/’ command.

# mkdir /sys/kernel/config/nvmet/ports/1

# cd /sys/kernel/config/nvmet/ports/1

  1. Set the IP address of the relevant port, using the ‘echo > addr_traddr’ command. traddris the transport address.

Set the IP address on the Mellanox adapter. For example:

# ip addr add 1.1.1.124 dev enp2s0f0

The address you configured on the post, should be the same address for the NVMe target (1.1.1.1 in this example) to listen. Run:

# echo 1.1.1.1 > addr_traddr

  1. Set RDMA as a transport type, and set the transport RDMA port. Any port number can be set. In the following example, the RDMA port is 4420 (This is the default IANA assignment, see here).

# echo rdma > addr_trtype

# echo 4420 > addr_trsvcid

  1. Set IPv4 as the Address Family of the port:

# echo ipv4 > addr_adrfam

  1. Create a soft link:

# ln -s /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name/sys/kernel/config/nvmet/ports/1/subsystems/nvme-subsystem-name

  1. Check dmesg to make sure that the NVMe target is listening on the port:

# dmesg | grep “enabling port”

[ 1066.294179] nvmet_rdma: enabling port 1 (1.1.1.1:4420)

At this point, NVME target will be ready for connection requests.

NVMe Client (Initiator) Configuration

NVME has a user-space utility for executing NVMe commands. This tool, called nvme-cli, supports the NVMF functionality and is substantial for some of the operations.

  1. Install nvmecli. Clone nvme-cli from Git repository.

# git clone https://github.com/linux-nvme/nvme-cli.git

Cloning into ‘nvme-cli’…

remote: Counting objects: 1741, done.

remote: Total 1741 (delta 0), reused 0 (delta 0), pack-reused 1741

Receiving objects: 100% (17411741), 862.69 KiB | 384.00 KiB/s, done.

Resolving deltas: 100% (11881188), done.

  1. Compile the nvme-cli. Execute make and make install

# cd nvme-cli

# make

# make install

  1. Verify the installation, run nvme command:

# nvme

nvme-0.8

usage: nvme [] []

The ‘’ may be either an NVMe character device (ex: /dev/nvme0) or an

nvme block device (ex: /dev/nvme0n1).

The following are all implemented sub-commands:

list List all NVMe devices and namespaces on machine

id-ctrl Send NVMe Identify Controller

id-ns Send NVMe Identify Namespace, display structure

list-ns Send NVMe Identify List, display structure

create-ns Creates a namespace with the provided parameters

delete-ns Deletes a namespace from the controller

attach-ns Attaches a namespace to requested controller(s)

detach-ns Detaches a namespace from requested controller(s)

list-ctrl Send NVMe Identify Controller List, display structure

get-ns-id Retrieve the namespace ID of opened block device

get-log Generic NVMe get log, returns log in raw format

fw-log Retrieve FW Log, show it

smart-log Retrieve SMART Log, show it

smart-log-add Retrieve additional SMART Log, show it

error-log Retrieve Error Log, show it

get-feature Get feature and show the resulting value

set-feature Set a feature and show the resulting value

format Format namespace with new block format

fw-activate Activate new firmware slot

fw-download Download new firmware

admin-passthru Submit arbitrary admin command, return results

io-passthru Submit an arbitrary IO command, return results

security-send Submit a Security Send command, return results

security-recv Submit a Security Receive command, return results

resv-acquire Submit a Reservation Acquire, return results

resv-register Submit a Reservation Register, return results

resv-release Submit a Reservation Release, return results

resv-report Submit a Reservation Report, return results

dsm Submit a Data Set Management command, return results

flush Submit a Flush command, return results

compare Submit a Compare command, return results

read Submit a read command, return results

write Submit a write command, return results

write-zeroes Submit a write zeroes command, return results

write-uncor Submit a write uncorrectable command, return results

reset Resets the controller

subsystem-reset Resets the controller

show-regs Shows the controller registers. Requires admin character device

discover Discover NVMeoF subsystems

connect-all Discover and Connect to NVMeoF subsystems

connect Connect to NVMeoF subsystem

disconnect Disconnect from NVMeoF subsystem

version Shows the program version

help Display this help

See ‘nvme help ’ for more information on a specific command

The following are all installed plugin extensions:

intel Intel vendor specific extensions

lnvm LightNVM specific extensions

See ‘nvme help’ for more information on a plugin

  1. Re-check that nvme-rdma module is loaded. If not, load it using ‘modprobe nvme-rdma’.

# lsmod | grep nvme

nvme_rdma 19605 0

nvme_fabrics 10929 1 nvme_rdma

nvme_core 43067 2 nvme_fabrics,nvme_rdma

rdma_cm 45356 5 rpcrdma,nvme_rdma,ib_iser,rdma_ucm,ib_isert

ib_core 195846 14 rdma_cm,ib_cm,iw_cm,rpcrdma,mlx4_ib,ib_srp,ib_ucm,nvme_rdma,ib_iser,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert

  1. Discover available subsystems on NVMF target. Use the ‘nvme discover -t rdma -a -s ’ command.

Make sure to use the IP of the target port.

# nvme discover -t rdma -a 1.1.1.1 -s 4420

Discovery Log Number of Records 1, Generation counter 1

=====Discovery Log Entry 0======

trtype: rdma

adrfam: ipv4

subtype: nvme subsystem

treq: not specified

portid: 1

trsvcid: 4420

subnqn: nvme-subsystem-name

traddr: 1.1.1.1

rdma_prtype: not specified

rdma_qptype: connected

rdma_cms: rdma-cm

rdma_pkey: 0x0000

Note: Make sure you are aware of the subnqn name. in this case the value is nvme-subsystem-name.

  1. Connect to the discovered subsystems using the command: ‘nvme connect –t rdma –n -t -s

# nvme connect -t rdma -n nvme-subsystem-name -a 1.1.1.1 -s 4420

# lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sda 8:0 0 930.4G 0 disk

\u251c\u2500sda2 8:2 0 929.9G 0 part

\u2502 \u251c\u2500centos-swap 253:1 0 31.5G 0 lvm [SWAP]

\u2502 \u251c\u2500centos-home 253:2 0 100G 0 lvm /home

\u2502 \u2514\u2500centos-root 253:0 0 798.4G 0 lvm /

\u2514\u2500sda1 8:1 0 500M 0 part /boot

nvme0n1 259:0 0 250G 0 disk

Note: nvme0n1 block device was created.

  1. In order to disconnect from the target run the nvme disconnect command:

# nvme disconnect -d /dev/nvme0n1

Fast Startup and Persistent Configuration Scripts

Target Configuration

  1. Create a persistent interface configuration:

# cat /etc/sysconfig/network-scripts/ifcfg-enp2s0f0

DEVICE=enp2s0f0

BOOTPROTO=static

IPADDR=1.1.1.1

NETMASK=255.255.255.0

ONBOOT=yes

  1. Copy the following script to /etc/rc.d/rc.local (or create a startup script for Linux, see here):

#!/bin/bash

# NVME Target Configuration

# Assuming the following:

# Interface is enp2s0f0

# IP is 1.1.1.124

# link is Up

# Using NULL Block device nullb0

# Change the red parameters below to suit your setup

modprobe mlx5_core

modprobe nvmet

modprobe nvmet-rdma

modprobe nvme-rdma

modprobe null_blk nr_devices=1

mkdir /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name

cd /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name

echo 1 > attr_allow_any_host

mkdir namespaces/10

cd namespaces/10

echo -n /dev/nullb0 > device_path

echo 1 > enable

mkdir /sys/kernel/config/nvmet/ports/1

cd /sys/kernel/config/nvmet/ports/1

echo 1.1.1.1 > addr_traddr

echo rdma > addr_trtype

echo 4420 > addr_trsvcid

echo ipv4 > addr_adrfam

ln -s /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name /sys/kernel/config/nvmet/ports/1/subsystems/nvme-subsystem-name

# End of NVNE Target Configuration

  1. Make sure that the mode is +x:

# chmod ugo+x /etc/rc.d/rc.local

  1. Reboot the server:

# reboot

  1. Verify that the target is enabled on the interface:

# lsmod | grep nvme

nvme_rdma 28672 0

nvme_fabrics 20480 1 nvme_rdma

nvme_core 45056 2 nvme_fabrics,nvme_rdma

nvmet_rdma 24576 1

nvmet 49152 7 nvmet_rdma

rdma_cm 53248 3 nvme_rdma,rdma_ucm,nvmet_rdma

ib_core 147456 14 ib_cm,rdma_cm,ib_umad,nvme_rdma,ib_uverbs,ib_mad,ib_ipoib,ib_sa,iw_cm,mlx5_ib,ib_ucm,rdma_ucm,nvmet_rdma,mlx4_ib

# dmesg | grep “enabling port”

[ 55.766228] enabling port 1 (1.1.1.1:4420)

Client Configuration

  1. Create a persistent interface configuration:

# cat /etc/sysconfig/network-scripts/ifcfg-enp2s0f0

DEVICE=enp2s0f0

BOOTPROTO=static

IPADDR=1.1.1.2

NETMASK=255.255.255.0

ONBOOT=yes

  1. Copy the following script to /etc/rc.d/rc.local (or create a startup script for Linux, see here).

Note: nvme-cli should be installed (see the first step under the NVMe Client (Initiator) Configuration section above).

#!/bin/bash

# NVME Client Configuration

# Assuming the following:

# Interface is enp2s0f0

# IP is 1.1.1.224, remote target is 1.1.1.124

# link is Up

# nvme-cli is installed

modprobe mlx5_core

modprobe nvme-rdma

nvme discover -t rdma -a 1.1.1.1 -s 4420

nvme connect -t rdma -n nvme-subsystem-name -a 1.1.1.1 -s 4420

# End of NVME Client Configuration

  1. Make sure that the mode is +x:

# chmod ugo+x /etc/rc.d/rc.local

  1. Reboot the server. Make sure that the target is enabled and UP:

# reboot

  1. Run lsblk and ldmod

# lsmod | grep nvme

nvme_rdma 28672 0

nvme_fabrics 20480 1 nvme_rdma

nvme_core 45056 2 nvme_fabrics,nvme_rdma

rdma_cm 53248 2 nvme_rdma,rdma_ucm

ib_core 147456 13 ib_cm,rdma_cm,ib_umad,nvme_rdma,ib_uverbs,ib_mad,ib_ipoib,ib_sa,iw_cm,mlx5_ib,ib_ucm,rdma_ucm,mlx4_ib

mlx_compat 16384 19 ib_cm,rdma_cm,ib_umad,ib_core,nvme_rdma,ib_uverbs,ib_mad,ib_addr,mlx4_en,ib_ipoib,mlx5_core,ib_sa,iw_cm,mlx5_ib,mlx4_core,ib_ucm,rdma_ucm,ib_netlink,mlx4_ib

# lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sda 8:0 0 930.4G 0 disk

\u251c\u2500sda2 8:2 0 929.9G 0 part

\u2502 \u251c\u2500centos-swap 253:1 0 31.5G 0 lvm [SWAP]

\u2502 \u251c\u2500centos-home 253:2 0 100G 0 lvm /home

\u2502 \u2514\u2500centos-root 253:0 0 798.4G 0 lvm /

\u2514\u2500sda1 8:1 0 500M 0 part /boot

nvme0n1 259:0 0 250G 0 disk

Useful commands

nvme list

Run from the client to see the list of the NVMe devices currently connected.

# nvme list

Node SN Model Namespace Usage Format FW Rev


/dev/nvme0n1 3b605a467714f272 Linux 10 268.44 GB / 268.44 GB 512 B + 0 B 4.8.7

Benchmarking

After establishing a connection between NVMF host (initiator) and NVMF target, find a new NVMe block device under /dev/dir in the initiator side. The block device represents the remote backing store of the connected subsystem.

Perform a simple traffic test on the block device to make sure everything is working properly. Use the fio command (install fio package if not available) or any other traffic generator.

Note: Make sure to update the filename parameter to suit the nvme device created in your system.

# fio –bs=64k –numjobs=16 –iodepth=4 –loops=1 –ioengine=libaio –direct=1 –invalidate=1 –fsync_on_close=1 –randrepeat=1 –norandommap –time_based –runtime=60 –filename=/dev/nvme0n1 –name=read-phase –rw=randread

For more details about fio installations and usage, see: HowTo Install Flexible I/O (Fio) for Storage Benchmarking I/O Testing.

Troubleshooting

  1. In case the soft link fails, This is the output that you get when executing the dmesg command.

# dmesg | grep nvmet_rdma

[ 462.992749] nvmet_rdma: binding CM ID to 1.1.1.1:4420 failed (-19)

[ 8552.951381] nvmet_rdma: binding CM ID to 1.1.1.1:4420 failed (-99)

Check the IP connectivity. Ping and try again.

  1. RDMA performance tools may not work by default. Follow HowTo Enable Perftest Package for Upstream Kernel to make sure the relevant modules and userspace libraries are enabled.

  2. The command, nvme disconnect -n nvme-subsystem-name may fail due to a bug in the nvme, in that case use # nvme disconnect -d /dev/nvme0n1.

# nvme disconnect -n nvme-subsystem-name

  1. In case you can’t enable the nvme-rdma module, make sure you installed the MLNX_OFED using –with-nvmf flag.