Home

MLNX_EN for Linux User Manual

image

Contents

1. 12 Table 5 MLNX EN Software 14 Table 6 Flow Specific Parameters 36 Table 7 ethtool Supported 55 Table 8 General Related 15 69 Table 9 Ethernet Related 5 69 Table 10 Performance Related 5 71 Table 11 SR IOV Related Issues 72 6 Mellanox Technologies Rev 3 1 1 0 4 Document Revision History Table 1 Document Revision History Release Date Description 3 1 1 0 4 October 08 2015 Added the following new sections Section 3 14 Wake on LAN WoL on page 67 Section 3 15 Hardware Accelerated 802 lad VLAN Q in Q Tunneling on page 67 Section 3 10 2 4 ECN on page 63 Section 3 4 1 2 2 Configuring SR IOV for ConnectX 4 on page 43 Section 3 4 1 2 3 Note on VFs Initialization on page 45 Updated the following sections Section 3 7 Ethtool on page 54 Section 3 3 1 Enable Disable Flow Steering on page 33 Section 3 32 1 0 Static Device Managed Flow Steering on pa
2. 2 19 2 6 Recompiling MLNX 20 2 7 Updating Firmware After 20 2 7 1 Updating the Device 20 2 7 2 Updating the Device 20 2 8 Ethernet Driver Usage and 21 2 9 Performance 22 Chapter3 Feature Overview and Configuration 23 3 1 Quality of Service 473 2 causa oven ve en aa 23 3 1 4 Mapping Traffic to Traffic 23 3 1 2 Plain Ethernet Quality of Service Mapping 23 3 1 3 Map Priorities with tc wrap py mlnx 408 24 3 1 4 Quality of Service Properties 24 Strict Priority wx ga y sus auqa suma tee auum Sears nes Hee Gok 24 31 42 Minimal Bandwidth Guarantee 5 25 3 143 Rate oe en tre reos re ee eme ess eee n 25 3 15 Quality of Service Tools u EROR S Vale es 25 2151 mlix qosl4ci AR esI ke Pees V RIA 25 3 1 5 2 teiand te wrap py e uev Ren x e ta EE q XR pa 29 3 1 5 3 Additional Tools
3. deter es o0 Chapter 1 OVOEVIGW uou x were a erac ac e el acu Par Oc MR 1 1 MLNX EN Package 13 1 1 1 Tarball Packages ee e tte hay a p d 13 1 1 2 Software 14 12 3 Firmwares Oe baa Pu ESSE ralis 14 1 1 4 Directory Structure sse dice rS e ER HE EE ERE 14 1 Diverse a ess E a 15 LEG mlx5 Drivet 3 5 E et ge tede qe eene peres 15 1 2 Module Parameters er u sarrun E E e a E e Maes kaya 15 1 2 1 mlx4 Module Parameters 15 1 2 1 1 anlx4 core Par melersza grossar eme ug IR ES 15 1 22 mlx4 en Parameters eee tahoe ER er ee ES 17 1 2 2 mlx5 Module Parameters 17 Chapter 2 Installation 1S 2 1 Software 18 22 Downloading MLNX 4 18 2 3 Installing MENX EN us ERE aha 18 2 3 1 Installation Medes a n a e e ege 18 2 3 2 Installation Procedure 2 2 0 5 ieee te Mees kama qu au u DEDE 19 2 4 Unloading MLNX EN 19 2 5 Uninstalling MLNX
4. ed dele taste etal ina dee ve qx 30 Mellanox Technologies 3 J 3 1 1 0 4 3 2 Tame Stamping Service soe cose per cad aap ene ees na 30 3 2 1 Enabling Time 30 3 2 2 Getting Time 32 3 3 Flow Steering i us see adele eh Re a ea E S 33 3 3 1 Enable Disable Flow 33 3 3 2 Flow Steering 35 3 3 2 1 0 Static Device Managed Flow Steering 35 3 3 3 Flow Domains and 35 SAA Virai ee e eR an ose iia tea NER 37 3 4 1 Single Root IO Virtualization 5 37 3 4 1 1 System Requirements 37 3472 Setting Up SR 1OV u u 34 sa Sa as Was Sas dais E 3rd Ge 37 3 4 1 3 Enabling SR IOV and Para Virtualization on the Same Setup 46 3 4 1 4 Assigning a Virtual Function to a Virtual Machine 47 3 4 1 5 Uninstalling SR IOV 48 3 4 1 6 Ethernet Virtual Function Configuration when Running SR IOV 48 3 4 1 7 MAC Forwarding DataBase FDB Management
5. 49 3 4 2 VXLAN Hardware Stateless Offloads 52 34 27 Prerequisites asia eatur aku Opa sata es vs 52 3 4 2 2 Enabling VXLAN Hardware Stateless Offloads 52 342 3 Important 8 APER rias da Pu desee S 53 35 Re siliency e Se Uca tiy eames Ease wad Ban ees SER S 53 354 Reset FOW er als ME EURO E mM ERA 53 32511 Ketel UICRS aseti osuere ius udis 53 3 5 12 SROV SR ote dh dhe an nte cie lle ticis pesce eae ps eR ARAS UA 53 3 5 1 3 Forcing the VF to 53 3 5 1 4 Advanced Error Reporting 54 3 5 1 5 Extended Error Handling 54 3 6 Ignore Frame Check Sequence FCS Errors 54 357 Ethtool ti gas rss ya paypaqpas Pace eek heiress 54 3 83 Checksum Offload ea te alt er RR Pach ale ue 58 3 9 Quantized Congestion Control 1 59 3 9 1 QCN Tool qen 59 3 9 2 Setting QCN 61 3 10 Explicit Congestion Notification 62 3 10 1 ConnectX 3 ConnectX 3 Pro 62
6. Glossary 3 1 1 0 4 The following is a list of concepts and terms related to InfiniBand in general and to Subnet Man agers in particular It is included here for ease of reference but the main reference remains the InfiniBand Architecture Specification Table 3 Glossary Channel Adapter CA An IB device that terminates an IB link and executes transport functions This Host Channel Adapter may be an HCA Host CA or a TCA Target CA HCA HCA Card A network adapter card based on an InfiniBand channel adapter device IB Devices Integrated circuit implementing InfiniBand compliant communication In Band A term assigned to administration activities traversing the IB connectivity only Local Port The IB port of the HCA through which IBDIAG tools connect to the IB fabric Master Subnet Manager The Subnet Manager that is authoritative that has the reference configuration information for the subnet See Subnet Manager Multicast Forwarding Tables A table that exists in every switch providing the list of ports to forward received multicast packet The table is organized by MLID Network Interface Card NIC A network adapter card that plugs into the PCI Express slot and provides one or more ports to an Ethernet network Unicast Linear For warding Tables LFT A table that exists in every switch providing the port through which packets should be sent to each LID Virtual Pr
7. Single Dual port Upto 16 Rx queues per port 6 Tx queues per port Rxsteering mode Receive Core Affinity RCA MSI X or INTx Adaptive interrupt moderation HW Tx Rx checksum calculation Large Send Offload 1 TCP Segmentation Offload Large Receive Offload Multi core NAPI support VLAN Tx Rx acceleration HW VLAN stripping insertion Ethtool support Net device statistics SR IOV support Flow steering Ethernet Time Stamping 1 1 MLNX EN Package Contents 1 1 1 Tarball Package MLNX EN for Linux is provided as a tarball that includes source code and firmware The tarball contains an installation script called install sh that performs the necessary steps to accomplish the following Discover the currently installed kernel 1 56 GbE is a Mellanox propriety link speed and can be achieved while connecting a Mellanox adapter cards to Mellanox SX10XX switch series or connecting a Mellanox adapter card to another Mellanox adapter card Mellanox Technologies 13 J Rev 3 1 1 0 4 Overview Uninstall any previously installed MLNX OFED MLNX EN packages Install the MLNX EN binary if they are available for the current kernel Identify the currently installed HCAs and perform the required firmware updates Software Components MLNX EN contains the following software components Table 5 MLNX EN Software Components Components Description 5 driver mlx5 is the low level d
8. balancer refer to Performance Tuning Guide tu14 04 is not optimal and may achieve results below the line rate in 40GE link speed UDP receiver throughput This is caused by the adaptive Disable adaptive interrupt moderation and may be lower then interrupt moderation routine set lower values for the interrupt coalescing expected when running which sets high values of manually over mlx4_en Ethernet interrupt coalescing causing ethtool C lt eth gt X adaptive rx off driver the driver to process large rx usecs 64 rx frames 24 number of packets in the same interrupt leading UDP to drop Values above may need tuning depending packets due to overflow in its the system configuration and link speed buffers Mellanox Technologies 71 J 3 1 1 0 4 Troubleshooting 4 4 SR IOV Related Issues Table 11 SR IOV Related Issues Issue Cause Solution Failed to enable The number of VFs config 1 Check the firmware SR IOV configura SR IOV ured in the driver is higher tion run the mlxconfig tool The following message than configured in the firm 2 Set the same number of VFS for the is reported in dmesg ware driver mlx4 core 0000 xx xx 0 Failed to enable SR IOV continuing without SR IOV err 22 Failed to enable SR IOV is disabled in the Check that the SR IOV is enabled in the SR IOV BIOS BIOS see Section 3 4 1 2 Setting Up The following message SR IOV on page 37
9. Time Stamping Service Time Stamping is currently supported in ConnectX 3 ConnectX 3 Pro adapter cards only A Time stamping is the process of keeping track of the creation of a packet A time stamping ser vice suppotts assertions of proof that a datum existed before a particular time Incoming packets are time stamped before they are distributed on the PCI depending on the congestion in the PCI buffers Outgoing packets are time stamped very close to placing them on the wire Enabling Time Stamping Time stamping is off by default and should be enabled before use gt To enable time stamping for a socket Callsetsockopt with SO TIMESTAMPING and with the following flags SOF TIMESTAMPING TX HARDWARE try to obtain send time stamp in hardware SOF TIMESTAMPING TX SOFTWARE if SOF TIMESTAMPING TX HARDWARE is off or fails then do it in software SOF TIMESTAMPING RX HARDWARE return the original unmodified time stamp as generated by the hardware SOF TIMESTAMPING RX SOFTWARE if SOF TIMESTAMPING RX HARDWARE is off or fails then do it in software SOF_TIMESTAMPING RAW HARDWARE return original raw hardware time stamp SOF TIMESTAMPING SYS HARDWARE return hardware time stamp transformed to the system time base SOF TIMESTAMPING SOFTWARE return system time stamp generated in software SOF TIMESTAMPING TX RX determine how time stamps are generated SOF TIMESTAMPING RAW SYS determine how they are reported
10. To enable time stamping for a net device Admin privileged user can enable disable time stamping through calling ioctl sock SIOCSHWT STAMP amp ifreq with following values Send side time sampling 30 Mellanox Technologies 3 1 1 0 4 Enabled by ifreq hwtstamp config tx type when Mellanox Technologies 31 3 1 1 0 4 Feature Overview and Configuration Receive side time sampling Enabled by ifreq hwtstamp_config rx_filter when possible values for hwtstamp_config gt rx filter enum hwtstamp rx filters time stamp no incoming packet at all HWTSTAMP FILTER NONE time stamp any incoming packet HWTSTAMP FILTER ALL return value time stamp all packets requested plus some others HWTSTAMP FILTER SOME PTP v1 UDP any kind of event packet HWTSTAMP FILTER PTP V1 L4 EVENT PTP v1 UDP Sync packet HWTSTAMP FILTER PTP V1 L4 SYNC PTP vl UDP Delay reg packet HWTSTAMP FILTER PTP V1 L4 DELAY REQ PTP v2 UDP any kind of event packet HWTSTAMP FILTER PTP V2 L4 EVENT PTP v2 UDP Sync packet HWTSTAMP FILTER PTP V2 L4 SYNC PTP v2 UDP Delay reg packet HWTSTAMP FILTER PTP V2 L4 DELAY REQ 802 AS1 Ethernet any kind of event packet HWTSTAMP FILTER PTP V2 L2 EVENT 802 AS1 Ethernet Sync packet HWTSTAMP FILTER PTP V2 L2 SYNC 802 AS1 Ethernet Delay req packet HWTSTAMP FILTER PTP V2
11. is reported in dmesg mlx4 core 0000 xx xx 0 Failed to enable SR IOV continuing without SR IOV err 12 When assigning a VF to SR IOV and virtualization 1 Verify they are both enabled in the BIOS a VM the following are not enabled in the 2 Add to the GRUB configuration file to message is reported on BIOS the following kernel parameter the screen intel immun on PCI assgine error see Section 3 4 1 2 Setting Up SR requires KVM sup IOV on page 37 port 72 Mellanox Technologies
12. lib modules kernel ver extra mlnx en Fornon KMP RPMs mlnx en RPM OnSLES 1ib modules kernel ver updates mlnx en On RHEL 1ib modules kernel ver extra mlnx en The kernel module sources are placed under usr src mellanox mlnx en lt ver gt 2 3 2 Installation Procedure Step 1 Login to the installation machine as root Step 2 Extract the tarball image on your machine gt tar xzvf mlnx en 3 0 1 0 1 tgz Step3 Change the working directory gt cd mlnx en 3 0 1 0 1 Step 4 Run the installation script gt install sh Step 5 Load the driver etc init d mlnx en d restart Unloading NIC driver Ox Loading NIC driver OK 1 p The etc init d mlInx en d service script will load both the mlx4 and or mlx5 drivers as set in the etc mlnx en conf configurations file Audi The result is a new net device appearing in the ifconfig a output 2 4 Unloading MLNX EN gt To unload the Ethernet driver etc init d mlnx en d stop Unloading NIC driver OK 1 2 5 Uninstalling MLNX EN Use the script sbin mlnx en uninstall sh to uninstall the Mellanox OFED package Mellanox Technologies 19 J 3 1 1 0 4 Installation 2 6 2 7 2 7 1 2 7 2 Recompiling MLNX_EN gt To recompile the driver Step 1 Enter the source directory cp a usr src mlnx en 3 0 tmp cd tmp mlnx en 3 0 Step 2 Apply kernel backport patch Scripts mlnx en patch sh Step 3 C
13. 4 and ConnectX 4 Lx network adapters Firmware configuration INI files for Mellanox standard network adapter cards and custom cards Directory Structure The tarball image of MLNX EN contains the following files and directories installsh This is the MLNX EN installation script 14 Mellanox Technologies Rev 3 1 1 0 4 milnx_en_uninstall sh This is the MLNX EN un installation script firmware Directory of the Mellanox HCA firmware images SOURCES Directory of the MLNX EN source tarball SRPM based A script required to rebuild MLNX EN for customized kernel version on supported RPM based Linux Distribution 1 1 5 mlx4 VPI Driver m1x4 is the low level driver implementation for the ConnectX family adapters designed by Mellanox Technologies The MLNX EN driver supports Ethernet NIC configurations To accommodate the supported configurations the driver is split into the following modules mlx4 core Handles low level functions like device initialization and firmware commands processing Also controls resource allocation so that the Ethernet functions can share the device without interfer ing with each other mlx4 en A 10 40GigE driver under drivers net ethernet mellanox mlx4 that handles Ethernet specific functions and plugs into the netdev mid layer 1 1 6 mlx5 Driver m1x5 is the low level driver implementation for the ConnectX 4 adapters designed by Mella nox Technologies ConnectX 4 ope
14. Down it is modified to Initialize In all other states it is unmodified The result is that the SM may bring the VPort up Follow follows the PortState of the physical port If the PortState of the physical port is Active then the VPort implements the Up policy Otherwise the VPort PortState is Down Thepolicy of all the vports is Initialized to Down after restart of the PF driver exception is vport0 which for which the policy is modified to Follow by the PF driver 3 4 1 2 3 Note on VEs Initialization Since the same m1x5_core driver supports both Physical and Virtual Functions once the Virtual Functions are created the driver of the PF will attempt to initialize them so they will be available to the OS owning the PF If you want to assign a Virtual Function to a VM you need to make sure the VF is not used by the PF driver If a VF is used you should first unbind it before assign ing to a VM gt To unbind a device use the following command 1 Get the full PCI address of the device lspci D Example 0000 09 00 2 2 Unbind the device echo 0000 09 00 2 gt sys bus pci drivers mlx5 core unbind 3 Bind the unbound VF echo 0000 09 00 2 gt sys bus pci drivers mlx5 core bind Mellanox Technologies 45 J 3 1 1 0 4 Feature Overview and Configuration 3 4 1 3 Enabling SR IOV Para Virtualization on the Same Setup To enable SR IOV and Para Virtualization on the same setup Step
15. The operating system network and devices pro vide counter data that an application can consume to provide users with a graphical view of how well the system is performing The counter index is a QP attribute given in the QP context Multiple QPs may be associated with the same counter set If multiple QPs share the same counter its value represents the cumulative total ConnectX 3 support 127 different counters which allocated 4counters reserved for PF 2 counters for each port 2 counters reserved for VF 1 counter for each port Allother counters if exist are allocated by demand counters are available only through sysfs located under sys class infiniband mlx4_ ports counters sys class infiniband mlx4 ports counters ext Physical Function can also read Virtual Functions port counters through sysfs located under sys class net eth vf statistics To display the network device Ethernet statistics you can run Ethtool S lt devname gt Counter Description rx_packets Total packets successfully received 64 Mellanox Technologies Rev 3 1 1 0 4 Counter Description rx_bytes Total bytes in successfully received packets rx_multicast_packets Total multicast packets successfully received rx_broadcast_packets Total broadcast packets successfully received IX errors Number of receive packets that contained errors preventi
16. ip link set dev PF device vf NUM vlan vlan id qos lt qos gt where NUM 0 max vf num e vlan 14 0 4095 4095 means set qos 0 7 For example ip link set dev eth2 vf 2 qos 3 sets VST mode for VF 2 belonging to PF eth2 with qos 3 ip link set dev eth2 vf 4095 sets mode for VF 2 back to VGT 48 Mellanox Technologies Rev 3 1 1 0 4 3 4 1 6 2 Additional Ethernet VF Configuration Options Guest MAC configuration By default guest MAC addresses are configured to be all zeroes In the MLNX_EN guest driver if a guest sees a zero it generates a random MAC address for itself If the administrator wishes the guest to always start up with the same MAC he she should configure guest MACs before the guest driver comes up The guest MAC may be configured by using ip link set dev lt PF device gt vf lt NUM gt mac lt LLADDR gt For legacy guests which do not generate random MACs the adminstrator should always configure their MAC addresses via ip link as above Spoof checking Spoof checking is currently available only on upstream kernels newer than 3 1 ip link set dev lt PF device gt vf lt NUM gt spoofchk on off Link State VF link state can be configured to one of the following 3 options auto enable disable ip link set dev lt PF device gt vf lt NUM gt state auto enable disable 3 4 1 6 3 Mapping VFs to Ports using the mInx get vfs pl Tool gt To map t
17. 1 Step 2 Step 3 Step 4 Create a bridge vim etc sysconfig network scripts ifcfg bridgeO DEVICE bridge0 TYPE Bridge TPADDR 12 195 15 1 NETMASK 255 255 0 0 BOOTPROTO static ONBOOT yes NM CONTROLLED no DELAY 0 Change the related interface in the example below bridgeO is created over eth5 DEVICE eth5 BOOTPROTO none STARTMODE on HWADDRz00 02 c9 2e 66 52 TYPE Ethernet NM CONTROLLED no ONBOOT yes BRIDGE bridge0 Restart the service network Attach a virtual NIC to VM ifconfig a eth6 Link encap Ethernet HWaddr 52 54 00 E7 77 99 inet addr 13 195 15 5 Bcast 13 195 255 255 Mask 255 255 0 0 inet6 addr fe80 5054 ff fee7 7799 64 Scope Link UP BROADCAST RUNNING MULTICAST MTU 1500 Metric 1 RX packets 481 errors 0 dropped 0 overruns 0 frame 0 TX packets 450 errors 0 dropped 0 overruns 0 carrier 0 collisions 0 txqueuelen 1000 RX bytes 22440 21 9 KiB TX bytes 19232 18 7 KiB Interrupt 10 Base address 0xa000 46 Mellanox Technologies Rev 3 1 1 0 4 Step 5 Add the MAC 52 54 00 E7 77 99 to the sys class net eth5 fdb table on HV Before cat sys class net eth5 fdb 83299 400500202102 33 33 2 66 52 01 00 5 00 00 01 33 33 00 00 00 01 echo 52 54 00 E7 77 99 gt sys class net eth5 fdb After cat sys class net eth5 fdb 52 54 00 e7 77 99 33 53 00200 02 02 a Zes Bal 00 5el 0000 01 argo OO OI 3 4 1 4 Assigning a Virtual Function to a Vir
18. 5 rpg max rate 40000 rpg ai rate 10 rpg hai rate 50 rpg gd 8 rpg min dec fac 2 rpg min rate 10 cndd state machine 0 3 9 2 Setting QCN Configuration Setting the QCN parameters requires updating its value for each priority 1 indicates no change in the current value Example for setting rp g enable in order to enable QCN for priorities 3 5 6 i Biel 4 q Example for setting rpg_hai_rate for priorities 1 6 7 cen 1 oho 5090 1 G0 GQ Mellanox Technologies 61 Rev 3 1 1 0 4 Feature Overview and Configuration 3 10 3 10 1 3 10 1 1 Explicit Congestion Notification ECN ConnectX 3 ConnectX 3 Pro ECN ECN is an extension to the IP protocol It allows reliable communication by notifying all ends of communication when a congestion occurs This is done without dropping packets Please note that since this feature requires all nodes in the path nodes routers etc between the communicating nodes to support ECN to ensure reliable communication ECN is marked as 2 bits in the traffic control IP header This ECN implementation refers to both and RoCEv2 ECN command interface is use to configure ECN activity The access to it is through the file sys tem mount of debugfs is required The interface provides a set of changeable attributes and information regarding ECN s counters and statistics Enabling the ECN comm
19. L2 DELAY REQ PTP v2 802 A81 any layer any kind of event packet HWISTAMP FILTER V2 EVENT PTP v2 802 AS1 any layer Sync packet HWTSTAMP FILTER PTP V2 SYNC PTP v2 802 AS1 any layer Delay req packet HWTSTAMP FILTER PTP V2 DELAY REQ Note for receive side time stamping currently only HWTSTAMP FILTER NONE and HWTSTAMP FILTER ALL are supported 3 2 2 Getting Time Stamping Once time stamping is enabled time stamp 1s placed in the socket Ancillary data recvmsg can be used to get this control message for regular incoming packets For send time stamps the outgo ing packet is looped back to the socket s error queue with the send time stamp s attached It can be received with recvmsg flags MSG ERRQUEUE The call returns the original outgoing packet data including all headers preprended down to and including the link layer the scm time stamping control message and a sock extended err control message with ee errmo ENOMSG and ee origin SO EE ORIGIN TIMESTAMPING A socket with such a pending bounced 32 Mellanox Technologies Rev 3 1 1 0 4 packet is ready for reading as far as select is concerned If the outgoing packet has to be frag mented then only the first fragment is time stamped and returned to the sending socket When time stamping is enabled VLAN stripping is disabled For more info please refer to Documentation networking timestamping txt
20. a general parameter is sys kernel debug mlx4 ib DEVICE attribute The path to a counter is sys kernel debug mlx4 ib DEVICE ecn algorithm ports port statistics prios prio counter 3 10 2 ConnectX 4 ECN ECN in ConnectX 4 enables end to end congestions notifications between two end points when a congestion occurs and works over Layer 3 ECN must be enabled on all nodes in the path nodes routers etc between the two end points and the intermediate devices switches between them to ensure reliable communication 3 10 2 1 Enabling ECN gt To enable ECN on the hosts Step 1 Enable ECN in sysfs sys class net lt interface gt lt protocol gt ecn_ lt protocol gt enable 1 Step 2 Query the attribute cat sys class net lt interface gt ecn lt protocol gt params lt requested attribute gt Step 3 Modify the attribute echo lt value gt sys class net lt interface gt ecn lt protocol gt params lt requested attribute gt ECN supports the following algorithms r roce ecn rp Reaction point r roce ecn np Notification point Each algorithm has a set of relevant parameters and statistics which are defined per device per port per priority gt To query ECN enable per Priority X cat sys class net interface ecn protocol enable X gt To read ECN configurable parameters cat sys class net lt interface gt ecn lt protocol gt requeste
21. com gt Products gt Software Ethernet Drivers Step 3 Use the md5sum utility to confirm the file integrity of your tarball image Installing MLNX EN The installation script install sh performs the following Discovers the currently installed kernel Uninstalls any previously installed MLNX OFED MLNX EN packages Installs the MLNX EN binary if they are available for the current kernel Identifies the currently installed Ethernet network adapters and automatically upgrades the firmware Installation Modes minx en installer supports 2 modes of installation The install scripts selects the mode of driver installation depending of the running OS kernel version Kernel Module Packaging KMP mode where the source rpm is rebuilt for each installed flavor of the kernel This mode is used for RedHat and SUSE distributions Non KMP installation mode where the sources are rebuilt with the running kernel This mode is used for vanilla kernels If the Vanilla kernel is installed as rpm please use the disable kmp flag when installing the driver 18 Mellanox Technologies Rev 3 1 1 0 4 The package consists of several source RPMs The install script rebuilds the source RPMs then installs the created binary RPMs The created kernel module binaries are located at For KMP RPMs installation On SLES mellanox mlnx en kmp RPM 1ib modules kernel ver updates mlnx en On RHEL kmod mellanox mlnx en RPM
22. hardware offload features Control DMA ring sizes and interrupt moderation 54 Mellanox Technologies 3 1 1 0 4 The following are the ethtool supported options Table 7 ethtool Supported Options Options Description ethtool i eth lt x gt Checks driver and device information For example gt ethtool i eth2 driver mlx4 en MT 0DD0120009 CX3 2 1 6 Aug 2013 2 30 3000 0000 1a 00 0 version firmware version bus info ethtool k eth lt x gt Queries the stateless offload status ethtool c eth lt x gt Queries interrupt coalescing settings ethtool C eth lt x gt adaptive rx onloff Note Supported in ConnectX 3 ConnectX 3 Pro cards only Enables disables adaptive interrupt moderation By default the driver uses adaptive interrupt moderation for the receive path which adjusts the moderation time to the traffic pattern For further information please refer to Adaptive Interrupt Moderation section ethtool C eth lt x gt pkt rate low N pkt rate high N rx usecs low N rx usecs high N Note Supported in ConnectX 3 ConnectX 3 Pro cards only Sets the values for packet rate limits and for moderation time high and low values For further information please refer to Adaptive Interrupt Moderation section ethtool C eth lt x gt rx usecs N rx frames N Sets the interrupt coalescing setting rx frames will be enforced
23. in kernel org 33 Flow Steering Flow Steering is applicable to the mlx4 driver only Flow steering is a new model which steers network flows based on flow specifications to specific QPs Those flows can be either unicast or multicast network flows In order to maintain flexibil ity domains and priorities are used Flow steering uses a methodology of flow attribute which is a combination of L2 L4 flow specifications a destination QP and a priority Flow steering rules could be inserted either by using ethtool or by using InfiniBand verbs The verbs abstraction uses an opposed terminology of a flow attribute ibv flow attr defined by a combination of specifi cations struct ibv_flow_spec_ 3 31 Enable Disable Flow Steering Only applicable to the mlx4 driver Flow Steering is automatically enabled in the mlx5 driver as of MLNX_EN v3 1 1 0 4 and above Flow steering is generally enabled when the 1og_num_mgm_entry_size module parameter is non positive e g 1og num mgm entry size meaning the absolute value of the parameter is a bit field Every bit indicates a condition or an option regarding the flow steering mechanism reserved bb h t bit Operation Description bO Force device managed Flow When set to 1 it forces HCA to be enabled regardless of Steering whether NC SI Flow Steering is supported or not bl Disable IPoIB Flow Steering When set to 1 i
24. of traffic However the general QoS flow may vary among them Plain Ethernet Applications use regular inet sockets and the traffic passes via the ker nel Ethernet driver e Applications use the API to transmit using QPs Raw Ethernet QP Application use VERBs API to transmit using a Raw Ethernet QP 3 1 2 Plain Ethernet Quality of Service Mapping Applications use regular inet sockets and the traffic passes via the kernel Ethernet driver The following is the Plain Ethernet QoS mapping flow 1 The application sets the ToS of the socket using setsockopt Tos value 2 ToS is translated into the prio using a fixed translation TOS 0 sk prio 0 TOS 8 sk prio 2 TOS 24 sk prio 4 TOS 16 sk prio 6 3 The Socket Priority is mapped to the UP Ifthe underlying device is a VLAN device egress map is used controlled by the vconfig command This is per VLAN mapping Ifthe underlying device is not a VLAN device the tc command is used In this case even though tc manual states that the mapping is from the prio to TC number the mlx 4 en driver interprets this as a prio to UP mapping Mellanox Technologies 23 Rev 3 1 1 0 4 Feature Overview and Configuration Mapping the sk prio to the UP is done by using tc wrap py i dev name gt u 0 1 2 3 4 5 6 7 4 The the UP is mapped to the TC as configured by the m1nx qos tool or by the 11dpad daemon if DCBX is u
25. the N A port type for port2 e g 1 4 Note that this parameter is valid only when num v s is not zero i e SRIOV is enabled Otherwise it is ignored Mellanox Technologies 41 3 1 1 0 4 Feature Overview and Configuration Parameter Recommended Value probe_vf If absent or zero no VF interfaces will be loaded in the Hypervisor host Ifnum_vfs is a number in the range of 1 63 the driver running on the Hypervisor will itself activate that number of VFs All these VFs will run on the Hyper visor This number will apply to all ConnectX HCAs on that host Ifits a triplet x y z applies only if all ports are config ured as Ethernet the driver probes X single port VFs on physical port 1 ysingle port VFs on physical port 2 applies only if such a port exist zn port VFs where n is the number of physical ports on device Those VFs are attached to the hypervisor Ifits format is a string the string specifies the probe vf parameter separately per installed HCA The string format is bb dd f v bb dd f v bb dd f bus device function of the PF of the HCA y number of VFs to use in the PF driver for that HCA which is either a single value or a triplet as described above For example probe_vfs 5 The PF driver will activate 5 VFs on the HCA and this will be applied to all ConnectX HCAs on the host probe vfs 00 04 0 5 00 07 0 8 The PF driver will activ
26. update the boot grub grub conf file to include a similar command line load parameter for the Linux kernel For example to Intel systems add default 0 timeout 5 splashimage hd0 0 grub splash xpm gz hiddenmenu title Red Hat Enterprise Linux Server 2 6 32 36 x86 645 root hd0 0 kernel vmlinuz 2 6 32 36 x86 64 ro root dev VolGroup00 LogVol00 rhgb quiet intel iommu on initrd initrd 2 6 32 36 x86 64 img 38 Mellanox Technologies Rev 3 1 1 0 4 a Please make sure the parameter intel_iommu on exists when updating the boot grub grub conf file otherwise SR IOV cannot be loaded 3 4 1 2 1 Configuring SR IOV for ConnectX 3 ConnectX 3 Pro Step 1 Install the MLNX OFED driver for Linux that supports SR IOV SR IOV can be enabled and managed by using one of the following methods Runthe mlxconfig tool and set the SRIOV EN parameter to 1 without re burning the firmware To find the mst device run mst start and mst status mlxconfig d lt mst_device gt s SRIOV_EN 1 For further information please refer to section mlxconfig Changing Device Configuration Tool in the MFT User Manual www mellanox com gt Products gt Software gt Firmware Tools Burn firmware with SR IOV support where the number of virtual functions VFs will be set to 16 enable sriov Step 2 Verify the HCA is configured to support SR IOV mstflint dev PCI Device dc 1 Verify in the HCA section the followi
27. 3 10 11 Enabling 202 ner tela er e RADAR rd ARE ION 62 3 10 1 2 Various ECN Paths luna ilasan apa o dues de oaks Tas 63 3 10 2 C nnectX 4 ECN e e DUI NOUS ane WS Np aS RD 63 3 102 Ena blint ECN a EER SGH eiTe 63 3 11 RSS Hash Function 64 3 12 Ethernet Performance Counters 64 3 13 RSS Support for IP Fragments 67 3 14 Wake on LAN 67 3 15 Hardware Accelerated 802 1ad VLAN Q in Q Tunneling 67 Chapter 4 69 4 Mellanox Technologies Rev 3 1 1 0 4 4 1 General Related 5 69 4 2 Ethernet Related Issues 69 4 3 Performance Related 5 71 4 4 SR IOV Related Issues 72 Mellanox Technologies 5 3 1 1 0 4 List of Tables Table 1 Document Revision History 1 7 Table 2 Abbreviations and Acronyms 11 Table 3 Glossaky z n E aa ee Gee na npe ease eg uS BENE 12 Table 4 Reference
28. Adaptive Interrupt Moderation Algo rithm on page 63 Section 3 13 RSS Support for IP Fragments on page 67 Updated Table 7 ethtool Supported Options on page 55 Updated ethtool K eth lt x gt options flag options Added the following new flags ethtool s eth lt x gt speed lt SPEED gt autoneg off and ethtool s eth lt x gt advertise lt gt autoneg on Updated port type array parameter description in Section 3 4 1 2 Setting Up SR IOV on page 37 Updated the following sections Section 3 8 Checksum Offload on page 58 Section 3 4 2 VXLAN Hardware Stateless Offloads on page 52 Section 3 4 22 Enabling VXLAN Hardware Stateless Offloads on page 52 Section 4 3 Performance Related Issues on page 71 2 3 2 0 1 November 27 2014 Added Section 3 5 1 5 Extended Error Handling EEH on page 54 8 Mellanox Technologies Rev 3 1 1 0 4 Table 1 Document Revision History Release Date Description 2 3 1 0 0 September 2014 Added the following sections Section 1 1 1 Tarball Package on page 13 Section 1 1 3 Firmware on page 14 Section 1 1 4 Directory Structure on page 14 Section 1 2 1 mlx4 Module Parameters on page 15 Section 2 2 Downloading MLNX EN on page 18 Section 2 3 1 Installation Modes on page 18 Section 3 3 2 Flow Steering Support on page 35 Sect
29. Flow Steering regardless of NC SI Flow Steering support disabling IPoIB Flow Steering support enabling AO static DMFS steering steering table is not optimized for rules ignoring source IP check The default value of 109 mgm entry size is 10 Meaning Ethernet Flow Steering i e IPoIB DMFS is disabled by default is enabled by default if NC SI DMFS is supported and the HCA supports at least 64 QPs per MCG entry Otherwise L2 steering BO is used When using SR IOV flow steering is enabled if there is an adequate amount of space to store the flow steering table for the guest master To enable Flow Steering Step 1 Open the etc modprobe d mlnx conf file Step2 Set the parameter 1og num entry size to anon positive value by writing the option mlx4 core log num entry sSize value Step3 Restart the driver To disable Flow Steering Step 1 Open the etc modprobe d mlnx conf file Step 2 Remove the options mlx4 core log num mgm entry size value Step3 Restart the driver For example a value of 7 means forcing flow steering regardless of NC SI flow steering sup port disabling IPoIB flow steering support and enabling AO static DMFS steering The default value of 109 num mgm entry size is 10 Meaning Ethernet Flow Steering i e IPoIB DMFS is disabled by default is enabled by default if NC SI DMFS is supported and the HCA supports at least 64 QPs per MCG entry Otherwise L2 st
30. Fs varies upon the working mode requirements Step 5 Reboot the server If the SR IOV is not supported by the server the machine might not come out of boot load Az Step 6 Load the driver and verify the SR IOV is supported Run lspci grep Mellanox 03 00 0 InfiniBand Mellanox Technologies MT26428 ConnectX VPI PCIe 2 0 5GT s IB QDR 10GigE b0 03 00 1 InfiniBan 03 00 2 InfiniBan 03 00 3 InfiniBan 03 00 4 InfiniBan 03 00 5 InfiniBan Where 03 00 represents the Physical Function Mellanox Technologies MT27500 Family ConnectX 3 Virtual Function Mellanox Technologies MT27500 Family ConnectX 3 Virtual Function lanox Technologies MT27500 Family ConnectX 3 Virtual Function rev b0 Mellanox Technologies MT27500 Family ConnectX 3 Virtual Function Mellanox Technologies MT27500 Family ConnectX 3 Virtual Function D 03 00 X represents the Virtual Function connected to the Physical Function 3 4 1 2 2 Configuring SR IOV for ConnectX 4 Step 1 Install the MLNX OFED driver for Linux that supports SR IOV Step 2 Check if SR IOV is enabled in the firmware mlxconfig d dev mst mt4115 pciconf0 q Device 1 Device type ConnectX4 PCI device dev mst mt4115 pciconf0 Configurations Current SRIOV EN L NUM_OF_VFS 8 FPP_EN 1 Mellanox Technologies 43 J 3 1 1 0 4 Feature Overview and Configuration If not use mlxconfig to enable it mlxconf
31. GE Mellanox TECHNOLOGIES Mellanox Technologies 350 Oakmead Parkway Suite 100 Sunnyvale CA 94085 U S A www mellanox com Tel 408 970 3400 Fax 408 970 3403 Copyright 2015 Mellanox Technologies All Rights Reserved Mellanox Mellanox logo BridgeX ConnectX Connect IB CoolBox CORE Direct GPUDirect InfiniBridge InfiniHost InfiniScale Kotura Kotura logo Mellanox Connect Accelerate Outperform logo Mellanox Federal Systems Mellanox Open Ethernet Mellanox Virtual Modular Switch MetroDX MetroX MLNX OS Open Ethernet logo PhyX ScalableHPC SwitchX TestX The Generation of Open Ethernet logo UFM Virtual Protocol Interconnect Voltaire and Voltaire logo are registered trademarks of Mellanox Technologies Ltd CyPU ExtendX FabricIT FPGADirect HPC X Mellanox Care Mellanox CloudX Mellanox Multi Host Mellanox NEO Mellanox Open Ethernet Mellanox PeerDirect Mellanox Socket Direct NVMeDirect StPU Spectrum Switch IB Unbreakable Link are trademarks of Mellanox Technologies Ltd All other trademarks are property of their respective owners 2 Mellanox Technologies Document Number 2950 Rev 3 1 1 0 4 Table of Contents Table of Contents 2 oe ok CON euet pe oie oe Ob eee eee ees 9 2 Deme E
32. I Rev 3 1 1 0 4 Feature Overview and Configuration gt Usage Options 26 Mellanox Technologies Rev 3 1 1 0 4 3 1 5 1 1 Get Current Configuration Mellanox Technologies 27 I Rev 3 1 1 0 4 Feature Overview and Configuration 3 1 5 1 2 Set ratelimit 3Gbps for 0 4Gbps for tc1 2Gbps for tc2 3 1 5 1 3 Configure QoS map UP 0 7 to 0 1 2 3 to tc1 and 4 5 6 to tc 2 set tc0 tc1 as ets and tc2 as strict divide ets 30 for 0 and 70 for tc1 28 Mellanox Technologies Rev 3 1 1 0 4 3 1 5 2 tc and tc wrap py The tc tool is used to setup prio to UP mapping using the queue discipline In kernels that do not support mqprio such as 2 6 34 an alternate mapping is created in sysfs The wrap py tool will use either the sysfs or the tc tool to configure the sk prio to UP mapping Usage Options Example set skprio 0 2 to UPO and skprio 3 7 to UP1 on eth4 Mellanox Technologies 29 Rev 3 1 1 0 4 Feature Overview and Configuration 3 1 5 3 Additional Tools 3 2 3 2 1 tool compiled with the sch_mgprio module is required to support kernel v2 6 32 or higher This is a part of iproute2 package v2 6 32 19 or higher Otherwise an alternative custom sysfs interface is available mlnx qos tool package ofed scripts requires python gt 2 5 tc_wrap py package ofed scripts requires python gt 2 5
33. Mellanox TECHNOLOGIES Connect Accelerate Outperform MLNX_EN for Linux User Manual Rev 3 1 1 0 4 www mellanox com Rev 3 1 1 0 4 NOTE THIS HARDWARE SOFTWARE OR TEST SUITE PRODUCT PRODUCT S AND ITS RELATED DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES AS IS WITH ALL FAULTS OF ANY KIND AND SOLELY FOR THE PURPOSE OF AIDING THE CUSTOMER IN TESTING APPLICATIONS THAT USE THE PRODUCTS IN DESIGNATED SOLUTIONS THE CUSTOMER S MANUFACTURING TEST ENVIRONMENT HAS NOT MET THE STANDARDS SET BY MELLANOX TECHNOLOGIES TO FULLY QUALIFY THE PRODUCT S AND OR THE SYSTEM USING IT THEREFORE MELLANOX TECHNOLOGIES CANNOT AND DOES NOT GUARANTEE OR WARRANT THAT THE PRODUCTS WILL OPERATE WITH THE HIGHEST QUALITY ANY EXPRESS OR IMPLIED WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT ARE DISCLAIMED IN NO EVENT SHALL MELLANOX BE LIABLE TO CUSTOMER OR ANY THIRD PARTIES FOR ANY DIRECT INDIRECT SPECIAL EXEMPLARY OR CONSEQUENTIAL DAMAGES OF ANY KIND INCLUDING BUT NOT LIMITED TO PAYMENT FOR PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY FROM THE USE OF THE PRODUCT S AND RELATED DOCUMENTATION EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMA
34. Offload GRO is available throughout all kernels Hardware VLAN Striping Offload rxvlan When enabled received VLAN traffic will be stripped from the VLAN tag by the hardware RX FCS Keeps FCS field in the received packets RX FCS validation rx all Ignores FCS validation on the received packets Note The flags below are supported in ConnectX 3 Con nectX 3 Pro cards only rxvlan on off txvlan on off ntuple on off rxhash on off rx all on off rx fcs on off ethtool a eth lt x gt Note Supported in ConnectX 3 ConnectX 3 Pro cards only Queries the pause frame settings ethtool A eth lt x gt rx onloff tx Note Supported in ConnectX 3 ConnectX 3 Pro cards onjoff only Sets the pause frame settings ethtool g eth lt x gt Queries the ring size values ethtool G eth lt x gt rx lt N gt tx Modifies the rings size lt N gt 56 Mellanox Technologies Rev 3 1 1 0 4 Table 7 ethtool Supported Options Options Description ethtool p identify eth lt x gt lt LED duration gt Allows users to identify interface s physical port by turning the ports LED on for a number of seconds Note The limit for the LED duration is 65535 seconds ethtool S eth lt x gt Obtains additional device statistics ethtool t eth lt x gt Performs a self diagnostics test ethtool s eth lt x gt msglvl N Changes the current dri
35. ack packets if gt 0 default 1 int Either a single value e g 5 to define uniform num vfs value for all devices functions or a string to map device func tion numbers to their num vfs values e g 0000 04 00 0 5 00 oy 1 Hexadecimal digits for the device function e g 002b 1c 0b a and decimal for num vfs value e g 15 string Either a single value e g 3 to indicate that the Hypervi sor driver itself should activate this number of VFs for each HCA on the host or a string to map device function numbers to their probe vf values e g 0000 04 00 0 3 002b 1c 0b a 13 Hexadecimal digits for the device function e g 002b 1c 0b a and decimal for probe vf value e g 13 string log mgm size that defines the num of qp per mcg for example 10 gives 248 range 7 log num mgm entry size 12 To activate device managed flow steering when available set to A Enable steering mode for higher packet rate default off int Enable fast packet drop when no recieve WQEs are posted int Enable 64 byte CQEs EQEs when the FW supports this if non zero default 1 int Log2 max number of MACs per ETH port 1 7 int Obsolete Log2 max number of VLANs per ETH port 0 7 int Log2 number of MTT entries per segment 0 7 default 0 int Either pair of values e g 1 2 to define uniform port1 port2 types configuration for all devices functions or a string to map device funct
36. ad the driver and verify the SR IOV is supported Run lspci grep Mellanox 08 00 0 Infiniband controller Mellanox Technologies MT27700 Family ConnectX 4 08 00 1 Infiniband controller Mellanox Technologies MT27700 Family ConnectX 4 08 00 2 Infiniband controller Mellanox Technologies MT27700 Family ConnectX 4 Virtual Function 08 00 3 Infiniband controller Mellanox Technologies MT27700 Family ConnectX 4 Virtual Function 08 00 4 Infiniband controller Mellanox Technologies MT27700 Family ConnectX 4 Virtual Function 08 00 5 Infiniband controller Mellanox Technologies MT27700 Family ConnectX 4 Virtual Function When SR IOV is enabled in the PF the following structure becomes available at sys class net ethx device sriov 44 Mellanox Technologies Rev 3 1 1 0 4 node policy port node policy port olde policy port In the extract above we see three Virtual Functions numbered 0 to 2 For each Virtual Function we have the following files Node stands for node GUID The user can set the node GUID of the VF by writing to this file Port stands for port GUID The user can set the port GUID of the VF by writing to this file Policy The user can read or modify the vport policy Policy can be one of Down the VPort PortState always remains Down Up ifthe current VPort PortState is
37. ager_pci grep PSID PSID 1210110019 Step2 Download the firmware BIN file from the website or the OEM website Step 3 Burn the firmware mlxfwmanager pci i fw file bin gt Step 4 Reboot your machine after the firmware burning is completed 2 8 Ethernet Driver Usage and Configuration To assign an IP address to the interface gt ifconfig eth lt x gt ip a x is the OS assigned interface number gt To check driver and device information gt ethtool i eth lt x gt Example gt ethtool i eth2 driver mlx4 en version 2 1 8 Oct 06 2013 firmware version 2 30 3110 bus info 0000 1a 00 0 To query stateless offload status gt ethtool k eth lt x gt To set stateless offload status gt ethtool K eth x rx on off tx on off sg on off tso on off lro on off To query interrupt coalescing settings gt ethtool c eth lt x gt To enable disable adaptive interrupt moderation gt ethtool C eth x adaptive rx on off By default the driver uses adaptive interrupt moderation for the receive path which adjusts the mod eration time to the traffic pattern gt To set the values for packet rate limits and for moderation time high and low gt ethtool C eth x pkt rate low N pkt rate high N rx usecs low N rx usecs high N Above an upper limit of packet rate adaptive moderation will set the moderation time to its highest valu
38. and interface is done by setting the en_ecn module parameter of mlx 4 ibto 1 options mlx4 ib ecn 1 Enabling ECN To enable ECN on the hosts Step 1 Enable ECN in sysfs proc sys net ipv4 tcp ecn 1 Step 2 Enable ECN CLI options mlx4 ib en 1 Step3 Restart the driver etc init d openibd restart Step 4 Mount debugfs to access ECN attributes mount t debugfs none sys kernel debug Please note mounting of debugfs is required The following is an example for ECN configuration through debugfs echo 1 to enable attribute sys kernel debug mlx4_ib lt device gt ecn lt algo gt ports 1 params prios lt prio gt lt the requested attribute gt ECN supports the following algorithms r roce rp r roce ecn np Each algorithm has a set of relevant parameters and statistics which are defined per device per port per priority r roce ecn np has an extra set of general parameters which are defined per device ECN and QCN are not compatible When using ECN QCN and all its related dae d mons utilities that could enable it i e should be turned OFF 62 Mellanox Technologies Rev 3 1 1 0 4 3 10 1 2 Various Paths The following the paths to ECM algorithm general parameters and counters The path to an algorithm attribute is except for general parameters sys kernel debug m1x4_ib DEVICE ecn algorithm ports port params prios prio attribute The path to
39. and mitigate congestion spreading and resulting victim flows in lossless environments The Quantized Congestion Notification QCN IEEE standard 802 1Qau provides congestion control for long lived flows in limited bandwidth delay product Ethernet networks It is part of the IEEE Data Center Bridging DCB protocol suite which also includes ETS PFC and DCBX QCN in conducted at L2 and is targeted for hardware implementations QCN applies to all Ethernet packets and all transports and both the host and switch behavior is detailed in the standard QCN user interface allows the user to configure QCN activity QCN configuration and retrieval of information is done by the m1nx qcn tool The command interface provides the user with a set of changeable attributes and with information regarding QCN s counters and statistics All parameters and statistics are defined per port and priority QCN command interface is available 1f and only the hardware supports it 3 9 1 QCN Tool qcn mlnx qcn is a tool used to configure QCN attributes of the local host It communicates directly with the driver thus does not require setting up a DCBX daemon on the system The enables the user to Inspect the current QCN configurations for a certain port sorted by priority Inspect the current QCN statistics and counters for a certain port sorted by priority Set values of chosen QCN parameters Usage mlnx i interface optio
40. applied to all ConnectX HCAs on the host num vfs 00 04 0 5 00 07 0 8 The driver will enable 5 VFs on the HCA positioned in BDF 00 04 0 and 8 on the one in 00 07 0 num vfs 1 2 3 The driver will enable 1 VF on physical port 1 2 VFs on physical port 2 and 3 dual port VFs applies only to dual port HCA when all ports are Ethernet ports num vfs 00 04 0 5 6 7 00 07 0 8 9 10 The driver will enable HCA positioned in BDF 00 04 0 Ssingle VFs on port 1 6single VFs on port 2 7 dual port VFs HCA positioned in BDF 00 07 0 Ssingle VFs on port 1 Osingle VFs on port 2 10 dual port VFs Applies when all ports are configure as Ethernet in dual port HCAs Notes PFs not included in the above list will not SR IOV enabled Triplets and single port VFs are only valid when all ports are configured as Ethernet When an InfiniBand port exists only num_vfs a syntax is valid where is a single value that represents the number of VFs The second parameter in a triplet is valid only when there are more than 1 physical port In a triplet 2 lt 63 and y z lt 63 the maximum number of VFs on each physical port must be 63 port type array Specifies the protocol type of the ports It is either one array of 2 port types t1 t2 for all devices or list of BDF to port type array bb dd f t1 t2 string Valid port types 1 ib 2 eth 3 auto 4 N A If only a single port is available use
41. ate 5 VFs on the HCA positioned in BDF 00 04 0 and 8 for the one in 00 07 0 probe vf 1 2 3 The PF driver will activate 1 VF on physical port 1 2 VFs on physical port 2 and 3 dual port VFs applies only to dual port HCA when all ports are Ethernet ports This applies to all ConnectX HCAs in the host probe vf 00 04 0 5 6 7 00 07 0 8 9 10 The PF driver will activate HCA positioned in BDF 00 04 0 Ssingle VFs on port 1 6 single VFs on port 2 7 dual port VFs HCA positioned in BDF 00 07 0 Ssingle VFs on port 1 Osingle VFs on port 2 10 dual port VFs Applies when all ports are configure as Ethernet in dual port HCAs 42 Mellanox Technologies Rev 3 1 1 0 4 Parameter Recommended Value probe_vf Notes PFs not included in the above list will not activate any of their VFs in the PF driver Triplets and single port VFs are only valid when all ports are configured as Ethernet When an InfiniBand port exist only probe vf a syntax is valid where is a single value that represents the number of VFs The second parameter in a triplet is valid only when there are more than 1 physical port Every value either a value in a triplet or a single value should be less than or equal to the respective value of num_vfs parameter The example above loads the driver with 5 VFs num_vfs The standard use of a VF is a single VF per a single VM However the number of V
42. bytes Unicast packet bytes received successfully vport rx multicast packets Multicast packets received successfully vport rx multicast bytes Multicast packet bytes received successfully vport rx broadcast packets Broadcast packets received successfully vport rx broadcast bytes Broadcast packet bytes received successfully vport rx dropped Received packets discarded due to luck of software receive buffers WQEs Important indication to weather RX comple tion routines are keeping up with hardware ingress packet rate vport rx filtered Received packets dropped due to packet check that failed For example Incorrect VLAN incorrect Ethertype unavailable queue QP or loopback prevention vport tx unicast packets Unicast packets sent successfully vport tx unicast bytes Unicast packet bytes sent successfully vport tx multicast packets Multicast packets sent successfully vport tx multicast bytes Multicast packet bytes sent successfully vport tx broadcast packets Broadcast packets sent successfully vport tx broadcast bytes Broadcast packet bytes sent successfully vport tx dropped Packets dropped due to transmit errors aggregated Number of packets processed by the LRO mechanism rx lro flushed Number of offloaded packets the LRO mechanism passed to kernel rx lro no desc LRO mechanism has no room to receive packets from the adap
43. ctober 2013 Added the following sections Section 3 4 1 Single Root IO Virtualization SR IOV on page 37 Section 3 3 Flow Steering on page 33 Section 32 Time Stamping Service on page 30 10 Mellanox Technologies Rev 3 1 1 0 4 About this Manual This Preface provides general information concerning the scope and organization of this User s Manual Intended Audience This manual is intended for system administrators responsible for the installation configuration management and maintenance of the software and hardware of VPI InfiniBand Ethernet adapter cards It is also intended for application developers Common Abbreviations and Acronyms Table 2 Abbreviations and Acronyms u s Whole Word Description B Capital is used to indicate size in bytes or multiples of bytes e g IKB 1024 bytes and 1MB 1048576 bytes b Small b is used to indicate size in bits or multiples of bits e g IKb 1024 bits FW Firmware HW Hardware LSB Least significant byte Isb Least significant bit MSB Most significant byte msb Most significant bit NIC Network Interface Card SW Software VPI Virtual Protocol Interconnect PFC Priority Flow Control PR Path Record RDS Reliable Datagram Sockets SL Service Level QoS Quality of Service ULP Upper Level Protocol VL Virtual Lane Mellanox Technologies 11
44. d Flow Control policy on RX 7 0 Per priority bit mask uint 1 2 2 5 Module Parameters The mlx5 core module supports a single parameter used to select the profile which defines the number of resources supported The parameter name for selecting the profile is prof sel The supported values for profiles are 0 for medium resources medium performance forlow resources 2 for high performance int default Mellanox Technologies 17 Rev 3 1 1 0 4 Installation 2 2 1 2 2 2 3 2 3 1 Installation This chapter describes how to install and test the MLNX_EN for Linux package on a single host machine with Mellanox InfiniBand and or Ethernet adapter hardware installed Software Dependencies To install the driver software kernel sources must be installed on the machine MLNX EN driver cannot coexist with OFED software on the same machine Hence when installing MLNX EN all OFED packages should be removed run the install sh script Downloading MLNX_EN Step 1 Verify that the system has a Mellanox network adapter HCA NIC installed The following example shows a system with an installed Mellanox HCA lspci v grep Mellanox 06 00 0 Network controller Mellanox Technologies MT27500 Family ConnectX 3 Subsystem Mellanox Technologies Device 0024 Step 2 Download the tarball image to your host The image s name has the format MLNX EN ver tgz You can download it from http www mellanox
45. d attributes gt To enabled ECN for each priority per protocol echo 1 gt sys class net interface ecn protocol enable X gt To modify ECN configurable parameters echo lt value gt gt sys class net lt interface gt ecn lt protocol gt requested attributes Where X priority 0 7 Mellanox Technologies 63 J Rev 3 1 1 0 4 Feature Overview and Configuration 3 11 3 12 protocol roce roce np requested attributes Next Slide for each protocol XOR RSS Hash Function The device has the ability to use XOR as the RSS distribution function instead of the default Toplitz function The XOR function can be better distributed among driver s receive queues in small number of streams where it distributes each TCP UDP stream to a different queue MLNX EN v2 2 1 0 0 and onwards provides an option to change the working RSS hash function from Toplitz to XOR and vice versa through ethtool priv flags For further information please refer to Table 7 ethtool Supported Options on page 55 This is the default behavior when using ConnectX 4 adapter cards and it cannot be changed Ethernet Performance Counters P Supported in ConnectX 3 and ConnectX 3 Pro only Counters are used to provide information about how well an operating system an application a service or a driver is performing The counter data helps determine system bottlenecks and fine tune the system and application performance
46. d the ethtool device feature are set the device will be ready for 802 1ad VLAN acceleration The phv bit private flag setting is available for the Physical Function PF only The Virtual Function VF can use the VLAN acceleration by setting the tx vlan stag hw insert parameter only if the private flag phv bit is enabled by the PF If the PF enables disables the phv bit flag after the VF driver is up the configuration will take place only after the VF driver 1s restarted Az 68 Mellanox Technologies 3 1 1 0 4 4 Troubleshooting You may be able to easily resolve the issues described in this section If a problem persists and you are unable to resolve it yourself please contact your Mellanox representative or Mellanox Support at support mellanox com 4 1 General Related Issues Table 8 General Related Issues Issue Cause Solution The system panics when Malfunction hardware com Remove the failed adapter it is booted with a failed ponent 2 Reboot the system adapter installed Mellanox adapter is not PCI slot or adapter PCI 1 Run 1spci identified as a PCI connector dysfunctionality 2 Reseat the adapter in its PCI slot or device insert the adapter to a different PCI slot If the PCI slot confirmed to be func tional the adapter should be replaced Mellanox adapters are Misidentification of the Run the command below and check not installed in the sys Mellanox a
47. dapter installed Mellanox s MAC to identify the Mella tem nox adapter installed lspci grep Mellanox or lspci d 15b3 Mellanox MACs start with 00 02 C9 xx xx xx 00 25 8B xx xx xx or F4 52 14 xx xx xx 4 2 Ethernet Related Issues Table 9 Ethernet Related Issues Issue Cause Solution No link Misconfiguration of the Ensure the switch port is not down switch port or using a cable Ensure the switch port rate is config not supporting link rate ured to the same rate as the adapter s port Mellanox Technologies 69 Table 9 Ethernet Related Issues 3 1 1 0 4 Troubleshooting Issue Cause Solution Degraded performance is measured when hav ing a mixed rate envi ronment 10GbE 40GbE and 56GbE Sending traffic from a node with a higher rate to a node with lower rate Enable Flow Control on both switch s ports and nodes On the server side run ethtool A lt interface gt rx on tx on On the switch side run the following command on the relevant interface send on force and receive on force No link with break out cable Misuse of the break out cable or misconfiguration of the switch s split ports Use supported ports on the switch with proper configuration For fur ther information please refer to the MLNX OS User Manual Make sure the QSFP break out cable side is connected to the SwitchX Physical link fails to The a
48. dapter is running an Install the latest firmware on the negotiate to maximum outdated firmware adapter supported rate Physical link fails to The cable is not connected Ensure that the cable 1s connected on come up while port physical state is Polling to the port or the port on the other end of the cable is disabled both ends or use a known working cable Check the status of the connected port using the ibportstate com mand and enable it if necessary Physical link fails to come up while port physical state is Dis abled The port was manually dis abled Restart the driver etc init d openibd restart 70 Mellanox Technologies Rev 3 1 1 0 4 4 3 Performance Related Issues Table 10 Performance Related Issues Issue Cause Solution The driver works but the These recommendations may assist with transmit and or receive gaining immediate improvement data rates are not opti 1 Confirm PCI link negotiated uses its mal maximum capability 2 Stop the IRQ Balancer service etc init d irq balancer stop 3 Start mInx affinity service mlnx affinity start For best performance practices please refer to the Performance Tuning Guide for Mellanox Network Adapters www mellanox com gt Products gt InfiniBand VPI Drivers gt Linux SW Drivers Out of the box throughput IRQ affinity is not set properly For additional performance tuning please performance in Ubun by the
49. ding to a minimal guarantee policy If for instance TCO is set to 80 guarantee and TC1 to 20 the TCs sum must be 100 then the BW left after servicing all strict priority TCs will be split according to this ratio Since this is a minimal guarantee there is no maximum enforcement This means in the same example that if TC1 did not use its share of 20 the reminder will be used by TCO ETS is configured using the mlnx qos tool mlnx qos which allows you to Assign a transmission algorithm to each TC strict or ETS Set minimal BW guarantee to ETS TCs Usage mlnx qos i options 3 1 4 3 Rate Limit Rate limit defines a maximum bandwidth allowed for a TC Please note that 10 deviation from the requested values is considered acceptable 3 1 5 Quality of Service Tools 3 1 5 1 qos mlnx_qos is centralized tool used to configure QoS features of the local host It communicates directly with the driver thus does not require setting up a DCBX daemon on the system The tool enables the administrator of the system to Inspect the current QoS mappings and configuration The tool will also display maps configured by TC and vconfig set egress map tools in order to give a centralized view of all QoS mappings Set UP to TC mapping Assign a transmission algorithm to each TC strict or ETS Set minimal BW guarantee to ETS TCs Set rate limit to TCs Mellanox Technologies 25 J
50. directed to different rings When the first packet in IP fragments chain contains upper layer transport header e g UDP packets larger than MTU it will be directed to the same target as the pro ae ceeding IP fragments that follows it to prevent out of order processing 3 14 Wake on LAN WoL Wake on LAN WOL is a technology that allows a network professional to remotely power on a computer or to wake it up from sleep mode To enable WoL ethtool s lt interface gt wol g To get WoL ethtool lt interface gt grep Wake on Wake on g Where is the magic packet activity 3 15 Hardware Accelerated 802 1ad VLAN Q in Q Tunneling Q in Q tunneling allows the user to create a Layer 2 Ethernet connection between two servers The user can segregate a different VLAN traffic on a link or bundle different VLANs into a sin gle VLAN Q in Q tunneling adds a service VLAN tag before the user s 802 1Q VLAN tags gt To enable device support for accelerated 802 1 VLAN Mellanox Technologies 67 Rev 3 1 1 0 4 Feature Overview and Configuration 1 Turn on the new ethtool private flag phv bit disabled by default 5 ethtool set priv flags ethl phv bit on Enabling this flag sets the phv_en port capability 2 Change the interface device features by turning on the ethtool device feature tx vlan stag hw insert disabled by default ethtool K ethl tx vlan stag hw insert on Once the private flag an
51. e Below a lower limit of packet rate the moderation time will be set to its lowest value gt To set interrupt coalescing settings when adaptive moderation is disabled gt ethtool C eth x rx usecs N rx frames N Mellanox Technologies 21 Rev 3 1 1 0 4 Installation usec settings correspond to the time to wait after the last packet is sent received before triggering an interrupt gt ConnectX 3 ConnectX 3 Pro To query pause frame settings gt ethtool a eth lt x gt gt ConnectX 3 ConnectX 3 Pro To set pause frame settings gt ethtool eth x rx on off tx on off To query ring size values gt ethtool g eth lt x gt gt To modify rings size gt ethtool G eth lt x gt rx lt N gt tx lt N gt gt To obtain additional device statistics gt ethtool S eth lt x gt gt ConnectX 3 ConnectX 3 Pro To perform a self diagnostics test gt ethtool t eth lt x gt The driver defaults to the following parameters Both ports are activated 1 a net device is created for each port The number of Rx rings for each port is the nearest power of 2 of number of cpu cores limited by 16 LRO is enabled with 32 concurrent sessions per Rx ring Some of these values can be changed using module parameters which can be displayed by run ning gt modinfo mlx4 en set non default values to module parameters add to the etc modprobe cont file o
52. e checksum value This allows accelerating checksum validation in Linux Networking Stack since it does not have to calculate the whole checksum including pay load by itself Checksum Complete is passed to OS when all of the following are true Ethtool k lt DEV gt shows rx checksumming on Using ConnectX 3 firmware version 2 31 7000 and up Received IpV4 IpV6 non TCP UDP packet The ingress parser of the ConnectX 3 Pro card comes by default without checksum offload support for non TCP UDP packets To change that please set the value of the module parameter ingress parser mode in mlx4 core to 1 In this mode IPv4 IPv6 non TCP UDP packets will be passed up to the protocol stack with CHECKSUM COMPLETE tag In this mode of the ingress parser the following features are unavailable NVGRE stateless offloads VXLAN stateless offloads RoCE v2 R RoCE over UDP Change the default behavior only if non tcp udp is very common 58 Mellanox Technologies Rev 3 1 1 0 4 e CHECKSUM NONE By setting this mode the driver indicates to the Linux Network ing Stack that the hardware failed to validate the IP or L4 checksum so the Linux Net working Stack must calculate and validate the IP L4 Checksum Checksum None is passed to OS for all other cases 3 9 Quantized Congestion Control Supported in ConnectX 3 and ConnectX 3 Pro only A Congestion control is used to reduce packet drops in lossy environments
53. e text file etc modprobe d mlx4 core conf if it does not exist Step 4 Insert an option line in the etc modprobe d mlx4 core conf file to set the number of VFs the protocol type per port and the allowed number of virtual functions to be used by the physical function driver probe vf For example options mlx4 core num vfs 5 port type array 1 2 probe vf 1 Parameter Recommended Value num vfs Ifabsent or zero no VFs will be available Ifits value is a single number in the range of 0 63 The driver will enable the num v s VFs on the HCA and this will be applied to all ConnectX amp HCAs on the host Ifits a triplet x y z applies only if all ports are config ured as Ethernet the driver creates xsingle port VFs on physical port 1 ysingle port VFs on physical port 2 applies only if such a port exist zn port VFs where n is the number of physical ports on device This applies to all ConnectX HCAs on the host 40 Mellanox Technologies Rev 3 1 1 0 4 Parameter Recommended Value num_vfs e Ifits format is a string The string specifies the num vfs parameter separately per installed HCA The string format is bb dd f v bb dd f v bb dd f bus device function of the PF of the HCA v number of VFs to enable for that HCA which is either a single value or a triplet as described above For example num vfs 5 The driver will enable 5 VFs on the HCA and this will be
54. ed UDP port If using previous firmware versions set the VXLAN tunnel over UDP port 4789 To add the UDP port to etc modprobe d vxlan conf options vxlan udp port number decided above 52 Mellanox Technologies Rev 3 1 1 0 4 3 4 2 3 Important Notes VXLAN tunneling adds 50 bytes 14 eth 20 ip 8 udp 8 vxlan to the VM Ethernet frame Please verify that either the MTU of the NIC who sends the packets e g the VM virtio net NIC or the host side veth device or the uplink takes into account the tunneling overhead Meaning the MTU of the sending NIC has to be decremented by 50 bytes e g 1450 instead of 1500 or the uplink NIC MTU has to be incremented by 50 bytes e g 1550 instead of 1500 From upstream 3 15 rcl and onward it is possible to use arbitrary UDP port for VXLAN Note that this requires firmware version 2 31 2800 or higher Additionally you need to enable this kernel configuration option CONFIG MLX4 EN VXLAN y On upstream kernels 3 12 3 13 GRO with VXLAN is not supported 3 5 Resiliency 3 5 1 Reset Flow Supported in ConnectX 3 and ConnectX 3 Pro only Reset Flow is activated by default once a fatal device error is recognized Both the HCA and the software are reset the ULPs and user application are notified about it and a recovery process is performed once the event is raised The Reset Flow is activated by the mlx4 core module parameter internal err reset and its defaul
55. eering BO is used When using SR IOV flow steering is enabled if there is an adequate amount of space to store the flow steering table for the guest master To enable Flow Steering Step 1 Open the etc modprobe d mlnx conf file 34 Mellanox Technologies Rev 3 1 1 0 4 Step 2 Set the parameter log num entry size to anon positive value by writing the option mlx4 core log num_mgm_entry_size lt value gt Step 3 Restart the driver gt To disable Flow Steering Step 1 Open the etc modprobe d mlnx conf file Step 2 Remove the options mlx4 core log num mgm entry size value Step3 Restart the driver 3 3 2 Flow Steering Support To determine which Flow Steering features are supported ethtool show priv flags eth4 The following output is shown mlx4 flow steering ethernet 12 on Creating Ethernet L2 MAC rules is supported mlx4 flow steering ipv4 on Creating IPv4 rules is supported mlx4 flow steering tcp on Creating TCP UDP rules is supported Flow Steering support in InfiniBand is determined according to the EXP_MANAGED_ FLOW STEERING flag has NE 3 3 2 1 0 Static Device Managed Flow Steering Only applicable to the mlx4 driver 2 This mode enables fast steering however it might impact flexibility Using it increases the packet rate performance by 30 with the following limitations for Ethernet link layer unicast QPs Limits the number of opened RSS Kern
56. el QPs to 96 MACs should be unique 1 MAC per 1 The number of VFs is limited When creating Flow Steering rules for user QPs only MAC gt QP rules are allowed Both MACs and QPs should be unique between rules Only 62 such rules could be cre ated When creating rules with Ethtool MAC gt QP rules could be used where the QP must be the indirection RSS QP Creating rules that indirect traffic to other rings is not allowed Ethtool MAC rules to drop packets action 1 are supported RFS is not supported in this mode VLAN is not supported in this mode 3 33 Flow Domains and Priorities Flow steering defines the concept of domain and priority Each domain represents a user agent that can attach a flow The domains are prioritized A higher priority domain will always super Mellanox Technologies 35 J 3 1 1 0 4 Feature Overview and Configuration sede a lower priority domain when their flow specifications overlap Setting lower priority value will result in higher priority In addition to the domain there is priority within each of the domains Each domain can have at most 2712 priorities in accordance to its needs The following are the domains at a descending order of priority Ethtool Ethtool domain is used to attach an RX ring specifically its QP to a specified flow Please refer to the most recent ethtool manpage for all the ways to specify a flow Examples ethtool U eth5 flow ty
57. ering infrastructure to support the RFS logic by implementing the ndo rx flow steer which in turn calls the underlying flow steering mechanism with the RFS domain Enabling the RFS requires enabling the ntuple flag via the ethtool For example to enable ntuple for 0 run ethtool K eth0 ntuple on 36 Mellanox Technologies Rev 3 1 1 0 4 RFS requires the kernel to be compiled with the coONFIG_RFS_ACCEL option This options is available in kernels 2 6 39 and above Furthermore RFS requires Device Managed Flow Steering support RFS cannot function if LRO is enabled LRO can be disabled via ethtool A Allof the rest The lowest priority domain serves the following users The mlx4 Ethernet driver attaches its unicast and multicast MACs addresses to its QP using L2 flow specifications Fragmented UDP traffic cannot be steered It is treated as other protocol by hardware from the first packet and not considered as UDP traffic 3 4 Virtualization 3 4 1 Single Root IO Virtualization SR IOV Single Root IO Virtualization SR IOV is a technology that allows a physical PCIe device to present itself multiple times through the PCIe bus This technology enables multiple virtual instances of the device with separate resources Mellanox adapters are capable of exposing in ConnectX 3 adapter cards up to 126 virtual instances called Virtual Functions VFs and Con nectX 4 Connect IB adapter cards up to 62 virtual i
58. es rpg gd RPG GD LIST Set value of rpg gd according to priority use spaces between values and 1 for unknown values rpg min dec fac RPG MIN DEC FAC LIST Set value of rpg min dec fac according to priority use spaces between values and 1 for unknown values rpg min rate RPG MIN RATE LIST Set value of rpg min rate according to prior ity use spaces between values and 1 for unknown values cndd state machine CNDD STATE MACHINE LIST Set value of cndd state machine according to priority use spaces between values and 1 for unknown values To get QCN current configuration sorted by priority mlnx qcn i eth2 g parameters gt To show QCN s statistics sorted by priority mlnx i eth2 g statistics Example output when running mlnx_qcn i eth2 g parameters 60 Mellanox Technologies Rev 3 1 1 0 4 priority 0 rpg enable 0 rppp max rps 1000 rpg time reset 1464 rpg byte reset 150000 rpg threshold 5 rpg max rate 40000 rpg ai rate 10 rpg hai rate 50 rpg gd 8 rpg min dec fac 2 rpg min rate 10 cndd state machine 0 priority 1 rpg enable 0 rppp max rps 1000 rpg time reset 1464 rpg byte reset 150000 rpg threshold 5 rpg max rate 40000 rpg ai rate 10 rpg hai rate 50 rpg gd 8 rpg min dec fac 2 rpg min rate 10 cndd state machine 0 rpg_enable 0 rppp max rps 1000 rpg time reset 1464 rpg byte reset 150000 rpg threshold
59. ge 35 Section 3 4 1 6 2 Additional Ethernet VF Configura tion Options on page 49 Section 3 4 1 2 2 Configuring SR IOV for ConnectX 4 on page 43 3 0 1 0 1 June 21 2015 Added the following new sections Section 1 1 5 mlx4 VPI Driver on page 15 Section 1 1 6 mlx5 Driver on page 15 Section 1 2 2 mlx5 Module Parameters on page 17 Section 3 6 Ignore Frame Check Sequence FCS Errors on page 54 Updated the following sections Section 1 1 2 Software Components on page 14 Section 2 3 1 Installation Modes on page 18 Section 2 3 2 Installation Procedure on page 19 Section 2 7 1 Updating the Device Online on page 20 Section 2 7 2 Updating the Device Manually on page 20 Section 2 8 Ethernet Driver Usage and Configura tion on page 21 Section 3 7 Ethtool on page 54 Section 3 12 Ethernet Performance Counters on page 64 e Removed the following sections Power Management Adaptive Interrupt Moderation Algorithm Virtual Guest Tagging VGT Installing MLNX EN on XenServer6 1 Mellanox Technologies 7 3 1 1 0 4 Table 1 Document Revision History Release Date Description 2 4 1 0 0 1 January 26 2015 Added the following new sections Section 2 8 2 Updating the Device Online on page 21 Section 3 4 1 7 1 FDB Status Reporting on page 49 Section 3 13
60. he PCI representation in BDF to the respective ports mlnx get vfg pl The output is as following BDF 0000 04 00 0 iwi da 2 vf0 0000 04 00 1 vfl 0000 04 00 2 orig 28 2 vf2 0000 04 00 3 0000 04 00 4 Both 1 vf4 0000 04 00 5 3 4 1 7 Forwarding DataBase FDB Management 3 4 1 7 1 FDB Status Reporting FDB also know as Forwarding Information Base FIB or the forwarding table is most com monly used in network bridging routing and similar functions to find the proper interface to which the input interface should forward a packet In the SR IOV environment the Ethernet driver can share the existing 128 MACs for each port among the Virtual interfaces VF and Physical interfaces PF that share the same table as fol low Each VF gets 2 granted MACs which are taken from the general pool of the 128 MACs Each can ask for up to 128 MACs on the policy of first asks first served mean ing except for the 2 granted MACs the other MACs in the pool are free to be asked Mellanox Technologies 49 J 3 1 1 0 4 Feature Overview and Configuration To check if there are free MACs for its interface PF or VF run sys class net lt ethx gt fdb det 50 Mellanox Technologies Rev 3 1 1 0 4 Example cat sys class net eth2 fdb det device eth2 max 112 used 2 free macs 110 gt To add anew MAC to the interface echo lt MAC gt gt sys class net eth lt X gt fdb Once run
61. ig d dev mst mt4115 pciconf0 set SRIOV EN 1 NUM OF VFS 16 FPP EN 1 Step 3 Either reset or reboot the firmware mlxfwreset reboot Step 4 Write to the sysfs file the number of Virtual Functions you need to create for the PF You can use one of the following files A standard Linux kernel generated file that is available in the new kernels echo num vfs sys class net ethx device sriov numvfs A file generated by the mlx5 core driver with the same functionality as the kernel generated one Used by old kernels that do not have the standard file echo num vfs sys class net ethx device mlx5 num vfs The following rules apply when writing to these file Ifthere are no VFs assigned the number of VFs be changed to any valid value 0 max VFs as set during FW burning Ifthere are VFs assigned to a VM it is not possible to change the number of VFs Ifthe administrator unloads the driver the PF while there are no VFs assigned the driver will unload and SRI OV will be disabled Ifthere are VFs assigned while the driver of the PF is unloaded SR IOV is not be disabled This means VFs will be visible on the VM However they will not be operational The VF driver will discover this situation and will close its resources When the driver on the PF is reloaded the VF becomes operational The administrator of the VF will need to restart the driver in order to resume working with the VF Step 5 Lo
62. immediately rx usecs will be enforced only when adaptive moderation is disabled Note usec settings correspond to the time to wait after the last packet is sent received before triggering an interrupt Mellanox Technologies 55 J 3 1 1 0 4 Feature Overview and Configuration Table 7 ethtool Supported Options Options Description ethtool K eth lt x gt rx onloff tx Sets the stateless offload status onjoff sg on off tso onjoff Iro TCP Segmentation Offload TSO Generic Segmentation onjoff gro gso Offload GSO increase outbound throughput by reducing rxvlan onloff txvlan onjoff ntu CPU overhead It works by queuing up large buffers and let ple on off rxhash on off rx all ting the network interface card split them into separate on off rx fcs on off packets Large Receive Offload LRO increases inbound through put of high bandwidth network connections by reducing CPU overhead It works by aggregating multiple incoming packets from a single stream into a larger buffer before they are passed higher up the networking stack thus reducing the number of packets that have to be processed LRO is avail able in kernel versions 3 1 for untagged traffic Hardware VLAN insertion Offload txvlan When enabled the sent VLAN tag will be inserted into the packet by the hardware Note LRO will be done whenever possible Otherwise GRO will be done Generic Receive
63. ion 3 3 2 1 AO Static Device Managed Flow Steering on page 35 Section 3 4 1 8 Virtual Guest Tagging VGT on page 46 Section 3 5 1 Reset Flow on page 53 and its subsections Section 3 10 Explicit Congestion Notification ECN on page 62 Section 4 1 General Related Issues on page 69 Section 4 2 Ethernet Related Issues on page 69 Section 4 3 Performance Related Issues on page 71 Section 4 4 SR IOV Related Issues on page 72 Updated the following section Section 3 3 1 Enable Disable Flow Steering on page 33 Section 3 4 1 2 Setting Up SR IOV on page 37 Section 3 7 Ethtool on page 54 2 2 1 0 1 May 2014 Added the following sections Section 3 4 1 6 3 Mapping VFs to Ports using the mlnx get vfs pl Tool on page 49 Section 3 7 Ethtool on page 54 Section 3 9 Quantized Congestion Control on page 59 Section 3 9 Quantized Congestion Control on page 59 Section 3 10 Power Management on page 58 Section 3 11 XOR RSS Hash Function on page 64 Updated the following section Section 3 4 1 2 Setting Up SR IOV on page 37 Removed the following sections Burning Firmware with SR IOV Performance 2 1 1 0 0 January 2014 Added Section 3 12 Ethernet Performance Counters on page 64 Mellanox Technologies 9 J 3 1 1 0 4 Table 1 Document Revision History Release Date Description 2 0 3 0 0 O
64. ion numbers to their pair of port types values e g 0000 04 00 0 1 2 002b 1c 0b a 1 1 Valid port types 1 ib 2 eth 3 auto 4 N A If only a single port is available use the N A port type for port2 e g 1 4 log maximum number of QPs per HCA default 19 int log maximum number of SRQs per HCA default 16 int r of RDMARC buffers per QP default 4 int log numbe log maximum number of CQs per HCA default 16 int m log maxi int um number of multicast groups per HCA default 13 log maximum number of memory protection table entries per HCA default 19 int 16 Mellanox Technologies 3 1 1 0 4 log num mtt log maximum number of memory translation table segments per HCA default max 20 2 MTTs for register all of the host mem ory limited to 30 int enable qos Enable Quality of Service support in the HCA default off bool internal err reset Reset device on internal errors if non zero default is 1 int 1 2 1 2 mlx4 en Parameters inline thold Threshold for using inline data int Default and max value is 104 bytes Saves PCI read operation transaction packet less then threshold size will be copied to hw buffer directly udp rss Enable RSS for incoming UDP traffic uint On by default Once disabled no RSS for incoming UDP traffic will be done pfctx Priority based Flow Control policy on TX 7 0 Per priority bit mask uint pferx Priority base
65. ng them from being deliverable to a higher layer protocol rx_dropped Number of receive packets which were chosen to be dis carded even though no errors had been detected to prevent their being deliverable to a higher layer protocol rx_length_errors Number of received frames that were dropped due to an error in frame length errors Number of received frames that were dropped due to hard ware port receive buffer overflow rx_crc_errors Number of received frames with a bad CRC that are not runts jabbers or alignment errors rx_jabbers Number of received frames with a length greater than MTU octets and a bad CRC rx in range length error Number of received frames with a length type field value in the decimal range 1500 46 42 is also counted for VLAN tagged frames rx out range length error Number of received frames with a length type field value in the decimal range 1535 1501 tx packets Total packets successfully transmitted tx bytes Total bytes in successfully transmitted packets tx multicast packets Total multicast packets successfully transmitted tx broadcast packets Total broadcast packets successfully transmitted tx errors Number of frames that failed to transmit tx dropped Number of transmitted frames that were dropped IX prio i packets Total packets successfully received with priority 1 p
66. ng fields 2 HCA num pfs 1 total vfs 0 126 sriov_en true Parameter Recommended Value num_pfs 1 Note This field is optional and might not always appear total_vfs When using firmware version 2 31 5000 and above the recommended value is 126 When using firmware version 2 30 8000 and below the recommended value is 63 Note Before setting number of VFs in SR IOV please make sure your system can support that amount of VFs Setting number of VFs larger than what your Hardware and Software can support may cause your system to cease working sriov en true 2 Add the above fields to the INI if they are missing 3 Set the total vfs parameter to the desired number if you need to change the num ber of total VFs 4 Reburn the firmware using the mlxburn tool if the fields above were added to the INI or the total_vfs parameter was modified 1 If SR IOV is supported to enable SR IOV if it is not enabled it is sufficient to set sriov en true in the INI 2 Ifthe HCA does not support SR IOV please contact Mellanox Support support mellanox com Mellanox Technologies 39 3 1 1 0 4 Feature Overview and Configuration If the mlxburn is not installed please downloaded it from the Mellanox website http www mellanox com gt products gt Firmware tools mlxburn fw fw ConnectX3 rel mlx dev dev mst mt4099 pci cr0 conf MCX341A XCG Ax ini Step3 Create th
67. ning the command above the interface VF PF verifies if a free MAC exists If there is a free MAC the VF PF takes it from the global pool and allocates it If there is no free MAC an error is returned notifying the user of lack of MACs in the pool gt To delete a MAC from the interface echo lt MAC gt gt sys class net eth lt X gt fdb If sys class net eth X fdb does not exist use the Bridge tool from the ip route2 package which includes the tool to manage FDB tables as the kernel supports FDB callbacks bridge fdb add 00 01 02 03 04 05 permanent self dev p3pl bridge fdb del 00 01 02 03 04 05 permanent self dev p3pl bridge fdb show dev p3pl If adding a new MAC from the kernel s NDO function fails due to insufficient MACS in the pool the following error flow will occur If the interface is a PF it will automatically enter the promiscuous mode If the interface is a VF it will try to enter the promiscuous mode and since it does not support it the action will fail and an error will be printed in the kernel s log Mellanox Technologies 51 3 1 1 0 4 Feature Overview and Configuration 3 42 VXLAN Hardware Stateless VXLAN technology introduced for solving scalability and security challenges requires exten sion of the traditional stateless offloads to avoid performance drop ConnectX 3 Pro adapter card offers the following stateless offloads for a VXLAN packet similar to the ones offered to
68. non encapsulated packets VXLAN protocol encapsulates its packets using outer UDP header Available hardware stateless offloads e Checksum generation Inner IP and Inner TCP UDP e Checksum validation Inner IP and Inner TCP UDP This will allow the use of GRO for inner TCP packets TSO support for inner TCP packets RSS distribution according to inner packets attributes Receive queue selection inner frames may be steered to specific QPs 3 4 2 1 Prerequisites HCA ConnectX 3 Pro Firmware 2 32 5100 or higher RHEL7 Ubuntu 14 04 or upstream kernel 3 12 10 or higher DMFS enabled AO static mode disabled 3 4 2 2 Enabling VXLAN Hardware Stateless Offloads To enable the VXLAN offloads support load the m1x4 core driver with Device Managed Flow steering DMFS enabled DMFS is the default steering mode gt To verify it is enabled by the adapter card Step 1 Open the etc modprobe d mlnx conf file Step 2 Set the parameter debug level to 1 options mlx4 core debug level 1 Step3 Restart the driver Step 4 Verify in the dmesg that the tunneling mode is vx1an The net device will advertise the tx udp tn1 segmentation flag shown when running etht hool k SDEV grep udp only when VXLAN is configured in the OpenvSwitch OVS with the configured UDP port For example ethtool k eth0 grep udp tnl tx udp tnl segmentation on As of firmware version 2 31 5050 VXLAN tunnel can be set on any desir
69. ns Options version Show program s version number and exit h help Show this help message and exit i INTF interface INTF Interface name g TYPE get type TYPE Type of information to get statistics param eters Mellanox Technologies 59 J 3 1 1 0 4 Feature Overview and Configuration rpg_enable RPG ENABLE LIST Set value of rpg enable according to prior ity use spaces between values and 1 for unknown values rppp max rps RPPP MAX RPS LIST Set value of rppp max rps according to prior ity use spaces between values and 1 for unknown values rpg time reset RPG TIME RESET LIST Set value of rpg time reset according to pri ority use spaces between values and 1 for unknown values rpg byte reset RPG BYTE RESET LIST Set value of rpg byte reset according to pri ority use spaces between values and 1 for unknown values rpg threshold RPG THRESHOLD LIST Set value of rpg threshold according to pri ority use spaces between values and 1 for unknown values rpg max rate RPG MAX RATE LIST Set value of rpg max rate according to prior ity use spaces between values and 1 for unknown values rpg ai rate RPG AI RATE LIST Set value of rpg ai rate according to prior ity use spaces between values and 1 for unknown values rpg hai rate RPG HAI RATE LIST Set value of rpg hai rate according to prior ity use spaces between values and 1 for unknown valu
70. nstances These virtual functions can then be provisioned separately Each VF can be seen as an additional device connected to the Physical Function It shares the same resources with the Physical Function and its number of ports equals those of the Physical Function SR IOV is commonly used in conjunction with an SR IOV enabled hypervisor to provide virtual machines direct hardware access to network resources hence increasing its performance In this chapter we will demonstrate setup and configuration of SR IOV in a Red Hat Linux envi ronment using Mellanox ConnectX VPI adapter cards family 3 4 1 1 System Requirements To set up an SR IOV environment the following is required MLNX EN Driver Aserver blade with an SR IOV capable motherboard BIOS Hypervisor that supports SR IOV such as Red Hat Enterprise Linux Server Version 6 Mellanox ConnectX VPI Adapter Card family with SR IOV capability 3 4 1 2 Setting Up SR IOV Depending on your system perform the steps below to set up your BIOS The figures used in this section are for illustration purposes only For further information please refer to the appropriate BIOS User Manual Mellanox Technologies 37 I Rev 3 1 1 0 4 Feature Overview and Configuration Step 1 Enable SR IOV in the system BIOS _ _ BIOS SETUP UTILITY Step 2 Enable Intel Virtualization Technology Step 3 Install the hypervisor that supports SR IOV Step 4 Depending on your system
71. ompile the driver sources make Step 4 Install the driver kernel modules make install Updating Firmware After Installation The firmware can be updated in one of the following methods Updating the Device Online To update the device online on the machine from Mellanox site use the following command line mlxfwmanager online u d device Example mlxfwmanager online u d 0000 09 00 0 Querying Mellanox devices firmware Device 1 Device Type ConnectX3 Part Number MCX354A FCA A2 A4 Description ConnectX 3 VPI adapter card dual port QSFP FDR IB 56Gb s and 40GigE PCIe3 0 x8 8GT s RoHS R6 PSID MT 1020120019 PCI Device Name 0000 09 00 0 Portl GUID 0002 90001004051 Port2 MAC 0002c9000002 Versions Current Available FW 21 33 50 0 2 34 5000 Status Update required Found 1 device s requiring firmware update Please use u flag to perform the update Updating the Device Manually In case you ran the install script with the without fw update option or you are using an OEM card and now you wish to manually update firmware on your adapter card s you need to perform the steps below The following steps are also appropriate in case you wish to burn 20 Mellanox Technologies Rev 3 1 1 0 4 newer firmware that you have downloaded from Mellanox Technologies Web site http www mellanox com gt Support gt Firmware Download Step 1 Get the device s PSID mlxfwman
72. otocol Inter connet VPI A Mellanox Technologies technology that allows Mellanox channel adapter devices ConnectX to simultaneously connect to an InfiniBand subnet and a 10GigE subnet each subnet connects to one of the adpater ports Related Documentation Table 4 Reference Documents Document Name Description IEEE Std 802 3ae 2002 Amendment to IEEE Std 802 3 2002 Document PDF 5594996 Part 3 Carrier Sense Multiple Access with Collision Detection CSMA CD Access Method and Physical Layer Specifications Amendment Media Access Control MAC Parame ters Physical Layers and Management Parameters for 10 Gb s Operation Support and Updates Webpage Please visit http www mellanox com gt Products gt Software gt Ethernet Drivers gt Linux Drivers for downloads FAQ troubleshooting future updates to this manual etc 12 Mellanox Technologies Rev 3 1 1 0 4 1 Overview This document provides information on the MLNX EN Linux driver and instructions for install ing the driver on Mellanox ConnectX adapter cards supporting ConnectX 4 e Ethernet 10GigE 25GigE 40GigE 50GigE and 100GigE ConnectX 4 Lx Ethernet 10GigE 25GigE 40GigE and 100GigE ConnectX 3 ConnectX 3 Pro Ethernet 10GigE 40GigE and 56GigE PCI Express 2 0 2 5 or 5 0 GT s PCI Express 3 0 8 GT s MLNX EN driver release exposes the following capabilities
73. pe ether dst 00 11 22 33 44 55 loc 5 action 2 All packets that contain the above destination MAC address are to be steered into rx ring 2 its underlying QP with priority 5 within the ethtool domain ethtool U eth5 flow type tcp4 src ip 1 2 3 4 dst port 8888 loc 5 action 2 All packets that contain the above destination IP address and source port are to be steered into rx ring 2 When destination MAC is not given the user s destination MAC is filled automatically ethtool u eth5 Shows all of ethtool s steering rule When configuring two rules with the same priority the second rule will overwrite the first one so this ethtool interface is effectively a table Inserting Flow Steering rules in the kernel requires support from both the ethtool in the user space and in kernel v2 6 28 MLXA Driver Support The mlx4 driver supports only a subset of the flow specification the ethtool API defines Asking for an unsupported flow specification will result with an invalid value failure The following are the flow specific parameters Table 6 Flow Specific Parameters ether tep4 udp4 ip4 Mandatory dst src ip dst ip Optional vlan src ip dst ip src src ip dst ip vlan port dst port vlan RFS RFS is an in kernel logic responsible for load balancing between CPUs by attaching flows to CPUs that are used by flow s owner applications This domain allows the RFS mechanism to use the flow ste
74. porting AER AER a mechanism used by the driver to get notifications upon PCI errors is supported only in native mode ULPs are called with remove_one add_one and expect to continue working prop erly after that flow User space application will work in same mode as defined in the Reset Flow above 3 5 1 5 Extended Error Handling Extended Error Handling EEH is a PowerPC mechanism that encapsulates AER thus exposing AER events to the operating system as EEH events The behavior of ULPs and user space applications is identical to the behavior of AER 3 6 Ignore Frame Check Sequence FCS Errors Supported in ConnectX 3 Pro and ConnectX 4 only Upon receiving packets the packets go through a checksum validation process for the FCS field If the validation fails the received packets are dropped When FCS is enabled disabled by default the device does not validate the FCS field even if the field is invalid It is not recommended to enable FCS For further information on how to enable disable FCS please refer to Table 7 ethtool Supported Options on page 55 37 Ethtool ethtool is a standard Linux utility for controlling network drivers and hardware particularly for wired Ethernet devices It can be used to Get identification and diagnostic information Get extended device statistics Control speed duplex autonegotiation and flow control for Ethernet devices Control checksum offload and other
75. ptions mlx4 en param name value param name value Values of all parameters can be observed in sys module mlx4 en parameters 2 9 Performance Tunining For further information on Linux performance please refer to the Performance Tuning Guide for Mellanox Network Adapters 22 Mellanox Technologies Rev 3 1 1 0 4 3 Feature Overview and Configuration 3 1 Quality of Service Quality of Service QoS is a mechanism of assigning a priority to a network flow socket rdma_cm connection and manage its guarantees limitations and its priority over other flows This is accomplished by mapping the user s priority to a hardware TC traffic class through a 2 3 stages process The TC is assigned with the QoS attributes and the different flows behave accordingly 3 1 1 Mapping Traffic to Traffic Classes Mapping traffic to TCs consists of several actions which are user controllable some controlled by the application itself and others by the system network administrators The following is the general mapping traffic to Traffic Classes flow 1 The application sets the required Type of Service ToS 2 The ToS is translated into a Socket Priority sk_prio 3 The sk_prio is mapped to a User Priority UP by the system administrator some applica tions set sk_prio directly 4 The UP is mapped to TC by the network system administrator 5 TCs hold the actual QoS parameters QoS can be applied on the following types
76. rates as a VPI adapter The mlx5 driver is comprised of the following kernel modules mlx5 core Acts as a library of common functions e g initializing the device after reset required by the ConnectX 4 adapter cards mlx5 core driver also implements the Ethernet interfaces for Con nectX 4 Unlike mlx4 en core mlx5 drivers does not require the mlx5 en module as the Ether net functionalities are built in in the mlx5 core module 1 2 Module Parameters 1 2 1 4 Module Parameters In order to set m1x4 parameters add the following line s to etc modprobe conf options mlx4 core parameter value and or options mlx4 en parameter lt value gt The following sections list the available m1x4 parameters Mellanox Technologies 15 J set_4k mtu debug_level msi x enable sys tune block loopback num vfs probe vf log num mgm entry size high rate steer fast drop enable 64b log num mac log num vlan log mtts per seg port type array log num gp log num srq log rdmarc per qp log num cq log num mcg log num mpt 3 1 1 0 4 Overview 1 2 1 1 4 Parameters Obsolete attempt to set 4K to all ConnectX ports int Enable debug tracing if 0 int 0 don t use MSI X Notion gt 1 limit number of MSI X irgs to msi_x non SRIOV only int Tune the cpu s for better performance default 0 int Block multicast loopb
77. rio lt gt bytes Total bytes in successfully received packets with priority i rx novlan packets Total packets successfully received with no VLAN priority rx novlan bytes Total bytes in successfully received packets with no VLAN priority tx prio i packets Total packets successfully transmitted with priority 1 tx prio lt gt bytes Total bytes in successfully transmitted packets with priority 1 tx_novlan_packets Total packets successfully transmitted with no VLAN prior ity tx novlan bytes Total bytes in successfully transmitted packets with no VLAN priority rx pause The total number of PAUSE frames received from the far end port Mellanox Technologies 65 J 3 1 1 0 4 Feature Overview and Configuration Counter Description rx pause duration The total time in microseconds that far end port was requested to pause transmission of packets rx pause transition The number of receiver transitions from XON state paused to XOFF state non paused tx_pause The total number of PAUSE frames sent to the far end port tx pause duration The total time in microseconds that transmission of packets has been paused tx pause transition The number of transmitter transitions from XON state paused to XOFF state non paused vport rx unicast packets Unicast packets received successfully vport rx unicast
78. river implementation for the ConnectX 4 adapters designed by Mellanox Technologies ConnectX 4 operates as a VPI adapter mlx5 core Acts as a library of common functions e g initializing the device after reset required by the ConnectX 4 adapter cards mlx4 driver mlx4 is the low level driver implementation for the ConnectX adapters designed by Mellanox Technologies The ConnectX can operate as an InfiniBand adapter and as an Ethernet NIC To accommodate the two flavors the driver is split into modules mlx4 core mlx 4 en and mlx4 ib Note mlx4 ib is not part of this package mlx4 core Handles low level functions like device initialization and firmware commands processing Also controls resource allocation so that the InfiniBand Ethernet and FC functions can share a device without interfering with each other mlx4 en Handles Ethernet specific functions and plugs into the netdev mid layer mstflint An application to burn a firmware binary image Software modules Source code for all software modules for use under conditions mentioned in the modules LICENSE files Documentation Release Notes User Manual For further information please refer to Section 1 1 5 mlx4 VPI Driver on page 15 and Section 1 1 6 mlx5 Driver on page 15 Firmware The tarball image includes the following firmware items Firmware images bin format for ConnectX 2 ConnectX 3 ConnectX 3 Pro ConnectX
79. sed Socket applications can use setsockopt SK PRIO value to directly set the sk prio of the socket In this case the ToS to prio fixed mapping is not needed This allows the application and the administrator to utilize more than the 4 values possible via ToS In case of VLAN interface the UP obtained according to the above mapping is also used P in the VLAN tag of the traffic 3 1 3 Map Priorities with tc wrap py mlnx qos Network flow that can be managed by QoS attributes is described by a User Priority UP A user s sk priois mapped to UP which in turn is mapped into TC Indicating the UP When the user uses sk prio it is mapped into a UP by tc tool This is done by the tc wrap py tool which gets a list of lt 16 comma separated UP and maps the prio to the specified UP For example wrap py ieth0 u 1 5 maps sk prio 0 of etho device to UP 1 and sk prio 1to UP 5 Setting set egress map in VLAN maps the skb priority of the VLAN to a v1an qos The v1an qos is represents a UP for the VLAN device set option with RbMA ID TOS could be used to set the UP When creating QPs the s1 field in modify command represents the UP Indicating the TC After mapping the skb priority to UP one should map the UP into a TC This assigns the user priority to a specific hardware traffic class In order to do that qos should be used m1n
80. t disables the support of IPoIB Flow Steer ing This bit should be set to 1 when b2 Enable AO static DMFS steering is used see Section 3 3 2 1 0 Static Device Managed Flow Steering on page 35 b2 Enable AO static DMFS When set to 1 AO static DMFS steering is enabled This steering see Section 3 3 2 1 bit should be set to 0 when b1 Disable IPoIB Flow Steer 0 Static Device Managed ing is 0 Flow Steering on page 35 Mellanox Technologies 33 J 3 1 1 0 4 Feature Overview and Configuration bit Operation Description b3 Enable DMFS only if the When set to 1 DMFS is enabled only if the HCA supports HCA supports more than more than 64 QPs attached to the same rule For example 64QPs per MCG entry attaching 64VFs to the same multicast address causes 64QPs to be attached to the same MCG If the HCA sup ports less than 64 QPs per MCG BO is used b4 Optimize IPoIB EoIB steer When set to 1 IPoIB EoIB steering table will be opti ing table for non source IP mized to support rules ignoring source IP check rules when possible This optimization is available only when IPoIB Flow Steering is set 55 Optimize steering table for When set to 1 steering table will be optimized to support non source IP rules when tules ignoring source IP check possible This optimization is possible only when DMFS mode is set For example a value of 7 means forcing
81. t value is 1 3 5 1 1 Kernel ULPs Once a fatal device error is recognized an EVENT DEVICE FATAL event is created ULPs are notified about the incident and outstanding WQEs are simulated to be returned with 1ush in error message to enable each ULP to close its resources and not get stuck via calling its remove one callback as part of Reset Flow Once the unload part is terminated each ULP is called with its add_one callback its resources are re initialized and it is re activated 3 5 1 2 SR IOV If the Physical Function recognizes the error it notifies all the VFs about it by marking their communication channel with that information consequently all the VFs and the PF are reset If the VF encounters an error only that VF is reset whereas the PF and other VFs continue to work unaffected 3 5 1 3 Forcing the VF to Reset If an outside reset is forced by using the PCI sysfs entry for a VF a reset is executed on that VF once it runs any command over its communication channel 1 A fatal device error be a timeout from a firmware command an error on a firmware closing command communication channel not being responsive in a VF etc Mellanox Technologies 53 J Rev 3 1 1 0 4 Feature Overview and Configuration For example the below command can be used on a hypervisor to reset VF defined by 0000 04 00 1 echo 1 gt sys bus pci devices 0000 04 00 1 reset 3 5 1 4 Advanced Error Re
82. ter In normal condition it should not increase rx alloc failed Number of times failed preparing receive descriptor rx csum good Number of packets received with good checksum IX csum none Number of packets received with no checksum indication tx chksum offload Number of packets transmitted with checksum offload tx queue stopped Number of times transmit queue suspended tx wake queue Number of times transmit queue resumed 66 Mellanox Technologies Rev 3 1 1 0 4 Counter Description tx_timeout Number of times transmitter timeout xmit_more Number of times doorbell was not triggered due to skb xmit more tx tso packets Number of packet that were aggregated lt gt packets Total packets successfully received on ring i lt gt bytes Total bytes in successfully received packets on ring 1 lt gt packets Total packets successfully transmitted on ring 1 tx lt i gt _bytes Total bytes in successfully transmitted packets on ring i a Pause statistics can be divided into lt gt depending on PFC configuration set 3 13 RSS Support for IP Fragments Supported in ConnectX 3 and ConnectX 3 Pro only As of MLNX EN for Linux v2 4 1 0 0 RSS will distribute incoming IP fragmented datagrams according to its hash function considering the L3 IP header values Different IP fragmented data grams flows will be
83. ts the driver to set a specific speed Mellanox Technologies 57 J Rev 3 1 1 0 4 Feature Overview and Configuration Table 7 ethtool Supported Options Options Description ethtool s eth lt x gt advertise lt N gt Changes the advertised link modes to requested link modes autoneg on lt N gt To check the link modes hex values run lt man ethtool gt and to check the supported link modes run ethtoo eth lt x gt NOTE lt autoneg on gt only sends a hint to the driver that the user wants to modify advertised link modes and not speed ethtool X eth lt x gt equal a b c Sets the receive flow hash indirection table ethtool x eth lt x gt Retrieves the receive flow hash indirection table 3 8 Checksum Offload MLNX EN supports the following Receive IP L4 Checksum Offload modes CHECKSUM UNNECESSARY By setting this mode the driver indicates to the Linux Networking Stack that the hardware successfully validated the IP and L4 checksum so the Linux Networking Stack does not need to deal with IP L4 Checksum validation Checksum Unnecessary is passed to the OS when all of the following are true Ethtool k lt DEV gt shows rx checksumming on Received TCP UDP packet and both IP checksum and L4 protocol checksum are correct ConnectX 3 ConnectX 3 Pro CHECKSUM COMPLETE When the checksum vali dation cannot be done or fails the driver still reports to the OS the calculated by hard war
84. tual Machine This section will describe a mechanism for adding a SR IOV VF to a Virtual Machine 3 4 1 4 1 Assigning the SR IOV Virtual Function to the Red Hat KVM VM Server Step 1 Run the virt manager Step 2 Double click on the virtual machine and open its Properties Step 3 to Details gt Add hardware gt PCI host device Virtual Machine View Send Key 0 gt Add new virtual hardware e C x Adding Virtual Hardware This assistant will guide you through adding a new piece of virtual hardware First select what type of hardware you wish to add Hardware type __ Storage W Network input Physical Host Device 00 video BB watchdog cancel Forward gt Add Hardware Remove Step 4 Choose Mellanox virtual function according to its PCI device e g 00 03 1 Step 5 Ifthe Virtual Machine is up reboot it otherwise start it Step 6 Log into the virtual machine and verify that it recognizes the Mellanox card Run lspci grep Mellanox 00 03 0 InfiniBand Mellanox Technologies MT27500 Family ConnectX 3 Virtual Function rev b0 Mellanox Technologies 47 3 1 1 0 4 Feature Overview and Configuration Step 7 Add the device to the etc sysconfig network scripts ifcfg ethx configuration file The MAC address for every virtual function is configured randomly therefore it is not necessary to add it 3 4 1 5 Uninstalling SR IOV Driver gt To
85. uninstall SR IOV driver perform the following Step 1 For Hypervisors detach all the Virtual Functions VF from all the Virtual Machines VM or stop the Virtual Machines that use the Virtual Functions Please be aware stopping the driver when there are VMs that use the VFs will cause machine to hang Step 2 Run the script below Please be aware uninstalling the driver deletes the entire driver s file but does not unload the driver sbin mlnx en uninstall sh MLNX EN uninstall done Step3 Restart the server 3 4 1 6 Ethernet Virtual Function Configuration when Running SR IOV 3 4 1 6 1 VLAN Guest Tagging VGT and VLAN Switch Tagging VST When running ETH ports on VFs the ports may be configured to simply pass through packets as is from VFs Vlan Guest Tagging or the administrator may configure the Hypervisor to silently force packets to be associated with a VLan Qos Vlan Switch Tagging In the latter case untagged or priority tagged outgoing packets from the guest will have the VLAN tag inserted and incoming packets will have the VLAN tag removed Any vlan tagged packets sent by the VF are silently dropped The default behavior is VGT The feature may be controlled on the Hypervisor from userspace via iprout2 netlink ip link set dev DEVICE group DEVGROUP up down vf NUM mac LLADDR vlan VLANID qos VLAN QOS spoofchk on off 1 state auto enable disable use
86. ver message level ethtool T eth lt x gt Note Supported in ConnectX 3 ConnectX 3 Pro cards only Shows time stamping capabilities ethtool 1 eth lt x gt Shows the number of channels ethtool L eth lt x gt rx lt N gt tx lt N gt Sets the number of channels Note For ConnectX 4 cards use ethtool L eth lt x gt combined lt N gt to set both RX and TX channels etthtool m dump module eeprom eth lt x gt raw on off hex onjoff offset N length N Queries Decodes the cable module eeprom information ethtool show priv flags eth lt x gt Shows driver private flags and their states ON OFF The private flag is qcn disable 32 14 4 e The flags below indicate the flow steering current configu ration and limits mlx4 flow steering ethernet 12 mlx4 flow steering ipv4 mlx4 flow steering tcp For further information refer to Flow Steering section The flags below are related to Ignore Frame Check Sequence and they are active when ethtool k does not support them orx fcs orx all ethtool set priv flags eth lt x gt priv flag lt on off gt Enables disables driver feature matching the given private flag ethtool s eth lt x gt speed SPEED autoneg off Changes the link speed to requested SPEED To check the supported speeds run ethtool eth lt x gt NOTE autoneg off gt does not set autoneg OFF it only hin
87. x qos gets a list of a mapping between UPs to TCs For example m1nx qos iethO p 0 0 0 0 1 1 1 1 maps UPs 0 3 to Tco and Ups 4 7 to Tc1 3 1 4 Quality of Service Properties The different QoS properties that can be assigned to a TC are Strict Priority see Strict Priority e Minimal Bandwidth Guarantee ETS see Minimal Bandwidth Guarantee ETS Rate Limit see Rate Limit 3 1 4 1 Strict Priority When setting a TC s transmission algorithm to be strict then this TC has absolute strict prior ity over other TC strict priorities coming before it as determined by the TC number TC 7 is highest priority TC 0 is lowest It also has an absolute priority over non strict TCs ETS 24 Mellanox Technologies Rev 3 1 1 0 4 This property needs to be used with care as it may easily cause starvation of other TCs A higher strict priority TC is always given the first chance to transmit Only if the highest strict priority TC has nothing more to transmit will the next highest TC be considered Non strict priority TCs will be considered last to transmit This property is extremely useful for low latency low bandwidth traffic Traffic that needs to get immediate service when it exists but is not of high volume to starve other transmitters in the sys tem 3 1 4 2 Minimal Bandwidth Guarantee ETS After servicing the strict priority TCs the amount of bandwidth BW left on the wire may be split among other TCs accor

Download Pdf Manuals

image

Related Search

Related Contents

取扱説明書(PDF 1663KB)  Olympus P-10 User's Manual  DL7/DL9/DL11/DL12 Parts Manual  FRN Multiplicateur de focale EF1,4X III/ EF2X III  esxupdate  Philips Ceiling light 32082/87/86  Italiano, 4.0 MB  TRANSPALETTE PESEUR  User`s Manual - Hi-view  剛ロ野包囲シリ-ズ 運転適性検査装置  

Copyright © All rights reserved.
Failed to retrieve file