Home

Mellanox WinOF-2 User Manual

image

Contents

1. Note This registry key is not exposed to the user via the UI Mellanox Technologies 41 Rev 1 10 Value Name Default Value Description LSOIpOptions 1 Enables its NIC to segment a large TCP packet whose IP header contains IP options The valid values are 0 disable e 1 enable Note This registry key is not exposed to the user via the UI PChecksumOff 3 Specifies whether the device performs the calculation of IPv4 loadIPv4 checksums The valid values are e Q disable e 1 Tx Enable e 2 Rx Enable e 3 Tx and Rx enable TCPUDPChecksu 3 Specifies whether the device performs the calculation of TCP mOffloadIPv4 or UDP checksum over IPv4 The valid values are e Q disable 1 Tx Enable e 2 Rx Enable e 3 Tx and Rx enable TCPUDPChecksu 3 Specifies whether the device performs the calculation of TCP mOffloadIPv6 or UDP checksum over IPv6 The valid values are e Q disable e 1 Tx Enable e 2 Rx Enable e 3 Tx and Rx enable Mellanox Technologies 42 Rev 1 10 3 3 4 Performance Registry Keys This group of registry keys configures parameters that can improve adapter performance Value Name Default Value Description RecvCompletion 1 Sets the completion methods of the receive packets Method and it affects network throughput and CPU utilization The supported methods are Polling increases the CPU utilizati
2. Issue Cause Solution Low performance Non optimal system con See section Performance Tuning and figuration might have Counters on page 50 to take advantage occurred of Mellanox 10 40 56 GBit NIC perfor mance Mellanox Technologies 58 Rev 1 10 Table 14 Ethernet Related Issues Issue Cause Solution The driver fails to start There might have been an RSS configuration mis match between the TCP stack and the Mellanox adapter 1 Open the event log and look under System for the mlx4ethx source 2 If found enable RSS run netsh int tcp set global rss enabled or a less recommended suggestion as it will cause low performance Disable RSS on the adapter run netsh int tcp set global rss no dynamic balancing The driver fails to start and a yellow sign appears near the Mel lanox ConnectX 10Gb Ethernet Adapter in the Device Manager display Code 10 A hardware error might have occurred Disable and re enable Mellanox Con nectX Adapter from the Device Man ager display In case it does not work refer to support No connectivity to a Fault Tolerance team while using network capture tools e g Wireshark The network capture tool might have captured the network traffic of the non active adapter in the team This is not allowed since the tool sets the packet filter to promiscuous thus causing traffic to be trans ferred on mu
3. 0 cee eee cence n 8 Common Abbreviations and Acronyms 0 0 rnau eee eee een eee 9 Related Documents a ses Ssss win ctw cn hw x age eee RIO a x Re vo avenge Waa alt 10 Chapter 1 Introductio Ves e Ur wh X eee mr cr II 1 1 Su pplied Packages mios du seed a emet edet eer etd 11 1 2 WinOF 2 Set of Documentation 000 0 12 13 Windows MPI MS MPI sessseeee e I nee 13 Chapter 2 Installation a IA 2 1 Hardware and Software Requirements ssssesseeessessesrers ee eee eee 14 2 2 Installing Mellanox WinOF 2 Driver 0 0 0 0 eee eese 14 2 2 1 Attended Installation eee cee eee ne eben eens 14 2 2 2 Unattended Installation l l 19 23 Installation Results 24 S es A RSS SS 20 2 4 Extracting Files Without Running Installation lees 20 2 5 Uninstalling Mellanox WinOF 2 Driver 0 0 cece eee ee eee 22 2 5 1 Attended Uninstallation 0 eee eee eee ene nee 22 2 5 2 Unattended Uninstallation lees 23 2 6 Firmware Upgrade cn osse LEE ond ol Oe ee ree eene 23 Chapter 3 Features Overview and Configuration cess 24 3 4 BthernetNetwOrk ooo ias ue Ae eR She eme de den hem hen 24 3 1 1 Assigning Port IP After Installation 0 0 00 0 0 reser rr ss ra 24 3 1 2 RDMA over Converged Ethernet ROCE 0 000 cece eee eee eee 26 3 53 Teaming and VLAN i css ER ee b r sie ee ee ci 31 3 1 4 Configuring
4. ooooooooooooooo o 57 5 2 Ethernet Related Troubleshooting llle 58 5 3 Performance Related Troubleshooting 0 0 0 0 eee e errors 60 5 3 1 General Diagnostics coser Sie Gene ee Er esee Gites 60 5 4 Reported Driver Events 61 Appendix A Performance Tools 000ceee eee eee eee eee eens 62 AJ nd write OD Wise usus eva 62 A2 Tid Write late ov eese end a ere oe etre Oe es 63 AO nd redd DW uh estt Mitre M Aer e de M o o A dero d d EM A d 63 AG nid read lats tin een er A EN PEU Re tse aS 64 AS nd ssend DW i uae aot rS AENEAM b T de dicte digi 65 AO nd zsend Lats 1 a Et stt t re aee re Ets 66 AJ NIWcp edm ERI BA kB Dee S9 eds lev UE 67 Appendix B Windows MPI MS MPI leeeeeeeneeense 69 Bal COVES Wise le uli eel et ee 69 B 2 System Requirements eleseeeeee e 69 B 3S Running MPI ee tir LOU p EE sv e svi 69 B 4 Directing MSMPI Traffic ssessessteseessrosiseern II 69 B 5 Running MSMPI on the Desired Priority 0 000005 69 B 6 Configuring MPL caos oka oad ee eee 70 B 7 PFC Example stos eds Bk XR es eee as eee RR 70 B 8 Running MPI Command Examples 000000002 eee 71 Mellanox Technologies 4 Rev 1 10 List of Tables Table 1 Document Revision History 0 0 cece cece cette ene 6 Table 2 Documentation Conventions 0 cece cnet hh 8 Table 3 Abbreviations and Acronyms 0
5. Enter the network location or click Change to browse to a location Click Install to create a server image of MLNX VPI at the specified network location or click Cancel to exit the wizard Network location FF Change InstallShield Mellanox Technologies 20 Rev 1 10 Step 5 Click Install to extract this folder or click Change to install to a different folder Network Location Specify a network location For the server image of the product Enter the network location or click Change to browse to a location Click Install to create a server image of MLNX_VPI at the specified network location or click Cancel to exit the wizard Network location Er Change InstallShield Step 6 To complete the extraction click Finish InstallShield Wizard Completed The InstallShield Wizard has successfully installed MLNX VPT Click Finish to exit the wizard 2 5 Uninstalling Mellanox WinOF 2 Driver 2 5 1 Attended Uninstallation gt To uninstall MLNX_WinOF2 on a single node Click Start gt Control Panel gt Programs and Features gt MLN X_WinOF2 gt Uninstall NOTE This requires elevated administrator privileges see Section 1 1 Supplied Packages on page 11 for details Mellanox Technologies 21 Rev 1 10 2 5 2 Unattended Uninstallation If no reboot options are specified the installer restarts the computer whenever necessary without displaying any pr
6. Document Revision Date Changes Rev 1 10 July 8 2015 Updated the following sections Section 1 Introduction on page 11 e Section 3 1 2 1 IP Routable RoCEv2 on page 26 e Section 3 1 2 6 Configuring the RoCE Mode on page 31 Rev 1 10 June 2015 Beta Release Mellanox Technologies 6 Rev 1 10 About this Manual Scope Mellanox WinOF 2 is the driver for adapter cards based on the Mellanox ConnectX 4 family of adapter IC devices It does not support earlier Mellanox adapter generations The document describes WinOF 2 Rev 1 10 features performance diagnostic tools content and configuration Additionally this document provides information on various performance tools supplied with this version Intended Audience This manual is intended for system administrators responsible for the installation configuration management and maintenance of the software and hardware of Ethernet adapter cards It is also intended for application developers Mellanox Technologies 7 Rev 1 10 Documentation Conventions Table 2 Documentation Conventions Description Convention Example File names file extension Directory names directory Commands and their parameters command param1 mts3610 1 gt show hosts Required item lt gt Optional item Mutually exclusive parameters pl p2 p3 or p11 p2 p3 Optional mutually exclusive pl l p21 p3 parameters
7. cece cece eee e nen n nee 9 Table 4 Related Documents eetere tar eee I eens 10 Table 5 Hardware and Software Requirements 0 0 0 0 cece cece eee eens 14 Table 6 Registry Key Parameters 0 eee cece I ae 31 Table 7 Registry Keys Setting yose de beer ee ee dea I Ue be 37 Table 8 RDMA ACHIVIty cec egeat hea ace b m wos eu Ba eae pole eae aa e RO 54 Table 9 Fabric Performance Utilities unuunu nauuna cence eens 55 Table 10 Installation Related Issues 0 0 0 eee III 57 Table 11 Setup Return Codes onere RR types da See e ea Dore e ae 57 Table 12 Firmware Burning Warning Codes 0 0 eee ee eee eee 58 Table 13 Restore Configuration Warnings 0 0 cee cece cece teen eae 58 Table 14 Ethernet Related Issues 0 0 ketene ence ene nee 58 Table 15 Performance Related Issues o ooooooocooooncrorrr e tenes 60 Table 16 nd write bw Flags and Options lslsseseeleee III 62 Table 17 nd write lat Options 0 0 ence I s 63 Table 18 nd read bw Options 0 0 0 0 cece mn 64 Table 19 nd read lat Options 0 0 een een n 65 Table 20 nd send bw Flags and Options eseeseeeee enna 66 Table 21 nd send lat Options a a a a r a e e e tence eens 67 Table 22 NTttcp Options 5 russe erm epe pue RI em Rp ome ca MCA eps 68 Mellanox Technologies 5 Rev 1 10 Document Revision History Table 1 Document Revision History
8. e TCP UDP Checksum Offload for IPv4 packets Enables the adapter to compute TCP UDP checksum over IPv4 packets upon transmit and or receive instead of the CPU default Enabled e TCP UDP Checksum Offload for IPv6 packets Enables the adapter to compute TCP UDP checksum over IPv6 packets upon transmit and or receive instead of the CPU default Enabled Large Send Offload LSO Allows the TCP stack to build a TCP message up to 64KB long and sends it in one call down the stack The adapter then re segments the message into multiple TCP packets for transmission on the wire with each pack sized according to the MTU This option offloads a large amount of kernel processing time from the host CPU to the adapter 3 4 4 Adapter Proprietary Performance Counters Proprietary Performance Counters are used to provide information on Operating System applica tion service or the drivers performance Counters can be used for different system debugging purposes help to determine system bottlenecks and fine tune system and application perfor mance The Operating System network and devices provide counter data that the application can consume to provide users with a graphical view of the system s performance quality WinOF counters hold the standard Windows CounterSet API that includes Network Interface RDMA activity SMB Direct Connection Mellanox Technologies 52 Rev 1 10 3 4 4 0 1 RDMA Activity RDMA Activity counter set cons
9. Create a Quality of Service QoS policy and tag each type of traffic with the relevant prior ity In this example TCP UDP use priority 1 SMB over TCP use priority 3 PS New NetQosPolicy DEFAULT store Activestore Default PriorityValue8021Action 3 PS New NetQosPolicy TCP store Activestore IPProtocolMatchCondition TCP Priority Value8021Action 1 PS New NetQosPolicy UDP store Activestore IPProtocolMatchCondition UDP Priority Value8021Action 1 New NetQosPolicy SMB SMB PriorityValue8021Action 3 Step 5 Create a QoS policy for SMB over SMB Direct traffic on Network Direct port 445 PS New NetQosPolicy SMBDirect store Activestore NetDirectPortMatchCondition 445 PriorityValue8021Action 3 Step 6 Optional If VLANs are used mark the egress traffic with the relevant VlanID The NIC is referred as Ethernet 4 in the examples below PS Set NetAdapterAdvancedProperty Name Ethernet 4 RegistryKeyword VlanID Reg istryValue 55 Step 7 Optional Configure the IP address for the NIC Mellanox Technologies 33 Rev 1 10 If DHCP is used the IP address will be assigned automatically PS Set NetIPInterface InterfaceAlias Ethernet 4 DHCP Disabled PS Remove NetIPAddress InterfaceAlias Ethernet 4 AddressFamily IPv4 Con firm Sfalse PS New NetIPAddress InterfaceAlias Ethernet 4 IPAddress 192 168 1 10 Prefix Length 24 Type Unicast Step 8 Optional Set the DNS serve
10. E Windows Settings p 71 Administrative Templates Add For this GPO run scripts in the following order Not configured v i PowerShell scripts require at least Windows 7 or Windows Server 2008 R2 Show Files OK Cancel 5 Click Add The script should include only the following commands PS Remove NetQosTrafficClass PS Remove NetQosPolicy Confirm False PS set NetQosDcbxSetting Willing 0 PS New NetQosPolicy SMB Policystore Activestore NetDirectPortMatchCondition 445 PriorityValue8021Action 3 PS New NetQosPolicy DEFAULT Policystore Activestore Default PriorityVal ue8021Action 3 PS New NetQosPolicy TCP Policystore Activestore IPProtocolMatchCondition TCP PriorityValue8021Action 1 PS New NetQosPolicy UDP Policystore Activestore IPProtocolMatchCondition UDP PriorityValue8021Action 1 PS Disable NetQosFlowControl 0 1 2 4 5 6 7 PS Enable NetAdapterQos InterfaceAlias port1 PS Enable NetAdapterQos InterfaceAlias port2 PS Enable NetQosFlowControl Priority 3 PS New NetQosTrafficClass name SMB class priority 3 bandwidthPercentage 50 Algorithm ETS 6 Browse for the script s location 7 Click OK 8 To confirm the settings applied after boot run PS get netgospolicy policystore activestore 3 1 5 Configuring the Ethernet Driver The following steps describe how to configure advanced featur
11. Variables for which users supply Italic font enable specific values Emphasized words Italic font These are emphasized words Note lt text gt This is a note Ad Ad Warning lt text gt May result in system insta bility Mellanox Technologies 8 Rev 1 10 Common Abbreviations and Acronyms Table 3 Abbreviations and Acronyms Abbreviation Acronym Whole Word Description B Capital B is used to indicate size in bytes or multiples of bytes e g IKB 1024 bytes and 1MB 1048576 bytes b Small b is used to indicate size in bits or multiples of bits e g 1Kb 1024 bits FW Firmware HCA Host Channel Adapter HW Hardware IB InfiniBand LSB Least significant byte Isb Least significant bit MSB Most significant byte msb Most significant bit NIC Network Interface Card NVGRE Network Virtualization using Generic Routing Encapsulation SW Software VPI Virtual Protocol Interconnect IPoIB IP over InfiniBand PFC Priority Flow Control PR Path Record RDS Reliable Datagram Sockets RoCE RDMA over Converged Ethernet SL Service Level MPI Message Passing Interface QoS Quality of Service Mellanox Technologies 9 Rev 1 10 Related Documents Table 4 Related Documents Document Description MFT User Manual Describes the set of firmware management tools for a single Inf
12. xx Intel R 1350 Gigabit Network Connection La Intel R 1350 Gigabit Network Connection 2 Intel R 1350 Gigabit Network Connection 3 gt Intel R 1350 Gigabit Network Connection 4 eF Mellanox ConnectX 4 VPI Adapter MT4115 3 Mellanox ConnectX 4 VPI Adapter MT4115 4 Microsoft Kernel Debug Network Adapter lj Other devices jg PCI Memory Controller YY Ports COM amp LPT deb Print queues np Processors lt gt Storage controllers jm System devices 9 Universal Serial Bus controllers bv vv vv vv 2 4 Extracting Files Without Running Installation To extract the files without running installation perform the following steps Step 1 Open a CMD console Windows Server 2012 R2 Click Start gt Task Manager gt File gt Run new task and enter CMD Step 2 Extract the driver and the tools gt MLNX WinOF2 1 10 All x64 a e To extract only the driver files MLNX WinOF2 1 10 All x64 a vMT DRIVERS ONLY 1 Mellanox Technologies 19 Rev 1 10 Step 3 Click Next to create a server image Welcome to the InstallShield Wizard for MLNX VPI The InstallShield R Wizard will install MLNX VPI on your computer To continue click Next WARNING This program is protected by copyright law and international treaties Step 4 Click Change and specify the location in which the files are extracted to Network Location Specify a network location for the server image of the product
13. Configuring SwitchX amp Based Switch System gt To enable RoCE the SwitchX should be configured as follows Ports facing the host should be configured as access ports and either use global pause or Port Control Protocol PCP for priority flow control e Ports facing the network should be configured as trunk ports and use Port Control Pro tocol PCP for priority flow control For further information on how to configure SwitchX please refer to SwitchX User Manual Mellanox Technologies 28 Rev 1 10 3 1 2 4 Configuring Arista Switch Step 1 Set the ports that face the hosts as trunk config interface et10 config if Et10 switchport mode trunk Step2 Set VID allowed on trunk port to match the host VID config if Et10 switchport trunk allowed vlan 100 Step3 Set the ports that face the network as trunk config interface et20 config if Et20 switchport mode trunk Step 4 Assign the relevant ports to LAG config interface et10 config if Et10 dcbx mode ieee config if Et10 speed forced 40gfull config if Et10 channel group 11 mode active Step 5 Enable PFC on ports that face the network config interface et20 config if Et20 load interval 5 config if Et20 speed forced 40gfull config if Et20 switchport trunk native vlan tag config if Et20 switchport trunk allowed vlan 11 config if Et20 switchport mode trunk config if Et20 dcbx mode ieee config if Et20
14. Control mechanism the adapters can overcome any TCP IP issues and eliminate the risk of data loss Value Name Default Value Description FlowControl 0 When Rx Pause is enabled the receiving adapter generates a flow control frame when its received queue reaches a pre defined limit The flow control frame is sent to the sending adapter When TX Pause is enabled the sending adapter pauses the transmission if it receives a flow control frame from a link partner The valid values are 0 Flow control is disabled e 1 Tx Flow control is Enabled e 2 Rx Flow control is enabled 3 Rx amp Tx Flow control is enabled 3 3 5 2 VMQ Options This section describes the registry keys that are used to control the NDIS Virtual Machine Queue VMQ VMQ is supported by WinOF 2 and allows a performance boost for Hyper V VMs For more details about VMQ please refer to Microsoft web site http msdn microsoft com en us library windows hardware ff571034 v vs 85 aspx Value Name Default Value Description VMQ 1 The support for the virtual machine queue VMQ features of the network adapter The valid values are e 1 enable e 0 disable Mellanox Technologies 47 Rev 1 10 Value Name Default Value Description RssOrVmaPrefer 0 Specifies whether VMQ capabilities should be enabled ence instead of receive side scaling RSS capabilities The valid values are 0 Report RSS capabili
15. D10 S 11 137 53 1 Client side start b wait affinity 0X1 nd write bw s1048576 D10 C 11 137 53 1 nd write bw Options The table below lists the various flags of the command Table 16 nd write bw Flags and Options Flag Description h Shows the Help screen V Shows the version number p Connects to the port lt port gt lt default 6830 gt s msg size Exchanges the message size with default 65536B gt and it must not be combined with a flag Runs all the messages sizes from 1B to 8MB and it must not be combined with s flag n num of iterations The number of exchanges at least 2 the default is 100000 I max inline size The maximum size of message to send inline The default number is 128B D test duration in seconds Tests duration in seconds f margin time in seconds The margin time to avoid calculation and it must be less than half of the duration time Q CQ Moderation value The default number is 100 S server interface IP server side only must be last parameter C server interface IP client side only must be last parameter Mellanox Technologies 62 Rev 1 10 A 2 nd write lat This test is used for performance measuring of RDMA Write requests in Microsoft Windows Operating Systems nd write lat is performance oriented for RDMA Write with minimum latency and runs over Microsoft s NetworkDirect stand
16. PFC To use PFC it must be enabled on all endpoints and switches in the flow path In the following section we present instructions to configure PFC on Mellanox ConnectX cards There are multiple configuration steps required all of which may be performed via Power Shell Therefore although we present each step individually you may ultimately choose to write a PowerShell script to do them all in one step Note that administrator privileges are required for these steps For further information about RoCE configuration please refer to https community mellanox com docs DOC 1844 Mellanox Technologies 27 Rev 1 10 3 1 2 2 1 Configuring Windows Host Since PFC is responsible for flow controlling at the granularity of traffic priority it is As per RoCE configuration all ND NDK traffic is assigned to one or more chosen pri necessary to assign different priorities to different types of network traffic ao orities where PFC is enabled on those priorities Configuring Windows host requires configuring QoS To configure QoS please follow the pro cedure described in Section 3 1 4 Configuring Quality of Service QoS on page 32 3 1 2 2 1 1 Global Pause Flow Control To use Global Pause Flow Control mode disable QoS and Priority PS Disable NetQosFlowControl PS Disable NetAdapterQos interface name To confirm flow control is enabled in adapter parameters Device manager gt Network adap
17. Quality of Service QoS sseseserrereerererrr rer eee eee 32 3 1 5 Configuring the Ethernet Driver 0 00 0 0 cece eee eee eee 36 3 1 6 Receive Side Scaling RSS 0 eee cen rer er eens 37 3 2 Storage Protocols uec s PAA ee etl ob SPEC 37 3 2 1 Deploying SMB Direct 2 0 sa b ssa Rr e et e t e 37 3 3 Configuration Using Registry Keys eese 39 3 3 1 Finding the Index Value of the Network Interface o ooo oo ooooooo 39 3 32 Basic Registry Keys eiectus bie b s eb SUE astu eek 40 3 3 3 Off load Registry Keys 00 eee e 41 3 3 4 Performance Registry Keys 0 0 00 ec eee 44 3 3 5 Ethernet Registry Keys 0 2 0 e ee ec ee 47 3 3 6 Network Direct Interface 0 0 0 eee eee eee 49 3 4 Performance Tuning and Counters 0 0 0 c eee eee eee 50 Mellanox Technologies 3 Rev 1 10 3 4 1 General Performance Optimization and Tuning 004 50 3 4 2 Application Specific Optimization and Tuning 004 51 3 4 3 Tunable Performance Parameters 00 0 e cece eee eee 51 3 4 4 Adapter Proprietary Performance Counters ssseseesrerserresr reser ere 53 Chapter4 Utilities s sisses ves ok rh o Rx RR aaa 4 1 Fabric Performance Utilities llle 55 Chapter 5 Iroubleshooting S ay O AAA Aa 5 1 Installation Related Troubleshooting 0oooooocoocococooccono o 57 5 1 1 Installation Error Codes and Troubleshooting
18. coalesce many small messages into a large one Valid MTU values range for an Ethernet driver is between 614 and 9614 G All devices on the same physical network or on the same logical network must have 3 the same MTU Receive Buffers The number of receive buffers default 1024 Send Buffers The number of sent buffers default 2048 e Performance Options Configures parameters that can improve adapter performance nterrupt Moderation Moderates or delays the interrupts generation Hence optimizes network throughput and CPU uti lization default Enabled Mellanox Technologies 50 Rev 1 10 When the interrupt moderation is enabled the system accumulates interrupts and sends a single interrupt rather than a series of interrupts An interrupt is generated after receiving 5 packets or after 10ms from the first packet received It improves performance and reduces CPU load however it increases latency When the interrupt moderation is disabled the system generates an interrupt each time a packet is received or sent In this mode the CPU utilization data rates increase as the system handles a larger number of interrupts However the latency decreases as the packet is handled faster Receive Side Scaling RSS Mode Improves incoming packet processing performance RSS enables the adapter port to utilize the multiple CPUs in a multi core system for receiving incoming packets and steering them to the des ignated
19. data ports if used p should be same on every instance a outstanding I O gt default 2 X lt PacketArray size gt default 1 rb lt Receive buffer size gt default 64K sb lt Send buffer size gt default 8K u UDP send recv W WSARecv WSASend d Verifies Flag t Runtime in seconds cd lt Cool down gt in seconds wu lt Warm up gt in seconds nic NIC IP Use NIC with for sending data sender only m lt mapping gt mapping Mellanox Technologies 68 Rev 1 10 Appendix B Windows MPI MS MPI B 1 Overview Message Passing Interface MPI is meant to provide virtual topology synchronization and com munication functionality between a set of processes With MPI you can run one process on several hosts Windows MPI run over the following protocols e Sockets Ethernet Network Direct ND B 1 1 System Requirements e Install HPC Build 4 0 3906 0 e Validate traffic ping between the whole MPI Hosts Every MPI client need to run smpd process which open the mpi channel MPI Initiator Server need to run mpiexec If the initiator is also client it should also run smpd B 2 Running MPI Step 1 Run the following command on each mpi client start smpd d p lt port gt Step 2 Install ND provider on each MPI client in MPI ND Step 3 Run the following command on MPI server mpiexec exe p smpd port hosts num of hosts hosts ip list env MPICH N
20. debugging right click and choose Properties and go to the Information tab PCI Gen 2 should appear as PCI E 5 0 GT s PCI Gen 3 should appear as PCI E 8 0 GT s e Link Speed 56 0 Gbps 40 0Gbps 10 0Gbps Issue2 To determine if the Mellanox NIC and PCI bus can achieve their maximum speed it s best to run nd send bw in a loopback On the same machine 1 Run start b affinity Ox1 nd send bw S 127 0 0 1 2 Run start b affinity 0x2 nd send bw C 127 0 0 1 3 Repeat for port 2 with the appropriate IP 4 On PCI Gen3 the expected result is around 5700MB s On PCI Gen2 the expected result is around 3300MB s Any number lower than that points to bad configuration or installation on the wrong PCI slot Malfunctioning QoS settings and Flow Control can be the cause as well Issue3 To determine the maximum speed between the two sides with the most basic test 1 Run nd send bw C IP host on machine 1 where IP hostl gt is the local IP 2 Run nd send bw C IP host gt on machine 2 3 Results appear in MB s Mega Bytes 2 20 and reflect the actual data that was transferred excluding headers 4 If these results are not as expected the problem is most probably with one or more of the following Old Firmware version e Misconfigured Flow control Global pause or PFC is configured wrong on the hosts routers and switches See Section 3 1 2 RDMA over Converged Ethernet RoCE on page 26 e CPU powe
21. platform For additional details on Windows installer return codes please refer to http support microsoft com kb 229683 Mellanox Technologies 57 Rev 1 10 5 1 1 2 Firmware Burning Warning Codes Table 12 Firmware Burning Warning Codes Error Code Description Troubleshooting 1004 Failed to open the device Contact support 1005 Could not find an image for at The firmware for your device was not least one device found Please try to manually burn the firmware 1006 Found one device that has multiple Burn the firmware manually and select images the image you want to burn 1007 Found one device for which force Burn the firmware manually with the update is required force flag 1008 Found one device that has mixed The firmware version or the expansion versions rom version does not match For additional details please refer to the MFT User Manual http www mellanox com gt Products gt Firmware Tools 5 1 1 3 Restore Configuration Warnings Table 13 Restore Configuration Warnings Error Code Description Troubleshooting 3 Failed to restore the configu Please see log for more details and contact the ration support team 5 2 Ethernet Related Troubleshooting For further performance related information please refer to the Performance Tuning Guide and to Section 3 4 Performance Tuning and Counters on page 50 Table 14 Ethernet Related Issues
22. the following PowerShell cmdlets to verify Network Direct is globally enabled and that you have NICs with the RDMA capability Run on both the SMB server and the SMB client PS Get NetOffloadGlobalSetting Select NetworkDirect PS Get NetAdapterRDMA PS Get NetAdapterHardwareInfo 3 2 1 1 2 Verifying SMB Configuration Use the following PowerShell cmdlets to verify SMB Multichannel is enabled confirm the adapters are recognized by SMB and that their RDMA capability is properly identified Onthe SMB client run the following PowerShell cmdlets PS Get SmbClientConfiguration Select EnableMultichannel PS Get SmbClientNetworkInterface On the SMB server run the following PowerShell cmdlets PS Get SmbServerConfiguration Select EnableMultichannel PS Get SmbServerNetworkInterface PS netstat exe xan match 445 3 2 1 1 3 Verifying SMB Connection To verify the SMB connection on the SMB client Step 1 Copy the large file to create a new session with the SMB Server Step 2 Open a PowerShell window while the copy is ongoing Step3 Verify the SMB Direct is working properly and that the correct SMB dialect is used PS Get SmbConnection PS Get SmbMultichannelConnection PS netstat exe xan match 445 P If you have no activity while you run the commands above you might get an empty list due to session expiration and absence current connections 1 The NETSTAT command confirms i
23. unattended installation session Step 1 Open a CMD console Windows Server 2012 R2 Click Start gt Task Manager gt File gt Run new task gt and enter CMD Step 2 Install the driver Run gt MLNX WinOF2 1 10 All x64 exe S v qn Step 3 Optional Manually configure your setup to contain the logs option MLNX WinOF2 1 10 All x64 exe S v qn v l vx LogFile Step 4 Optional if you want to control whether to install ND provider or not MLNX WinOF2 1 10 All win2012 x64 exe vMT NDPROPERTY 1 d Applications that hold the driver files such as ND applications will be closed during the unat A tended installation 1 MT_NDPROPERTY default value is True Mellanox Technologies 18 Rev 1 10 2 3 Installation Results Upon installation completion you can verify the successful addition of the network card s through the Device Manager Upon installation completion the inf files can be located at 9 ProgramFiles Mellanox MLNX_WinOF2 ETH To see the Mellanox network adapter device and the Ethernet or IPoIB network device depending on the used card for each port display the Device Manager and expand Network adapters Figure 1 Installation Results File Action View Help s m HA a y Edev w072 p Mi Computer Disk drives Display adapters 3 Human Interface Devices IEEE 1394 host controllers Keyboards Mice and other pointing devices Monitors Network adapters
24. 1350 Gigabit Network Connection 2 AP Intel R 1350 Gigabit Network Connection 3 AP Intel R 1350 Gigabit Network Connection 4 Mellanox ConnectX 4 VPI Adapter MT4115 3 Microsoft Kernel Debug Network Adapter 4 5 Other devices jg PCI Memory Controller b Y Ports COM amp LPT p dab Print queues gt BB Processors b X7 Storage controllers p pill System devices p Universal Serial Bus controllers General I Advanced Information Performance I Driver Details Events Resources Power Management Mk Melanox ConnectX 4 VPI Adapter MT4115 4 lt Property Driver key Value 453668726325 Ice bfc 108002610318 poi Rev 1 10 3 3 2 Basic Registry Keys This group contains the registry keys that control the basic operations of the NIC Value Name Default Value Description JumboPacket 1514 The maximum size of a frame or a packet that can be sent over the wire This is also known as the maximum transmis sion unit MTU The MTU may have a significant impact on the network s performance as a large packet can cause high latency However it can also reduce the CPU utilization and improve the wire efficiency The standard Ethernet frame size is 1514 bytes but Mellanox drivers support wide range of packet sizes The valid values are e Ethernet 600 up to 9600 Note All the devices across the network switches and rout ers should support the same frame siz
25. AMAGE Mellanox TECHNOLOGIES Mellanox Technologies Mellanox Technologies Ltd 350 Oakmead Parkway Suite 100 Hakidma 26 Sunnyvale CA 94085 Ofer Industrial Park U S A Yokneam 2069200 www mellanox com Israel Tel 408 970 3400 www mellanox com Fax 408 970 3403 Tel 4972 0 74 723 7200 Fax 4972 0 4 959 3245 Copyright 2015 Mellanox Technologies All Rights Reserved Mellanox Mellanox logo BridgeX ConnectX Connect IB CoolBox CORE Direct GPUDirect InfiniBridge InfiniHost InfiniScale Kotura Kotura logo MetroX MLNX OS PhyX ScalableHPC SwitchX TestX UFM Virtual Protocol Interconnect Voltaire and Voltaire logo are registered trademarks of Mellanox Technologies Ltd CyPU ExtendX FabricIT FPGADirect HPC X Mellanox Care Mellanox CloudX Mellanox Open Ethernet Mellanox PeerDirect Mellanox Virtual Modular Switch MetroDX NVMeDirect StPU Switch IBTM Unbreakable Link are trademarks of Mellanox Technologies Ltd All other trademarks are property of their respective owners 2 Mellanox Technologies Document Number MLNX 15 3280 Rev 1 10 Table of Contents Document Revision History ccc cece ccc cece cece eee ff n n ere rr 0 6 About this Manali ese ge aes a A eee haw OPER Reed NER ee STARE ad SCOPE AMI Em 7 Intended Audience ariadna Lace tmr e rte Re Ir eR e ce ee Gare ace 7 Documentation Conventions
26. AMN Mellanox TECHNOLOGIES Connect Accelerate Outperform Mellanox WinOF 2 User Manual Rev 1 10 Beta www mellanox com Rev 1 10 NOTE THIS HARDWARE SOFTWARE OR TEST SUITE PRODUCT PRODUCT S AND ITS RELATED DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES AS IS WITH ALL FAULTS OF ANY KIND AND SOLELY FOR THE PURPOSE OF AIDING THE CUSTOMER IN TESTING APPLICATIONS THAT USE THE PRODUCTS IN DESIGNATED SOLUTIONS THE CUSTOMER S MANUFACTURING TEST ENVIRONMENT HAS NOT MET THE STANDARDS SET BY MELLANOX TECHNOLOGIES TO FULLY QUALIFY THE PRODUCT S AND OR THE SYSTEM USING IT THEREFORE MELLANOX TECHNOLOGIES CANNOT AND DOES NOT GUARANTEE OR WARRANT THAT THE PRODUCTS WILL OPERATE WITH THE HIGHEST QUALITY ANY EXPRESS OR IMPLIED WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT ARE DISCLAIMED IN NO EVENT SHALL MELLANOX BE LIABLE TO CUSTOMER OR ANY THIRD PARTIES FOR ANY DIRECT INDIRECT SPECIAL EXEMPLARY OR CONSEQUENTIAL DAMAGES OF ANY KIND INCLUDING BUT NOT LIMITED TO PAYMENT FOR PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY FROM THE USE OF THE PRODUCT S AND RELATED DOCUMENTATION EVEN IF ADVISED OF THE POSSIBILITY OF SUCH D
27. ETMASK network ip subnet env MPICH ND ZCOPY THRESHOLD 1 env MPICH DISABLE ND 0 1 env MPICH DISABLE SOCK 0 1 affinity process B 3 Directing MSMPI Traffic Directing MPI traffic to a specific QoS priority may delayed due to Except for NetDirectPortMatchCondition the QoS powershell CmdLet for NetworkDi rect traffic does not support port range Therefore NetwrokDirect traffic cannot be directed to ports 1 65536 e The MSMPI directive to control the port range namely MPICH PORT RANGE 3000 3030 is not working for ND and MSMPI chose a random port B 4 Running MSMPI on the Desired Priority Step 1 Set the default QoS policy to be the desired priority Note this prio should be lossless all the way in the switches Step 2 Set SMB policy to a desired priority only if SMD Traffic running Mellanox Technologies 69 Rev 1 10 Step 3 Recommended Direct ALL TCP UDP traffic to a lossy priority by using the IPProtocol MatchCondition TCP is being used for MPI control channel smpd while UDP is being used for other 3 services such as remote desktop PI Arista switches forwards the pcp bits e g 802 1p priority within the vlan tag from ingress to egress to enable any two End Nodes in the fabric as to maintain the priority along the route In this case the packet from the sender goes out with priority X and reaches the far end node with the same priority X The priority should be lo
28. MLNX_WinOF2 Performance tools InstallShield b Confirm the start of the installation Ej Windows Security Lx Would you like to install this device software Name Mellanox Technologies Ltd Network adapt A Publisher Mellanox Technologies LTD Y Always trust software from Mellanox Don t Install Technologies LTD Y You should only install driver software from publishers you trust How can decide which device software is safe to install Mellanox Technologies 16 Rev 1 10 c Click Install to start the installation Ready to Install the Program The wizard is ready to begin installation Click Install to begin the installation If you want to review or change any of your installation settings click Back Click Cancel to exit the wizard Step 7 Click Finish to complete the installation InstallShield Wizard Completed The InstallShield Wizard has successfully installed MLNX_WinOF2 Click Finish to exit the wizard You chose to run performance tuning The log file can be found at C Windows System32 LogFiles PerformanceTuni ng log Mellanox Technologies 17 Rev 1 10 2 2 2 Unattended Installation If no reboot options are specified the installer restarts the computer whenever necessary without displaying any prompt or warning to the user Use the norestart or forcerestart standard command line options to control reboots The following is an example of an
29. Operating Systems Windows Server 2012 and Windows Server 2012 R2 3 1 4 2 QoS Configuration Prior to configuring Quality of Service you must install Data Center Bridging using one of the following methods To Disable Flow Control Configuration Mellanox Technologies 31 Rev 1 10 Device manager gt Network adapters gt Mellanox ConnectX 4 Ethernet Adapter gt Properties gt Advanced tab a Device Manager BH X File Action View Help 9 5 i BH pn E Fas 4 gd I dev w072 p Computer gt a Disk drives Details Events Resources Power Management KE Display adapters General Advanced Infomation Performance Driver b 0 Human Interface Devices The following properties are available for this network adapter Click p IEEE 1394 host controllers le maaan ofthe lh and then sect ts value p amp Keyboards p Pl Mice and other pointing devices p Ki Monitors 4 Lu Network adapters K Intel R 1350 Gigabit Network Connection EP Intel R 1350 Gigabit Network Connection 2 IPV4 Ch im Offload EP Intel R 1350 Gigabit Network Connection 3 Jumbo Packet amp Intel R 1350 Gigabit Network Connection 4 Large Send Offload V2 IPv4 z Intel R igal j ork Conn ed Send Offload V2 IPv6 amp Mellanox ConnectX 4 VPI Adapter MT4115 3 e number of RSS Processc Mellanox ConnectX 4 VPI Adapter MT4115 4 Maximum Number of RSS Queues AA Network Address xr Monan Kernel Deb
30. age to send inline The default number is 128B D test duration in seconds Tests duration in seconds f margin time in seconds The margin time to avoid calculation and it must be less than half of the duration time S server interface IP server side only must be last parameter C server interface IP client side only must be last parameter h Shows the Help screen A 7 NTttcp NTttcp is a Windows base testing application that sends and receives TCP data between two or more endpoints It is a Winsock based port of the ttcp tool that measures networking perfor mance bytes second To download the latest version of NTttcp 5 28 please refer to Microsoft website following the link below Mellanox Technologies 67 Rev 1 10 http gallery technet microsoft com NTttcp Version 528 Now f8b12769 P This tool should be run from cmd only NTttcp Synopsis Server ntttcp x64 exe r t 15 m 16 lt intentace TP Client ntttcp x64 exe s t 15 m 16 lt same address as above NTttcp Options The table below lists the various flags of the command Table 22 NTttcp Options Flags Description S Works as a sender T Works as a receiver l lt Length of buffer gt default TCP 64K UDP 128 n lt Number of buffers gt default 20K p lt port base gt default 5001 sp Synchronizes
31. arameter C server interface IP client side only must be last parameter h Shows the Help screen A 3 nd read bw This test is used for performance measuring of RDMA Read requests in Microsoft Windows Operating Systems nd read bw is performance oriented for RDMA Read with maximum throughput and runs over Microsoft s NetworkDirect standard The level of customizing for the Mellanox Technologies 63 Rev 1 10 user is relatively high User may choose to run with a customized message size customized num ber of iterations or alternatively customized test duration time nd_read_bw runs with all mes sage sizes from 1B to 4MB powers of 2 message inlining CQ moderation nd_read_bw Synopsis lt running on specific single core gt Server side start b affinity 0X1 nd read bw s1048576 D10 S 11 137 53 1 Client side start b wait affinity 0X1 nd read bw s1048576 D10 C 11 137 53 1 nd read bw Options The table below lists the various flags of the command Table 18 nd read bw Options Flags Description h Shows the Help screen V Shows the version number p Connects to the port lt port gt lt default 6830 gt s lt msg size gt Exchanges the message size with lt default 65536B gt and it must not be combined with a flag Runs all the messages sizes from 1B to 8MB and it must not be combined with s flag n lt num of iterations gt The number of exchanges a
32. ard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alternatively customized test duration time nd write lat runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation nd write lat Synopsis running on specific single core Server side start b affinity 0X1 nd write lat s1048576 D10 S 11 137 53 1 Client side start b wait affinity 0X1 nd write lat s1048576 D10 C 11 137 53 1 nd write lat Options The table below lists the various flags of the command Table 17 nd write lat Options Flag Description h Shows the Help screen V Shows the version number p Connects to the port lt port gt lt default 6830 gt s lt msg size gt Exchanges the message size with lt default 65536B gt and it must not be combined with a flag a Runs all the messages sizes from 1B to 8MB and it must not be combined with s flag n lt num of iterations gt The number of exchanges at least 2 the default is 100000 I lt max inline size gt The maximum size of message to send inline The default number is 128B D test duration in seconds Tests duration in seconds f margin time in seconds The margin time to avoid calculation and it must be less than half of the duration time S server interface IP server side only must be last p
33. ch interface 08002be 10318 lt nn gt RssBaseProcNumber The number can be different for each interface This allows partitioning of CPUs across network adapters Note Restart the network adapter when you change this registry key HKLM S YSTEM CurrentControlSet Con NUMA node affinitization trol Class 4d36e972 e325 11ce bfcl 08002be10318 lt nn gt NumaNodeID HKLM S YSTEM CurrentControlSet Con Sets the RSS base processor group for trol Class 4d36e972 e325 11ce bfc1 systems with more than 64 processors 08002be10318 lt nn gt RssBaseProcGroup 3 2 Storage Protocols 3 2 1 Deploying SMB Direct The Server Message Block SMB protocol is a network file sharing protocol implemented in Microsoft Windows The set of message packets that defines a particular version of the protocol is called a dialect Mellanox Technologies 36 Rev 1 10 The Microsoft SMB protocol is a client server implementation and consists of a set of data pack ets each containing a request sent by the client or a response sent by the server SMB protocol is used on top of the TCP IP protocol or other network protocols Using the SMB protocol allows applications to access files or other resources on a remote server to read create and update them In addition it enables communication with any server program that is set up to receive an SMB client request 3 2 1 1 SMB Configuration Verification 3 2 1 1 1 Verifying Network Adapter Configuration Use
34. destination RSS can significantly improve the number of transactions the number of con nections per second and the network throughput This parameter can be set to one of the following values Enabled default Set RSS Mode Disabled The hardware is configured once to use the Toeplitz hash function and the indirection table is never changed IOAT is not used while in RSS mode IN Receive Completion Method Sets the completion methods of the received packets and can affect network throughput and CPU utili zation Polling Method Increases the CPU utilization as the system polls the received rings for the incoming packets However it may increase the network performance as the incoming packet is handled faster Interrupt Method Optimizes the CPU as it uses interrupts for handling incoming messages However in certain scenarios it can decrease the network throughput Adaptive Default Settings A combination of the interrupt and polling methods dynamically depending on traffic type and network usage Choosing a different setting may improve network and or system performance in certain configu rations Interrupt Moderation RX Packet Count Number of packets that need to be received before an interrupt is generated on the receive side default 5 Interrupt Moderation RX Packet Time Maximum elapsed time in usec between the receiving of a packet and the generation of an inter rupt even if the mode
35. e Be aware that differ ent network devices calculate the frame size differently Some devices include the header i e information in the frame size while others do not Mellanox adapters do not include Ethernet header informa tion in the frame size i e when setting JumboPacket to 1500 the actual frame size is 1514 ReceiveBuffers 512 The number of packets each ring receives This parameter affects the memory consumption and the performance Increasing this value can enhance receive performance but also consumes more system memory In case of lack of received buffers dropped packets or out of order received packets you can increase the number of received buffers The valid values are 256 up to 4096 TransmitBuffers 2048 The number of packets each ring sends Increasing this value can enhance transmission performance but also consumes system memory The valid values are 256 up to 4096 SpeedDuplex 7 The Speed and Duplex settings that a device supports This registry key should not be changed and it can be used to query the device capability Mellanox ConnectX device is set to 7 meaning10Gbps and Full Duplex Note Default value should not be modified Mellanox Technologies 39 Rev 1 10 Value Name Default Value Description RxIntModerationPro 2 Enables the assignment of different interrupt moderation pro file files for receive completions Interrupt moderation can have a great eff
36. e IP based GIDs by default A straightforward extension of the RoCE protocol enables traffic to operate in layer 3 environ ments This capability is obtained via a simple modification of the RoCE packet format Instead of the GRH used in RoCE routable RoCE packets carry an IP header which allows traversal of IP L3 Routers and a UDP header that serves as a stateless encapsulation layer for the RDMA Transport Protocol Packets over IP Mellanox Technologies 25 Rev 1 10 Figure 2 RoCE and RoCE v2 Frame Format Differences EtherType indicates that packet is RoCE Le next header is IB GRH RoCE e gt UDI x LI EtherType indicates that packet is IP DP dport number Indicates i e next header is IP ip protocol number that next header is IB BTH indicates that packet is UDP The proposed RoCEv2 packets use a well known UDP destination port value that unequivocally distinguishes the datagram Similar to other protocols that use UDP encapsulation the UDP source port field is used to carry an opaque flow identifier that allows network devices to imple ment packet forwarding optimizations e g ECMP while staying agnostic to the specifics of the protocol header format The UDP source port is calculated as follows UDP SrcPort SrcPort XOR DstPort OR 0xcooo where SrcPort and DstPort are the ports used to establish the connection For example in a Network Direct application when connecting to a remote peer the destinat
37. e cases the DHCP server may require the MAC address of the network adapter installed in your machine gt To obtain the MAC address Step 1 Open a CMD console Windows Server 2012 R2 Click Start gt Task Manager gt File gt Run new task gt and enter CMD Step 2 Display the MAC address as Physical Address ipconfig all Configuring a static IP is the same for Ethernet adapters To assign a static IP address to a network port after installation Step 1 Open the Network Connections window Locate Local Area Connections with Mellanox devices Network and Inter Network Connections Organize v Name Ethernet 11 Ethernet 12 Ethernet 13 Ethernet 15 Ethernet Ethernet 3 Local Area Connection 9 Mellanox Technologies 23 Rev 1 10 Step 2 Right click a Mellanox Local Area Connection and left click Properties Networking Sharing Connect using Lu Mellanox ConnectX 4 VPI Adapter MT4115 6 This connection uses the following items Client for Microsoft Networks 8 File and Printer Sharing for Microsoft Networks E 5 QoS Packet Scheduler 2 NDISTest600F041 Prot instance 4 NDISAPITest0f041Prot instance 2 NDISTest6003d41 Prot instance amp NDISAPITest03d41Prot instance m lt gt Uninstall Properties Description Allows your computer to access res
38. ect on optimizing network throughput and CPU utilization The valid values are 0 Low Latency Implies higher rate of interrupts to achieve better latency or to handle scenarios where only a small number of streams are used e 1 Moderate Interrupt moderation is set to midrange defaults to allow maxi mum throughput at minimum CPU utilization for common sce narios 2 Aggressive Interrupt moderation is set to maximal values to allow maxi mum throughput at minimum CPU utilization for more inten sive multi stream scenarios TxIntModerationPro 1 Enables the assignment of different interrupt moderation pro file files for send completions Interrupt moderation can have great effect on optimizing network throughput and CPU utili zation The valid values are 0 Low Latency Implies higher rate of interrupts to achieve better latency or to handle scenarios where only a small number of streams are used e 1 Moderate Interrupt moderation is set to midrange defaults to allow maxi mum throughput at minimum CPU utilization for common sce narios 2 Aggressive Interrupt moderation is set to maximal values to allow maxi mum throughput at minimum CPU utilization for more inten sive multi stream scenarios 3 3 3 Off load Registry Keys This group of registry keys allows the administrator to specify which TCP IP offload settings are handled by the adapter rather than by the operating system Mellan
39. epresentative or Mellanox Support at support mellanox com 5 1 Installation Related Troubleshooting Table 10 Installation Related Issues Issue Cause Solution The installation of An incorrect driver version Use the correct driver package accord WinOF 2 fails with the might have been installed ing to the CPU architecture following error mes e g you are trying to install sage a 64 bit driver on a 32 bit This installation machine or vice versa package is not sup ported by this pro cessor type Contact your product ven dor The installation of A known issue in windows Follow the recommendation in the arti WinOF 2 fails and reads installer when using the cle as follows The chain MSI feature as installation cannot described in the following be done while the link RDSH service is snabled tlease dig http rcmtech word able it You may re Pfess com 2013 08 27 enable it after the Server 2012 remote desk installation is com top session host installa plete tion hangs at windows installer coordinator 5 1 1 Installation Error Codes and Troubleshooting 5 1 1 1 Setup Return Codes Table 11 Setup Return Codes Error Code Description Troubleshooting 1603 Fatal error during installation Contact support 1633 The installation package is not supported Make sure you are installing the on this platform right package for your
40. es Step 1 Display the Device Manager Mellanox Technologies 35 Rev 1 10 Step2 Right click a Mellanox network adapter under Network adapters list and left click Properties Select the Advanced tab from the Properties sheet Step 3 Modify configuration parameters to suit your system Please note the following For help on a specific parameter option check the help button at the bottom of the dialog Ifyou select one of the entries Off load Options Performance Options or Flow Control Options you ll need to click the Properties button to modify parameters via a pop up dialog 3 1 6 Receive Side Scaling RSS RSS settings can be set per individual adapters as well as globally To do so set the registry keys listed below For instructions on how to find interface index in registry nn please refer to Section 3 3 1 Finding the Index Value of the Network Interface on page 39 Table 7 Registry Keys Setting Sub key Description HKLM S YSTEM CurrentControlSet Con Maximum number of CPUs allotted trol Class 4d36e972 e325 11ce bfcl Sets the desired maximum number of 08002be10318 lt nn gt MaxRSSProcessors processors for each interface The num ber can be different for each interface Note Restart the network adapter after you change this registry key HKLM S YSTEM CurrentControlSet Con Base CPU number Sets the desired trol Class 4d36e972 e325 11ce bfcl base CPU number for ea
41. f the File Server is listening on the RDMA interfaces Mellanox Technologies 37 Rev 1 10 3 2 1 2 Verifying SMB Events that Confirm RDMA Connection gt To confirm RDMA connection verify the SMB events Step 1 Step 2 Open a PowerShell window on the SMB client Run the following cmdlets NOTE Any RDMA related connection errors will be displayed as well PS Get WinEvent LogName Microsoft Windows SMBClient Operational Message match RDMA For further details on how to configure the switches to be lossless please refer to https community mellanox com L 3 3 Configuration Using Registry Keys 3 3 1 Finding the Index Value of the Network Interface To find the index value of your Network Interface from the Device Manager please perform the following steps Step 1 Open Device Manager and go to Network Adapters Step2 Right click gt Properties on Mellanox Connect X Ethernet Adapter Step3 Go to Details tab Step 4 Select the Driver key and obtain the nn number In the below example the index equals 0010 a Device Manager exe ss File Action View Help 9 Hs amp FRB 4 di dev w072 gt jill Computer Mellanox Technologies 38 gt ca Disk drives p KE Display adapters b Qs Human Interface Devices p IEEE 1394 host controllers p amp Keyboards b n Mice and other pointing devices b Ki Monitors 4 amp Network adapters XX Intel R 1350 Gigabit Network Connection Ca Intel R
42. hernet RoCE is a mechanism to provide this efficient data transfer with very low latencies on loss less Ethernet networks With advances in data center convergence over reliable Ethernet ConnectX EN with RoCE uses the proven and efficient RDMA transport to provide the platform for deploying RDMA technology in mainstream data center application at 10GigE and 40GigE link speed ConnectX EN with its hardware offload support takes advantage of this efficient RDMA transport InfiniBand services over Ethernet to deliver ultra low latency for performance critical and transaction intensive applications such as financial database storage and content delivery networks RoCE encapsulates IB transport and GRH headers in Ethernet packets bearing a dedicated ether type While the use of GRH is optional within InfiniBand subnets it is mandatory when using RoCE Applications written over IB verbs should work seamlessly but they require provisioning of GRH information when creat ing address vectors The library and driver are modified to provide mapping from GID to MAC addresses required by the hardware 3 1 2 1 IP Routable RoCEv2 RoCE has two addressing modes MAC based GIDs and IP address based GIDs In RoCE IP based if the IP address changes while the system is running the GID for the port will automati cally be updated with the new IP address using either IPv4 or IPv6 RoCE IP based allows RoCE traffic between Windows and Linux systems which us
43. ifferent setting may improve network and system performance in certain configura tions The valid values are e I static 2 adaptive The interrupt moderation count and time are configured dynamically based on traffic types and rate Mellanox Technologies 43 Rev 1 10 Value Name Default Value Description RSS 1 Sets the driver to use Receive Side Scaling RSS mode to improve the performance of handling incom ing packets This mode allows the adapter port to uti lize the multiple CPUs in a multi core system for receiving incoming packets and steering them to their destination RSS can significantly improve the number of transactions per second the number of connections per second and the network throughput This parameter can be set to one of two values e 1 enable default Sets RSS Mode 0 disable The hardware is configured once to use the Toeplitz hash function and the indirection table is never changed Note the I O Acceleration Technology IOAT is not functional in this mode ReturnPacketThresh 341 The allowed number of free received packets on the old rings Any number above it will cause the driver to return the packet to the hardware immediately When the value is set to 0 the adapter uses 2 3 of the received ring size The valid values are 0 to 4096 Note This registry value is not exposed via the UI NumTcb 16 The number of send buffers that the driver allocates fo
44. ing disclaimer Mads ml nnn in hinas Fawr m mmis mm tha shara O I accept the terms in the license agreement 8 I do not accept the terms in the license agreement InstallShield a ee 7 Mellanox Technologies 14 Rev 1 10 Step 5 Select the target folder for the installation Click Next to install to this folder or dick Change to install to a differer Install MLNX WinOF2 to C Program Files Mellanox MLNX_WinOF2 InstallShield Step 6 Select a Complete or Custom installation follow Step a and on on page 16 Setup Type Choose the setup type that best suits your needs Please select a setup type O Complete All program features will be installed Requires the most disk space Custom Choose which program features you want installed and where they will be installed Recommended for advanced users InstallShield a Select the desired feature to install Performances tools install the performance tools that are used to measure performance in user envi ronment Mellanox Technologies 15 Rev 1 10 Documentation contains the User Manual and Release Notes Custom Setup Select the program features you want installed Click on an icon in the list below to change how a feature is installed Feature Description configurations Network Direct InfiniBand This feature requires 496KB on your hard drive Install to C Program Files Mellanox
45. ini Band node MFT can be used for Generating a standard or customized Mellanox firmware image Querying for firmware information Burning a firmware image to a single InfiniBand node Enabling changing card configuration to support SRIOV WinOF 2 Release Notes For possible software issues please refer to WinOF 2 Release Notes Mellanox Technologies 10 Rev 1 10 1 Introduction This User Manual describes installation configuration and operation of Mellanox WinOF 2 driver Rev 1 10 package Mellanox WinOF 2 is composed of several software modules that contain Ethernet drivers It supports 10 25 40 50 or 100 Gb s Ethernet network ports The port type is determined upon boot based on card capabilities and user settings The Mellanox WinOF 2 driver release introduces the following capabilities Support for ConnectX 4 single and dual port adapter cards Up to 16 Rx queues per port Dedicated PCI function per physical port Rxsteering mode RSS Hardware Tx Rx checksum calculation Large Send off load i e TCP Segmentation Off load Receive Side Coalescing RSC or LRO in Linux Hardware multicast filtering Adaptive interrupt moderation Support for MSI X interrupts NDK with SMB Direct NDvl and v2 API support in user space e VMQ for Hypervisor Hardware VLAN filtering RDMA over Converged Ethernet e RoCE MAC Based v1 RROCE over UDP v2 1 1 Supplied Packages Mellanox Wi
46. ion IP address and the destination port must be provided as they are used in the calculation above The source port provision is optional Furthermore since this change exclusively affects the packet format on the wire and due to the fact that with RDMA semantics packets are generated and consumed below the AP applications can seamlessly operate over any form of RDMA service including the routable version of RoCE as shown in Figure 2 RoCE and RoCE v2 Frame Format Differences in a completely trans parent way 1 Standard RDMA APIs are IP based already for all existing RDMA technologies Mellanox Technologies 26 Rev 1 10 Figure 3 RoCE and RoCEv2 Protocol Stack RDMA Application d gt OFA Open Fabric Alliance Stack T RDMA API Verbs z Fi E a RoCE v1 RoCE v2 2 MEE Ethemettinktayer i The fabric must use the same protocol stack in order for nodes to communicate EN The default RoCE mode in Windows is MAC based The default RoCE mode in Linux is IP based In order to communicate between Windows and Linux over RoCE please use RoCE v2 the default mode for Windows WI 3 1 2 2 RoCE Configuration In order to function reliably RoCE requires a form of flow control While it is possible to use global flow control this is normally undesirable for performance reasons The normal and optimal way to use RoCE is to use Priority Flow Control
47. iption RoceMaxFrameSize 1024 The maximum size of a frame or a packet that can be sent by the RoCE protocol a k a Maximum Transmission Unit MTU Using larger RoCE MTU will improve the performance however one must ensure that the entire system including switches supports the defined MTU Ethernet packet uses the general MTU value whereas the RoCE packet uses the RoCE MTU The valid values are e 256 512 e 1024 e 2048 Note This registry key is supported only in Ethernet drivers Priority VLANTag 3 Packet Pri Enables sending and receiving IEEE 802 3ac tagged frames ority amp which include VLAN e 802 1p QoS Quality of Service tags for priority tagged pack Enabled ets e 802 10 tags for VLANs When this feature is enabled the Mellanox driver supports sending and receiving a packet with VLAN and QoS tag Mellanox Technologies 46 Rev 1 10 Value Name Default Value Description Promiscuous Vlan 0 Specifies whether a promiscuous VLAN is enabled or not When this parameter is set all the packets with VLAN tags are passed to an upper level without executing any filtering The valid values are 0 disable e 1 enable Note This registry value is not exposed via the UI 3 3 5 1 Flow Control Options This group of registry keys allows the administrator to control the TCP IP traffic by pausing frame transmitting and or receiving operations By enabling the Flow
48. irmware does not mode of the installed NIC RoCE mode can be enabled and disabled either via the registry key or the PowerShell RoCE is enabled by default To enable it using the registry key Setthe roce mode as follows HKEY LOCAL MACHINE SYSTEM CurrentControlSet Services mlx5 Parameters Roce For changes to take effect please restart the network adapter after changing this registry key p 3 1 2 6 1 Registry Key Parameters The following are per driver and will apply to all available adapters Table 6 Registry Key Parameters Parameters Name rond Description Allowed Values and Default roce mode DWORD Sets the RoCE mode The following RoCE MAC Based 0 are the possible RoCE modes e RoCE v2 2 e RoCE MAC Based e No RoCE 4 RoCE v2 Default No RoCE No RoCE 3 1 3 Teaming and VLAN Windows Server 2012 and above supports Teaming as part of the operating system Please refer to Microsoft guide NIC Teaming in Windows Server 2012 following the link below http www microsoft com en us download confirmation aspx id 40319 Note that the Microsoft teaming mechanism is only available on Windows Server distributions Mellanox Technologies 30 Rev 1 10 3 1 3 1 Configuring a Network Interface to Work with VLAN in Windows Server 2012 and Above In this procedure you DO NOT create a VLAN rather use an existing VLAN ID a gt To configure a port to work with VLAN using
49. ists of NDK performance counters These performance counters allow you to track Network Direct Kernel RDMA activity including traffic rates errors and control plane activity Table 8 RDMA Activity RDMA Activity Counters Description RDMA Accepted Connections The number of inbound RDMA connections established RDMA Active Connections The number of active RDMA connections RDMA Completion Queue This counter is not supported and always is set to zero Errors RDMA Connection Errors The number of established connections with an error before a consumer disconnected the connection RDMA Failed Connection The number of inbound and outbound RDMA connection Attempts attempts that failed RDMA Inbound Bytes sec The number of bytes for all incoming RDMA traffic This includes additional layer two protocol overhead RDMA Inbound Frames sec The number in frames of layer two frames that carry incoming RDMA traffic RDMA Initiated Connections The number of outbound connections established RDMA Outbound Bytes sec The number of bytes for all outgoing RDMA traffic This includes additional layer two protocol overhead RDMA Outbound Frames sec The number in frames of layer two frames that carry outgoing RDMA traffic Mellanox Technologies 53 Rev 1 10 4 Utilities 4 1 Fabric Performance Utilities The performance utilities described in this chapter are intended to be used as a perfo
50. ith all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation Mellanox Technologies 65 Rev 1 10 nd_send_bw Synopsis lt running on specific single core gt Server side start b affinity 0X1 nd send bw s1048576 D10 S 11 137 53 1 Client side start b wait affinity 0X1 nd send bw s1048576 D10 C db 35917559 11 nd send bw Options The table below lists the various flags of the command Table 20 nd send bw Flags and Options Flag Description h Shows the Help screen Vy Shows the version number p Connects to the port lt port gt lt default 6830 gt s lt msg size gt Exchanges the message size with lt default 65536B gt and it must not be combined with a flag Runs all the messages sizes from 1B to 8MB and it must not be combined with s flag n lt num of iterations gt The number of exchanges at least 2 the default is 100000 I lt max inline size gt The maximum size of message to send inline The default number is 128B D test duration in seconds Tests duration in seconds f margin time in seconds The margin time to avoid calculation and it must be less than half of the duration time Q CQ Moderation lt value gt The default num ber is 100 S server interface IP lt server side only must be last parameter gt C server interface IP client side only must be la
51. lists the various flags of the command Table 19 nd read lat Options Flags Description h Shows the Help screen y Shows the version number p Connects to the port lt port gt lt default 6830 gt s lt msg size gt Exchanges the message size with lt default 65536B gt and it must not be combined with a flag a Runs all the messages sizes from 1B to 8MB and it must not be combined with s flag n lt num of iterations gt The number of exchanges at least 2 the default is 100000 I lt max inline size gt The maximum size of message to send inline The default number is 128B D test duration in seconds Tests duration in seconds f margin time in seconds The margin time to avoid calculation and it must be less than half of the duration time S server interface IP server side only must be last parameter C server interface IP client side only must be last parameter h Shows the Help screen A 5 ond send bw This test is used for performance measuring of Send requests in Microsoft Windows Operating Systems nd send bw is performance oriented for Send with maximum throughput and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alternatively customized test duration time nd send bw runs w
52. ltiple inter faces Close the network capture tool on the physical adapter card and set it on the team interface instead No Ethernet connectiv ity on 10Gb adapters after activating Perfor mance Tuning part of the installation A TcpWindowSize registry value might have been added Remove the value key under HKEY LOCAL MACHINENSYSTEMNCur rentControlSet Ser vices Tcpip Parameters TcpWind owSize Or e Set its value to oxFFFF Packets are being lost The port MTU might have been set to a value higher than the maximum MTU supported by the switch Change the MTU according to the maxi mum MTU supported by the switch NVGRE changes done on a running VM are not propagated to the VM The configuration changes might not have taken effect until the OS is restarted Stop the VM and afterwards perform any NVGRE configuration changes on the VM connected to the SR IOV enabled virtual switch Mellanox Technologies 59 Rev 1 10 5 3 Performance Related Troubleshooting Table 15 Performance Related Issues Issue Cause Solution Low performance issues The OS profile might not be 1 Go to Power Options in the Con configured for maximun trol Panel Make sure Maximum performace Performance is set as the power scheme 2 Reboot the machine 5 3 1 General Diagnostic Issue 1 Go to Device Manager locate the Mellanox adapter that you are
53. mmended to E back up the registry on your system before implementing recommendations included in this sec tion If the modifications you apply lead to serious problems you will be able to restore the original registry state For more details about backing up and restoring the registry please visit www micro soft com 3 4 4 General Performance Optimization and Tuning To achieve the best performance for Windows you may need to modify some of the Windows registries 3 4 1 1 Registry Tuning The registry entries that may be added changed by this General Tuning procedure are Under HKEY_LOCAL_MACHINE SYSTEM CurrentControlSet Services Tcpip Parameters Disable TCP selective acks option for better cpu utilization SackOpts type REG DWORD value set to 0 Under HKEY_ LOCAL MACHINE SYSTEM CurrentControlSet Services AFD Parameters Enable fast datagram sending for UDP traffic FastSendDatagramThreshold type REG DWORD value set to 64K Under HKEY LOCAL MACHINENSYSTEMNCurrentControlSetNServicesNdis Parameters Set RSS parameters RssBaseCpu type REG DWORD value set to 1 3 4 1 2 Enable RSS Enabling Receive Side Scaling RSS is performed by means of the following command netsh int tcp set global rss enabled 3 4 1 3 Improving Live Migration In order to improve live migration over SMB direct performance please set the following regis try key to 0 and reboot the machine HKEY LOCAL MACHINE VSystemVCurrentControlSe
54. nOF 2 driver Rev 1 10 includes the following package MLNX WinOF2 1 10 All x64 exe 1 2 WinOF 2 Set of Documentation Under lt installation_directory gt Documentation License file User Manual this document MLNX WinOF 2 Release Notes 1 WinOF 2 does not support earlier Mellanox adapters For earlier adapters the Windows driver is MLNX_WinOF Mellanox Technologies 11 Rev 1 10 1 3 Windows MPI MS MPI Message Passing Interface MPI is meant to provide virtual topology synchronization and com munication functionality between a set of processes MPI enables running one process on several hosts Windows MPI runs over the following protocols Sockets Ethernet Network Direct ND For further details on MPI please refer to Appendix B Windows MPI MS MPI on page 69 Mellanox Technologies 12 Rev 1 10 2 Installation 2 1 Hardware and Software Requirements Table 5 Hardware and Software Requirements Description Package Windows Server 2012 R2 64 bit only MLNX WinOF2 1 10 All x64 exe Windows Server 2012 64 bit only MLNX WinOF2 1 10 All x64 exe a The Operating System listed above must run with administrator privileges 2 2 Installing Mellanox WinOF 2 Driver WinOF 2 supports adapter cards based on the Mellanox ConnectX 4 family of adapter IC gt devices only If you have ConnectX 3 and ConnectX 3 Pro on your server you will need to install WinOF drive
55. ompt or warning to the user Use the norestart or forcerestart standard command line options to control reboots gt To uninstall MLNX_WinOF in unattended mode Step 1 Open a CMD console Windows Server 2012 R2 Click Start gt Task Manager gt File gt Run new task gt and enter CMD Step 2 Uninstall the driver Run gt MLNX WinOF2 1 10 All win2012 x64 exe S x v qn 2 6 Firmware Upgrade If the machine has a standard Mellanox card with an older firmware version the firmware will be automatically updated as part of the WinOF 2 package installation For information on how to upgrade firmware manually please refer to MFT User Manual www mellanox com gt Products gt InfiniBand VPI Drivers gt Firmware Tools The adapter card may not have been shipped with the latest firmware version The section below describes how to update firmware Mellanox Technologies 22 Rev 1 10 3 Features Overview and Configuration Once you have installed Mellanox WinOF 2 package you can perform various modifications to your driver to make it suitable for your system s needs Changes made to the Windows registry happen immediately and no backup is automati Wm cally made Do not edit the Windows registry unless you are confident regarding the changes 3 1 Ethernet Network 3 1 1 Assigning Port IP After Installation By default your machine is configured to obtain an automatic IP address via a DHCP server In som
56. on because the sys tem polls the received rings for incoming packets how ever it may increase the network bandwidth since the incoming packet is handled faster Adaptive combines the interrupt and polling methods dynamically depending on traffic type and network usage The valid values are 0 polling e 1 adaptive InterruptModeration 1 Sets the rate at which the controller moderates or delays the generation of interrupts making it possible to optimize network throughput and CPU utilization When disabled the interrupt moderation of the system generates an interrupt when the packet is received In this mode the CPU utilization is increased at higher data rates because the system must handle a larger number of interrupts However the latency is decreased since that packet is processed more quickly When interrupt moderation is enabled the system accumulates interrupts and sends a single interrupt rather than a series of interrupts An interrupt is gener ated after receiving 5 packets or after the passing of 10 micro seconds from receiving the first packet The valid values are 0 disable e l enable RxIntModeration 2 Sets the rate at which the controller moderates or delays the generation of interrupts making it possible to optimize network throughput and CPU utilization The default setting Adaptive adjusts the interrupt rates dynamically depending on traffic type and net work usage Choosing a d
57. ormal operation Mellanox ConnectX Ethernet Adapter X device detected that the link connected to port Y is down This can occur if the physical link is disconnected or damaged or if the other end port is down Mismatch in the configurations between the two ports may affect the performance When Using MSI X both ports should use the same RSS mode To fix the problem configure the RSS mode of both ports to be the same in the driver GUI Mellanox ConnectX Ethernet Adapter X device failed to create enough MSI X vectors The Network interface will not use MSI X interrupts This may affects the performance To fix the problem configure the number of MSI X vectors in the registry to be at least Y Mellanox Technologies 61 Rev 1 10 Appendix A Performance Tools A 1 nd write bw This test is used for performance measuring of RDMA Write requests in Microsoft Windows Operating Systems nd write bw is performance oriented for RDMA Write with maximum throughput and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized num ber of iterations or alternatively customized test duration time nd write bw runs with all mes sage sizes from 1B to 4MB powers of 2 message inlining CQ moderation nd write bw Synopsis running on specific single core Server side start b affinity 0X1 nd write bw s1048576
58. ources on a Microsoft network Step 3 Select Internet Protocol Version 4 TCP IPv4 from the scroll list and click Properties Step 4 Select the Use the following IP address radio button and enter the desired IP information General Alternate Configuration You can get IP settings assigned automatically if your network supports this capability Otherwise you need to ask your network administrator for the appropriate IP settings Obtain an IP address automatically O Use the following IP address IP address Subnet mask Default gatewa O Obtain DNS server address automatically 8 Use the following DNS server addresses Preferred DNS server m Alternate DNS server Validate settings upon exit Step 5 Click OK Step 6 Close the Local Area Connection dialog Mellanox Technologies 24 Rev 1 10 Step 7 Verify the IP configuration by running ipconfig from a CMD console gt ipconfig Ethernet adapter Local Area Connection 4 Connection specific DNS Suffix UPVACUKCSS Mire tr er oa ee e a p263 Sue WEE o na o E 6 2 5E DONO DetaultCatewav wren e ec 3 1 2 RDMA over Converged Ethernet RoCE Remote Direct Memory Access RDMA is the remote memory management capability that allows server to server data movement directly between application memory without any CPU involvement RDMA over Converged Et
59. ox Technologies 40 Rev 1 10 Enabling offloading services increases transmission performance Due to offload tasks such as checksum calculations performed by adapter hardware rather than by the operating system and therefore with lower latency In addition CPU resources become more available for other tasks Value Name Default Value Description LsoV1IPv4 1 Large Send Offload Version 1 IPv4 The valid values are e Q disable 1 enable LsoV2IPv4 1 Large Send Offload Version 2 IPv4 The valid values are e Q disable 1 enable LsoV2IPv6 1 Large Send Offload Version 2 IPv6 The valid values are e Q disable 1 enable LSOSize 64000 The maximum number of bytes that the TCP IP stack can pass to an adapter in a single packet This value affects the memory consumption and the NIC per formance The valid values are MTU 1024 up to 64000 Note This registry key is not exposed to the user via the UI If LSOSize is smaller than MTU 1024 LSO will be dis abled LSOMinSegment 2 The minimum number of segments that a large TCP packet must be divisible by before the transport can offload it to a NIC for segmentation The valid values are 2 up to 32 Note This registry key is not exposed to the user via the UI LSOTcpOptions 1 Enables that the miniport driver to segment a large TCP packet whose TCP header contains TCP options The valid values are 0 disable e 1 enable
60. priority flow control mode on config if Et20 priority flow control priority 3 no drop 3 1 2 4 1 Using Global Pause Flow Control gt To enable Global Pause on ports that face the hosts perform the following config interface et10 config if Et10 flowcontrol receive on config if Et10 flowcontrol send on 3 1 2 4 2 Using Priority Flow Control PFC gt To enable Global Pause on ports that face the hosts perform the following config interface et10 config if Et10 dcbx mode ieee config if Et10 priority flow control mode on config if Et10 priority flow control priority 3 no drop 3 1 2 5 Configuring Router PFC only The router uses L3 s DSCP value to mark the egress traffic of L2 PCP The required mapping maps the three most significant bits of the DSCP into the PCP This is the default behavior and no additional configuration is required Mellanox Technologies 29 Rev 1 10 3 1 2 5 1 Copying Port Control Protocol PCP between Subnets The captured PCP option from the Ethernet header of the incoming packet can be used to set the PCP bits on the outgoing Ethernet header 3 1 2 6 Configuring the RoCE Mode Configuring the RoCE mode requires the following e RoCE mode is configured per driver and is enforced on all the devices in the system support the needed mode the fallback mode would be the maximum supported RoCE E The supported RoCE modes depend on the firmware installed If the f
61. r e For details on how to install WinOF driver please refer to WinOF User Manual This section provides instructions for two types of installation procedures e Attended Installation An installation procedure that requires frequent user intervention e Unattended Installation An automated installation procedure that requires no user intervention Both Attended and Unattended installations require administrator privileges ao 2 2 1 Attended Installation The following is an example of an installation session Step 1 Double click the exe and follow the GUI instructions to install MLNX_WinOF2 Step 2 Optional Manually configure your setup to contain the logs option gt MLNX WinOF2 1 10 All x64 exe v 1 vx LogFile Mellanox Technologies 13 Rev 1 10 Step 3 Click Next in the Welcome screen Welcome to the InstallShield Wizard for MLNX WinOF2 The InstallShield R Wizard will install MLNX WinOF2 on your computer To continue dick Next WARNING This program is protected by copyright law and international treaties Please read the following license agreement carefully Copyright c 2005 2015 Mellanox Technologies All rights reserved Redistribution and use in source and binary forms with or without modification are permitted provided that the following conditions are met Redistributions of source code must retain the above copyright notice this list of conditions and the follow
62. r assuming its IP address is 192 168 1 2 PS Set DnsClientServerAddress InterfaceAlias Ethernet 4 ServerAddresses WS 156 IZ After establishing the priorities of ND NDK traffic the priorities must have PFC 2d enabled on them Step 9 Disable Priority Flow Control PFC for all other priorities except for 3 PS Disable NetQosFlowControl 0 1 2 4 5 6 7 Step 10 Enable QoS on the relevant interface PS Enable NetAdapterQos InterfaceAlias Ethernet 4 Step 11 Enable PFC on priority 3 PS Enable NetQosFlowControl Priority 3 To add the script to the local machine startup scripts Step 1 From the PowerShell invoke gpedit msc Step2 In the pop up window under the Computer Configuration section perform the following 1 Select Windows Settings 2 Select Scripts Startup Shutdown 3 Double click Startup to open the Startup Properties Mellanox Technologies 34 Rev 1 10 4 Move to PowerShell Scripts tab a Local Group Policy Editor File Action View Help 9 alr El b am Ef Local Computer Policy 4 f Computer Configuration Startup Name b E Software Settings 4 E Windows Settings v d Display Properties uns e 3 Gescripticae E Windows PowerShel Startup Scripts for Local Computer b m3 security Settings Contains computer startup scripts gt gly Policy based QoS p 1 Administrative Templates Name Parameters a b User Configuration gt El Software Settings gt
63. r options are not set to Maximum Performance Mellanox Technologies 60 Rev 1 10 5 4 Reported Driver Events The driver records events in the system log of the Windows server event system which can be used to identify diagnose and predict sources of system problems To see the log of events open System Event Viewer as follows Right click on My Computer click Manage and then click Event Viewer OR 1 Click start gt Run and enter eventvwr exe 2 In Event Viewer select the system log The following events are recorded Mellanox ConnectX Ethernet Adapter X has been successfully initialized and enabled Failed to initialize Mellanox ConnectX Ethernet Adapter Mellanox ConnectX Ethernet Adapter X has been successfully initialized and enabled The port s network address is MAC Address The Mellanox ConnectX Ethernet was reset Failed to reset the Mellanox ConnectX Ethernet NIC Try disabling then re enabling the Mellanox Ethernet Bus Driver device via the Windows device manager Mellanox ConnectX Ethernet Adapter X has been successfully stopped Failed to initialize the Mellanox ConnectX Ethernet Adapter X because it uses old firmware version old firmware version You need to burn firmware version new firmware version or higher and to restart your computer Mellanox ConnectX Ethernet Adapter X device detected that the link connected to port Y is up and has initiated n
64. r sending purposes Each buffer is in LSO size if LSO is enabled or in MTU size otherwise The valid values are 1 up to 64 Note This registry value is not exposed via the UI ThreadPoll 10000 The number of cycles that should be passed without receiving any packet before the polling mechanism stops when using polling completion method for receiving Afterwards receiving new packets will gen erate an interrupt that reschedules the polling mecha nism The valid values are 0 up to 200000 Note This registry value is not exposed via the UI Mellanox Technologies 44 Rev 1 10 Value Name Default Value Description AverageFactor 16 The weight of the last polling in the decision whether to continue the polling or give up when using polling completion method for receiving The valid values are O up to 256 Note This registry value is not exposed via the UI AveragePollThresh 10 The average threshold polling number when using old polling completion method for receiving If the aver age number is higher than this value the adapter con tinues to poll The valid values are 0 up to 1000 Note This registry value is not exposed via the UI ThisPollThreshold 100 The threshold number of the last polling cycle when using polling completion method for receiving If the number of packets received in the last polling cycle is higher than this value the adapter continues to poll The valid values a
65. ration count has not been reached default 10 Rx Interrupt Moderation Type Sets the rate at which the controller moderates or delays the generation of interrupts making it pos sible to optimize network throughput and CPU utilization The default setting Adaptive adjusts the interrupt rates dynamically depending on the traffic type and network usage Choosing a differ ent setting may improve network and system performance in certain configurations Send completion method Mellanox Technologies 51 Rev 1 10 Sets the completion methods of the Send packets and it may affect network throughput and CPU utilization Interrupt Moderation TX Packet Count Number of packets that need to be sent before an interrupt is generated on the send side default 0 Interrupt Moderation TX Packet Time Maximum elapsed time in usec between the sending of a packet and the generation of an inter rupt even if the moderation count has not been reached default 0 Offload Options Allows you to specify which TCP IP offload settings are handled by the adapter rather than the operating system Enabling offloading services increases transmission performance as the offload tasks are per formed by the adapter hardware rather than the operating system Thus freeing CPU resources to work on other tasks Pv4 Checksums Offload Enables the adapter to compute IPv4 checksum upon transmit and or receive instead of the CPU default Enabled
66. re 0 up to 1000 Note This registry value is not exposed via the UI VlanId 0 Enables packets with VlanId It is used when no team intermediate driver is used The valid values are 0 disable No Vlan Id is passed e 1 4095 Valid Vlan Id that will be passed Note This registry value is only valid for Ethernet NumRSSQueues 8 The maximum number of the RSS queues that the device should use Note This registry key is only in Windows Server 2012 and above Mellanox Technologies 45 Rev 1 10 Value Name Default Value Description BlueFlame 1 The latency critical Send WQES to the device When a BlueFlame is used the WQEs are written directly to the PCI BAR of the device in addition to memory so that the device may handle them without having to access memory thus shortening the execution latency For best performance it is recommended to use the BlueFlame when the HCA is lightly loaded For high bandwidth scenarios it is recommended to use regular posting without BlueFlame The valid values are 0 disable e 1 enable Note This registry value is not exposed via the UI MaxRSSProcessors 8 The maximum number of RSS processors Note This registry key is only in Windows Server 2012 and above 3 3 5 Ethernet Registry Keys The following section describes the registry keys that are only relevant to Ethernet driver Value Name Default Value Descr
67. rityValue8021Action 3 New NetQosPolicy DEFAULT Default PriorityValue8021Action 3 New NetQosPolicy TCP IPProtocolMatchCondition TCP PriorityValue8021Actionl New NetQosPolicy UDP IPProtocolMatchCondition UDP PriorityValue8021Action 1 Enable PFC on priority 3 Enable NetQosFlowControl 3 Disable Priority Flow Control PFC for all other priorities except for 3 Disable NetQosFlowControl 0 1 2 4 5 6 7 Enable QoS on the relevant interface Enable netadapterqos Name B 5 2 Running MPI Command Examples Running MPI pallas test over ND gt mpiexec exe p 19020 hosts 4 11 11 146 101 11 21 147 101 DATES 11 11 145 101 env MPICH NETMASK 11 0 0 0 255 0 0 0 env MPICH ND ZCOPY THRESHOLD 1 env MPICH DISABLE ND 0 env MPICH DISABLE SOCK 1 affinity c testl exe Running MPI pallas test over ETH gt exempiexec exe p 19020 hosts 4 11 11 146 101 11 21 147 101 112i KAT Si 11 11 145 101 env MPICH NETMASK 11 0 0 0 255 0 0 0 env MPICH ND ZCOPY THRESHOLD 1 env MPICH DISABLE ND 1 env MPICH DISABLE SOCK 0 affinity c testl exe Mellanox Technologies 71
68. rmance micro benchmark They support both InfiniBand and RoCE For further information on the following tools please refer to the help text of the tool by running the help command line parameter hax Table 9 Fabric Performance Utilities Utility Description nd write bw This test is used for performance measuring of RDMA Write requests in Microsoft Windows Operating Systems nd write bw is perfor mance oriented for RDMA Wirite with maximum throughput and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alternatively cus tomized test duration time nd write bw runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation nd write lat This test is used for performance measuring of RDMA Write requests in Microsoft Windows Operating Systems nd write lat is perfor mance oriented for RDMA Wirite with minimum latency and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized mes sage size customized number of iterations or alternatively customized test duration time nd write lat runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation nd read bw This test is used for performance measuring of RDMA Read requests in Microsof
69. sslessin the switches A gt To force MSMPI to work over ND and not over sockets add the following in mpiexec com mand env MPICH DISABLE ND 0 env MPICH DISABLE SOCK 1 B 5 Configuring MPI Step 1 Configure all the hosts in the cluster with identical PFC see the PFC example below Step2 Run the WHCK ND based traffic tests to Check PFC ndrping ndping ndrpingpong ndpingpong Step 3 Validate PFC counters during the run time of ND tests with Mellanox Adapter QoS Counters in the perfmon Step 4 Install the same version of HPC Pack in the entire cluster NOTE Version mismatch in HPC Pack 2012 can cause MPI to hung Step 5 Validate the MPI base infrastructure with simple commands such as hostname B 5 1 PFC Example In the example below ND and NDK go to priority 3 that configures no drop in the switches The TCP UDP traffic directs ALL traffic to priority 1 Install dcbx Install WindowsFeature Data Center Bridging Remove the entire previous settings Remove NetQosTrafficClass Remove NetQosPolicy Confirm False Setthe DCBX Willing parameter to false as Mellanox drivers do not support this feature Set NetQosDcbxSetting Willing 0 Mellanox Technologies 70 Rev 1 10 Create a Quality of Service QoS policy and tag each type of traffic with the relevant priority In this example we used TCP UDP priority 1 ND NDK priority 3 New NetQosPolicy SMB NetDirectPortMatchCondition 445 Prio
70. st parameter A 6 nd send lat This test is used for performance measuring of Send requests in Microsoft Windows Operating Systems nd send lat is performance oriented for Send with minimum latency and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alterna tively customized test duration time nd send lat runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation Mellanox Technologies 66 Rev 1 10 nd_send_lat Synopsis lt running on specific single core gt Server side start b affinity 0X1 nd send lat s1048576 D10 S ML EST 03 oll Client side start b wait affinity 0X1 nd send lat s1048576 D10 C JE MS oi nd_send_lat Options The table below lists the various flags of the command Table 21 nd_send_lat Options Flag Description h Shows the Help screen y Shows the version number p Connects to the port lt port gt lt default 6830 gt s lt msg size gt Exchanges the message size with lt default 65536B gt and it must not be combined with a flag Runs all the messages sizes from 1B to 8MB and it must not be combined with s flag n lt num of iterations gt The number of exchanges at least 2 the default is 100000 I lt max inline size gt The maximum size of mess
71. t Windows Operating Systems nd read bw is performance oriented for RDMA Read with maximum throughput and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized mes sage size customized number of iterations or alternatively customized test duration time nd read bw runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation Mellanox Technologies 54 Rev 1 10 Utility Description nd_read_lat This test is used for performance measuring of RDMA Read requests in Microsoft Windows Operating Systems nd_read_lat is performance oriented for RDMA Read with minimum latency and runs over Micro soft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alternatively customized test duration time nd read lat runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation nd send bw This test is used for performance measuring of Send requests in Micro soft Windows Operating Systems nd send bw is performance ori ented for Send with maximum throughput and runs over Microsoft s NetworkDirect standard The level of customizing for the user is rela tively high User may choose to run with a customized message size customized number of iterations or alternati
72. t least 2 the default is 100000 I lt max inline size gt The maximum size of message to send inline The default number is 128B D test duration in seconds Tests duration in seconds f margin time in seconds The margin time to avoid calculation and it must be less than half of the duration time Q CQ Moderation lt value gt The default number is 100 S lt server interface IP gt server side only must be last parameter C server interface IP client side only must be last parameter h Shows the Help screen A 4 nd read lat This test is used for performance measuring of RDMA Read requests in Microsoft Windows Operating Systems nd read lat is performance oriented for RDMA Read with minimum latency and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alternatively customized test duration time nd read lat runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation Mellanox Technologies 64 Rev 1 10 nd_read_lat SynopsisSynopsis lt running on specific single core gt Server side start b affinity 0X1 nd read lat s1048576 D10 S JL SSA BS a lh Client side start b wait affinity 0X1 nd read lat s1048576 D10 C MiS nd_read_lat Options The table below
73. tV Services LanmanServer V Parameters RequireSecuritySig nature Mellanox Technologies 49 Rev 1 10 3 4 2 Application Specific Optimization and Tuning 3 4 2 1 Ethernet Performance Tuning The user can configure the Ethernet adapter by setting some registry keys The registry keys may affect Ethernet performance gt To improve performance activate the performance tuning tool as follows Step 1 Start the Device Manager open a command line window and enter devmgmt msc Step 2 Open Network Adapters Step3 Right click the relevant Ethernet adapter and select Properties Step4 Select the Advanced tab Step 5 Modify performance parameters properties as desired 3 4 2 1 1 Performance Known Issues On Intel I OAT supported systems it is highly recommended to install and enable the latest I OAT driver download from www intel com e With I OAT enabled sending 256 byte messages or larger will activate I OAT This will cause a significant latency increase due to I OAT algorithms On the other hand throughput will increase significantly when using I OAT 3 4 3 Tunable Performance Parameters The following is a list of key parameters for performance tuning Jumbo Packet The maximum available size of the transfer unit also known as the Maximum Transmission Unit MTU The MTU of a network can have a substantial impact on performance A 4K MTU size improves performance for short messages since it allows the OS to
74. ters gt Mellanox ConnectX 4 Ethernet Adapter gt Properties gt Advanced tab Device Manager File Action View Help s 9 Hw PRS 4 Ey l dev w072 p 1 Computer b a Disk drives p M Display adapters b ia Human Interface Devices b IEEE 1394 host controllers p amp Keyboards b n Mice and other pointing devices b Ki Monitors 4 amp Network adapters S Intel R 1350 Gigabit Network Connection La Intel R 1350 Gigabit Network Connection 2 E Intel R 1350 Gigabit Network Connection 3 EF Intel R 1350 Gigabit Network Connection 4 La Mellanox ConnectX 4 VPI Adapter MT4115 3 Mellanox ConnectX 4 VPI Adapter MT4115 4 Microsoft Kernel Debug Network Adapter Other devices din PCI Memory Controller p Y Ports COM amp LPT p Am Print queues gt BB Processors b Storage controllers 4 5 p pil System devices b Universal Serial Bus controllers Details Events Resources Power Management General Advanced Infomation Performance Diver The following properties are available for this network adapter Click the property you want to change on the left and then select its value on the right Value Rx amp Tx Enabled y Large Send Offload V2 IPv4 Large Send Offload V2 IPv6 Maximum number of RSS Processc Maximum Number of RSS Queues Network Address NetworkDirect Functionality Prefered NUMA Node Priority amp Vian Tag 3 1 2 3
75. the Device Manager Step 1 Open the Device Manager Step 2 Goto the Network adapters Step 3 Go to the properties of Mellanox ConnectX 4 Ethernet Adapter card Step 4 Go to the Advanced tab Step 5 Choose the VLAN ID in the Property window Step 6 Set its value in the Value window En Device Manager 8 x File Action View Help m E B m eres 4 l dev w072 p JE Computer D a Disk drives Details Events Resources Power Management p My Display adapters General Advanced Information Performance Driver gt O Human interface Devices The following properties are available for this network adapter Click p IEEE 1394 host controllers the property you want to change on the left and then select its value p Keyboards p F Mice and other pointing devices Value p NS Monitors 4 amp Network adapters SX Intel R 1350 Gigabit Network Connection Intel R 1350 Gigabit Network Connection 2 F Intel R 1350 Gigabit Network Connection 3 Intel R 1350 Gigabit Network Connection 4 Mellanox ConnectX 4 VPI Adapter MT4115 3 o DoD Mellanox ConnectX 4 VPI Adapter MT4115 4 Microsoft Kernel Debug Network Adapter a j Other devices lg PCI Memory Controller YY Ports COM amp LPT den Print queues BB Processors lt gt Storage controllers pl System devices Universal Serial Bus controllers vvvvvv 3 1 4 Configuring Quality of Service QoS 3 1 4 1 System Requirements
76. ties e 1 Report VMQ capabilities Note This registry value is not exposed via the UI VMQVlanFiltering 1 Specifies whether the device enables or disables the ability to filter network packets by using the VLAN identifier in the media access control MAC header The valid values are 0 disable e 1 enable 3 3 6 Network Direct Interface The Network Direct Interface NDI architecture provides application developers with a net working interface that enables zero copy data transfers between applications kernel bypass I O generation and completion processing and one sided data transfer operations NDI is supported by Microsoft and is the recommended method to write InfiniBand application NDI exposes the advanced capabilities of the Mellanox networking devices and allows applica tions to leverage advances of InfiniBand For further information please refer to http msdn microsoft com en us library cc904397 v vs 85 aspx Mellanox Technologies 48 Rev 1 10 3 4 Performance Tuning and Counters For further information on WinOF 2 performance please refer to the Performance Tuning Guide for Mellanox Network Adapters This section describes how to modify Windows registry parameters in order to improve performance Please note that modifying the registry incorrectly might lead to serious problems including the P loss of data system hang and you may need to reinstall Windows As such it is reco
77. ug Network Adapter NetworkDirect Functionality 4 5 Other devices Prefered NUMA Node jg PCI Memory Controller Priority amp Vian Tag p Y Ports COM amp LPT p deb Print queues gt A Processors b Storage controllers p pli System devices b L Universal Serial Bus controllers Mellanox Technologies 32 Rev 1 10 gt To install the Data Center Bridging using the Server Manager Step 1 Open the Server Manager Step2 Select Add Roles and Features Step3 Click Next Step 4 Select Features on the left panel Step 5 Check the Data Center Bridging checkbox Step 6 Click Install To install the Data Center Bridging using PowerShell Step 1 Enable Data Center Bridging DCB PS Install WindowsFeature Data Center Bridging To configure QoS on the host The procedure below is not saved after you reboot your system Hence we recom P mend you create a script using the steps below and run it on the startup of the local 3 machine Please see the procedure below on how to add the script to the local machine startup scripts PIS Step 1 Change the Windows PowerShell execution policy PS Set ExecutionPolicy AllSigned Step2 Remove the entire previous QoS configuration PS Remove NetQosTrafficClass PS Remove NetQosPolicy Confirm False Step3 Set the DCBX Willing parameter to false as Mellanox drivers do not support this feature PS set NetQosDcbxSetting Willing 0 Step 4
78. vely customized test dura tion time nd send bw runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation nd send lat This test is used for performance measuring of Send requests in Micro soft Windows Operating Systems nd send lat is performance oriented for Send with minimum latency and runs over Microsoft s NetworkDi rect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alternatively customized test duration time nd send lat runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation NTttcp NTttcp is a Windows base testing application that sends and receives TCP data between two or more endpoints It is a Winsock based port of the ttcp tool that measures networking performance bytes second To download the latest version of NTttcp 5 28 please refer to Micro soft website following the link below http gallery technet microsoft com NTttcp Version 528 Now f8b12769 NOTE This tool should be run from cmd only The following InfiniBand performance tests are deprecated and might be removed in future releases PI Mellanox Technologies 55 Rev 1 10 5 Troubleshooting You may be able to easily resolve the issues described in this section If a problem persists and you are unable to resolve it please contact your Mellanox r

Download Pdf Manuals

image

Related Search

Related Contents

取付=取扱説明書  FH-7900 - Shimano  Sandberg Aerial Connection F-F 9.5 mm  Introduction Table of Contents  Network Camera Installation manual  

Copyright © All rights reserved.
Failed to retrieve file