1 / 23

Utilizing NIC’s enhancements

Utilizing NIC’s enhancements. A look at how driver software needs to change when using newer features of our hardware. ‘theory’ versus ‘practice’.

glemley
Download Presentation

Utilizing NIC’s enhancements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Utilizing NIC’s enhancements A look at how driver software needs to change when using newer features of our hardware

  2. ‘theory’ versus ‘practice’ • The engineering designs one encounters in computer hardware components can be observed to undergo an ‘evolution’ during successive iterations, from a scheme that embodies simplicity, purity, and symmetry at the outset, based upon what designers think will be the device’s likely uses, to a conglomeration of disparate ‘add-ons’ as actual practices dictate accommodations

  3. ‘backward compatibility’ • An historically important consideration in the marketing of computer hardware has been the need to maintain past functions in a ‘transparent’ manner – i.e., no change is needed to run older software on newer equipment, while offering enhancements as ‘options’ that can be selectively enabled

  4. Example: Intel’s x86 • The current generation of Intel CPU’s will still execute all of the software written for PCs a quarter-century ago – based on a small set of 16-bit registers, a restricted set of instructions, and a one-megabyte memory-space – but is able, as an option, to use more and larger registers (64-bits), richer instruction-sets, and more memory

  5. Gigabit NICs • Intel’s network controller designs exhibit this same kind of ‘evolution’ over time • The ‘Legacy’ descriptor-formats are just one example of keeping prior-generation functionality: it’s simple, it’s ‘pure’ (i.e., not tied to any specific network-protocols, but emphasizing ‘mechanism’, not ‘policy’) • But now alternatives exist -- as options!

  6. ‘Legacy’ RX-Descriptors The device-driver initializes this ‘base-address’ field with the physical address of a packet-buffer… … and network hardware does not ever modify it Base-address (64-bits) Packet- length Packet- checksum status errors VLAN tag The network controller later will ‘write-back’ values into all these fields when it has finished transferring a received packet’s data into that packet-buffer

  7. RxDesc Status-field 7 6 5 4 3 2 1 0 PIF IPCS TCPCS UDPCS VP IXSM EOP DD DD = Descriptor Done (1=yes, 0=no) shows if nic is finished with descriptor EOP = End Of Packet (1=yes, 0=no) shows if this packet is logically last IXSM = Ignore Checksum Indications (1=yes, 0=no) VP = VLAN Packet match (1=yes, 0=no) USPCS = UDP Checksum calculated in packet (1=yes, 0=no) TCPCS = TCP Checksum calculated in packet (1=yes, 0=no) IPCS = IPv4 Checksum calculated on packet (1=yes, 0=no) PIF = Passed In-exact Filter (1=yes, 0=no) shows if software must check

  8. RxDesc Errors-field 7 6 5 4 3 2 1 0 RXE IPE TCPE reserved (=0) reserved (=0) SEQ SE CE CE = CRC Error or Alignment Error (check statistics registers to differentiate) TCPE = TCP/UDP Checksum Error IPE = IPv4 Checksum Error These bits are relevant only while NIC is operating in ‘SerDes’ mode: SE = Symbol Error SEQ = Sequence Error RXE = Rx Data Error

  9. ‘Extended’ RX-Descriptors CPU writes this, NIC reads it: NIC writes this, CPU reads it: Base-address (64-bits) Packet- checksum IP identification MRQ (multiple receive queues) reserved (=0) VLAN tag Packet- length Extended errors Extended status The device-driver initializes the ‘base-address’ field with the physical address of a packet-buffer, and it initializes the ‘reserved’ field with a zero-value… … the network hardware will later modify both fields The network controller will ‘write-back’ the values for these fields when it has transferred a received packet’s data into the packet-buffer

  10. An alternative option CPU writes this, NIC reads it: NIC writes this, CPU reads it: Base-address (64-bits) RSS Hash (Receive Side Scaling) MRQ (multiple receive queues) reserved (=0) VLAN tag Packet- length Extended errors Extended status ‘Receive Side Scaling’ refers to an optional capability in the network controller to assist with routing of network packets to various CPUs within a modern multiprocessor system (See Section 3.2.13 in Intel’s Software Developer’s Manual)

  11. Extended Rx-Status (20-bits) 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 0 A C K 0 0 0 0 U D P V I P I V 0 P I F I P C S T C P C S U D P C S V P I X S M E O P D D These ‘extra’ status-bits provide additional hardware support to driver software for processing ethernet packets that conform to standard TCP/IP network protocols (with possibilities for future expansion) These eight bits have the same meanings as in a ‘Legacy’ Rx-Status byte DD = Descriptor Done EOP = End Of Packet IXSM = Ignore Checksum Indications VP = VLAN Packet match USPCS = UDP Checksum calculated TCPCS = TCP Checksum calculated IPCS = IPv4 Checksum calculated PIF = Passed In-exact Filter ACK = TCP ACK-Packet identification UDPV = Valid UDP checksum IPIV = Valid IP Identification

  12. Extended Rx-Errors (12 bits) 11 10 9 8 7 6 5 4 3 2 1 0 RXE IPE TCPE 0 0 SEQ SE CE 0 0 0 0 These eight bits have the same meanings, and the occupy the same arrangement, as in the ‘Legacy’ Rx-Errors byte

  13. Main device-driver changes • If we want to utilize the NIC’s ‘Extended’ Receive Descriptor format, we will need several significant changes in our driver source-code and data-types: • Our module’s initialization of ‘base_address’ fields • Our new need for programming register RFCTL • Our ‘typedef’ for the ‘RX_DESCRIPTOR’ structs • Our ‘get_info_rx()’ function for ‘/proc/nicrx’ display • Our interrupt-handler’s treatment of ‘rxring’ entries

  14. Use of C language ‘union’ • Each Receive-Descriptor now has a ‘dual’ identity, as far as the NIC is concerned: • one layout during its ‘fetch’ from memory • another layout during ‘write-back’ to memory • The C language provides a special ‘type’ construction for accommodating this kind of programming situation, it’s known as a union and it requires a special syntax

  15. ‘Bitfields’ in C • Some of the fields in the ‘Extended’ RX Descriptor do not align with the CPU’s natural 8-bit,16-bit and 32-bit data-sizes • The C language provides ‘bitfields’ for a situation like this (not yet ‘standardized’) Extended errors Extended status 12-bits 20-bits

  16. Syntax for Rx-Descriptors typedef struct { unsigned long long base_address; unsigned long long reserved; } RX_DESC_FETCH; typedef struct { unsigned int mrq; unsigned short ip_identification; unsigned short packet_chksum; unsigned int desc_status:20; unsigned int desc_errors:12; unsigned short packet_length; unsigned short vlan_tag; } RX_DESC_STORE; typedef union { RX_DESC_FETCH rxf; RX_DESC_STORE rxs; } RX_DESCRIPTOR;

  17. RFCTL (0x5008) The Receive Filter Control register 31 16 reserved (=0) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 E X T E N IP FRSP _DIS ACKD _DIS ACK DIS IPv6 XSUM _DIS IPv6 _DIS NFS_VER NFSR _DIS NFSW _DIS iSCSI_DWC iSCSI _DIS EXTEN (bit 15) = Extended Status Enable (1=yes, 0=no) This enables the NIC to write-back the ‘Extended Status’

  18. Modifying ‘my_read()’ • To implement use of ‘Extended’ Receive Descriptors in our most recent character-mode device-driver (i.e., ‘zerocopy.c’), we need some changes in the ‘read()’ method • Most obvious example: a packet-buffer’s memory address can no longer be gotten from an Rx-Descriptor’s ‘base_address’ (which now gets ‘overwritten’ by the NIC)

  19. For our pseudo-file’s sake… • Also our driver’s ‘read()’ function shouldn’t prepare a current rx-descriptor for reuse, as it did in earlier drivers, since that would destroy all of the useful information which the NIC has just written into that descriptor • Instead, the preparation of a descriptor for reuse in a future packet-receive operation should be deferred, at least temporarily

  20. OK, but then when? • We can reassign the duty to ‘refresh’ some Rx-Descriptors for reuse to our driver’s Interrupt Service Routine; specifically, at the point in time when an ‘RXDMT0’ event is signaled (Rx-Descriptor Min-Threshold) • It might be best to create a ‘bottom half’ to take care of those re-initializations, but we haven’t yet done that in our new prototype

  21. Handling ‘RXDMT0’ interrupts irqreturn_t my_isr( int irq, void *dev_id ) { int intr_cause = ioread32( io + E1000_ICR ); if ( intr_cause & (1<<4) ) // Rx-Descriptors Low { unsigned int rx_buf = virt_to_phys( rxring ) + 16 * N_RX_DESC; unsigned int rxtail = ioread32( io + E1000_RDT ), i, ba; // prepare the next eight Rx-Descriptors for ‘reuse’ by the NIC for (i = 0; i < 8; i++) { ba = rx_buf + rxtail * RX_BUFSIZ; rxring[ rxtail ].base_address = ba; rxring[ rxtail ].reserved = 0LL; rxtail = (1 + rxtail) % N_RX_DESC; } // now give the NIC ‘ownership’ of these reinitialized descriptors iowrite32( rxtail, io + E1000_RDT ); }

  22. ‘extended.c’ • Here’s our revision of ‘zerocopy.c’, aimed at showing how we can incorporate use of the NIC’s ‘Extended’ Receive Descriptors • It appears to function exactly as before, until a user attempts to view the driver’s Receive-Descriptor queue: $ cat /proc/nicrx • Then we are shown descriptors having two distinct formats (i.e., FETCH and STORE)

  23. Demo: ‘bitfield.c’ • Because the manner in which ‘bitfields’ are handled in the C language varies with the particular C-compiler being used, we have created a short demo-program that shows us how our GNU C-compiler ‘gcc’ handles the layout of bitfields within a C data-item typedef struct { unsigned int desc_status:20; // bits 0..19 unsigned int desc_errors:12; // bits 20..31 } RXD_ELT;

More Related