1 / 45

Network Device Drivers

Network Device Drivers. Dr A Sahu Dept of Comp Sc & Engg . IIT Guwahati. Outline. PCI Devices NIC Cards Specific to 82573 ( I ntel NIC) How transmit works How receive work Network Device D river. PCI Configuration Header. 16 doublewords.

rozene
Download Presentation

Network Device Drivers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Device Drivers Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati

  2. Outline • PCI Devices • NIC Cards • Specific to 82573 (Intel NIC) • How transmit works • How receive work • Network Device Driver

  3. PCI Configuration Header 16 doublewords 31 0 31 0 Dwords Status Register Command Register Device ID Vendor ID 1 - 0 BIST Header Type Latency Timer Cache Line Size Class Code Class/SubClass/ProgIF Revision ID 3 - 2 Base Address 1 Base Address 0 5 - 4 Base Address 3 Base Address 2 7 - 6 Base Address 5 Base Address 4 9 - 8 Subsystem Device ID Subsystem Vendor ID CardBus CIS Pointer 11 - 10 reserved capabilities pointer Expansion ROM Base Address 13 - 12 Maximum Latency Minimum Grant Interrupt Pin Interrupt Line reserved 15 - 14

  4. Three IA-32 address-spaces accessed using a large variety of processor instructions (mov, add, or, shr, push, etc.) and virtual-to-physical address-translation memory space (4GB) accessed only by using the processor’s special ‘in’ and ‘out’ instructions (without any translation of port-addresses) PCI configuration space (16MB) i/o space (64KB) i/o-ports 0x0CF8-0x0CFF dedicated to accessing PCI Configuration Space

  5. Interface to PCI Configuration Space PCI Configuration Space Address Port (32-bits) 31 23 16 15 11 10 8 7 2 0 E N reserved bus (8-bits) device (5-bits) function (3-bits) doubleword (6-bits) 00 CONFADD ( 0x0CF8) Enable Configuration Space Mapping (1=yes, 0=no) PCI Configuration Space Data Port (32-bits) 31 0 CONFDAT ( 0x0CFC)

  6. Reading PCI Configuration Data • Step one: Output the desired longword’s address (bus, device, function, and dword) with bit 31 set to 1 (to enable access) to the Configuration-Space Address-Port • Step two: Read the designated data from the Configuration-Space Data-Port • Already discussed PCI-probes pciprobes.c • Lect 29..Showing vram, pciprobe.cpp

  7. How ‘transmit’ works Buffer0 List of Buffer-Descriptors descriptor0 descriptor1 Buffer1 descriptor2 descriptor3 0 0 0 Buffer2 0 We setup each data-packets that we want to be transmitted in a ‘Buffer’ area in ram We also create a list of buffer-descriptors and inform the NIC of its location and size Then, when ready, we tell the NIC to ‘Go!’ (i.e., start transmitting), but let us know when these transmissions are ‘Done’ Buffer3 Random Access Memory

  8. Registers’ Names • Memory-information registers • TDBA(L/H) = Transmit-Descriptor Base-Address Low/High (64-bits) • TDLEN = Transmit-Descriptor array Length • TDH = Transmit-Descriptor Head • TDT = Transmit-Descriptor Tail • Transmit-engine control registers • TXDCTL = Transmit-Descriptor Control Register • TCTL = Transmit Control Register • Notification timing registers • TIDV = Transmit Interrupt Delay Value • TADV = Transmit-interrupt Absolute Delay Value

  9. Tx-Desc Ring-Buffer TDBA base-address 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 TDH (head) TDLEN (in bytes) TDT (tail) = owned by hardware (nic) = owned by software (cpu) Circular buffer (128-bytes minimum)

  10. Tx-Descriptor Control (0x3828) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 0 0 0 0 0 0 G R A N 0 0 WTHRESH (Writeback Threshold) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 HTHRESH (Host Threshold) 0 FRC DPLX FRC SPD 0 0 0 0 I L O S 0 A S D E PTHRESH (Prefetch Threshold) 0 L R S T 0 0 0 0 “This register controls the fetching and write back of transmit descriptors. The three threshhold values are used to determine when descriptors are read from, and written to, host memory. Their values can be in units of cache lines or of descriptors (each descriptor is 16 bytes), based on the value of the GRAN bit (0=cache lines, 1=descriptors). When GRAN = 1, all descriptors are written back (even if not requested).” --Intel manual Recommended for 82573: 0x01010000 (GRAN=1, WTHRESH=1)

  11. Transmit Control (0x0400) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 R =0 R =0 R =0 MULR TXCSCMT UNO RTX RTLC R =0 SW XOFF COLD (upper 6-bits) (COLLISION DISTANCE) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 COLD (lower 4-bits) (COLLISION DISTANCE) CT (COLLISION THRESHOLD) 0 ASDV I L O S SPEED S L U TBI mode P S P 0 0 R =0 E N R =0 EN = Transmit Enable SWXOFF = Software XOFF Transmission PSP = Pad Short Packets RLTC = Retransmit on Late Collision CT = Collision Threshold (=0xF) UNORTX = Underrun No Re-Transmit COLD = Collision Distance (=0x3F) TXCSCMT = TxDescriptor Minimum Threshold MULR = Multiple Request Support 82573L

  12. Tx Configuration Word (0x0178) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 ANE Tx Config Reserved (=0) ITCE R =0 IAME R =0 DF PAR EN PB PAR EN Tx LS Tx LS Flow =0 R =0 Phy Pwr Down En DMA Dyn GE R =0 RO DIS 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 SPD BYPS TxConfigWord R =0 EE RST ASD CHK R =0 R =0 R =0 R =0 R =0 R =0 R =0 R =0 0 0 ANE = Auto-Negotiation Enable TxConfig = Transmit Configuration Control bit TxConfigWord = Transmit Configuration Word This register has two meanings, depending on the state of the ANE bit (i.e., setting ANE=1 enables the hardware auto-negotiation machine). Applicable only in SerDes mode; program as 0 for internal-PHY mode. 82573L

  13. TxDesc Command-field 7 6 5 4 3 2 1 0 IDE VLE DEXT reserved =0 RS IC IFCS EOP EOP = End Of Packet (1=yes, 0=no) IFCS = Insert Frame CheckSum (1=yes, 0=no) – provided EOP is set IC = Insert CheckSum (1=yes, 0=no) as indicated by CSO/CSS fields RS = Report Status (1=yes, 0=no) DEXT = Descriptor Extension (1=yes, 0=no) use ‘0’ for Legacy-Mode VLE = VLAN-Packet Enable (1=yes, 0=no) – provided EOP is set IDE = Interrupt-Delay Enable (1=yes, 0=no)

  14. TxDesc Status field 3 2 1 0 reserved =0 LC EC DD DD = Descriptor Done this bit is written back after the NIC processes the descriptor provided the descriptor’s RS-bit was set (i.e., Report Status) EC = Excess Collisions indicates that the packet has experienced more than the maximum number of excessive collisions (as defined by the TCTL.CT field) and therefore was not transmitted. (This bit is meaningful only in HALF-DUPLEX mode.) LC = Late Collision indicates that Late Collision has occurred while operating in HALF-DUPLEX mode. Note that the collision window size is dependent on the SPEED: 64-bytes for 10/100-MBps, or 512-bytes for 1000-Mbps.

  15. Device Status (0x0008) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 ? 0 0 0 0 0 0 0 0 0 0 0 GIO Master EN 0 0 0 some undocumented functionality? 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 0 0 PHY reset ASDV I L O S SPEED S L U 0 TX OFF Function ID 0 0 L U F D FD = Full-Duplex LU = Link Up TXOFF = Transmission Paused SPEED (00=10Mbps,01=100Mbps, 10=1000Mbps, 11=reserved) ASDV = Auto-negotiation Speed Detection Value 82573L

  16. Device Control (0x0000) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 PHY RST VME R =0 TFCE RFCE RST R =0 R =0 R =0 R =0 R =0 ADV D3 WUC R =0 D/UD status R =0 R =0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 R =0 R =0 R =0 FRC DPLX FRC SPD R =0 SPEED R =0 S L U R =0 R =0 R =1 0 0 GIO M D R =0 F D FD = Full-Duplex SPEED (00=10Mbps, 01=100Mbps, 10=1000Mbps, 11=reserved) GIOMD = GIO Master Disable ADVD3WUP = Advertise Cold Wake Up Capability SLU = Set Link Up D/UD = Dock/Undock status RFCE = Rx Flow-Control Enable FRCSPD = Force Speed RST = Device Reset TFCE = Tx Flow-Control Enable FRCDPLX = Force Duplex PHYRST = Phy Reset VME = VLAN Mode Enable 82573L

  17. Ethernet packet layout • Total size normally can vary from 64 bytes up to 1536 bytes (unless ‘jumbo’ packets and/or ‘undersized’ packets are enabled) • The NIC expects a 14-byte packet ‘header’ and it appends a 4-byte CRC check-sum 0 6 12 14 the packet’s data ‘payload’ goes here (usually varies from 56 to 1500 bytes) destination MAC address (6-bytes) source MAC address (6-bytes) Type/length (2-bytes) Cyclic Redundancy Checksum (4-bytes)

  18. How ‘receive’ works Buffer0 List of Buffer-Descriptors descriptor0 descriptor1 Buffer1 descriptor2 descriptor3 0 0 0 Buffer2 0 We setup memory-buffers where we want received packets to be placed by the NIC We also create a list of buffer-descriptors and inform the NIC of its location and size Then, when ready, we tell the NIC to ‘Go!’ (i.e., start receiving), but to let us know when these receptions have occurred Buffer3 Random Access Memory

  19. Receive Control (0x0100) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 R =0 0 FLXBUF 0 SE CRC BSEX R =0 PMCF DPF R =0 CFI CFI EN VFE BSIZE 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 B A M R =0 MO DTYP RDMTS I L O S LBM S L U LPE MPE UPE 0 0 SBP E N R =0 EN = Receive Enable DTYP = Descriptor Type DPF = Discard Pause Frames SBP = Store Bad Packets MO = Multicast Offset PMCF = Pass MAC Control Frames UPE = Unicast Promiscuous En BAM = Broadcast Accept Mode BSEX = Buffer Size Extension MPE = Multicast Promiscuous En BSIZE = Receive Buffer Size SECRC = Strip Ethernet CRC LPE = Long Packet reception EnaVFE = VLAN Filter Enable FLXBUF = Flexible Buffer size LBM = Loopback Mode CFIEN = Canonical Form Indicator Enable RDMTS = Rx-Descriptor Minimum Threshold Size CFI = Cannonical Form Indic

  20. Registers’ Names • Memory-information registers • RDBA(L/H) = Receive-Descriptor Base-Address Low/High (64-bits) • RDLEN = Receive-Descriptor array Length • RDH = Receive-Descriptor Head • RDT = Receive-Descriptor Tail • Receive-engine control registers • RXDCTL = Receive-Descriptor Control Register • RCTL = Receive Control Register • Notification timing registers • RDTR = Receive-interrupt packet Delay Timer • RADV = Receive-interrupt Absolute Delay Value

  21. Rx-Desc Ring-Buffer RDBA base-address 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 RDH (head) RDLEN (in bytes) RDT (tail) = owned by hardware (nic) = owned by software (cpu) Circular buffer (128-bytes minimum)

  22. Rx-Descriptor Control (0x2828) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 R =0 R =0 R =0 R =0 R =0 R =0 R =0 G R A N R =0 R =0 WTHRESH (Writeback Threshold) 1 --------- 0 ADV D3 WUC SDP1 DATA --------- 0 SDP0 DATA --------- D/UD status 0 GRAN (Granularity): 1=descriptor-size, 0=cacheline-size 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 R =0 R =0 HTHRESH (Host Threshold) 0 FRC DPLX FRC SPD 0 R =0 R =0 A S D E PTHRESH (Prefetch Threshold) 0 L R S T 0 0 0 0 Prefetch Threshold – A prefetch operation is considered when the number of valid, but unprocessed, receive descriptors that the ethernet controller has in its on-chip buffer drops below this threshold. Host Threshold - A prefetch occurs if at least this many valid descriptors are available in host memory WritebackThreshold - This field controls the writing back to host memory of already processed receive descriptors in the ethernet controller’s on-chip buffer which are ready to be written back to host memory

  23. RxDesc Status-field 7 6 5 4 3 2 1 0 PIF IPCS TCPCS UDPCS VP IXSM EOP DD DD = Descriptor Done (1=yes, 0=no) shows if nic is finished with descriptor EOP = End Of Packet (1=yes, 0=no) shows if this packet is logically last IXSM = Ignore Checksum Indications (1=yes, 0=no) VP = VLAN Packet match (1=yes, 0=no) USPCS = UDP Checksum calculated in packet (1=yes, 0=no) TCPCS = TCP Checksum calculated in packet (1=yes, 0=no) IPCS = IPv4 Checksum calculated on packet (1=yes, 0=no) PIF = Passed In-exact Filter (1=yes, 0=no) shows if software must check

  24. RxDesc Error-field 7 6 5 4 3 2 1 0 RXE IPE TCPE reserved =0 reserved =0 SEQ SE CE RXE = Received-data Error (1=yes, 0=no) IPE = IPv4-checksum error TCPE = TCP/UDP checksum error (1=yes, 0=no) SEQ = Sequence error (1=yes, 0=no) SE = Symbol Error (1=yes, 0=no) CE = CRC Error or alignment error (1=yes, 0=no)

  25. Statistics registers • The 82573L has several dozen statistical counters which automatically operate to keep track of significant events affecting the ethernet controller’s performance • Most are 32-bit ‘read-only’ registers, and they are automatically cleared when read • Your module’s initialization routine could read them all (to start counting from zero)

  26. Initializing the nic’s counters • The statistical counters all have address- offsets in the range 0x04000 – 0x04FFF • You can use a very simple program-loop to ‘clear’ each of these read-only registers // Here ‘io’ is the virtual base-address // of the nic’s i/o-memory region { int r; // clear all of the Pro/1000 controller’s statistical counters for (r = 0x4000; r < 0x4FFF; r += 4) ioread32( io + r ); }

  27. A few ‘counter’ examples 0x4000 CRCERRS CRC Errors Count 0x400C RXERRC Receive Error Count 0x4014 SCC Single Collision Count 0x4018 ECOL Excessive Collision Count 0x4074 GPRC Good Packets Received 0x4078 BPRC Broadcast Packets Received 0x407C MPRC Multicast Packets Received 0x40D0 TPR Total Packets Received 0x40D4 TPT Total Packets Transmitted 0x40F0 MPTC Multicast Packets Transmitted 0x40F4 BPTC Broadcast Packets Transmitted

  28. A ‘nic.c’ character driver? my_isr() my_fops ioctl my_ioctl() open my_open() read my_read() write my_write() release my_release() module_init() module_exit()

  29. Network drivers • Network interface driver similar to mounted block devices • A Special block devices registers its disk and methods with kernel and Transmit and Receive block on request • Socket Read/Write system call • Network driver receive Asyn packet from Outside world • Ask to push incoming packet towards kernel

  30. Network drivers • Many administrative works • Setting up address, modify transmission param, maintaining traffic, error statistics • Network subsystem completely protocol independent • Software (IP) and Hardware (Eth, Ring)

  31. Snull: the network interface driver • Linux loop back driver • At drivers/net/loopback.c • It simulates conversations with real remote hosts in order to demonstrate the task of writing network drivers

  32. Assigning IP number • Suppose two interfaces in system sn0, sn1 interfaces • Loopback: it really don’t send/simulate • But to send actually: toggle LSBit of third octet of both src & dest address • It changes both the network number and host number of class C IP number • The net effect is packet sent to network interface sn0 appears on sn1 • Snullnet0: network connected to sn0 interface, Snullnet1:network connected to sn1 interface • Must have 24 bit masks • local0, local1: IP address assigned: must differs in the Lsbit of their in 3rd and 4th octet

  33. The physical transport of Packets • Snull interface correspond to Ethernet class • It emulates Ethernet • Kernel offers some generalized support of Ethernet devices • Ethernet is strong: plip (interface used for printer): declares itself as Ethernet device • Watch packets: tcpump • Snull works only wit IP Packets • Modify src,dst,chksu in the IP headers: without checking wheather it actually conve IP infos

  34. Connecting to kernel: Device Registration • Loopback.c, plip.c, e100.c are examples of network drivers : /drivers/net/ • Device registration: • Alloc net devices (Request for resources and offer facilities) • Structnet_devices *snull_dev[2] ; //linux/netdevice.h • snull_dev[0]=alloc_netdev(sizeof(structsnull_priv), “sn%d”,snull_init); • Alloac_etherdev(intsizeof_priv); /wrapper to alloc_netdev • After initialization complete register the devices • register_netdev(snull_dev[i]); // return 1 if fails

  35. Connecting to kernel: Device initialization • Snull uses alloc_netdev, it have a separate initialization function • Ether_setup(dev);//it assign some field • dev->open=snull_open; • dev->close=snull_release; • set_config, hard_start_txmit, do_ioctl, get_stats, rebuild_header, tx_timeout, watchdog_timeo, • flag|=IFFNOARP; • Features|=NETIF_F_NO_CSUM • hard_header_cache=NULL//disable caching • Private data pointers: priv with al netdevices

  36. Private data • Strcutsnull_priv *priv=nedev_priv(dev); Strcusnull_priv { structnet_devices_stats stats; int status; strcutsnull_packet *ppool; structsnul_packet *rx_queue; intrx_enabled, tc_packele; u8 *tx_packetdata; structsk_bff *skb; spinlock_t lock; }; • Initialization priv=netdriv_priv(dev); memset(priv,0,sizeof(strcutnnull_priv)); spin_lock_init(&priv->lock); snull_rx_inits(dev,1); //enable revice interrupts

  37. Connecting to kernel: Module unloading • Cleanup (snull_dev[i]){ unregister_netdev(snull_dev[i]); snull_teardown_pool(snull_dev[i]); free_netdev(snull_dev[i]); } • Tearown_pool: flush packet pool and bufffer of private data

  38. Net_deviceStrcutures • Global Information • name: name of device • State: state of device • net_device *next; // ptr to next dev in global list • init_funtion: An init fun called by reg_netdev(); • Hardware Information • Interface Information • Device methods

  39. Net_deviceStrcutures: Hardware info • Low level hardware information • Base_address: io_base address of network interface • Char irq: dev->irq, the assigned interrupt number..ifconfig • Char if_port: the port is in use on multiport device..10base • Char dma; // dmaallcoated by the device for ISA bus • Device memory information: address of shared memory used by the devices • Rmem (rxmem) , mem (tx_mem) • rmem_start, rmem_end, mem-start, mem_end;

  40. Net_device: Interface information • Init setup most of the information But device specific setup information need to setup later on • Non ethernet interface can use helper functions • fc_setup, ltalk_setup, fddi_setup • Fiber channel, local talk, fiber dis data ineterface, token ring, hihhperfparllel interface (hppi_setup) • Non default interface filed • Hard_headerlen,MTU (max tx unit=1500 oct ), tx_queue_len (ether=1000, pipl=10), short type, char adresslen; char dev_addeess[Max_add_len], breadcast[max_ad_len] • Flags bt sets: Mask bits, loopback, debug, noarp, multicast • Special hardware capability the device has: DMA

  41. Net_device: Device methods • Fundamental method • Open, Stop, Hard_start_xmit • Hard_header, Rebuild_header • Tx_timeout, Net_device_stats, Set_config • Optional methods • Poll, poll_controller, do_ioctl, set_multicastlist • Set_mac_address,change_mtu, header_cache, header_cache_update, hard_header_parse • Utilities fileds (not methods) • Trans_start, last_rx, watchdog_timeo, *priv, mc_list, mc_count, xmit_lock, xmit_lock_owner

  42. Open() & Close () intsnull_open(structnet_device *dev) { /* request_region( ), request_irq( ), Assign the hardware address of the board: use "\0SNULx", where * x is 0 or 1. The first byte is '\0' to avoid being a multicast * address (the first byte of multicast addrs is odd). */ memcpy(dev->dev_addr, "\0SNUL0", ETH_ALEN); if (dev = = snull_devs[1]) dev->dev_addr[ETH_ALEN-1]++; /* \0SNUL1 */ netif_start_queue(dev); return 0; } intsnull_release(structnet_device *dev) { /* release ports, irq and such -- like fops->close */ netif_stop_queue(dev); /* can't transmit any more */ return 0; }

  43. Tx() intsnull_tx(structsk_buff *skb, structnet_device *dev){ intlen; char *data, shortpkt[ETH_ZLEN]; structsnull_priv *priv = netdev_priv(dev); data = skb->data; len= skb->len; if (len < ETH_ZLEN) { memset(shortpkt, 0, ETH_ZLEN); memcpy(shortpkt, skb->data, skb->len); len = ETH_ZLEN; data = shortpkt; } dev->trans_start = jiffies; /* save the timestamp */ /* Remember the skb, so we can free it at interrupt time */ priv->skb = skb; /* actual deliver of data is device-specific, and not shown here */ snull_hw_tx(data, len, dev); return 0; }

  44. Rx() void snull_rx(structnet_device *dev, structsnull_packet *pkt) { structsk_buff *skb; structsnull_priv *priv = netdev_priv(dev); /* * The packet has been retrieved from the transmission * medium. Build an skb around it, so upper layers can handle it */ skb = dev_alloc_skb(pkt->datalen + 2); if (!skb) { if (printk_ratelimit( )) printk(KERN_NOTICE "snullrx: low on mem - packet dropped\n"); priv->stats.rx_dropped++; gotoout; } memcpy(skb_put(skb, pkt->datalen), pkt->data, pkt->datalen); /* Write metadata, and then pass to the receive level */ skb->dev = dev; skb- >protocol = eth_type_trans(skb, dev); skb->ip_summed = CHECKSUM_UNNECESSARY; /* don't check it */ priv->stats.rx_packets++; priv->stats.rx_bytes += pkt->datalen; netif_rx(skb); out: return; }

  45. ThanksRef: Chap 17, LDD 3e Rubini- Corbet

More Related