Network Processor:

Architecture and Applications

Yan Luo

Yan_Luo@uml.edu

http://faculty.uml.edu/yluo/

12/18/05Yan Luo, CAR of UML1

Overview of Network Processors

Network Processor Architectures

Applications

Case Studies

Wireless Mesh Network

a Content-Aware Switch

Conclusion

12/18/05Yan Luo, CAR of UML2

Packet Processing in the Future Internet

Future InternetASIC
 
More packets 
& 
Complex packet 
processingGeneral-
 Purpose Processors

•High processing power •Support wire speed •Programmable •Scalable

•Optimized for network applications

• …

12/18/05Yan Luo, CAR of UML3

What is Network Processor ?

Programmable processors optimized for network applications and protocol processing

High performance

Programmable & Flexible

Optimized for packet processing

Main players: AMCC, Intel, Hifn, Ezchip, Agere

Semico Research Corp. Oct. 14, 2003

12/18/05Yan Luo, CAR of UML4

Commercial Network Processors

VendorProductLineFeatures
  speed 
AMCCnP7510OC-192/Multi-core, customized ISA,
  10 Gbpsmulti-tasking
    
IntelIXP2850OC-192/Multi-core, h/w multi-threaded,
  10 Gbpscoprocessor, h/w accelerators
    
Hifn5NP4GOC-48/Multi-threaded multiprocessor
  2.5 Gbpscomplex, h/w accelerators
    
EZchipNP-2OC-192/Classification engines, traffic
  10 Gbpsmanagers
    
AgerePayloadPlusOC-192/Multi-threaded, on-chip traffic
  10 Gbpsmanagement
12/18/05Yan Luo, CAR of UML5

Typical Network Processor Architecture

SDRAMSRAM
(e.g. packet buffer)(e.g. routing table)
Network interfaces  
  PE
 Co-processorH/w accelerator
BusNetwork Processor
 
12/18/05Yan Luo, CAR of UML6

Intel IXP2400 Network Processor

127

Snapshots of IXP2xxx Based

Systems

ADI Roadrunner Platform

•IPv4 Forwarding/NAT •Forwarding w/ QoS / DiffServ •ATM RAN

•IP RAN

•IPv6/v4 dual stack forwarding

Radisys ENP2611 PCI Packet Processing Engine

•multiservice switches,

•routers, broadband access devices, •intrusion detection and prevention (IDS/IPS) •Voice over IP (VoIP) gateway

•Virtual Private Network gateway •Content-aware switch

12/18/05Yan Luo, CAR of UML8

Intel IXP425 Network Processor

12/18/05

StarEast: IXP425 Based Multi-radio

Platform

12/18/05Yan Luo, CAR of UML10

Applications of Network Processors

Core router

DSL modem

Edge router

Wireless router

VoIP terminal

VPN gateway

Printer server

12/18/05Yan Luo, CAR of UML11

Case Study 1:

Wireless Mesh Network

12/18/05Yan Luo, CAR of UML12

Software Stack on StarEast

12/18/05Yan Luo, CAR of UML13

Case Study 2: Content-aware Switch

   Internetwww.yahoo.com 
         
          Media Server
IPTCP APP. DATA       
         
       
            Application Server
           
          
 GET /cgi-bin/form HTTP/1.1 Switch 
 Host: www.yahoo.com…  
          HTML Server

Front-end of a Web cluster, only one Virtual IP

Route packets based on Layer 5 information

Examine application data in addition to IP& TCP

Advantages over layer 4 switches

Better load balancing: distributed based on content type

Faster response: exploit cache affinity

Better resource utilization: partition database

12/18/05Yan Luo, CAR of UML14

Mechanisms to Build a Content-aware Switch

TCP gateway

An application level proxy

Setup 1st connection w/ client, parses request server, setup 2nd connection w/ server

Copy overhead

server

TCP splicing

Reduce the copy overhead

Forward packet at network level between the network interface driver and the TCP/IP stack

Two connections are spliced together

Modify fields in IP and TCP headerserver

12/18/05 Yan Luo, CAR

user

kernel

client

user

kernel

client

Anatomy of TCP SplicingBookkeeping of
connection states,
 selection of servers,
 state migration

SEQ # translation

Checksum Recalculation

Etc.

Without TCP SplicingWith TCP Splicing
 
12/18/05Yan Luo, CAR of UML16

Design Options

•Option 0: GP-based (Linux-based) switch

•Option 1: CP setup & and splices connections, DPs process packets sent after splicing

Connection setup & splicing is more complex than data forwarding Packets before splicing need to be passed through DRAM queues

•Option 2: DPs handle connection setup, splicing & forwarding

12/18/05Yan Luo, CAR of UML17

IXP 2400 Block Diagram

 SRAM        
ME ME  Scratch
controller     
     Hash
       
     
          
           
     ME ME
        CSR
        
          
 XScale         
            
           IX bus
           
            
 PCI         
    ME ME 
      interface
         
           
 SDRAM         
ME ME   
     
controller       
         
       
            
             
              

XScale core

Microengines(MEs)

2 clusters of 4 microengines each

Each ME

run up to 8 threads

16KB instruction store

Local memory

Scratchpad memory, SRAM & DRAM controllers

12/18/05Yan Luo, CAR of UML18

Resource Allocation

SRAM (8MB)

Client side CB list

Server side CB list

server selection table

Locks

Client-side control block list

record states for connections between clients and SpliceNP, states after splicing

Server-side control block list

record states for connections between server and SpliceNP

DRAM (256MB)Microengines
  
Packet buffer  
 Rx ME
  
Scratchpad (16KB)Client ME Server ME
     
     
Packet queues      
      
    Tx ME 
    
       
12/18/05Yan Luo, CAR of UML19

Comparison of Functionality

A lite version of TCP due to the limited instruction size of microengines.

Processing a SYN packet

SteFunctionality TCPLinuxSpliceNP
p   Splicer 
      
1Dequeue packet YYY
      
2IP header verification YYY
      
3IP option processing YYN
      
4TCP header verification YYY
      
5Control block lookup YYY
     
6Create new socket and set state toYYNo socket, only
 LISTEN   control block
     
7Initialize TCP and IP header templateYYN
     
8Reset idle time and keep-alive timerYYN
      
9Process TCP option YYOnly MSS
     option
     
10Send ACK packet, change state toYYY
12/18/05SYN_RCVDYan Luo, CAR of UML  20

Experimental Setup

Radisys ENP2611 containing an IXP2400

XScale & ME: 600MHz

8MB SRAM and 128MB DRAM

Three 1Gbps Ethernet ports: 1 for Client port and 2 for Server ports

Server: Apache web server on an Intel 3.0GHz Xeon processor

Client: Httperf on a 2.5GHz Intel P4 processor

Linux-based switch

Loadable kernel module

2.5GHz P4, two 1Gbps Ethernet NICs

12/18/05Yan Luo, CAR of UML21

Latency on a Linux-based TCP Splicer

Latency is reduced by TCP splicing

12/18/05Yan Luo, CAR of UML22

Latency vs Request File Size

LatencyontheSplicer(ms)

20

18Linux-based
16NP-based
 

14

12

10

8

6

4

2

0

1416642561024

Request file size (KB)

Latency reduced significantly

83.3% (0.6ms 0.1ms) @ 1KB

The larger the file size, the higher the reduction

89.5% @ 1MB file

12/18/05Yan Luo, CAR of UML23

Comparison of Packet Processing

Latency

12/18/05Yan Luo, CAR of UML24

Analysis of Latency Reduction

 Linux-based NP-based
   
 Interrupt: NIC raises an interruptpolling
 once a packet comes  
 NIC-to-mem copy No copy: Packets
 Xeon 3.0Ghz Dual processor w/are processed inside
 without two copies
 1Gbps Intel Pro 1000 (88544GC)
  
 NIC, 3 us to copy a 64-byte packet 
 by DMA  
 Linux processing: OS overheadsIXP processing:
 Processing a data packet inOptimized ISA
  
 splicing state: 13.6 us 6.5 us
    
 12/18/05Yan Luo, CAR of UML25

Throughput vs Request File Size

Throughput (Mbps)

800

700 Linux-based

NP-based

600

500

400

300

200

100

0

1416642561024

Request file size (KB)

Throughput is increased significantly

5.7x for small file size @ 1KB, 2.2x for large file @ 1MB

Higher improvement for small files

Latency reduction for control packets > data packets

Control packets take a larger portion for small files

12/18/05Yan Luo, CAR of UML26

Conclusion

Network Processor combines high- performance packet processing and programmability

A large variety of NP applications

Efficient resource utilization is challenging

12/18/05Yan Luo, CAR of UML27

Thank you !

12/18/05Yan Luo, CAR of UML28

Microengine

12/18/05Yan Luo, CAR of UML29
Convert PDF to HTML using PDF2HTML Online