Network Processor:Architecture and Applications

Network Processor:

Architecture and Applications

Yan Luo

Yan_Luo@uml.edu

http://faculty.uml.edu/yluo/

12/18/05

Yan Luo, CAR of UML

Overview of Network Processors

Network Processor Architectures

Applications

Case Studies

Wireless Mesh Network

a Content-Aware Switch

Conclusion

12/18/05

Yan Luo, CAR of UML

Packet Processing in the Future Internet

Future Internet	ASIC
Future Internet
More packets
&
Complex packet
processing	General-
	Purpose Processors

•High processing power •Support wire speed •Programmable •Scalable

•Optimized for network applications

• …

12/18/05

Yan Luo, CAR of UML

What is Network Processor ?

Programmable processors optimized for network applications and protocol processing

High performance

Programmable & Flexible

Optimized for packet processing

Main players: AMCC, Intel, Hifn, Ezchip, Agere

Semico Research Corp. Oct. 14, 2003

12/18/05

Yan Luo, CAR of UML

Commercial Network Processors

Vendor	Product	Line	Features
		speed
AMCC	nP7510	OC-192/	Multi-core, customized ISA,
		10 Gbps	multi-tasking

Intel	IXP2850	OC-192/	Multi-core, h/w multi-threaded,
		10 Gbps	coprocessor, h/w accelerators

Hifn	5NP4G	OC-48/	Multi-threaded multiprocessor
		2.5 Gbps	complex, h/w accelerators

EZchip	NP-2	OC-192/	Classification engines, traffic
		10 Gbps	managers

Agere	PayloadPlus	OC-192/	Multi-threaded, on-chip traffic
		10 Gbps	management

12/18/05

Yan Luo, CAR of UML

Snapshots of IXP2xxx Based

Systems

ADI Roadrunner Platform

•IPv4 Forwarding/NAT •Forwarding w/ QoS / DiffServ •ATM RAN

•IP RAN

•IPv6/v4 dual stack forwarding

Radisys ENP2611 PCI Packet Processing Engine

•multiservice switches,

•routers, broadband access devices, •intrusion detection and prevention (IDS/IPS) •Voice over IP (VoIP) gateway

•Virtual Private Network gateway •Content-aware switch

12/18/05

Yan Luo, CAR of UML

Applications of Network Processors

Core router

DSL modem

Edge router

Wireless router

VoIP terminal

VPN gateway

Printer server

12/18/05

Yan Luo, CAR of UML

Case Study 2: Content-aware Switch

Internet

www.yahoo.com

Media Server

TCP

APP. DATA

Application Server

GET /cgi-bin/form HTTP/1.1

Switch

Host: www.yahoo.com…

HTML Server

Front-end of a Web cluster, only one Virtual IP

Route packets based on Layer 5 information

Examine application data in addition to IP& TCP

Advantages over layer 4 switches

Better load balancing: distributed based on content type

Faster response: exploit cache affinity

Better resource utilization: partition database

12/18/05

Yan Luo, CAR of UML

Mechanisms to Build a Content-aware Switch

TCP gateway

An application level proxy

Setup 1st connection w/ client, parses request server, setup 2nd connection w/ server

Copy overhead

server

TCP splicing

 Reduce the copy overhead

 Forward packet at network level between the network interface driver and the TCP/IP stack

 Two connections are spliced together

 Modify fields in IP and TCP headerserver

12/18/05 Yan Luo, CAR

user

kernel

client

user

kernel

client

Anatomy of TCP Splicing	Bookkeeping of
Anatomy of TCP Splicing	connection states,
	selection of servers,
	state migration

SEQ # translation

Checksum Recalculation

Etc.

Without TCP Splicing	With TCP Splicing

12/18/05

Yan Luo, CAR of UML

Design Options

•Option 0: GP-based (Linux-based) switch

•Option 1: CP setup & and splices connections, DPs process packets sent after splicing

Connection setup & splicing is more complex than data forwarding Packets before splicing need to be passed through DRAM queues

•Option 2: DPs handle connection setup, splicing & forwarding

12/18/05

Yan Luo, CAR of UML

IXP 2400 Block Diagram

SRAM

Scratch

controller

Hash

CSR

XScale

IX bus

PCI

interface

SDRAM

controller

XScale core

Microengines(MEs)

2 clusters of 4 microengines each

Each ME

run up to 8 threads

16KB instruction store

Local memory

Scratchpad memory, SRAM & DRAM controllers

12/18/05

Yan Luo, CAR of UML

Resource Allocation

SRAM (8MB)

•Client side CB list

•Server side CB list

•server selection table

•Locks

Client-side control block list

record states for connections between clients and SpliceNP, states after splicing

Server-side control block list

record states for connections between server and SpliceNP

DRAM (256MB)	Microengines

Packet buffer
		Rx ME

Scratchpad (16KB)	Client ME		Server ME


Packet queues

		Tx ME
		Tx ME

12/18/05

Yan Luo, CAR of UML

Comparison of Functionality

• A lite version of TCP due to the limited instruction size of microengines.

Processing a SYN packet

Ste	Functionality		TCP	Linux	SpliceNP
p				Splicer

1	Dequeue packet		Y	Y	Y

2	IP header verification		Y	Y	Y

3	IP option processing		Y	Y	N

4	TCP header verification		Y	Y	Y

5	Control block lookup		Y	Y	Y

6	Create new socket and set state to		Y	Y	No socket, only
	LISTEN				control block

7	Initialize TCP and IP header template		Y	Y	N

8	Reset idle time and keep-alive timer		Y	Y	N

9	Process TCP option		Y	Y	Only MSS
					option

10	Send ACK packet, change state to		Y	Y	Y
12/18/05	SYN_RCVD	Yan Luo, CAR of UML			20

Experimental Setup

Radisys ENP2611 containing an IXP2400

XScale & ME: 600MHz

8MB SRAM and 128MB DRAM

Three 1Gbps Ethernet ports: 1 for Client port and 2 for Server ports

Server: Apache web server on an Intel 3.0GHz Xeon processor

Client: Httperf on a 2.5GHz Intel P4 processor

Linux-based switch

Loadable kernel module

2.5GHz P4, two 1Gbps Ethernet NICs

12/18/05

Yan Luo, CAR of UML

Latency vs Request File Size

LatencyontheSplicer(ms)

18	Linux-based
16	NP-based
16

256

1024

Request file size (KB)

Latency reduced significantly

83.3% (0.6ms  0.1ms) @ 1KB

The larger the file size, the higher the reduction

89.5% @ 1MB file

12/18/05

Yan Luo, CAR of UML

Analysis of Latency Reduction

Linux-based		NP-based

Interrupt: NIC raises an interrupt		polling
once a packet comes
NIC-to-mem copy		No copy: Packets
Xeon 3.0Ghz Dual processor w/		are processed inside
Xeon 3.0Ghz Dual processor w/		without two copies
1Gbps Intel Pro 1000 (88544GC)		without two copies
1Gbps Intel Pro 1000 (88544GC)
NIC, 3 us to copy a 64-byte packet
by DMA
Linux processing: OS overheads		IXP processing:
Processing a data packet in		Optimized ISA
Processing a data packet in
splicing state: 13.6 us		6.5 us

12/18/05	Yan Luo, CAR of UML	25