DonNTU  Masters' portal
Українська  Русский

Abstract

When the abstract was writing this master's work was not complete yet. Final completion: December 2014. Full text of work and materials on the topic can be obtained from the website after this date.

Content

Introduction

Bandwidth routing equipment is an important part. Demands made by modern routers, forced manufacturers to use special network processors instead of general purpose processors. Network processors perform important functions in the processing of network packets, namely search the routing table, packet fragmentation and ensure quality of service, etc. A generalized network processor architecture shown in figure 1.

Generalized network processor architecture

Figure 1 – Generalized network processor architecture

To accelerate the development of hardware, software engineers often use hardware models they produce. Models are developed using traditional programming languages and hardware description. These models can be carried out to verify the performance and correctness of the design.

Programmers can use software models to develop and test software before as hardware will be available. Although the software models are slower than their hardware counterparts, they allow you to quickly build and test the model without the need to create real equipment. This mechanism provides a rapid development of high-quality equipment.

Development of software models of hardware driven three basic principles: efficiency, flexibility, detail. Performance determines the load under which the model can work using the available resources for the simulation engine. Flexibility determines how well designed a model for further modification. Detailing determined by the level of abstraction used to implement the components of the model. Highly detailed model with greater accuracy to emulate all aspects of machine operations.

In practice optimization of these three characteristics is not possible. Therefore, most models of software optimized only one or two of them. This explains the presence of many simulators today [1].

1. Topicality

Performance network processors depends on the speed of analysis algorithms package, so the search for new more effective models for network processors is an urgent task. On the other hand, the development of hardware prototypes requires substantial material, time and intellectual costs. For this reason, an important role is played by simulation and analytical modeling, capable of early stages of development to identify bottlenecks in the architecture.

2. Goal and tasks of the research

Master's thesis is devoted to the study of methods of construction and analysis of multi-core network processors, as well as finalizing the application for modeling SimpleScalar and conduct experiments on these models.

3. Overview of researches and developments

The study is a popular topic in the international scientific community and the local. This is evidenced by a large number of research and development.

3.1. Overview of international sources

On the international stage can be found not only a variety of scientific articles on the topic of modeling, but also specialized software for modeling.

One such software is the simulator SimpleScalar. SimpleScalar simulator reproduces the operation of computing devices of all program instructions using an interpreter. This toolkit supports popular instruction sets, including Alpha, Power PC, x86 and ARM.

The article [2] describes a framework of network simulation processor systems. The article [3] describes the operation set to measure performance NetBench. NetBench contains a total of 9 applications for network processors.

The article [4] propose another simulator NePSim. This simulator includes a simulator for a typical network processor, a framework for testing and validation. Use it to calculate the power consumption of the simulated processor.

3.2. Overview of local sources

Investigations associated with the modeling of network processors, on the territory of Ukraine is only concerned DONNTU.

The university is engaged in such research Grishchenko V.I., Ladyzhenskii Y.V., Moataz Eunice, and masters of past years: Miller E.V., Morgaylov D.D. and Matvienko M.V.

V.I. Grishchenko, Y. Ladyzhenskii, Moataz Younis in his article [5] describe particular features of network processors, analyzed modern joint venture, their specifications and functionality, as well as offered a generalized structure reflecting future trends in the development of network processors.

Integrated simulation method of network processors, as well as disadvantages of techniques and ways of its development are given in [6].

Features and functions of network processors, analysis of modern SP, their specifications and capabilities are discussed in [7].

In [8] Approaches to modeling network processor batch processing, a model of multi-core network processor, in which the amount of cache Team cores large enough to contain all the code executing applications and compares the performance of network processors with different organization of memory blocks.

In [9] discuss approaches to modeling multi-core network processors specialized batch processing, as well as a model of the network processor with separate memory for code and data.

4. Modeling network processor cache

4.1. Basics of modeling

The typical approach to modeling computer systems using simple approximation model with good performance modeling and the modular structure of the code. This type of simulation is suitable for research because simple model focuses on the basic components of the design without affecting the detail parts that may adversely affect the performance and flexibility of the model.

On the other hand, the industry requires detailed structural models for minimizing risks. Detailed modeling gives confidence that the equipment will not be damaged components and bottlenecks.

Simulation modeling – a method of research in which the system under study is replaced by a model with sufficient accuracy to describe the real system with which the experiments are conducted to obtain information about the system [10].

Existing simulation models allow a detailed view of all the features of the processor and the test can be used for the analysis of any application and any input streams. The disadvantage is the complexity of simulation models of their development and specialization. Typically, these models are designed for only one or several similar structures and their modernization to support other devices is overly burdensome. For these reasons, the simulation model, unlike the analysis, it is difficult to use for analysis of a large number of structures. But they allow a closer look at the work of all components of the processor, less demanding form of a description of the input data stream, and running applications.

4.2. Simulation cache using SimpleScalar

Cache – the buffer memory with fast access, containing information that may be requested with the highest probability. Accessing data in the cache is faster than the original sampling data from the slower memory, or a remote source, but its volume is considerably limited as compared with the raw data repository [11].

When choosing the correct substitution model cache can achieve significant performance increases due to the fact that the dates at the moment the data will already be in the cache. The hierarchy of data requests shown in figure 2.

Animation hierarchy data acquisition request

Figure 2 - Hierarchy request for data (animation: 6 frames, 10 cycles of repetition, 79.1 KB)

Simulation starts with the choice of parameters of the network processor: number of cores, cache sizes, data and commands, the number of supported hardware threads used process (to calculate the area of the crystal), the choice of test application. In the second stage statistics are collected on the work of the test application in the system simulation SimpleScalar. SimpleScalar system is widely used in academic studies for modeling the processors of different structures and with different sets of commands. It allows the flexibility to vary the size of caches parameters ALU block branch prediction latency memory access and other features of the processor. Based on these data and calculated performance area network processor, they are used to determine the effectiveness of the modeled structure. Then adjusted parameters SP and the whole cycle is repeated simulation [12]. Structure modeling network processor shown in figure 3.

Structure modeling network processor

Figure 3 – Structure modeling network processor

SimpleScalar interpreters contains instructions for instruction set ARM, x86, PPC and Alpha. Interpreters are written in a language that provides a comprehensive framework to describe how commands change the contents of registers and memory status. Preprocessor uses these definitions to synthesize machine interpreters, analyzers and generators microcode dependencies that are required for the model SimpleScalar. With a bit of additional add-ins models can support a certain set of instruction sets.

Emulation module I/O provides simulation programs with access to I/O devices. SimpleScalar supports multiple emulation modules I/O emulation of system calls to the simulation system. The system emulates a call by translating it into an equivalent call to the operating system and directing the simulator to process the call on behalf of the emulated program. For example, if the simulated program tries to open a file emulation module I/O translates the request to call open() and returns the file handle or error code in the case of the simulated program.

Other modules provide a more detailed simulation modeling of actual hardware devices. For example, SimpleScalar/ARM version includes emulator I/O devices for Compaq IPaq. This emulator is so detailed, that you can load the operating system ARM Linux. Emulation I/O at the device level is advantageous to analyze the role of the operating system when the application runs. It has been proven effective in server applications where services networking and file system depends primarily on the load at runtime.

Each model simulation kernel detect hardware model organization. Simulator code defines a main loop that performs one iteration for each instruction in the program before the end of the program. For synchronous model must take into account the main loop move runtime measured in ticks for this model. Variable cycle stores Runtime recorded in the number of cycles it took for the program to the current team.

Module cache.c, supplied with the distribution SimpleScalar, implements data cache. Cache module uses a hash table to record cache blocks. If the request address corresponds to an entry in the hash table, then a query returns a cache hit. If the address does not contain the query entry in the hash table, the query returns a cache miss. When this is done the system call handler cache miss, which returns the number of cycles required to service a cache miss. The model defines an oversight, which may be another module or cache DRAM- memory model. Cache module does not return a value obtained on request from the cache, because it has no influence on the cache access latency. For structures in which the value of the cache latency can affect the system can configure the cache unit so that it will return a value for.

In addition to the standard components, SimpleScalar provides a variety of auxiliary modules that implement the useful functions required in many models. Such models include the debugger, program loader, the command line processor, statistics package.

Conclusion

Study performance network processors is an important task, which can significantly reduce the time and labor costs for the development and improvement of network processors.

Further work will focus on the completion of SimpleScalar and conduct experiments on models of multi-core network processors in order to increase the accuracy of modeling.

References

  1. Todd Austin, Eric Larson, Dan Ernst – SimpleScalar: An Infrastructure for Computer System Modeling [Electronic source]. Access mode: https://web.eecs.umich.edu/~taustin/papers/IEEEcomp-simplescalar.pdf.
  2. Patrick Crowley, Jean-Loup Baer – A Modeling Framework for Network Processor Systems [Electronic source]. Access mode: http://arl.wustl.edu/~pcrowley/np1.pdf.
  3. Gokhan Memik, William H. Mangione-Smith, Wendong Hu – NetBench: A Benchmarking Suite for Network Processors [Electronic source]. Access mode: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12.8449&rep=rep1&type=pdf.
  4. Yan Luo, Jun Yang, Laxmi N. Bhuyan, Li Zhao – NePSim: A network processor simulator with a power evaluation framework [Electronic source]. Access mode: http://www.cs.ucr.edu/~bhuyan/papers/micro.pdf.
  5. Грищенко В.И., Ладыженский Ю.В., Моатаз Юнис Основные направления развития современных сетевых процессоров. В сб. Научные труды ДонНТУ. Серия Информатика, кибернетика и вычислительная техника. – 2011. – Вып. 14(188). – с. 123–127.
  6. Грищенко В.И., Ладыженский Ю.В. Моделирование работы приложений на сетевых процессорах. // Моделирование и компьютерная графика : Материалы 2-й международной научно-технической конференции, г. Донецк, 10-12 октября 2007 г. — Донецк, ДонНТУ, Министерство образования и науки Украины, 2007. — с. 167-173.
  7. Грищенко В.И., Ладыженский Ю.В., Моатаз Юнис Перспективные архитектуры и тенденции развития современных сетевых процессоров. // Моделирование и компьютерная графика : Материалы 4-й международной научно-технической конференции, г. Донецк, 5-8 октября 2011 г. — Донецк, ДонНТУ, Министерство образования и науки Украины, 2011. — с. 93-97.
  8. Грищенко В.И., Ладыженский Ю.В., Моатаз Юнис Влияние выделенного кэша команд на производительность сетевогопроцессора. В сб. Научные труды ДонНТУ. Серия Информатика, кибернетика и вычислительная техника. – 2011. – Вып. 13(185). – с. 85-91.
  9. Грищенко В.И., Ладыженский Ю.В. Исследование влияния раздельной памяти на производительность многоядерного сетевого процессора. В сб. Научные труды ДонНТУ. Серия Проблемы моделирования, и автоматизации проектирования (МАП-2011). Випуск: 9 (179) – Донецк: ДонНТУ – 2011. – 356 с.
  10. Имитационное моделирование [Electronic source]. Access mode: http://ru.wikipedia.org/wiki/Имитационное моделирование.
  11. Кэш [Electronic source]. Access mode: https://ru.wikipedia.org/wiki/%D0%9A%D1%8D%D1%88.
  12. Грищенко В.И. Оптимизация методики моделирования кэша сетевых процессоров // Моделирование и компьютерная графика: Материалы 3-й международной научно-технической конференции, г. Донецк, 7-9 октября 2009 г. — Донецк: ДонНТУ, Министерство образования и науки Украины, 2009. — C. 249—254.