Taras Soroka

Faculty: Computer Science and Technology

Department of Applied Mathematics and Informatics

Speciality "Software engineering"

Theme of master's work:

Research of hierarchic memory systems efficiency

Scientific adviser: prof. Lev Feldman

Resume

Abstract

   Introduction
   Relevance of the work
   Scientific novelty
   Goals and objectives of the work
   Hierarchy of storage devices
   Cache coherence problem in multiprocessors
   Expected practical results
   References

Introduction

The main characteristics of any computer system are the CPU performance and memory capacity. In the Master's work is analyzed the effectiveness of memory hierarchy and designed the optimum cache memory system to speed up the interaction between the CPU with RAM.

Clock speed of modern processors exceeds 3 GHz. At the same time desktop computers have memory reaches 4 GB. The main problem in the further development of computer systems is a sharp distinction between performances of two components: the CPU and memory. With increasing processor performance, this gap is increased only, so in fact the higher the CPU frequency, the longer it runs idle. At the same time, developers and manufacturers of RAM are guided not by increasing the speed of memory access, and to further increase its volume.

Relevance of the work

This problem was solved using a multilevel cache, which runs at speeds far exceeding the speed of main memory. There is no single answer to the question of what exactly cache architecture is optimal. Since 1960's, was developed a number of schemes of the cache memory and the algorithms of its work, but today, developers do not stop research in this field: almost every new processor family has its own organization of cache memory.

The performance of the cache memory is a relative measure and depends on the solved task. In this regard, it is necessary to simulate the cache system's work for solving various problems. Modeling system that allows to choose the best structure of cache memory, allows to develop effective architecture for solving various scientific problems.

Scientific novelty

Assumed design and implementation of a distributed modeling system, based on a flexible multi-parameter simulation model of multiprocessor, which, unlike most counterparts, will not fix hardly the characteristics of cache memory, but allows to edit them in a wide range.

Goals and objectives of the work

The aim is to develop and implement a fully functional simulator to study the effectiveness of different configurations of multiprocessor's cache memory, based on simulation models of these configurations. To achieve the goal were set tasks:

review and analyze the algorithms and the structure of the cache for a single-processor system;
review and analyze the possible structure of the cache for multiprocessor with cache coherence subsystem;
develop a multiprocessor simulation model that supports the parameterization of the structure of cache memory, algorithms of its work and the coherence protocol;
develop a simulation model of multiprocessor workload, which will make active use of the cache memory;
implement this models and link them into a single modeling system;
provide an interactive calculation of multiprocessor performance parameters;
make a distributed model of a multiprocessor.

Hierarchy of storage devices

When you create a memory system you always have to solve the problem of providing the required capacity and high performance at an affordable price. The most common approach here is to construct a system of computer memory in a hierarchical manner [1]. Hierarchical memory consists of multiple storage devices with different capacity, access speed and cost: registers, cache memory, main memory, hard drive, removable media (see Fig. 1). Here, as we move downward change 3 parameters: increased capacity, increased access time, reduced cost per bit.

Figure 1 – Hierarchy of storage devices

If the memory system is designed well, then as you move down the hierarchy levels will also decrease processor reference frequency to the memory.

Cache memory, also can be divided into several levels: L1, L2, L3, L4 (for supercomputers only). The first level – the fastest and has the smallest capacity, cache is the third level – the biggest and the slowest (but faster than the RAM). Comparison of memory access time is shown in Table 1.

Table 1. Memory access time

Storage device	Estimated access time, ns
CPU registers	<10ns
cache L1	10-15ns
cache L2	15-30ns
cache L3	40-50ns
main memory (RAM)	>100ns

Capacity of processor registers - 100 bytes, of a cache in the first level - up to 128 KB, of a cache (L2) - 128 KB - 1-12 MB, of a cache L3 - up to 24 MB and of the RAM - several gigabytes.

Hierarchical organization of memory leads to decrease in its value at a given level of performance. At each level of the hierarchy data is divided into blocks. Block size can be fixed or variable. Block size at each level often varies and increases from the upper to the lower levels.

The memory access time is very important for computations. The idea of hierarchical storage management is to move the commands and data to be used as soon as possible to the processor [2]. If the processor needs some data, then it starts to look for them, starting with the highest level, if data at top level is not found, then the processor looks for them in the lower level and so on, until the lowest level.

Before contacting to the memory, the processor looks at all levels of cache memory; if required word is in the cache, then reference to the RAM is not required, thus the effectiveness of the cache depends largely on the effectiveness of the memory system.

Cache coherence problem in multiprocessors

Fig. 2 as an example shows the status of dual-processor system with shared memory and caches for each processor before and after the shared variable changing. When the X-value is changed with CPU2, it is written to Cache2 or both to Cache2 ant to RAM. In both cases, the value of shared variable in the Cache1 turns out to be wrong, which leads to inconsistency of the contents of caches.

Cache coherence problem (MP Gif Animator, 148 KB)

Figure 2 – Cache coherence problem

According to [4], the system is coherent if each read operation on an address, executed by any processor returns the value entered in the during the last write operation to this address, regardless of which processor is writing the last.

There are several ways to solve the problem of coherence in hardware [5]:

refusal of caching: fundamentally solve the problem, but it greatly reduces the system performance due to significant differences between processor speed and the speed of memory access;
broadcast: when the value of a variable is changed with the processors used variable recording into the cache of this processor and shared memory, immediately thereafter, all processors sent a request to update this variable in their caches - the easiest and most foolproof method, but creates excessive amount of messages through the communications media, which also significantly reduces overall system performance;
implementation of specialized protocols coherence: is the hardware implementation of a specific algorithm - the most difficult way, requiring pre-verification and analysis at the level of models, but allows to maximize the performance of the parallel system.

By cache coherence protocol understands any hardware mechanism to overcome the problem of coherence. First Protocol of the cache coherence ("write-once") described by Goodman in 1983 [6]. The first systematic work on the analysis and comparison of several cache coherence protocols is [7]. Among more recent works on verification, simulation and analysis of the coherence protocols can be noted [8,9].

Each cache coherence protocol consists of a specification of possible block states in the local caches and the actions that are to be taken by the cache controller as certain bus transactions are observed [7].

Expected practical results

It is expected to get:

dependence of the hit cache memory from its structure and characteristics: number of levels, the block size, cache size, replacement algorithm, mapping, and displacement;
load factor of multiprocessor cores and the whole system depending on the cache coherence protocol and number of processors; repression;
load factor on the system bus depending on the cache coherence protocol and number of processors.

References

Орлов С.А., Цилькер Б.Я. Организация ЭВМ и систем: учебник для вузов. 2-е изд. – CПб.: Питер, 2011. – 668 с.
Таненбаум Э. Архитектура компьютера. 5-е изд. – CПб.: Питер, 2007. – 844 с.
Пом А., Агравал О. Быстродействующие системы памяти: Пер. с англ. – М.: Мир, 1987. – 264 с.
Tomasevic M., Milutinovic V. “The cache coherence problem in shared–memory multiprocessors”, IEEE Computer Society Press, LosAlamos, CA. 1993.
Сорока Т.Е., Фельдман Л.П., Михайлова Т.В. «Обзор протоколов когерентности кэш-памяти» // Iнформацiйнi управляючi системи та комп`ютерний монiторiнг / Матерiали II всеукраїнської конференцiї студентiв, аспiрантiв та молодих вчених – 11-13 квiтня 2011р., Донецьк, ДонНТУ – 2011, с. 168-172
Goodman J. R., “Using cache memory to reduce processor–memory traffic”, Proceeding of the 10th International Symposium on Computer Architecture, 1983.
Archibald J., Baer J., “Cache coherence protocols: evaluation using a multiprocessor simulation model”. ACM Trans. Comput. Syst., 1986.
Emerson E. A., Kahlon V. “Rapid parameterized model checking of snoopy cache coherence protocol”, Proceedings of 9th International Workshop on frontiers of combining systems, 2002.
Рудометов В. В., Семенов В. С., «Анализ когерентности кэш-памятей для повышения эффективности тестирования подсистемы памяти». //Сборник научно-технических трудов «Высокопроизводительные вычислительные системы и микропроцессоры», – М.: ИМВС РАН №4, 2003.
Фельдман Л.П., Дедищев В.О. Математическое обеспечение САПР: Моделирование вычислительных и управляющих систем. – К.: УМК ВО, 1992. – 256 с.
Гуров В. В. Архитектура микропроцессоров. – [Internet resource]. – Access mode: http://www.intuit.ru/department/hardware/microarch.
Крис Касперски. «Подсистема кэш-памяти, как она есть». [Internet resource]. – Access mode: http://www.insidepro.com/kk/008/008r.shtml.

The master's work have not been completed yet. Final completion is on December 1, 2011. Full text of the work can be obtained from the author or his advisor after this date.

Resume

Abstract