Maksim Filenko - Development of a distributed storage and information protection method

Abstract of the dissertation

§ Introduction

The relevance of the topic. The rapid development of telecommunication systems, local and global computer networks has led to a dramatic increase in channel capacity data transmission. As a result, the speed of information exchange between nodes in a computer network often exceeds the speed of working with HD. By passing a general equalization of velocities should be advantages of a distributed network — reliability, scalability, flexibility and greater security of information by the possibility of geographical dispersion. Thus, the issue of work aimed at developing a method for distributed storage of information is relevant.

Related academic programs, plans, topics. Master's work is carried out during 2008-2009 in connection with the "Digital City" project. Donetsk national technical university take a leading position in the program.

The purpose and objectives of the research. The goal of this work is to create a technology of storing information, providing an adequate level of security and optimization of disk resources.
The idea of work is in the perception of a computer network as a single disk space. Information is divided into blocks, which are distributed to nodes for further storing.
The main objectives of development and research. To achieve this goal it is necessary to:

Analyze network protocols of peer-to-peer computer networks, methods of cryptographic protection of information.
Develop algorithm for partitioning data into blocks of equal size.
Create URI scheme to read/write data.
Inspect schemes of storing information in order to determine their effective application.

The subject of development and research: decentralized computer networks.
The object of research: information-sharing protocol in a decentralized network.
Methodology and research methods. The theory of computer networks, graph theory, probability theory, the formal apparatus of cryptography, engineering analysis, the theory of modeling are used in the research.

Scientific novelty is determined by the following states:

proposition to use the idea peer-to-peer networks as information storage;
proposition to use cryptographic techniques to protect the transmitted information.

The practical significance of the results contained in the development of cross-platform software which realize the idea of developed distributed information storage.

Review of related researches and developments. Masters' works of Babenko and Dosta can be considerd as related local researches. They used to study LAN-based computing clusters. On the national level GRID infrastructure is being created. Global researches has moved quite far and made even paradigm — cloud computing which provide its resources fully transparent to end-users, hiding its internal structure and making feeling of single system. Leading researches on this topic is carried out on the Department of Computer Science at the University of California.

§ Main work's content

In the introduction justified the relevance of the topic of master's work, formulated the goal, motivation and objectives of the research, the idea of the work and its scientific novelty.

The first section contains analysis of modern methods of storing information, storage optimization methods, ways of transmission of information in peer-to-peer networks, the main task of research.
Now available a sufficient quantity of a single data repository, it is different, corresponding to the various requirements for the storage system.
In computer storage, logical volume management or LVM is a method of allocating space on mass storage devices that is more flexible than conventional partitioning schemes. In particular, a volume manager can concatenate, stripe together or otherwise combine partitions into larger virtual ones that can be resized or moved, possibly while it is being used.
Volume management is one of many forms of storage virtualization, the one implemented as a layer in the OS disk driver stack.
Most volume manager implementations share the same basic design and start with physical volumes (PVs), which can be hard disk partitions, RAID devices or a SAN. PVs are split into chunks called physical extents (PEs). Some volume managers (such as that in HP-UX and Linux) will have PEs of a uniform size; others (such as that in Veritas) will have variably-sized PEs that can be split and merged at will.
Normally, PEs are simply mapped one-to-one to logical extents (LEs). With mirroring, multiple PEs are mapped onto each LE. These PEs are drawn from a physical volume group (PVG), a set of same-sized PVs which act similarly to hard disks in a RAID1 array. PVGs are usually laid out so that they reside on different disks and/or data buses for maximum redundancy.
The LEs are pooled into a volume group (VG). The pooled LEs can then be concatenated together into virtual disk partitions called logical volumes or LVs. LVs are usable as raw block devices just as disk partitions are: mountable file systems can be created on them, or they can be used as swap storage.
On striped LVs, each successive LE is allocated from a different PV; depending on the size of the LE, this can improve performance on large sequential reads by bringing to bear the combined read throughput of multiple PVs.
The LVs can be grown or shrunk by concatenating more LEs from or returning them to the pool. The concatenated LEs do not have to be contiguous. This allows LVs to be grown without having to move already-allocated LEs. Some volume managers allow LVs to be resized in either direction while online. Changing the size of the LV does not necessarily change the size of a filesystem on it; it merely changes the size of its containing space. A file system that can be resized online is recommended because it allows the system to adjust its storage on-the-fly without interrupting applications.
PVs and LVs cannot be shared between or span different VGs (although some volume managers may allow them to be moved at will between VGs on the same host). This allows VGs to be conveniently brought online, taken offline or moved between host systems as a single administrative unit.
VGs can grow their storage pool by absorbing new PVs or shrink by retracting from PVs. This may involve moving already-allocated LEs out of the PV. Most volume managers can perform this movement online; this allows storage (if the underlying hardware is hot-pluggable) to be upgraded or replaced without system downtime.

Figure 1 — Logical Volume Manager.

Some volume managers also implement snapshots by applying copy-on-write to each LE. In this scheme, the volume manager will copy the LE to a copy-on-write table just before it is written to. This preserves an old version of the LV (the snapshot) which can later be reconstructed by overlaying the copy-on-write table atop the current LV. Snapshots which are read-write are branching snapshots because they implicitly allow diverging versions of an LV.
Snapshots can be useful for backing up self-consistent versions of volatile data like table files from a busy database, or for rolling back large changes in one swoop, such as an operating system upgrade. Some Linux-based Live CD systems also use snapshots to simulate read-write access on a read-only compact disc.
The LVM can:

Resize volume groups online by absorbing new physical volumes (PV) or ejecting existing ones.
Resize logical volumes (LV) online by concatenating extents onto them or truncating extents from them.
Create read-only snapshots of logical volumes (LVM1).
Create read-write snapshots of logical volumes (LVM2).
Stripe whole or parts of logical volumes across multiple PVs, in a fashion similar to RAID 0.
Mirror whole or parts of logical volumes, in a fashion similar to RAID 1.
Move online logical volumes between PVs.
Split or merge volume groups in situ (as long as no logical volumes span the split).

This can be useful when migrating whole logical volumes to or from offline storage. The idea of concatenating physical disk to single storage array was developed in University of California, Berkeley.
RAID is an acronym first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the University of California, Berkeley in 1987 to describe a Redundant Array of Inexpensive Disks,^[1] a technology that allowed computer users to achieve high levels of storage reliability from low-cost and less reliable PC-class disk-drive components, via the technique of arranging the devices into arrays for redundancy.
More recently, marketers representing industry RAID manufacturers reinvented the term to describe a Redundant Array of Independent Disks as a means of disassociating a "low cost" expectation from RAID technology.^[2]
"RAID" is now used as an umbrella term for computer data storage schemes that can divide and replicate data among multiple hard disk drives. The different schemes/architectures are named by the word RAID followed by a number, as in RAID 0, RAID 1, etc. RAID's various designs all involve two key design goals: increased data reliability or increased input/output performance. When multiple physical disks are set up to use RAID technology, they are said to be in a RAID array. This array distributes data across multiple disks, but the array is seen by the computer user and operating system as one single disk. RAID can be set up to serve several different purposes.
Redundancy is achieved by either writing the same data to multiple drives (known as mirroring), or writing extra data (known as parity data) across the array, calculated such that the failure of one (or possibly more, depending on the type of RAID) disks in the array will not result in loss of data. A failed disk may be replaced by a new one, and the lost data reconstructed from the remaining data and the parity data. Organizing disks into a redundant array decreases the usable storage capacity. For instance, a 2-disk RAID 1 array loses half of the total capacity that would have otherwise been available using both disks independently, and a RAID 5 array with several disks loses the capacity of one disk. Other types of RAID arrays are arranged so that they are faster to write to and read from than a single disk.
There are various combinations of these approaches giving different trade-offs of protection against data loss, capacity, and speed. RAID levels 0, 1, and 5 are the most commonly found, and cover most requirements.
RAID combines two or more physical hard disks into a single logical unit by using either special hardware or software. Hardware solutions often are designed to present themselves to the attached system as a single hard drive, so that the operating system would be unaware of the technical workings. For example, you might configure a 1TB RAID 5 array using three 500GB hard drives in hardware RAID, the operating system would simply be presented with a "single" 1TB disk. Software solutions are typically implemented in the operating system and would present the RAID drive as a single drive to applications running upon the operating system.
There are three key concepts in RAID: mirroring, the copying of data to more than one disk; striping, the splitting of data across more than one disk; and error correction, where redundant data is stored to allow problems to be detected and possibly fixed (known as fault tolerance). Different RAID levels use one or more of these techniques, depending on the system requirements. RAID's main aim can be either to improve reliability and availability of data, ensuring that important data is available more often than not (e.g. a database of customer orders), or merely to improve the access speed to files (e.g. for a system that delivers video on demand TV programs to many viewers).
The configuration affects reliability and performance in different ways. The problem with using more disks is that it is more likely that one will fail, but by using error checking the total system can be made more reliable by being able to survive and repair the failure. Basic mirroring can speed up reading data as a system can read different data from both the disks, but it may be slow for writing if the configuration requires that both disks must confirm that the data is correctly written. Striping is often used for performance, where it allows sequences of data to be read from multiple disks at the same time. Error checking typically will slow the system down as data needs to be read from several places and compared. The design of RAID systems is therefore a compromise and understanding the requirements of a system is important. Modern disk arrays typically provide the facility to select the appropriate RAID configuration.
The kind of RAID we can use to archive our goal of storing is 0, but it can't be used in networking environment when physical disks attached to different nodes.
RAID 0

Figure 2 — RAID 0.
RAID 0 (striped disks) distributes data across several disks in a way that gives improved speed and no lost capacity, but all data on all disks will be lost if any one disk fails. Although such an array has no actual redundancy, it is customary to call it RAID 0.
RAID 0 ("Striped set without parity" or "Striping") provides improved performance and additional storage but no redundancy or fault tolerance. Any disk failure destroys the array, which has greater consequences with more disks in the array (at a minimum, catastrophic data loss is twice as severe compared to single drives without RAID). A single disk failure destroys the entire array because when data is written to a RAID 0 drive, the data is broken into fragments. The number of fragments is dictated by the number of disks in the array. The fragments are written to their respective disks simultaneously on the same sector. This allows smaller sections of the entire chunk of data to be read off the drive in parallel, increasing bandwidth. RAID 0 does not implement error checking so any error is unrecoverable. More disks in the array means higher bandwidth, but greater risk of data loss.
In addition, we did not get a final and complete solution of the problem for the organization of the repository since both approaches (LVM and RAID) implemented in the local physical system and not in a LAN because of inability to mount at the level of network resources.

The need to unify all the resources of the information infrastructure into a unified environment that is transparent to the user interface. The prototype can take the cluster system - computers, the combined network and transparently offer their computing power as if the work is carried out with a single machine. Relationship can serve as a local (LAN) and a global network. A cluster, in general, provides the CPU time of all nodes to perform mathematical calculations. The challenge is to identify and improve or establish a mechanism for storing files in a certain similarity of the cluster.
The basis for constructing such a scheme the concept of storing information dissemination in P2P-networks.

Peer-to-peer (P2P or decentralized) network is the computer network based on the equality of participants. In such networks, there are no dedicated servers, and each node (peer) is a client and server. Unlike client-server architecture, this organization helps to keep the network operating under any number and any combination of available nodes.
A peer-to-peer (or P2P) computer network uses diverse connectivity between participants in a network and the cumulative bandwidth of network participants rather than conventional centralized resources where a relatively low number of servers provide the core value to a service or application.
An important goal in P2P networks is that all clients provide resources, including bandwidth, storage space, and computing power. Thus, as nodes arrive and demand on the system increases, the total capacity of the system also increases. This is not true of a client-server architecture with a fixed set of servers, in which adding more clients could mean slower data transfer for all users.
The distributed nature of P2P networks also increases robustness in case of failures by replicating data over multiple peers, and—in pure P2P systems—by enabling peers to find the data without relying on a centralized index server. In the latter case, there is no single point of failure in the system.^[3]
This nature helps with sharing extremely large files, as everyone is distributing the download (as in uploading to help download), so all "pieces" are sent at the same time to other users. It is a vast and intricate, but totally ingenious idea. However, depending on the type of network, these are also vulnerable. As with most network systems, insecure and unsigned codes may allow remote access to files on a victim's computer or even compromise the entire network. In the past this has happened for example to the FastTrack network when anti P2P companies managed to introduce faked chunks into downloads and downloaded files (mostly mp3 files) were unusable afterwards or even contained malicious code. The result of this and similar tactics is that the P2P networks of today have seen an enormous increase of their security and file verification mechanisms. Modern hashing, chunk verification and different encryption methods have made most networks resistant to almost any type of attack, even when major parts of the respective network have been replaced by faked or nonfunctional hosts.
An important goal in P2P networks is that all clients provide resources, including bandwidth, storage space, and computing power. Thus, as nodes arrive and demand on the system increases, the total capacity of the system also increases. This is not true of a client-server architecture with a fixed set of servers, in which adding more clients could mean slower data transfer for all users.
The distributed nature of P2P networks also increases robustness in case of failures by replicating data over multiple peers, and—in pure P2P systems—by enabling peers to find the data without relying on a centralized index server. In the latter case, there is no single point of failure in the system.^[3]
This nature helps with sharing extremely large files, as everyone is distributing the download (as in uploading to help download), so all "pieces" are sent at the same time to other users. It is a vast and intricate, but totally ingenious idea. However, depending on the type of network, these are also vulnerable. As with most network systems, unsecure and unsigned codes may allow remote access to files on a victim's computer or even compromise the entire network. In the past this has happened for example to the FastTrack network when anti P2P companies managed to introduce faked chunks into downloads and downloaded files (mostly mp3 files) were unusable afterwards or even contained malicious code. The result of this and similar tactics is that the P2P networks of today have seen an enormous increase of their security and file verification mechanisms. Modern hashing, chunk verification and different encryption methods have made most networks resistant to almost any type of attack, even when major parts of the respective network have been replaced by faked or nonfunctional hosts.

The second section contains algorithm of distibuted file storage which can be described by the next series of steps:

File segmentation.
Calculating hashes.
Distributing blocks, URL generating.
Storing procedure.

In most cases, files cannot be divided into blocks with the same size, so it is necessary to put a special end-of-data mark and fill the rest of the file with some random information.
Method to distrubited storing files

Figure 3 — Proccess of distributed storing a file.
(Animation. Number of frames - 3, number of cycles - 7, size - 38,263 bytes)

The third section describes possible ways of cryptographic protection.
In cryptography, the one-time pad (OTP) is an encryption algorithm in which the plaintext is combined with a secret random key or pad as long as the plaintext and used only once. A modular addition is typically used to combine plaintext elements with pad elements. (For binary data, the operation XOR amounts to the same thing.) It was invented in 1917 and patented a couple of years later. If the key is truly random, never reused in whole or part, and kept secret, the one-time pad provides perfect secrecy. It has also been proven that any cipher with the perfect secrecy property must use keys with effectively the same requirements as OTP keys.[citation needed] The key normally consists of a random stream of numbers, each of which indicates the number of places in the alphabet (or number stream, if the plaintext message is in numerical form) which the corresponding letter or number in the plaintext message should be shifted. For messages in the Latin alphabet, for example, the key will consist of a random string of numbers between 0 and 25; for binary messages the key will consist of a random string of 0s and 1s; and so on.
The "pad" part of the name comes from early implementations where the key material was distributed as a pad of paper, so the top sheet could be easily torn off and destroyed after use. For easy concealment, the pad was sometimes reduced to such a small size that a powerful magnifying glass was required to use it. Photos accessible on the Internet show captured KGB pads that fit in the palm of one's hand,^[13] or in a walnut shell.^[14] To increase security, one-time-pads were sometimes printed onto sheets of highly flammable nitrocellulose.
The one-time pad is derived from the Vernam cipher, named after Gilbert Vernam, one of its inventors. Vernam's system was a cipher that combined a message with a key read from a paper tape loop. In its original form,[citation needed] Vernam's system was not unbreakable because the key could be reused. One-time use came a little later when Joseph Mauborgne recognized that if the key tape was totally random, cryptanalytic difficulty would be increased.
There is some ambiguity to the term due to the fact that some authors use the term "Vernam cipher" synonymously for the "one-time-pad", while others refer to any additive stream cipher as a "Vernam cipher", including those based on a cryptographically secure pseudorandom number generator (CSPRNG).^[15]

In the fourth section fail-safe features are being developed.
Method which is used in ECC memory can fix single errors and discover double errors on-the-fly. So it could be useful to create a way to protect information block against being losted due to node(s) failures.

The fifth section contains information about effective practical usage of technology which is being developed.

Important notice! This work is still in progress. ETA is December 2009. Feel free to contact the author or scientific adviser for more information.

§ Further Reading

David A. Patterson, Garth Gibson, Randy H. Katz, A Case for Redundant Arrays of Inexpensive Disks, University of California Berkley, 1988.
Gupta, Meeta, Storage Area Network Fundamentals, Cisco Press. Appendix A.
Advantages of peer-to-peer networks - http://www.solyrich.com/p2p-pros-cons.asp
Lewis, LVM HOWTO, Linux Documentation Project - http://tldp.org/HOWTO/LVM-HOWTO
Баранов Константин, Linux Logical Volume Manager - http://const.tltsu.ru/articles/lvm.pdf
Javed I. Khan, Adam Wierzbicki, Foundation of Peer-to-Peer Computing, Elsevier Journal of Computer Communication, February 2008. Volume 31, Issue 2.
Ross J. Anderson, The eternity service, Pragocrypt, 1996.
Marling Engle, J. I. Khan, Vulnerabilities of P2P systems and a critical look at their solutions, May 2006 - http://www.medianet.kent.edu/techreports/TR2006-11-01-p2pvuln-EK.pdf
Stephanos Androutsellis-Theotokis, Diomidis Spinellis, A survey of peer-to-peer content distribution technologies, ACM Computing Surveys, December 2004.
John F. Buford, Heather Yu, Eng Keong Lua, P2P Networking and Applications, Morgan Kaufmann, December 2008.
Ralf Steinmetz, Klaus Wehrle, Peer-to-Peer Systems and Applications, Lecture Notes in Computer Science, September 2005.
Detlef Schoder, Kai Fischbach, Core Concepts in Peer-to-Peer Networking, Idea Group Inc, 2005 - http://www.idea-group.com/downloads/excerpts/Subramanian01.pdf
Marcus J. Ranum, One-Time-Pad (Vernam's Cipher) Frequently Asked Questions, 1995 - http://www.ranum.com/security/computer_security/papers/otp-faq
Savory, Stuart, One-Time-Pad, Chiffriergerätebau - http://home.egge.net/~savory/chiffre9.htm
Kahn, David, The Codebreakers, Macmillan, 1967.