NTFS file system


By Dmitriy Mikhailov
Source: digit-life.com





Biogr.   Abstract   Library   Links   Report   Ind. t.



DonNTU







Master's portal

The Microsoft operating systems of the Windows NT set cannot be imagined without NTFS file system - one of most complex and successful of existing at present file systems. The given article will tell you what features and disadvantages this system has, on what principles based the organisation of the information and how to keep the system in the stable condition, what possibilities NTFS offers and how they can be used by the common user.



Part 1. NTFS physical structure


ÍLet's begin from the common facts. The NTFS partition theoretically can be almost of any size. The limit certainly exists but I shall not point at it as it will be more than enough for the next hundreds of years of computer technology development at any growth rates. What about practice? Almost the same way. The maximum size of the partition NTFS at the moment is limited only by the hard disks sizes. NT4 probably will have some problems at the attempt of installation on the partition if any of its parts steps back more than on 8 GBytes from the disk physical beginning but this problem concerns only the load partition.

The way of NT4.0 installation on the empty disk is rather original and can lead to incorrect thoughts about NTFS possibilities. If you point the installation program that you wish to format disk in NTFS, maximum size which it will offer you will be only 4 GBytes. Why it is so little if NTFS partition size actually is unlimited? The answer is that installation section simply does not know this file system. :) The installation program formats this disk in usual FAT which maximum size in NT is 4 GByte (with usage of not absolutely standard huge cluster 64 KByte) NT is installed on this FAT. And during the first operating system load (in the installation phase) the fast partition conversion to NTFS is effected so that user notice nothing except the strange "limiting" on the NTFS size at the installation time.

Overview of the partition structure.

As well as any other system NTFS divides all useful place into clusters - data blocks used at a time. NTFS supports almost all sizes of clusters - from 512 bytes up to 64 KBytes. The 4 KBytes cluster is considered to be some standard. NTFS doesn't have any anomalies of cluster structure and I have nothing to say about it.

NTFS disk is symbolically divided into two parts. The first 12% of the disk are assigned to so-called MFT area - the space which MFT metafile grows into. Any data recording into this area is impossible. The MFT-area is always kept empty not to let the most important service file (MFT) be fragmented at growth. The rest 88% of the disks represent usual space for files storage.

Disk free space however includes all physically free space - free pieces of MFT-area are included there too. The mechanism of MFT-area usage is like this: when files already cannot be recorded in usual space, MFT-area is simply reduced (in operating systems current versions -twice) clearing the space for recording files. At clearing the usual area, MFT can be extended again. Thus it is possible for usual files to remain in this area and it is normal. The system tried to keep it free but failed. Life is going on... The metafile MFT all the same can be fragmented though it would be undesirable.

MFT and its structure

NTFS file system is a distinguished achievement of structuring: each system component is a file - even system information. The most important file on NTFS is named MFT or Master File Table - the common table of files. It is situated in MFT area and is the centralised directory of all remaining disk files and itself. MFT is divided into records of the fixed size (usually 1 KBytes), and each record corresponds to some file. The first 16 files are housekeeping and they are inaccessible to the operating system. They are named metafiles and the very first metafile is MTF itself. These first 16 elements MFT are the only part of the disk having the fixed position. It is interesting that the second copy of the first 3 records, for reliability (they are very important) is stored exactly in the middle of the disk. The remaining MFT-file can be stored as well as any other file at any places of the disk. It is possible to re-establish its position with its own help using the basis - the first MFT element.

Metafiles

The first 16 NTFS files (metafiles) are system files. Each of them is responsible for some aspect of system operation. The advantage of such modular approach is in amazing flexibility - for example on FAT the physical failure in the FAT area is fatal for all disk operation. As for NTFS it can displace and even fragment on the disk all system areas avoiding any damage of the surface except the first 16 MFT elements.

The metafiles are in the NTFS disk root directory, they start with a name character "$", though it is difficult to get any information about them by standard means. Curiously that even for these files the quite real size is reported, and it is possible to find out for example how many operating system spends on cataloguing of all your disk having looked at $MFT file size. In the following table the metafiles used at the moment and their function are indicated.



Files and streams

So the system has files and nothing except files. What does this concept on NTFS include?

First of all the compulsory element is the record in MFT. As it was said above all disk files are mentioned in MFT. All information about a file except data itself is stored in this place: a file name, its size, separate fragments position on the disk, etc. If one MFT record is not enough for information, then several records are used and not obligatory one after another.

Optional element is file data streams. The definition "optional" seems to be a bit strange but nevertheless there is nothing strange here. Firstly a file may not have data and in this case disk free space isn't used on it. Secondly a file may have not very big size. Then a rather successful decision is applied: file data are stored just in MFT, in the place free from the master data in limits of one MFT record. The files with the size of hundreds byte usually don't have "physical" image in the fundamental file area. All such file data are stored in one place - in MFT.

There is an interesting case with file data. Each file on NTFS has a rather abstract constitution - it has no data, it has streams. One of the streams has the habitual for us sense - file data. But the majority of file attributes are also streams! Thus we have that the base file nature is only the number in MFT and the rest is optional. The given abstraction can be used for the creation of rather convenient things - for example it is possible to "stick" one more stream to a file, having recorded any data in it - for example information about the author and the file content as it was made in Windows 2000 (the most right bookmark in file properties which is accessible from the explorer). It is interesting that these additional streams are not visible by standard means: the observed file size is only the size of the main stream contains the traditional data. It is possible for example to have a file with a zero length and at its deleting 1 GByte of space is freed just because some program or technology has sticked an additional stream (alternative data) of gigabyte size on it. But actually at the moment the streams are practically not used, so we might not be afraid of such situations though they are hypothetically possible. Just keep in mind that the file on NTFS is much deeper and more global concept than it is possible to imagine just observing the disk directories. Well and at last: the file name can consist of any characters including the full set of national alphabets as the data is represented in Unicode - 16-bit representation which gives 65535 different characters. The maximum file name length is 255 characters.

The directories

The directory on NTFS is a specific file storing the references to other files and directories establishing the hierarchical constitution of disk data. The directory file is divided into blocks, each of them contains a file name, base attributes and reference to the element MFT which already gives the complete information on an element of the directory. The inner structure of the directory is a binary tree. It means that to search the file with the given name in the linear directory such for example as for FAT, the operating system should look through all elements of the directory until it finds the necessary one. The binary tree allocates the names of files to make the file search faster - with the help of obtaining binary answers to the questions about the file position. The binary tree is capable to give the answer to the question in what group the required name is situated - above or below the given element. We begin with such question to the average element, and each answer narrows the area of search twice. The files are sorted according to the alphabet, and the answer to the question is carried out by the obvious way - matching of initial letters. The search area which has been narrowed twice starts to be researched the same way starting again from the average element.

It is necessary to realise that to search one file among 1000 files for example FAT should do about 500 matchings (most probably the file will be found in the middle of the search) and the system on the basis of a tree - at all about 10 (2^10 = 102. Saving of search time is in fact. Don't think that in traditional systems (FAT) everything is so uncared-for: firstly the maintenance of the binary tree files list is rather complex and secondly - even FAT in fulfilment of the modern system (Windows2000 or Windows98) uses similar search optimisation. This is just one more fact to be added to your knowledge. It would be desirable also to clear up the widespread mistake (which I absolutely shared recently) that to add a file in the directory as a tree is more difficult than in the linear directory. These operations are comparable enough on time. To add a file in the directory it is necessary at first to be sure that the file with such name is not present yet there and then we shall have some problems in the linear system with the search of a file described above. These problems compensate with interest the ease of file addition in the directory.

What information can be got just having read a directory file? This is what is given by the command dir. To effect the elementary navigating on the disk it is not necessary to go in MFT for each file, it is only necessary to read the most common information about files from directories files. The main disk directory - root - differs from the usual directories by nothing except the special reference to it from the metafile MFT beginning.