350 likes | 537 Views
File Management. How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files Disadvantage – added complexity can’t cope with new file types e.g MP3 Both MS-DOS and UNIX don’t care
E N D
File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files Disadvantage – added complexity can’t cope with new file types e.g MP3 Both MS-DOS and UNIX don’t care Considered to be a sequence of bytes with no structure However UNIX recognises Regular files – text data etc Directories Char/block – files which refer to devices Pipes – FIFO buffers MS-DOS only really has attributes System files Archive Hidden Read only Application packages do the rest
File System Services In one form or another, all file systems provide applications with the ability to: • Create a file • Remove a file • Open an existing file • Read from an open file • Write to an open file • Close an open file fetch metadata of a file • Modify metadata of a file Metadata are the data about a file.e.g. file attributes, (name, size, data type, etc) and data about records or data structures (length, fields, columns, etc) and data about data (where it is located, how it is associated, ownership, etc.).
File Structure In the simplest scenario the data is totally unstructured and appears as a stream of bytes. • The disadvantage to this approach is that each application may treat data structures e.g. one program may treat the fields in a database in a totally different way. • The second way is to store and process data in terms of records of 80 (or some other fixed number of) characters. • E.g. the first nine characters might be Social Security Number, the next 15 might be the first name etc. • When dealing with fixed sized records, the record size is usually stored in the file’s metadata. • However, there are disadvantages in having the operating system know about the file structures. • The principal of these is the resultant size and complexity of the system. • Additionally, a new application may require a file structure or access facility not implemented by the supplied system.
In this respect, UNIX adopts an extreme position; files are considered to be sequences of bytes with no structure. UNIX recognises a limited number of ‘file types’, which are described below: • regular ‘Ordinary’ files such as programs, text, data etc; in fact any file which is not of the other types. • directory File containing references to other files; described shortly. • char/block Not ‘true’ files at all but directory entries which refer to devices. • pipe A pipe file is used as a queuing buffer which holds the standard output of one process and supplies this data as the standard input of another process.
Both Microsoft and UNIX use directories which are notional grouping of files; since directories reside on disk, they can be considered as special files. With the exception of directories, the nearest that Microsoft comes to having different file types is that files can have certain attributes. The possible attributes are: • System Assigned to system files such as the operating system files • Archive Used by file back-up systems • Hidden A file with this attribute is ignored by many system commands • Read-only The file cannot be written to or deleted Attributes are not mutually exclusive; e.g.a ‘read-only’ file can also be ‘hidden’.
File identification • Microsoft Windows original naming convention was the 8.3 filename convention BASENAME.EXT • When Internet first arrived, Windows systems were still restricted to 8.3 filename formats had to create web pages with names ending in .HTM, while Macintosh or Unix used .html filename extension. • Similar with Java, since source code files to have the extension .java and compiles object code with .class. • Eventually, Windows introduced support for long file names, and removed the 8.3 name/extension split in file names. It changed the length restriction to 255 characters, and allowed a mix of upper case and lower case letters. • The use of three-character extensions under Microsoft Windows has continued (although it could be longer, as long as the whole name is less than 255) mainly for backward compatibility • Cannot use / \ ? : * < > “ | characters or control characters in a filename
Unix stored the file name as a single string, not split into base name and extension components, with the '.' being just another character. Some applications use suffixes to indicate file types, but they did not use them as much - for example, executables and ordinary text files had no suffixes in their names.
Directories • Early operating systems ‘lumped together’ the files on a disk. • Files belonging to several different users and/or applications cannot be readily distinguished, hence problems such as file naming, security and ‘housekeeping’. • For example, if several people were using the disk, the name of files would need to be strictly controlled by some person assigned to this task or by enforcing conventions which avoided name conflict. • This was not a problem in earlier systems where, in effect, access to the computer was centralised in the data processing department. • Systems introduced directories as a logical grouping of files managed by using a special directory file which contains a list of the directory’s member files. The first directory systems were simply two level; the top level contained user names plus a pointer to another directory which held all the files for that user.
For each of its component files, the directory will generally hold information pertaining to the file e.g. • Filename • file type, if the system recognises different file types • file attributes • information indicating the location of the file on the disk • access rights; i.e.an indication of who can access the file and how it can be accessed • file size in bytes • date information: e.g.date of creation , date of last access, date of last amendment Note that it is admissible to have two or more files with the same name within the system provided that they are in separate directories.
Managing file space • Generally, space is allocated in units of a fixed size, called an allocation unit or block, which is a simple multiple of the disk physical sector size, usually 512 bytes. Typical sizes are 512, 1024 and 2048 bytes. • Unix is generally 1kByte (1024 bytes). • Each disk block has a unique address or disk block number
The actual representation of the set of free blocks generally takes on of several forms: • Firstly, there is the free bitmap. In this representation, each block is represented by a single bit, which is 1 if the block is free and 0 if allocated. The second representation is a free list, normally implemented as a linked list. The links need only to be a single pointer to the head of the list.
The Third representation for free blocks is a simple list of free blocks. If there is at least one free block on the disk then the list can be stored in the free blocks themselves. However there must be a way to identify other blocks if the entire list doesn’t fit into one block. One approach is to create a linked list of these list blocks using the last pointer in the block to point to the next block in the list
Users System Call Interface I/O Subsystem Device Driver Interface Drivers Terminal Disk Network Treatment of Devices and Files • As far as the user is concerned, all sources of input and output in a Unix system are represented as files. Teminals, disk drives, files, communication mechanisms such as pipes and sockets, all look alike to the systems programmer and are treated in the same way.
File Management File naming MS-DOS – up to 8 character name + dot + 3 character extension UNIX – typical length = 14 but Linux = 256 no structure required – any character except / or <space> ok can’t use >, *, ? because they have special meaning Windows – up to 255 characters, can have spaces also generates MS-DOS filename ADAMS JONES SMITH File PROG1 PROG1 List of user names USER DIRECTORY FOR ADAMS MASTER DIRECTORY Two Level Directory System
File Management ROOT system cprogs src include work edit spool prog1.c prog2.c prog1.e file.dat • Directory normally holds following information • filename • file type • file attributes • location on disk • access rights • file size • date information
File Management Clusters Disk Space as an array of clusters Allocated file Unused portion of cluster • Cluster sizes range from 512 to 64kBytes • Using LBA addressing is a 32 bit address for each cluster • Therefore 232 = 4,294,967,296 addresses. • At 512 bytes • = 4,294,967,296 x 512 = 2,199,023,255,552 • = 2 Tera bytes
File Management A A B A A C C C B B File A End of file Free cluster File B End of file Typical cluster allocation of several files From directory entries A A B A A C C C B B Space allocation – chained clusters
File Management (FAT Table) Directory entry: ORDERS DAT no attribs 9/12/00 11:23:44 40 11,230 • File Allocation Table (FAT) • Entry# Value • ………………………… • 39 EOF • 40 41 • 41 42 • 42 44 • 43 Bad • 44 102 • ........................................ • 102 103 • 103 EOF Field 1- filename Field 2 – extension e.g .txt, .dat Field 3 – attributes e.g hidden, read only, directory Field 4 – date last modified Field 5 – time last modified Field 6 - starting cluster Field 7 – size in bytes
Current Windows File Systems • HPFS (High Performance File System) is used by OS/2 and is supported by Windows NT. It provides better performance than FAT on larger disk volumes and supports long file names. • NTFS (New Technology File System) is the standard file system of Windows NT, including its later versions Windows 2000, Windows XP, Windows Server 2003, Windows Server 2008, Windows Vista, and Windows 7. • NTFS supersedes the FAT file system as the preferred file system for Microsoft’s Windows operating systems. NTFS has several improvements over FAT and HPFS. • NTFS supports long file names including Unicode filenames, large volumes, data security, and universal file sharing. • Formatting a volume with the NTFS file system results in the creation of several system files and the Master File Table (MFT), which contains information about all the files and folders on the NTFS volume.
Master File Table • Logically, the disk consists of allocation units called clusters. • A cluster is a power-of-two multiple of the physical disk block size. The cluster size is set when the disk is formatted. • The free list is a bitmap, each of whose bits describe one cluster. • Clusters on the disk are numbered starting from zero to the maximum number of clusters (minus one). These numbers are called logical cluster numbers (LCN) and are used to name blocks (clusters) on disk.
MFT Standard information: This attribute includes the information that was standard in the MS-DOS world: • read/write permissions, • creation time, • last modification time, • count of how many directories point to this this file (hard link count. File Name: This attribute describes the file's name in the Unicode character set. Security Descriptor: This attribute lists which user owns the file and which users can access it (and how they can access it). Data: This attribute either contains the actual file data in the case of a small file or points to the data
MFT • When dealing with large data, the Data attribute contains pointers to the data. • The pointers to data are actually pointers to sequences of logical clusters on the disk. • Each sequence is identified by three parts: • starting cluster in the file, called the virtual cluster number (VCN), • starting logical cluster (LCN) of the sequence on disk, • length, counted as the number of clusters. • The run of clusters is called an extent.
Unix File Systems • boot block - used to boot the operating system. • super block - main function of the super block is to tell the file system how big the various pieces of the file system are. • The super block contains the following information, to keep track of the entire file system. • Size of the file system • Number of free blocks on the system • A list of free blocks • Index to next free block on the list • Size of the inode list • Number of free the inodes • A list of free inodes • Index to next free inode on the list • Lock fields for free block and free inode lists • Flag to indicate modification of super block • i-nodes followed by the block available for storage. Note that the free space is maintained as a linked list of available blocks.
Distributed Link Tracking maintains the integrity of shortcuts to files as well as OLE links within compound documents. • Sparse Files Sparse files allow programs to create very large files but consume disk space only as needed.
Encryption The Encrypting File System (EFS) provides the core file encryption technology. • Disk Quotas Disk quotas can be used to monitor and limit disk-space use. • Reparse Points similar to Windows shortcuts and Unix symbolic links. For example, a reparse point would allow a folder such as C:\DVD to point to E:, the actual DVD drive. • Volume Mount Points You already have one hard disk (Drive 1) mapped as C, and you don't want to map the second disk (Drive 2) as D. You can get around this problem by adding a mount point to the directory structure of Drive 1 that references Drive 2. • Distributed Link Tracking maintains the integrity of shortcuts to files as well as OLE links within compound documents. • Sparse Files Sparse files allow programs to create very large files but consume disk space only as needed.
Shared and Exclusive Access • If a file is already open and another process wants access to it, the operating system has to decide whether to allow this or block it. • In practice both cases may be desirable. • For instance if both processes are reading the file o.k, however if both processes want to write to the file it may lead to inconsistent data. • Consequently most file systems allow for both. Two methods of requesting exclusive access are: • The system call to open the file is passed a flag to say it is to be opened exclusively – if another process wants to access it, then it will have to wait. • A system call which has the ability to lock a file or parts of it • The difference between locking a file and locking an area of memory is that processes declare when they intend to write to a file.
Access Patterns • More often than not a process expects to open a file and begin reading and writing at the beginning. • Each subsequent read or write continues where the last one left off. • This type of sequential access requires that the operating system keeps tabs on the current location. • However there are times when random access is required. • This feature is sometimes included by using a rewind operation or seek. You can find both of these commands in the ‘C’ programming language.
File Management DIRECTORY 1 2 4 5 File A File B File C 3 10 11 6 7 8