1 / 27

Chap 5. Managing Files of Records

Chap 5. Managing Files of Records. Chapter Objectives. Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search and Direct access Files access and file organization Examine other kinds of the file structures in terms of Abstract data models

wooda
Download Presentation

Chap 5. Managing Files of Records

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chap 5. Managing Files of Records

  2. Chapter Objectives • Extend the file structure concepts of Chapter 4: • Search keys and canonical forms • Sequential search and Direct access • Files access and file organization • Examine other kinds of the file structures in terms of • Abstract data models • Metadata • Object-oriented file access • Extensibility • Examine issues of portability and standardization.

  3. Contents 5.1 Record Access 5.2 More about Record Structures 5.3 Encapsulating Record I/O Ops in a Single Class 5.4 File Access and File Organization 5.5 Beyond Record Structures 5.6 Portability and Standardization

  4. Record Access • Record Key • Canonical form : a standard form of a key • e.g. Ames or ames or AMES (need conversion) • Distinct keys : uniquely identify asingle record • Primary keys, Secondary keys, Candidate keys • Primary keys should be dataless (not updatable) • Primary keys should be unchanging • Social-security-number: good primary key • but, 999-99-9999 for all non-registered aliens

  5. Sequential Search (1) • O(n), n : the number of records • Use record blocking • A block of several records • fields < records < blocks • O(n), but blocking decreases the number of seeking • e.g.- 4000 records, 512 bytes length • Unblocked (sector-sized buffers): 512byte size buffer => average 2000 READ() calls • Blocked (16 recs / block) : 8K size buffer ==> average 125 READ() call

  6. Sequential Search (2) • UNIX tools for sequential processing • cat, wc, grep • When sequential search is useful • Searching patterns in ASCII files • Processing files with few records • Searching records with a certainsecondary key value

  7. Direct Access • O(1) operation • RRN ( Relative Record Number ) • It gives relative position of the record • Byte offset = N X R • r : record size, n : RRN value • In fixed length records • Class IOBuffer includes • direct read (DRead) • direct write (DWrite) int IOBuffer::DRead (istream & stream, int recref) // read specified record { stream . seekg (recref, ios::beg); if (stream . tellg () != recref) return -1; return Read (stream); } int IOBuffer::DWrite (ostream & stream, int recref) const // write specified record { stream . seekp (recref, ios::beg); if (stream . tellp () != recref) return -1; return Write (stream); }

  8. Ames John 123 Maple Stillwater OK74075 Mason Alan 90 Eastgate Ada OK74820 (a) Unused space Ames|John|123 Maple|Stillwater|OK|74075| Unused space Mason|Alan|90 Eastgate|Ada|OK|74820| (b) Choosing a record length and structure • Record length is related to the size of the fields • Access vs. fragmentaion vs. implementation • Fixed length record • (a)With a fixed-length fields • (b) With a variable-length fields • Unused space portion is filled with null character in C

  9. Header Records • General information about file • date and time of recent update, count of the num of records • Header record is often placed at the beginning of the file • Header records are a widely used, important file design tool

  10. Abstract base class for file buffers class IOBuffer public : virtual int Read( istream & ) = 0; // read a buffer from the stream virtual int Write( ostream &) const = 0; // write a buffer to the stream // these are the direct access read and write operations virtual int DRead( istream &, int recref ); //read specified record virtual int DWrite( ostream &, int recref ) const; // write specified record // these header operations return the size of the header virtual int ReadHeader ( istream & ); virtual int WriteHeader ( ostream &) const; protected : int Initialized ; // TRUE if buffer is initialized char *Buffer; // character array to hold field values IO Buffer Class definition(1)

  11. IO Buffer Class definition(2) • The full definition of buffer class hierarchy • write method : adds header to a file and return the number of bytes in the header • read method : reads the header and check for consistency • WriteHeader method : writes the string IOBuffer at the beginning of the file. • ReadHeader method : reads the record size from the header and checks that its value is the same as that of the BufferSize member of the buffer object • DWrite/DRead methods : operates using the byte address of the record as the record reference. Dread method begins by seeking to the requested spot.

  12. Encapsulation Record I/O Ops in a Single Class(1) • Good design for making objects persistent • provide operation to read and write objects directly • Write operation until now : • two operation : pack into a buffer + write the buffer to a file • Class ‘RecordFile’ • supports a read operation that takes an object of some class and writes it to a file. • the use of buffers is hidden inside the class • problem with defining class ‘RecordFile’: • how to make it possible to support files for different object types without needing different versions of the class

  13. Encapsulation Record I/O Operation in a Single Class(2) • Class ‘RecordFile’ • uses C++ template features to solve the problem • definition of the template class RecordFile • template <class RecType> • class RecordFile : public BufferFile • { • public: • int Read(RecType& record, int recaddr = -1); • int Write(const RecType& record, int recaddr = -1 ); • RecordFile(IOBuffer& buffer) : BufferFile(buffer) { } • };

  14. // template method bodies template <class RecType> int RecordFile<RecType>::Read (RecType & record, int recaddr = -1) { int writeAdd, result; writeAddr = BufferFile::Read (recaddr); if (!writeAddr) return -1; result = record.Unpack(Buffer); if (!result) return -1; return writeAddr; } template <class RecType> int RecordFile<RecType>::Write (const RecType & record, int recaddr = -1) { int result; result = record . Pack (Buffer); if (!result) return -1; return BufferFile::Write (recaddr); }

  15. File Organization File Access Variable-length Records Sequential access Fixed-length records Direct access File Access and File Organization • There is difference between file access and file organization. • Variable-length records • Sequential access is suitable • Fixed-length records • Direct access and sequential access are possible

  16. Abstract Data Model • Data object such as document, images, sound • e.g. color raster images, FITS image file • Abstract Data Model does not view data as it appears on a particular medium. application-oriented view • Headers and Self-describing files

  17. Metadata • Data that describe the primary data in a file • A place to store metadata in a file is the header record • Standard format • FITS (Flexible Image Transport System) by International Astronomers’ union (see Figure 5.7)

  18. Mixing object Types in a file • Each field is identified using “keyword = value” • Index table with tags • e.g.

  19. Object-oriented file access • Separate translating to and from the physical format and application (representation-independent file access) Program find_star : read_image(“star1”, image) process image : end find_star image : star1 star2 RAM Disk

  20. Extensibility • Advantage of using tags • Identify object within files is that we do not have to know a priori what all of the objects will look like • When we encounter new type of object, we implement method for reading and writing that object and add the method.

  21. Factor affecting Portability • Differences among operating system • Differences among language • Differences in machine architecture • Differences on platforms • EBCDIC and ASCII

  22. Achieving Portability (1) • Standardization • Standard physical record format • extensible, simple • Standard binary encoding for data elements • IEEE, XDR • File structure conversion • Number and text conversion

  23. Achieving Portability (2) • File system difference • Block size is 512 bytes on UNIX systems • Block size is 2880 bytes on non-UNIX systems • UNIX and Portability • UNIX support portability by being commonly available on a large number of platforms • UNIX provides a utility called dd • dd : facilitates data conversion

  24. Portability • 화일 공유 • 화일이 서로 다른 컴퓨터에서, 서로 다른 프로그램에서 접근 가능 • 이식성 (Portability) 과 표준화 (Standardization) • 이식성에 영향을 주는 요인들 • 두 회사가 화일을 공유 • A 회사: sun 컴퓨터, C 프로그래밍, B 회사: IBM PC 에서 Turbo PASCAL 프로그래밍 • 운영체제 사이의 차이점들 • 화일의 궁극적인 물리적 형식은 운영체제 사이의 차이점에 의해 변할 수 있음 • 프로그래밍 언어들 사이의 차이점들

  25. Portability • 이식성의 달성 • 표준이 되는 물리적인 레코드 형식에 동의하고 그것을 따름 • 물리적 표준 : 어떤 언어, 기계, 운영체제에 상관 없이 물리적으로 같게 표현되는 것 • ex) FITS • 데이터 요소를 위한 표준 이진 코드화에 동의 • 기본적 데이터 요소 : 텍스트, 숫자 • ex) IEEE 표준형식과 XDR

  26. Portability • 변환 1: 직접 변환 형태 • 변환 2 : 중간 표준 형태 IBM VAX Cray Sun 3 IBM PC IBM VAX Cray Sun 3 IBM PC IBM VAX Cray Sun 3 IBM PC IBM VAX Cray Sun 3 IBM PC XDR

  27. Let’s Review !!! 5.1 Record Access 5.2 More about Record Structures 5.3 Encapsulating Record I/O Ops in a Single Class 5.4 File Access and File Organization 5.5 Beyond Record Structures 5.6 Portability and Standardization

More Related