1 / 25

File System Numbers

File System Numbers. 4/18/2002 Michael Ferguson mpf7@cornell.edu. Why?. Make trace studies of filesystems to Inform development See trends in file system usage Ask these questions How do people actually use filesystems? What to they store and how do they access their data?

sydnee
Download Presentation

File System Numbers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File System Numbers 4/18/2002 Michael Ferguson mpf7@cornell.edu

  2. Why? • Make trace studies of filesystems to • Inform development • See trends in file system usage • Ask these questions • How do people actually use filesystems? What to they store and how do they access their data? • What caching strategies are best? • Filesystem statistics have wider implications • Network activity may depend on these filesystem statistics (think of a web server)

  3. What data do we gather? • User activity – e.g. number of users, amount of data transferred? • File access patterns – e.g. was the file read sequentially from start to finish? • File lifetimes – e.g. what percentage of files exist for less than a second?

  4. File System Trace Studies • BSD Numbers from 1985 (Ousterhout & others) • Sprite Numbers from 1991(Ousterhout & others) • Windows NT numbers from 1999 (Vogels)

  5. The BSD Study - 1985 • Local BSD 4.2 filesystem on a 3 VAX-11/780s • Ucbarpa – used by graduate students for program development and document formatting – 4 Mb of memory • Ucbernie – used by grad students and by administration – 8 Mb of memory • Ucbead – used to run CAD programs for EE – 16 Mb of memory • Average file accesses only a few hundred bytes/sec/user • 75% of files open for less than ½ second • Many files only exist for a few seconds • File accesses tend to be sequential • Most file accesses are to short files but most bytes transferred are from large ones

  6. Sprite Overview • Network-Oriented OS • File system servers and diskless workstations • Supports process migration

  7. Sprite Study - Environment • 40 10-mips workstations running Sprite • 4 are fileservers • Memory averages 24Mb/workstation • Pmake commonly used to migrate processes and make use of idle workstations

  8. Sprite Users • ~ ¼ OS researchers • ~ ¼ Architecture researchers design and simulate IO subsystems • ~ ¼ Researchers studying VSLI design and parallel processing • ~ ¼ Administrators, graphics researchers, and other people

  9. Sprite – Measurement Approach • Instrumented kernels on file servers • Kernel records trace of activity (open, close, delete, lseek, etc but not read or write) • Kernel gives log to user process which records it in a file • Can deduce exact range of bytes accessed • lseek was modified to call file server • Removed trace-file records and tape backup records • Total statistics are gathered in-kernel • I’ll talk about results in comparison with Windows

  10. Windows NT Measurements • 1998 – used 45 Windows NT 4 systems • Systems are used by one person at a time • Statistics are gathered with • File system snapshots • A transparent filter device driver for tracing

  11. Windows trace summary

  12. User Activity Comparison

  13. File Access Pattern Comparison

  14. Windows NT Sprite File Lifetimes

  15. Windows NT Sprite Sequential Runs - Comparison

  16. Windows NT Sprite File Size Distribution - Comparison

  17. Windows NT Sprite File Open Times - Comparison

  18. Windows NT interesting notes • Time between sequential reads and writes different – 90 microseconds for reads, 30 microseconds for writes • 74% of sessions were opening files for control – not read or write • common operation checks whether or not the volume is mounted

  19. Statistical Gotcha! • The data from the Windows NT trace is not a Poisson process – it is better modeled by the Pareto distribution

  20. Open requests vs. Poisson Process

  21. What does it mean? • There is extreme variance at all time scales • Mean and variance of request distribution does not stabilize over time! • Other components have heavy-tail distribution as well: • Process lifetime • Number of DLLs accessed • Number of files open per process • Spacing of file accesses

  22. File Size Distribution • File Sizes are not normally distributed!

  23. Bottom Line – WinNT traces • Although all systems were interactive and used by a single person at a time • 92% of file system operations were from processes that have no direct user input • Even explorer.exe’s behavior does not come directly from the user • “It is the structure and content of the filesystem that determines explorer’s file system interactions, not the user requests.”

  24. Summary • We’ve followed several statistics through Sprite and Windows NT measurements • Network filesystems are still feasible but • Access is quite bursty • Most accesses are for controlling files • But beware! Several statistical assumptions about filesystems seem to be just plain wrong

  25. Summary

More Related