1k likes | 1.02k Views
Learn how to read, create, and modify files effectively in Python programming. Understand file accessing, permissions, and storage mechanisms. Explore file systems and directory structures.
E N D
COSC 1306COMPUTER SCIENCE AND PROGRAMMING Jehan-François Pâris jfparis@uh.edu Fall 2016
Chapter Overview We will learn how to read, create and modify files Essential if we want to store our program inputs and results. Pay special attention to pickled files They are very easy to use!
Accessing file contents Two step process: First we open the file Then we access its contents read write When we are done, we close the file
What happens at open() time? The system verifies That you are an authorized user That you have the right permission Read permission Write permission Execute permission exists but doesn’t apply and returns a file handle /file descriptor
The file handle Gives the user Fast direct access to the file No folder lookups Authority to execute the file operations whose permissions have been requested
Python open() open(name, mode = 'r', buffering = -1)where nameis name of file modeis permission requested Default is'r'for read only bufferingspecifies thebuffer size Use system default value (code -1)
The modes Can request 'r' for read-only 'w' for write-only Always overwrites the file 'a' for append Writes at the end 'r+' or 'a+' for updating (read + write/append)
Examples f1 = open("myfile.txt") same asf1 = open("myfile.txt", "r") f2 = open("test\\sample.txt", "r") f3 = open("test/sample.txt", "r") f4 = open("C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt")
The file system Provides long term storage of information. Will store data in stable storage (disk) Cannot be RAM because: Dynamic RAM loses its contents when powered off Static RAMis too expensive System crashes can corrupt contents of the main memory
Overall organization Data managed by the file system are grouped in user-defined data sets called files The file system must provide a mechanism for naming these data Each file system has its own set of conventions All modern operating systems use a hierarchical directory structure
Windows solution Each device and each disk partition is identified by a letter A: and B: were used by the floppy drives C: is the first disk partition of the hard drive If hard drive has no other disk partition,D: denotes the DVD drive Each device and each disk partition has its own hierarchy of folders
Windows solution Second diskD: Flash driveF: C: Windows Users Program Files
Linux organization Inherited from Unix Each device and disk partition has its own directory tree Disk partitions are glued together through theoperation to form a single tree Typical user does not know where her files are stored Uses "/" as a separator
UNIX/LINUX organization Root partition / Other partition usr The magicmount bin Second partition can be accessed as /usr
Mac OS organization Similar to Windows Disk partitions are not merged Represented by separate icons on the desktop
Accessing a file (I) Your Python programs are stored in a folder AKA directory On my home PC it is C:\Users\Jehan-Francois Paris\Documents\Courses\1306\Python All files in that folder can be directly accessed through their names "myfile.txt"
The root Users J.-F. Paris Documents Courses\1306\Python\x.txt Courses 1306\Python\x.txt 1306 Python\x.txt Python x.txt
Accessing a file (II) Files in folders inside that folder—subfolders—can be accessed by specifying first the subfolder Windows style: "test\\sample.txt" Note the double backslash Linux/Unix/Mac OS X style: "test/sample.txt" Generally works for Windows
Why the double backslash? The backslash is an escape character in Python Combines with its successor to represent non-printable characters ‘\n’ represents a newline ‘\t’ represents a tab Must use ‘\\’ to represent a plain backslash
Accessing a file (III) For other files, must use full pathname Windows Style: "C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt" Linux and Mac: "/Users/Jehan-Francois Paris/Documents/Courses/1306/Python/myfile.txt"
Reading a file Four ways: Line by line Global reads Within a while loop Also works with other languages Pickled files
Line-by-line reads for line in fh : # special for loop #anything you wantfh.close() # optional
Example f3 = open("test/sample.txt", "r") for line in f3 : print(line)f3.close() # optional
Output To be or not to be that is the questionNow is the winter of our discontent With one or more extra blank lines
Why? Each line ends with newline print(…)adds an extra newline
Trying to remove blank lines print('-----')f5 = open("test/sample.txt", "r") for line in f5 : print(line[:-1]) # remove last charf5.close() # optionalprint('------')
The output ------ To be or not to be that is the questionNow is the winter of our disconten------ The last line did not end with an newline!
A smarter solution (I) Only remove the last character if it is an newline if line[-1] == '\n' : print(line[:-1]else print line
A smarter solution (II) print('-------')fh = open("test/sample.txt", "r")for line in fh : if line[-1] == '\n' : print(line[:-1]) # remove last char else : print(line)print('------')fh.close() # optional
It works! ------ To be or not to be that is the questionNow is the winter of our discontent-------
We can do better • Use the rstrip() Python method • astring.rstrip() remove all trailing spaces from astring • astring.rstrip('\n') remove all trailing newlines from astring
The simplest solution This will remove all trailing newlines even the ones we should keep print('-------')fh = open("test/sample.txt", "r")for line in fh : print(line.rstrip('\n')print('------')fh.close() # optional
Global reads fh.read() Returns whole contentsof file specified by file handlefh File contents are stored in a single stringthat might be very large
Example f2 = open("test\\sample.txt", "r") bigstring = f2.read()print(bigstring)f2.close() # optional
Output of example To be or not to be that is the questionNow is the winter of our discontent Exact contents of file ‘test\sample.txt’ followed by an extra return
fh.read() and fh.read(n) fh.read() reads in the whole fh file and returns its contents as a single string fh.read(n) reads the next n bytes of file fh
Reading within a loop Standard method for C/C++ infile = open("test sample.txt", "r") line = infile.readline() # priming read while line : # false if empty print(line.rstrip("\n") line = infile.readline() infile.close()
Making sense of file contents Most files contain more than one data item per line COSC 713-743-3350UHPD 713-743-3333 Must split lines mystring.split(sepchar)where sepchar is a separation character returns a list of items
Splitting strings >>> txt = "Four score and seven years ago">>> txt.split()['Four', 'score', 'and', 'seven', 'years', 'ago'] >>>record ="1,'Baker, Andy', 83, 89, 85">>> record.split(',')[' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85'] Not what we wanted!
Example # how2split.py print('-----') fh = open("test/sample.txt", "r") for line in fh : words = line.split() for xxx in words : print(xxx) fh.close() # optional print('-----')
Output Spurious newlines are gone -----Tobe…ofourdiscontent-----
Standard way to access a file # preprocessing # set up counters, strings and lists fh = open("input.txt", "r") for line in fh : words = line.split(sepchar) # often space for xxx in words : # do something fh.close() # optional # postprocessing # print results
Example • List of expenditures with dates: • Rent 11/2/16 $850Latte 11/2/16 $4.50Food 11/2/16 $35.47Latte 11/3/16 $4.50Latte 11/3/16 $4.50Outing 11/4/16 $27.00 • Want to know how much money was spent on latte
First attempt • Read line by line • Will split all lines such as • "Food 11/2/16 $35.47" into • ["Food", "11/2/16", "$35.47"] • Will use first and last entries of each linelist
First attempt total = 0 # set up accumulator fh = open("expenses.txt", "r") for line in fh : words = line.split(" ") if words[0] == 'Latte' : total += words[2] # increment fh.close() # optional print("you spent %.2f on latte" % total) It does not work!
Second attempt Must first remove the offending '$' Must also convert string to float def price2float(s) : """ remove leading dollar sign""" if s[0] == "$" : returns float(s[1:]) else : return float(s)
Second attempt total = 0 # set up accumulator fh = open("expenses.txt", "r") for line in fh : words = line.split(" ") if words[0] == 'Latte' : total += price2float(words[2]) fh.close() # optional print("You spent $%.2f on latte" % total) You spent $13.50 on latte
Picking the right separator (I) Commas CSV Excel format Values are separated by commas Strings are stored without quotes Unless they contain a comma “Doe, Jane”, freshman, 90, 90 Quotes within strings are doubled