1 / 8

parsing strings

parsing strings. Fred Kuhns Computer Science and Engineering Applied Research laboratory Washington University in St. Louis. Working Example. As software developers we frequently find ourselves needing to parse strings.

lisbet
Download Presentation

parsing strings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. parsing strings Fred Kuhns Computer Science and Engineering Applied Research laboratory Washington University in St. Louis

  2. Working Example • As software developers we frequently find ourselves needing to parse strings. • We will focus on character strings using the STL string class but the techniques used have broader applicability • A common scenario is we have a file which contains tabulated data that we must read, perform some processing than store the results back in a file • A popular file format is CSV or Comma Separated Values • commas are used to separate fields and newlines to separate records CS422 – Operating Systems Concepts

  3. CSV # Registration table # Last <fs> First <fs> MI <fs> ID <fs> Email <fs> Comments # Each line represents the record for one registered person Smith , John , M , 1001 , john@someplace.com , needs receipt Jackson, Mary , I , 2010 , mary@thatplace.edu , Mitchel, Mark, L, 4000, mm@candy.com, must call Hicks,,, 2110, , must get missing information • Convenient o think of the file as a two-dimensional array of records and fields • Be specific about your assumption concerning data format and whether comments and escape sequences are permitted • don’t assume that are fields will have values or that the proper number of field separators are present, especially if people are permitted to edit the file CS422 – Operating Systems Concepts

  4. Accessing the file • C++ gives you access to the C and C++ standard libraries for I/O. See required text for the details. • I assume you need input and output files open: char ch; ifstream fin(“data.cvs”) if (!fin) {cerr<<“open fin failed”; exit(1);} ofstream fout(“result.cvs”); if (fout) {cerr<<“open fout failed”; exit(1);} while (fin.get(ch)) { … fout.put(ch);} if(!fin.eof() || !fout){cerr<<“File IO error”;exit(1);} • you may explicitly open a file fin.open(“filename”); • stream destructor closes file or you may explicitly close it fin.close(); CS422 – Operating Systems Concepts

  5. Operations • stream objects have stategood() – next operation expected to succeedeof() – end of file (input) reachedfail() – next operation will failbad() – corrupted stream • An operation on a stream not in a good state is a null op • bool operator!() const on a stream returns fail() • operator void*()const returns fail() ? 0 : -1; • char oriented I/O uses get, put, read, write, getline and the operators << and >>. • get(char*,…) does not remove ‘\n’ but getline(char *,…) does. • Can also use the non-member function getline which takes a string CS422 – Operating Systems Concepts

  6. Reading CSV • Questions: • is it OK to add fl to the vector records? • does the line read retain all whitespace? istream fin(argv[1]; string line; vector<string> lines; vector<FieldList> records; while (getline(fin, line)) { lines.push_back(line); // example of using vectors FieldList fl(line); records.push_back(fl); } CS422 – Operating Systems Concepts

  7. Reading the fields, one or many ways FieldList::FieldList(const string &rec, …) // you can fill in the missing pieces string fld; string::size_type indx, fend, tmp, end = rec.size(); for (indx = 0; indx <= end; indx = fend + 1) { // skip over any initial white space indx = rec.find_first_not_of(ws_, indx); ??? flds_.push_back(fld); } • To solve this consider the edge cases • Make sure you explicitly address each case • Draw a picture • Do you allow comments? • What about quoted text with embedded field separators? CS422 – Operating Systems Concepts

  8. Simple Examples • You can use the find family of string member function to split up this line: find(), find_first_of(), find_first_not_of(), find_last_of(), find_last_not_of() a, b, c\n char a , b , c 0 1 2 3 4 5 index Record as it appears in file string representation of record after a cal to getline(fin, line). line.size() == 5 CS422 – Operating Systems Concepts

More Related