1 / 25

Lecture 4 Basic Web Concepts

CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu. Lecture 4 Basic Web Concepts. IP address 1. IP address 2. TCP/IP network. HypertexT Transfer Protocol (HTTP). HTTP request. HTTP response. web browser

hank
Download Presentation

Lecture 4 Basic Web Concepts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 502 Computing Methods for Digital LibrariesCornell University – Computer ScienceHerbert Van de Sompelherbertv@cs.cornell.edu Lecture 4 Basic Web Concepts

  2. IP address 1 IP address 2 TCP/IP network HypertexT Transfer Protocol (HTTP) HTTP request HTTP response web browser HTTP client renders response web server HTTP server

  3. Transmission Control Protocol/Internet Protocol (TCP/IP ) • is the protocol suite that drives the Internet • handles network communications between network nodes (computers, printers, webcams, … connected to the Internet) • protocol suite: • TCP: communication of data between applications • IP: communication of data between nodes • UDP: communication between applications • ICMP: error and stats

  4. Client sends HTTP request Server receives HTTP request Application layer Transport layer TCP Internet layer IP Network Access layer Ethernet, … TCP/IP protocol architecture

  5. Transmission Control Protocol (TCP) • breaks message up into chunks • chunks get sequence number and IP address of addressee • opens connection with addressee (handshake) • hands chunks over to IP layer • guarantees error-free delivery of chunks at addressee (through connection)

  6. Internet Protocol (IP) • handles the routing of chunks towards addressee (through routers) • IP Addressing: • each node has an IP address: 157.193.101.6 • each node can have readable name erlserv.rug.ac.be • DNS connects IP and readable name • IP Data Transmission: • sender delivers chunk to router (via lower level protocol) • router delivers chunk to router or host • individual chunks can be delivered via different paths • routers decide on the path of least resistance • at addressee delivers chunk to TCP layer

  7. TCP/IP protocol architecture Application layer HTTP, FTP, telnet Transport layer TCP, UDP Internet layer IP, ICMP Network Access layer Ethernet, …

  8. method header entity-body HTTP request GET / HTTP/1.1 Date: Wednesday, 02-Feb-99 23:04:12 GMT Accept-Language: en-us User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: no.good.com Connection: Keep-Alive * a blank line * HTTP request no.good.com web browser HTTP client web server HTTP server

  9. HTTP request method method URI HTTP-version GET - POST - HEAD – PUT - … GET / HTTP/1.1 header • general-header: optional, general information • Date: Wednesday, 02-Feb-99 23:04:12 GMT • Connection: Keep-Alive • request-header: about client • Accept-Language: en-us • User-Agent: Mozilla/4.0 (compatible; • MSIE 5.01; Windows NT) • entity-header: about entity-body What is sent to the server entity-body

  10. status header entity-body HTTP response HTTP/1.1 200 OK Date: Wednesday, 02-Feb-99 23:04:25 GMT Server: Apache/1.3.6 (Unix) Last-Modified: Sun, 01 Feb 1999 13:54:26 GMT ETag: “2f5cd-964-38js8” Content-length: 327 Connection: close Content-Type: text/html * a blank line * <title>Welcome to nogood</title> <img src=“/images/nogood-logo.gif”> HTTP response no.good.com web browser HTTP client web server HTTP server

  11. HTTP response status HTTP-version Status-code Reason-phrase HTTP/1.1 200 OK header • general-header: optional, general information • Date: Wednesday, 02-Feb-99 23:04:25 GMT • response-header: about server • Server: Apache/1.3.6 (Unix) • entity-header: about entity-body • Content-Type: text/html • ETag: “2f5cd-964-38js8” • Content-length: 327 entity-body What is sent to the client title>Welcome to nogood</title> <img src=“/images/nogood-logo.gif”>

  12. HTTP request GET /images/nogood-logo.gif HTTP/1.1 Date: Wednesday, 02-Feb-99 23:04:27 GMT Accept-Language: en-us User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: no.good.com Connection: Keep-Alive * a blank line * HTTP request no.good.com web browser HTTP client web server HTTP server

  13. HTTP response HTTP/1.1 200 OK Date: Wednesday, 02-Feb-99 23:04:29 GMT Server: Apache/1.3.6 (Unix) Last-Modified: Sun, 01 Feb 1999 08:20:00 GMT ETag: “2f5cd-964-445e” Content-length: 220 Connection: close Content-Type:image/gif * a blank line * the GIF file HTTP response no.good.com web browser HTTP client web server HTTP server

  14. HypertexT Transfer Protocol (HTTP) HTTP request HTTP response MIME type + file web browser HTTP client renders response web server HTTP server

  15. Browser • built into browser • plug-in • helper application file MIME type Presentation software Display

  16. s e r v e r c l i e n t HTTP Proxies • Reduce network traffic: caching (Etag, Last-Modified) • IP-based authentication cache no.good.com web browser HTTP client web server HTTP server HTTP proxy

  17. HTTP cookies • HTTP protocol is stateless: once a server has given a response to a client, it forgets about it. No session information. • Fake state with cookies: • server sends token to client • client sends token back to server • server understands the meaning of the token • for instance: server avoids to require input of username/password with every request by reading authorization from cookie

  18. CGI HTTP request HTTP response Dynamic content: Common Gateway Interface (CGI) • Client interaction with non-web servers program no.good.com web browser HTTP client web server HTTP server

  19. CGI CGI -- HTTP POST request POST/cgi-bin/find HTTP/1.1 Date: Wednesday, 02-Feb-99 23:04:27 GMT Accept-Language: en-us User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: no.good.com Connection: Keep-Alive Content-length: 26 Content-type: application/x-www-form-urlencoded * a blank line * search=herbert&type=author program find HTTP request no.good.com web browser HTTP client web server HTTP server

  20. CGI CGI -- HTTP GET request GET/cgi-bin/find?search=herbert&type=author HTTP/1.1 Date: Wednesday, 02-Feb-99 23:04:27 GMT Accept-Language: en-us User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: no.good.com Connection: Keep-Alive * a blank line * program find HTTP request no.good.com web browser HTTP client web server HTTP server

  21. CGI - the interface program find • find receives input from • STDIN • environment variables (about client, server, • request … CGI search=herbert&type=author SERVER-NAME server.good.com REMOTE-HOST 157.193.101.6 … no.good.com web server HTTP server

  22. CGI - the interface find outputs to STDOUT program find Content-type: text/html <title>Search results</title> … CGI web server adds header information sends response to client no.good.com web server HTTP server

  23. Dynamic content: Mobile code - JavaScript • Executed by the browser • • User interface, client-side validation, … HTML HTTP response JavaScript no.good.com web server HTTP server web browser HTTP client

  24. Dynamic content: Mobile code – Java applets • Executed by virtual machine • • Interaction with find not via HTTP program find Java HTTP response no.good.com web server HTTP server web browser HTTP client

  25. Want to read a bit more? • on Web Characterization http://www.w3.org/1999/05/WCA-terms/01 • on CGI http://www.ukans.edu/~acs/docs/other/forms-intro.shtml • on Web, TCP/IP, CGI http://www.wdvl.com/Authoring/Tools/Tutorial/index4.html • HTTP http://www.ietf.org/rfc/rfc1945.txt?number=1945 ; http://www.jmarshall.com/easy/http/

More Related