490 likes | 792 Views
Guntis Bārzdiņš Artūrs Lavrenovs. Web servers. What web servers do?. What web servers do. Implement HTTP protocol Listen for HTTP requests from browsers Try to fulfill them with static content from file system Modern web servers also Forward dynamic content requests to other systems
E N D
Guntis Bārzdiņš Artūrs Lavrenovs Web servers
What web servers do • Implement HTTP protocol • Listen for HTTP requests from browsers • Try to fulfill them with static content from file system • Modern web servers also • Forward dynamic content requests to other systems • Do lots of useful tasks using modules
C10K problem • Dan Kegel, 1999 • Web servers should handle ten thousand clients simultaneously • Operating system kernel limitations • Operating system provided functionality • Web server design flaws
C10K problem solution – OS kernel • Open source nature of unix kernels allowed to quickly identify all C10K bottlenecks and fix them • Networking related algorithms and data structures in unix kernels originally implemented with complexities O(n|n^2|...) which where fixed to O(1|n) • As a result networking capabilities of unix kernels are virtually limitless (limited by hardware resources)
C10K - OS functionality • Implemented new scalable I/O event notification mechanisms (epoll – Linux, kqueue - *BSD) • Better performance than traditional poll/select • Can receive all pending event using one system call • AIO - The POSIX asynchronous I/O (AIO) interface allows applications to initiate one or more I/O operations that are performed asynchronously (i.e., in the background). The application can elect to be notified of completion of the I/O operation in a variety of ways: by delivery of a signal, by instantiation of a thread, or no notification at all.
C10K – web server design • Non-blocking I/O for networking and disk • Don't block waiting on action completion, serve other requests and wait for notifications about I/O completion • Many threads • Use all available CPU cores to achieve maximum concurrency, avoid locking data structures • Each thread serves many requests • Don't create thread per request, reuse threads, while some non-blocking action completes process other requests
C10M problem – Next decade • 10 million concurrent connections per server • Current unix kernels can't handle that • Application thread locks in kernel • Hardware drivers (NIC) • Memory management • Solution: new generation of high load unix kernels • 1 main application per server • Minimize system call ammount • Minimize kernel work
Dynamic content • Web servers can't create dynamic content themselves • We need application created in some programming language • We need some method how web server can communicate with application • CGI • Apache modules • FastCGI, SCGI, ... • WSGI, PSGI, JSGI, ...
CGI - Common Gateway Interface • Oldest method of getting dynamic content from web servers • For each browser request web server defines set of environment variables derived from request and server configuration • Web server starts application in prepared environment • Send POST data as standard input (if any) • Waits for standard output from executed file and returns it to browser
CGI application • Can be ANY script or binary file executable in UNIX • No libraries required • Use request information from environment variables • Or ignore it completely if not needed • Process standard input if needed • Output additional HTTP headers and then generated document body in standard output
CGI enivronment variables • REQUEST_METHOD: name of HTTP method • PATH_INFO: path suffix, if appended to URL after program name and a slash • PATH_TRANSLATED: corresponding full path as supposed by server, if PATH_INFO is present • SCRIPT_NAME: relative path to the program, like /cgi-bin/script.cgi • QUERY_STRING: the part of URL after ? character. The query string may be composed of *name=value pairs separated with ampersands (such as var1=val1&var2=val2...) when used to submit form data transferred via GET method as defined by HTML application/x-www-form-urlencoded • REMOTE_HOST: host name of the client, unset if server did not perform such lookup • REMOTE_ADDR: IP address of the client (dot-decimal) • Variables passed by user agent (HTTP_ACCEPT, HTTP_ACCEPT_LANGUAGE, HTTP_USER_AGENT, HTTP_COOKIE and possibly others) contain values of corresponding HTTP headers • Only few more
CGI example #!/bin/bash echo "Content-type: text/plain" echo "" echo "Hello world!" echo "Today is:" `date`
CGI issues • Each request forces to create new process, big overhead for process creation and destruction • All script files must be interpreted on each request, another big overhead • Not scalable • Not suitable for modern web servers • Still widely used in embedded systems (e.g. wifi router web management console) which require occasional requests
FastCGI • Multiple processes started • Web server communicate over sockets or TCP • Each process serves many requests • Good performance • Complete separation of web server and dynamic content system • Great scalability – put FastCGI processes across server farm
Other communication methods • Integrate dynamic content generation system with web server process (Apache modules) • CGI derivatives (SCGI) • *SGI implement programming language specific method of communication between web server and selected programming language (WSGI – Python, PSGI – Perl) • Proxy requests to applications that implement communication via HTTP
LAMP • Linux Apache MySQL PHP • Most common web server stack • Simple to install and configure • Simple to develop web applications • Acceptable performance and security
Apache • One of the oldest web servers • Still actively developed • Most popular web server today and in recorded web server history • Highly configurable and extensible using modules • All in one solution • Runs on many OS, most often on unix servers
PHP • One of the most popular web application programming language • Easy to learn (bad coding practices) • Interpreted language • Functions from unix libraries and tools • Huge ammount of ready applications, libraries and modules
MySQL • Unix distributions moving towards MariaDB (MySQL fork) after acquisition by Oracle • Fast relation DB implementation • Fairly easy to user • Different storage engines (faster without transactions, slower with, memory based, etc.) • Query caching • User quotas
Historicalinstallation • Acquire source files for all required software (Apache MySQL PHP) • Acquire all dependencies and install them • Configure make files via ./configure • Compile everything • Configure each piece of software so it works with other • Use it
Modern installation • Use OS package manager • root@server# apt-get install libapache2-mod-php5 apache2 php5 mysql-server • Use it
Simple web site example • Create database user, database, table structure and maybe some data • Using MySQL command prompt accessed by • $ mysql -u root -p • > CREATE DATABASE `example` COLLATE 'utf8_general_ci'; • > CREATE TABLE `posts` (...) • > CREATE USER 'example'@'localhost' IDENTIFIED BY PASSWORD '…' • > GRANT ... ON `example`.* TO 'example'@'localhost'; • > INSERT INTO `posts` (`title`, `info`) VALUES ('a', 'a');
Simple web site example II • Or be lazy and use some web interface like phpMyAdmin or Adminer • Download single file adminer.php • Drop it into /var/www • Navigate your browser to http://localhost/adminer.php • Do all the tasks in browser without really knowing SQL
Simple web site example III • Create file example.php in /var/www • Write your HTML and PHP code inside • Connect to database • Select data • Show data • Your simple web site is ready • Navigate your browser to http://localhost/example.php • Enjoy result
Simple web site example - Result • From http://localhost/example.php
nginx • Contestant for 2nd place in web server rating • Event-driven • High-performance (thousands req/s) • Small memory footprint per request • Efficient CPU usage • Advanced configuration and functionality via modules • Often used as FrontEnd to big websites • CloudFlare built on top of it
High-load web systems • Big dynamic web site can't reside in only 1 server • Need some strategy how to split load across multiple web servers • One possible strategy • One entry point “FrontEnd” which receives all requests and can handle the load (e.g., Varnish, nginx) • Backends process requests from FrontEnd (nginx, Apache)
Kas ir Varnish? • Starpniekserveris (proxy server) • Reversais • Kešojošais • Programmējams • Slodzes dalītājs (load balancer) • Dinamiskā satura ģenerētājs • Rīki – žurnalēšanas, atkļūdošanas, monitorēšānas
Kādēļ Varnish? • Fantastiska veiktspēja pat uz lētā gala serveriem – no 1000 līdz 10000 pieprasījumu uz serveri sekundē tā ir norma • C + LABI C programmētāji • Izmanto Unix arhitektūras labumus • Pēc tūninga desmitiem tūkstošu pieprasījumu sekundē, testēšanā pārsniegti 100k/s • Brīva programmatūra (free open source) • Pieprasījuma orientēta domēnspecifiska konfigurēšanas/programmēšanas valoda VCL • Gandrīz viss, kas nepieciešams augstas noslodzes tīmeklim, vienā
Kešošana • Jebkura dinamiskas tīmekļa lapas ģenerēšana ir ļoti lēna - atkarībā no vides simtiem vai tūkstošiem reižu lēnāka nekā statiska satura atgriešana • Lētā gala serveris var ģenerēt pāris simtus šādu dinamisku lapu sekundē • Jebkurš izstrādes ietvars padara dinamiskas lapas ģenerēšanu vēl desmitiem vai simtiem reižu lēnāks (it īpaši Java EE, Zend Framework) • Jau tikai daži desmiti pieprasījumi sekundē • Rupja matemātika 100x100=10 000 reižu lēnāk kā statiska lapa
Kešošana II • Ideja – būtu ideāli atgriezt dinamisku saturu ar veiktspēju līdzīgu statiskām lapām • Mēs varam saglabāt tās lapas, kas ir vienādas lietotājam un būtiski nemainās noteiktā laika posmā • Cietā diska izmantošana lēna, labā prakse izmantot tikai RAM vai servera SSD visa kešotā satura glabāšanai • Katram konkrētam gadījumam jāveido kešošanas stratēģija, kas var būt ļoti subjektīva
Varnish kešošana • Pēc pieprasījuma adreses (pilnas vai regulāras izteiksmes) var noteikt, kurus pieprasījumus kešot, cik ilgi konkrētu elementu kešot vai nekešot – standarta kešošanas pieeja praktiski visur • Lietotāji – Facebook, Twitter, WikiLeaks, ThePirateBay • Izstrādāts Norvēģijā • Reklamējas, ka var paātrināt lapas atgriešanu no 300 līdz tūkstošiem reižu, t.i., tikai aptuveni līdz 10 reizēm lēnāk nekā statisks saturs • Ātra salīdzinoši ar citām kešošanas pieejām
DSL VCL • Vienkārša sintakse (līdzīga C), kas tiek notranslēta uz C un tad nokompilēts uz mašīnkodu • =, ==, !=, ~, !~, !, &&, ||, +, “string” • if () {} else {}, set, unset, return • 9 subrutīnas, kas ir dažādi katra pieprasījuma apstrādes posmi, kurās var kaut ko ietekmēt • Tikai predefinēti objekti - client, server, req, bereq, beresp, obj, resp sub vcl_recv { if (req.request == "GET" && req.url ~ “\.js$”) { return (lookup); } }
Integrēšana • Fiksētais kešošanas laiks var nebūt optimāls • Saturs var mainīties biežāk par uzstādīto laiku - lietotāji dabū vecu informāciju • Retāk – serveri veic nevajadzīgu darbu • Risinājums – jāpaziņo serverim, ka saturs ir jāatjaunina acl purge { "192.168.0.0"/24; } sub vcl_recv { if (req.request == "PURGE" ) { if (!client.ip ~ purge) { error 405 "Not allowed."; } return (lookup); } } sub vcl_hit { if (req.request == "PURGE") { purge; error 200 "Purged."; } }
Dinamiskā satura ģenerēšana ESI • Bieži vien tīmekļa lapas sastāv no blokiem, kuru mainība ir dažāda • Vai arī ir neliels informācijas bloks, kas atbilst katram lietotājam (piemēram, “Sveiks, Jānis Bērziņš | Tev ir [0] jauns ziņas”) • Mēs to varam ielādēt pēc lapas ielādes, izmantojot JSON vai arī ģenerēt saturu uz Varnish <TABLE><TR><esi:include src=”sveiks.html”/></TR> <TR><TD><esi:include src=”index.html”/></TD> <TD><esi:include src=”article.html”/></TD></TR> </TABLE> • Varnish parsē <esi> birkas un saliek elementus kopā, visi elementi konfigurēti un kešoti kā neatkarīgi
Slodzes dalīšana • Vienu adresi var apstrādāt vairāki ar bakendi • Dažādus url var apstrādāt dažādi bakendi • Monitorēšana • Beigto serveru atslēgšana (restart, upgrade, repair) • Atdzīvojušos serveru pieslēgšana atpakaļ (arī jauni) • Faktiski nozīmē, ka var lietot kaudzi LĒTU desktop grade dzelžu dinamiskā satura ģenerēšanai • Ja pievienojam vēl vienu frontend, tad iegūstam augstu, bet lētu bojājumpiecietība (fault tolerance) • Ja izmantojam NoSQL vai kā savādāk iegūstam replicētu datubāzi, tad nav nepieciešami dārgi serveri vispār
$ curl -I www.delfi.lv HTTP/1.1 200 OK X-Fe-Node: nuffy Content-type: text/html; charset=utf-8 Server: lighttpd/1.4.31 (PLD Linux) Content-Length: 159097 Date: Wed, 07 Nov 2012 20:20:58 GMT X-Varnish: 734492112 734450241 Age: 58 Via: 1.1 varnish Connection: keep-alive Varnish lietojums Latvijā $ curl -I www.tvnet.lv • HTTP/1.1 200 OK • Server: Apache • Last-Modified: Wed, 07 Nov 2012 20:09:08 GMT • Expires: Wed, 07 Nov 2012 20:10:08 GMT • Cache-Control: max-age=60 • Vary: Accept-Encoding • Content-Type: text/html; charset=UTF-8 • Content-Length: 185924 • Date: Wed, 07 Nov 2012 20:10:15 GMT • X-Varnish: 2025605055 2025545136 • Age: 67 • Via: 1.1 varnish • Connection: keep-alive
Nestandarta lietojumi - WAF • Programmējamība ļauj veidot nestandarta lietojumus, piemēram, WAF • Definējam pēc iespējas precīzākas saņemto pieprasījumu apstrādes adreses un metodes • req.url ~ “^/topic/([0-9])$” nevis “^/topic” • req.request == “GET” • Beigās izmantojam return(error); • Ierobežojam piekļuvi backend serveriem (vai atvienojam no interneta) • Uzbrucēji tagad uzbrūk frontendam, aizsargājam to • Nepalīdz pret loģiskām (un daudzām citām) ievainojamībām
New trend • Web application is central thing • Develop application in some framework • No separate web server, it is now just a part of application (it is library from used framework) • Extremely customizable
Standarta tīmekļa izstrādes risinājums ir HTTP serveris un kāda klasiska dinamiskā satura ģenerējošā sistēma (PHP, ASP, Python u.c.), pastāv problēmas: Ilglaicīgie pieprasījumi un pastāvīgie savienojumi Vienlaicīgi apkalpojamo klientu skaits Savietojamība ar citām tehnoloģijām Nākotnes attīstības iespējas Situācija šobrīd
Notikumvirzītie programmēšanas ietvari • Ideja un realizācija nav jauni (Python Twisted, Perl Object Environment, Ruby EventMachine, Node.js) • Maza izplatība tīmekļa risinājumos • Risina standarta tehnoloģiju problēmas • Reaktora projektējums, C10K problēma • Ļauj tīmekļa programmētājiem veidot tīkla risinājumus
Node.js • Bibliotēku kopums, kas ļauj veidot tīkla risinājumus JavaScript programmēšanas valodā, darbojas V8 dzinī • JavaScript dziņu veiktspējas novērtējums • Jaunas saistītās tehnoloģijas – Socket.IO, CoffeeScript • Problemātiski aspekti - pakotņu pārvaldība, lietotņu mitināšana