1 / 39

The p2pweb Project Low cost Peer to Peer solutions for high availability web hosting

The p2pweb Project Low cost Peer to Peer solutions for high availability web hosting. 19 Mai 2005 Séminaire « Peer-To-Peer : Concept, Tools and Applications » Ecole d’ingénieurs de Genève. Agenda. The Project goals Web hosting solutions and architecture The p2pweb solution

indiya
Download Presentation

The p2pweb Project Low cost Peer to Peer solutions for high availability web hosting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The p2pweb ProjectLow cost Peer to Peer solutions for high availability web hosting 19 Mai 2005 Séminaire « Peer-To-Peer : Concept, Tools and Applications » Ecole d’ingénieurs de Genève

  2. Agenda • The Project goals • Web hosting solutions and architecture • The p2pweb solution • Project constraints and key technologies • Related projects • The project components • Global server load balancing system • Distributed set of web server • Monitoring system • Node architecture and hardware • Conclusion

  3. The Project goals To explore and implement low cost solutions for high availability web hosting “Do More with Less” Our targets are : • small or medium structures (associations, NGO, etc …) • with limited resources (money, IT people) • with important web hosting needs (bandwidth available) • rich and complex web site • medium to high web traffic • high availability and visibility needs It may fit very well the needs of many project in Least Developed Countries : TeleCentres Networks, Rural Organisations, Universities, Cultural Centres, Public Libraries, Community Multimedia Centres, Health Networks, etc ...

  4. Example of hosted web site Afromix.org (personal web site) A portal of African and Caribbean Cultures since 1993 A complex web site using multiple technologies • in house Perl Content Management System (CMS) • an extended discographic database (1600 artist, more than 50 styles from all Africa and French West Indies) • multilingual (French, English, Spanish) site running on a JAVA application server (Tomcat) • about 25 000 files, 400 000 pages/month, 2 million hits/month, 60 000 unique visitors/month Mediaport.net (community web site) One of the first French web pioneer, first developed in INA • mostly static content (near 10 000 files) • multilingual (French, English) site running on a PHP CMS (ezpublish) • it’s the main p2pweb test platform and it will evolve to an open web hosting solution for artistic and cultural web projects (an editorial committee is forming)

  5. The web hosting market • Free web hosting • Very limited • static html or small PHP site (limited computing resources) • can’t use your own domain name • Professional web hosting • A broad range of services • private virtual server • dedicated server • Co/location • But price is quite high • 100-200€/month for one dedicated server • and maintenance can be complex

  6. Centralized architecture Server in one location : Server and Internet link are single point of failure (SPOF)

  7. Centralized architecture (cont.) High availability architecture Datacenter hosting - BGP routing - hardware load balancing - SAN storage Multi-homing with BGP routing Load Balancers Reverse Proxy / Cache / SSL accelerators Load Balancers • In theory, no SPOF • but very complex architecture • very high cost Web servers Application Servers Database cluster SAN Storage

  8. CDN Architecture Content Delivery Network Service delivered by companies like Akamai, Speedera, and others. Edge servers provide caching and data replication for fast delivery to clients worldwide. A solution for very high traffic web site. Very expensive solution.

  9. alternative web hosting • Community based web hosting • Initiatives from various associations ouvaton.coop, globenet.net, autre.net, altern.net, ... • Most of the time, people share their money and knowledge to buy and administer one or two dedicated server. • Home server • We now have sufficient bandwidth (ADSL) computing power (PCs), good software (apache, linux …) • We lack reliability !

  10. First idea : big home server

  11. Second idea (better one) • Lots of people (family, friends, co-workers, …) already have : • An ADSL Internet access or Permanent High Speed Connection • One or more PCs (with a lot of unused disk space) • So, what about sharing those resources to build a more powerful and resilient network of web servers

  12. Web Hosting : the p2pweb way ADSL ISP 2 ADSL ISP 1 ADSL ISP 3 Each member of the p2pweb network share a portion of his Internet bandwidth (most of the time an ADSL line) and host a small server. The result is a powerful network that is the sum of the bandwidth and computing resources of all the members.

  13. A peer to peer solution • Somehow, it’s a return to the very fundamentals principles of Internet: • a cooperative solution (network of servers) • a distributed solution (no central control) • a fault tolerant solution (resilience) • But with all the power of existing internet and open source technologies • consumer computers and internet access • overlay network and services over the Internet • It is a peer to peer solution !

  14. The project constraints • Unreliable component • Node failure is not an exception, it’s the rule. • Internet link failure, power outage, server crash … • Automatic function • Murphy’s law : servers will always crash when there is nobody to fix the problem (at night, when you are on vacation …) • Pragmatic approach • Build from existing component • Simple and efficient solutions are priority choices

  15. Key technologies Mass market products are available at low cost now ! • ADSL lines • 1 Mb/s Up - 15Mb/s Down for 30€ / month (free.fr) • ADSL router / firewall / ethernet or wifi • D-LINK, NetGear, LINKSYS from 75 to 150 € • Small Servers • PC barebones (Asus, Biostar, Shuttle …) • from 300 to 500 € • mini iMac (Apple) • 499 € • Open Source Software • BSD, Linux, apache, tomcat, etc …

  16. Related projects YouServ (IBM) http://www.almaden.ibm.com/cs/people/bayardo/userv/ • YouServ is software that forms a webserving "grid" by allowing its users to pool their desktop computing resources to create one large, virtual web-space. • An intranet project, more oriented on desktop file sharing. • Unfortunately not open source Vergenet (Simon Horman) http://www.vergenet.net/ • Vergenet has servers located in Sydney, Amsterdam, London, Tokyo and Indiana. These servers are all running Linux and a variant of Super Sparrow to load balance traffic between them. • Super Sparrow enables users to load balance traffic between geographically separated points of presence by finding the site network-wise closest to clients. This is done by accessing BGP routing information (but it require direct access to a BGP router)

  17. Related projects (cont.) Coral (New York University) http://www.coralcdn.org/ • Coral is peer-to-peer content distribution network, comprised of a world-wide network of web proxies and name servers • Publishing through Coral is as simple as appending a short string to the hostname of objects' URLs; a peer-to-peer DNS layer transparently redirects browsers to participating caching proxies • an URL like www.myserver.com/some/path.html becomes www.myserver.com.nyud.net:8090/some/path.html • Coral is in fact running on top of the planet-lab network (a grid computing research network : http://www.planet-lab.org/) Globule (Vrije University Amsterdam) http://www.globule.org/ • Globule is a module for the Apache Web server that allows a given server to replicate its documents to other Globule servers. Clients are automatically redirected to one of the available replicas. • The project provide both content replication and HTTP or DNS based redirection mechanisms

  18. P2PWeb - Project Components • A global server load balancing system • Two main functions • Load balance the traffic on the web servers • Provide failover = only send traffic on alive web servers • A distributed set of web server • And a set of tools to : • Publish content on the servers • Keep all servers in sync (replication mechanism) • Monitoring services

  19. Global server load balancing • Load balancing • achieved using Round Robin DNS • simple system, with well known limits (http://www.tenereillo.com/GSLBPageOfShame.htm) • Failover • achieved by coupling a monitoring system (NAGIOS) with the DNS • DNS entries have short TTL (time to live) • NAGIOS monitors each web servers • When a server change state (for example DOWN) a special handler is called that update the DNS entry and reload the DNS • The failed server is no longer announced by the DNS To have a fully redundant system, we use 3 independents DNS (all primary), each running its own NAGIOS instance

  20. GSLB : Failover illustrated Initial DNS entries : all server are up www 300 IN A 82.66.103.28 www 300 IN A 195.101.152.113 www 300 IN A 82.232.203.167 www 300 IN A 66.35.250.210 Server 195.101.152.113 fails In the syslog trace, we can see : 22:22:46 nagios: SERVICE ALERT: ns1;HTTP-P2PWEB;CRITICAL;SOFT;1;Connection refused by host 22:23:47 nagios: SERVICE ALERT: ns1;HTTP-P2PWEB;CRITICAL;SOFT;2;Connection refused by host 22:24:46 nagios: SERVICE ALERT: ns1;HTTP-P2PWEB;CRITICAL;HARD;3;Connection refused by host After 3 unsuccessfull try, a notification is send by email to the admin 22:24:46 nagios: SERVICE NOTIFICATION: nagios;ns1;HTTP-P2PWEB;CRITICAL;notify-by-email;Connection refused by host The specific handler is called 22:24:47 nagios: SERVICE EVENT HANDLER: ns1;HTTP-P2PWEB;CRITICAL;HARD;3;http_p2pweb_handler And the DNS is reloaded 22:24:47 named[17379]: master/p2pweb.net.zone:1: no TTL specified; using SOA MINTTL instead And now we can verify that the DNS entries are www 300 IN A 82.66.103.28 ;www 300 IN A 195.101.152.113 www 300 IN A 82.232.203.167 www 300 IN A 66.35.250.210 Failover time is : 2 or 3 minutes (NAGIOS) + DNS max TTL (here 5 minutes) = less than 10 minutes

  21. GSLB : next steps Improvements : • Better service provisioning (manual process for now) • Better support for “long downtime” • When a server crash for a long period of time and then recovers its content may be outdated • We must not announce it back until it has re-synchronize itself • Proximity load balancing • The goal is to load balance traffic between geographically distributed servers by finding the site network-wise closest to clients. • A technology used in the CDN (Content Delivery Network) world We can use part of the globule project, as Globule support DNS redirection based on 'AS-path length' policy (used in BGP routing) which tries to redirect clients to a server close to them. These BGP information's can be collected through routeviews.org (no direct access to a BGP router needed)

  22. Web server content management ADSL ISP 2 ADSL ISP 1 We have a set of web servers and we need tools to : • Publish content on all servers • Keep them in sync (content replication) Two main replication strategies • primary backup : one master server to form replicas • active replication : if any changes, one replica propagates them back to all the other ones ADSL ISP 3

  23. static content replication Replica ADSL ISP 2 Replica ADSL ISP 1 One server play the master’s role • Content is published first on the master (for example via FTP) • Then the content is either pushed or pulled on the replica The easiest way is to use rsync (rsync.samba.org) Content can be pulled via anonymous rsync from master Content can be pushed via rsync over ssh (using private/public key pair for security) ADSL ISP 3 Replica Master

  24. Content replication : rsync rsync is a file transfer program for Unix systems. rsync provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand. Anonymous rsync server (pull mode) • Run as a standalone daemon or can be launched by inetd • Advanced security options (read-only, chroot, IP access list) • Use : run from crontab on each mirror rsync -a master.mydomain.com::www/ /data/www/ Rsync over SSH (push mode) • Need ssh access on each mirror • And ssh cryptographic keys exchange for unattended operation • Use : run on demand or from crontab on master rsync -a /data/www/ user@mirror.mydomain.com::/data/www/ Useful options --compress compress file data during the transfer --bwlimit=KBPS limit I/O bandwidth; KBytes per second

  25. Content distribution : Satellite For a lot of geographically distributed mirrors, an interesting solution can be Datacasting over satellite • Technology used by some CDN vendors • Skycache, cidera, Skystream.com, panamsat.com • Now available at lower cost from worldspace.fr (SatPost Solution)

  26. Use of CMS Nowadays most webmasters use CMS (Content Management System) tools for publishing • A lot of open source and commercial tools • Spip, mambo, typo3, phpnuke, … (php) • Bricolage, metadot, slashcode, … (perl) • Cofax, opencms, magnolia, jahia, … (java) • Plone, cps, zwook, … (python) • But none of them has direct support for a distributed architecture • Most use a database as a backstore • Database distributed transaction and replication is a hard problem

  27. CMS : a pragmatic solution Replica webmaster ADSL ISP 2 ADSL ISP 1 The webmaster publish using the CMS as usual • The content is exported as static html files • Then distributed on the replicas using rsync Constraint : the CMS must support export with “static like URLs” Either directly or thru URL rewriting /article/sport/2005/4/13/football.html (good) /article.php?id_category=3&id_article=25 (bad for mirroring) Replica ADSL ISP 3 CMS Back office Replica html export Master : static html files

  28. CMS : distributed architecture (1) Mali ADSL ISP 2 ADSL ISP 1 Senegal Example : a non-governmental organization has activity over 4 countries and want to provide a global web presence. The same global web design and tools are used on all servers. Local publishing Each local webmaster publish news about his country using the CMS on the local server Content exchange using web services Each local web server “collect” (pull) new articles from the other servers using some RSS (Really Simple Syndication) web services Global web presence Global content is (re)constructed on each server (from all data from the others) and served on Internet Such solution may be constructed by hacking/customizing existing CMS ADSL ISP 3 XML content exchange Burkina faso Ivory coast

  29. CMS : distributed architecture (2) CMS + Message-oriented middleware (MOM) A MOM is a client/server infrastructure that increases the interoperability, portability and flexibility of an application by allowing the application to be distributed over multiple heterogeneous platforms. Thru the use of queue system, a MOM can provide asynchronous reliable data exchange. MOM is typically asynchronous and peer-to-peer and supports • Point to point communication • Publish and subscribe communication There is a standardized interface in Java : JMS (java Message Service) API Various open source implementation in the java world ActiveMQ (activemq.codehaus.org) OpenJMS (openjms.sourceforge.net) Joram (joram.objectweb.org) MantaRay (mantamq.org) No CMS use it now (as far as i know), but it may be a very good solution

  30. Performance monitoring We collaborate with the webperf.org project • WebPerf is a system for measuring response time of specified URLs from multiple locations on the internet. • The project is founded on the premise that there are lot of other companies who also require such a monitoring service. If the other companies are willing to monitor our URLs, we will montior theirs (a free co-peering arrangement). Some perl script installed on local node collect data from other web site, then data are pushed to a central repository for further analysis. A web interface allow members to display various statistics. A view of one’s web site as seen from all other the world.

  31. Webperf.org : sample graph (1)

  32. Webperf.org : sample graph (2)

  33. Webperf.org : sample graph (3)

  34. Node architecture and security • Security • Mandatory • Hardware router/firewall with NAT capabilities • Internal private network using RFC 1918 IP address (192.168.x.y) • No incoming traffic from the outside other than required • Controlled via redirect on the firewall • http (port 80) • ssh (port 22, optional) Internet ADSL or Cable modem Ethernet link Ethernet router/firewall Optional Wifi access point P2pweb traffic Private Ethernet LAN Web server

  35. Node hardware (example) • Run on the corner of a desk • An ethernet and wifi switch • Connect other computers (not shown here) • A web and application server • Mac mini (apple) running apache2 and tomcat • A firewall • Embedded PC (www.pcengines.ch) running pf (packet filter) on OpenBSD from a compact flash • No noise, and low electric power consumption (near 50W)

  36. Conclusion • It can be done (at low cost) • It runs, with good results (service uptime measured by siteuptime.com) www.p2pweb.net hosted by the p2pweb network monitored Since: 9/23/2004 Outages: 40 Total Uptime: 99.560% Downtime/year: 38,5 hours www.afromix.org hosted on a single node monitored Since: 9/23/2004 Outages: 37 Total Uptime: 97.634% Downtime/year: 207,3 hours • Still a lot of improvements Not already an easy to use solution : node admin still require good Unix knowledge • Most important : a new way to design web applications

  37. The Future What we can provide right now P2pweb.net : a global load balancing solution for any distributed web project • Just provide the servers IP addresses and a health check URL Mediaport.net : a Community web hosting solution • We can host various web projects We are looking for Partnerships in the following domains : Packaging an easy and ready to use solution for deploying web mirrors (industrializing the solution) • dedicated LINUX or BSD Distro with preinstalled packages • “all in one” solution : Java CMS + MOM in one webapp application Helping in deploying such solution in Least Developed Countries The P2PWeb Solution fits perfectly for Least Developed Countries with weak bandwidth and low connectivity,

  38. Contacts P2pweb is a SourceForge project (bsd license) www.p2pweb.net or mediaport.sourceforge.net Contacts : about the project : fgaillard@w3architect.com you want to be hosted on mediaport.net : fabrice.gaillard@mediaport.net pierre.genillon@mediaport.net

  39. Questions Thank you • Questions ?

More Related