210 likes | 221 Views
Python at Elemental Security EuroPython - June 29, 2005 Guido van Rossum Elemental Security, Inc. guido@elementalsecurity.com guido@python.org. Elemental Security, Inc. Enterprise security software product: Elemental Compliance System (ECS)
E N D
Python atElemental SecurityEuroPython - June 29, 2005Guido van RossumElemental Security, Inc. guido@elementalsecurity.comguido@python.org
Elemental Security, Inc. • Enterprise security software • product: Elemental Compliance System (ECS) • express, monitor and enforce security policies for any computer connecting to the network (cross-platform) • scored 9.3 in recent InfoWorld Test Center • Startup (no longer in stealth mode!) • C round just closed; 11M led by Lehman Brothers • Using lots of Python (and Java!) • We're always hiring! • See http://www.elementalsecurity.com • Now a real website :-)
ECS Application Structure • One Central Server • Java, J2EE (Tomcat), some Python, Oracle • front-end: rich web UI (JavaScript + XML-RPC) • back-end: agent connector (HTTP+SSL) • Many Agents • Python and C • runs on Windows, Solaris, Linux, ... • main components: • scheduler • server connector • policy engine – I'll get back to this later • packet filter – nearly the only part written in C
Why Does Elemental Use Python? A. Because I'm There :-) B. Python is the best tool for the job • small footprint • runs everywhere (or almost runs :-) • access to platform-specific APIs (e.g. registry) • much of what we do is "script-like" • gather various configuration information about the host • check specific policy rules • this is so important we have a custom language for it! • application changes frequently • we continually learn to understand the problem better • quickly refactor code as needed
ElementClass – a Simpler XML API • Use cases: • exchange data with central server • policies, reports, etc. • persist structured data within agent • policies, schedule, etc. • tool to manage policy definitions (Tkinter UI) • XML an obvious choice • Want better mapping between Python & XML • example: • XML: <schedule start="1" offset="100" /> • Py: sch.start+sch.offset #not int(sch.getattr("start"))
ElementClass – Example Input <group name="PSF"> <employee name="Guido" age="49" /> <employee name="Tim" age="99" /> <employee name="Ben" age="17" /> <employee name="Dan" age="15" /> </group>
ElementClass – Example Code • from xmlparse import ElementClass, String, Integer • class Employee(ElementClass): __element__ = "employee" __attributes__ = {"name": String, "age": Integer} • class Group(ElementClass): __element__ = "group" __attributes__ = {"name": String} __children__ = {Employee: "employees[]"} • group = Group.__parseFile__(filename) • minors = [e for e in group.employees if e.age < 18] • group.employees = minors • f = open(filename, "w"); group.__render__(f); f.close()
Element Class – Example Output <group name="PSF"> <employee age="17" name="Ben" /> <employee age="15" name="Dan" /> </group>
ElementClass – Limitations, Features • No namespace support • attribute names must be Python identifiers • (except '-' mapped to '_') • Can have CDATA or subelements but not both • Subelement choices for #occurrences: • zero or once: Python attribute is None or object • any number: Python attribute is a list, may be empty • Ordering of attributes and subelements is lost • except for relative ordering of similar elements • All attributes and elements are optional • Optionally, can ignore unrecognized attrs/elements
ElementClass – What's Next? • Improve the API a bit? • use lists of tuples instead of dicts for metadata • this allows specifying attribute/subelement ordering • decide what to do with Unicode values • convert to str if ASCII only, or not? • add more attribute data types? • currently String, Integer, Boolean, Timestamp • add Float; what else? enumerations? • add required attributes, subelements? (which API?) • tidy up output (fewer line breaks) • Document it • Contribute it to the PSF in time for Python 2.5! • ESI lawyers to look at PSF Contribution Agreement
Really Hammering The Server • Server scalability requirement: support 4000 agents • Available: a few dozen test machines • How to do server load testing? • Solution 1: run 50 agents on one test machine • test machines overloaded • test machines look too similar • can't quite reach scalability requirement • Solution 2: run 500 synthetic agents on one box • skips work that doesn't affect what the server sees • started out as a private hack, adopted very quickly • full potential not yet reached (next: 20K agents!) • can easily inject additional test data into server
The Approach • Share as much code as possible with real agent • fortunately, most agent code is in library modules • N agent objects, K worker threads (K ≤ N) • 1 scheduler thread • real-time event queue managed using heapq module • main loop sleeps until next event ready • beware: event queue may be updated while sleeping! • distributes events to workers via Queue.Queue • worker main loop: • while True: callable, args = workQueue.get() callable(*args) • callable is typically a bound method of an agent object
The Outcome • Works really well despite its simplicity • didn't have to use asynchronous I/O • Randomized synthetic data sent to server • example: simulate all agents being "nmapped" • Probably bounded by number of threads • can't have too many agents per thread • Inexplicable slow memory leak (not M2Crypto!)
A Policy Implementation Language • ECS is all about policy compliance • each host has a policy compliance score: 0-100% • composed of individual (Boolean) policy rule scores • some (not all) policy rules can also be enforced • So what's a policy rule? Examples: • all passwords must be at least 6 characters • ftpd should be disabled • all email must go through server X • Elemental has a library of 1000+ policy rules • user selects some and deploys to group of hosts • agent gets rule list, executes rules, uploads results • repeat on user-selected schedule (30 min – 7 days)
How To Implement Policy Rules • Requirements: • Cost to add another rule must be low • Some rules are relatively complex programming tasks • Rule authors are security experts, not programmers • Some possibilities: • shell scripts (Titan) • Perl, Python, etc. • XML • custom language
Why Write Another Language • Need a library of policy-checking methods, e.g.: • assert that a file has a specific mode, owner, group • assert that a registry entry has a specific value • parse a configuration file using "name = value" syntax and then check a specific name/value pair • Ideal: constraint-based (declarative) language • execution order doesn't matter • compiler can check for conflicts between rules • Python would be fine if I were writing all the rules • still fairly low-level; risk of using the wrong approach • Compromise: nearly-declarative language • resembles Python except where it doesn't
How Fuel Differs From Python • func has_localhost(host: Host, group: str): bool:for ip in host.gethostgroup(group):if substr(ip, 0, 4) == "127.":return truereturn false • Declarations required; all code is type-checked • interfaces used for library code written in Python • Single-assignment language with immutable values • let var [: type] = expr • Argument defaults computed dynamically • Many Python features left out (e.g. slicing!) • Container types: immutable set and struct • Fuel is not Turing-complete!
Implementing Fuel • Process grammar with pgen • eventually reimplemented pgen in Python • Use tokenize.py for tokenization • Implemented pgen parsing automaton • as-we-go parse tree reduction • Use visitor pattern to translate to Python source • Parse tree node classes have grammar in docstrings • Run-time library in Python • defines some mutable object types
Challenges in Writing Fuel • Not enough users yet to know we're doing it right • yes, we should open-source it! • Main challenge is to keep the language expressive without compromising its declarative nature • Fuel 2.0 will tweak the design quite a bit • host.runscript("userdel", "-r", acct.name) • admission of defeat – but unavoidable some times • Source code organization • linkage between source & hierarchical menu of rules • metadata repeated in source & XML • same rule implemented differently per platform
How We Use Fuel • ~1400 policy rules implemented in Fuel • Written by about 4 people part-time over 1 year • Rules cover Solaris, Linux, Windows (2k+), ... • Rules cover all areas of security: • accounts, network, filesystem, system, hardware, software, packet filter, trust, authentication, logging