480 likes | 584 Views
Have Your Cake and Eat It Too: Cascading Disclosure Control Language. GJXDM Users’ Conference September 6-8, 2006 San Diego, California. Agenda. Background CDCL Overview Rules Authoring Disclosure Concern Abstraction Use With GJXDM & NIEM Questions .
E N D
Have Your Cake and Eat It Too:Cascading Disclosure Control Language GJXDM Users’ Conference September 6-8, 2006 San Diego, California
Agenda • Background • CDCL Overview • Rules Authoring • Disclosure Concern Abstraction • Use With GJXDM & NIEM • Questions
Data From A Disclosure Point of View Equivalent to a complex XML data type • Regardless of actual IEPD, XML Document, etc. Most document instances can be represented hierarchically • Simplifies thinking about the application of disclosure control Equivalent to a simple XML data type A Root XML document
Data To XML From A Disclosure Point of View <MyRootIepdElementj:id=“1"> <Personj:id=“2"> <PersonName> <PersonGivenName>Adam</PersonGivenName> <PersonSurName>Brooks</PersonSurName> </PersonName> <PersonBirthDate>1960-10-07</PersonBirthDate> </Person> </ MyRootIepdElement>
Traditional Approaches • Don’t Share Sensitive Data • Safe, but… Anyone who still isn't convinced that we really should share data is probably at the wrong conference. • Write Restrictions Into Each Database, Application & Exchanges • Works fine at first. • The costs start mounting when the rules change. • The risks to production systems increase whenever code must be modified.
The CDCL Approach • Process each data node individually. • Define rules that match data nodes to recipient users, and specify what kind of disclosure will be permitted. • Separate the rules from the application code. • Define a predictable processing model. • Make the rules easy to sight-read and author. • Accommodate distributed authorship based on "Custodial" roles.
XACML/XRML CDCL Comprehensive solution for Resource Access Disclosure Control only Made for machines Made for humans Requires Programmer time to develop Business Users can sight-read and author Based on "Rights/Access" paradigm Supports multiaxial semantics Implementation-neutral Specific “Gatepoint” implementation model XML-specific Compatible with XML, RDF, RDBMS, LDAP, etc, and with W3C Semantic Web stack. Hasn’t this been done before? • No. • True, effective Permissions & Rights management languages have emerged, • XACML/XRML/etc. • CDCL is complementary, not competing, technology • Addresses a different problem space.
The Basis of CDCL: Data Custodianship • A Custodian is anybody who writes CDCL Rules that someone will pay attention to. • There are two kinds of Custodian: • Primary Custodians can write rules that: • Authorize disclosure of data. • Restrict disclosure of data. • Stakeholder Custodians can only • Restrict data disclosure • Never authorize it. • Primary Custodians are usually identified with the entity that “owns” the data.
The CDCL Project • Ad Hoc: make it happen. Cocktail Napkins instead of White Papers • Public specifications, open standards, cross-platform • Open source reference implementations: parsers, editors and transformers • All content to be licensed under vendor-friendly Open Source licenses • Open to contributions from all interested parties • W3C Semantic Web development/compatibility path • Agile techniques • Community forum/publications at http://wijiscommons.org/cdcl/
Application Gatepoint Application Gatepoint Using The Gatepoint • The Gatepoint can be deployed in two ways: • As a service component in your Enterprise architecture • Or, because it’s a platform-independent specification of behavior • You can build it into your applications, using any language and platform you choose • Roll your own or use an existing implementation
The CDCL Gatepoint • The primary CDCL Processing component is called a Gatepoint. • The Gatepoint provides a single operational service. • The service accepts: • A document (i.e., a structure of information nodes), and • Information about the intended recipient. • The service returns either: • The document, unaltered; • The document with some content redacted; or • A distinguished value (such as an empty document) signifying that nothing in the document may be released to that particular recipient. Gatepoint
Rulesheet Deck Rulesheet Deck Rulesheet Deck Rulesheet Deck Rulesheet Deck How Does CDCL Work? • The Gatepoint consults CDCL Rules assembled, from various sources, into the Rulesheet Deck. Gatepoint
User Context Present Document Rulesheet Deck Rulesheet Deck Rulesheet Deck Rulesheet Deck Rulesheet Deck How Does CDCL Work? • At runtime, the Gatepoint is aware of: • The present document, which is a structured, well-understood collection of individual information items, or datanodes; • The recipient user context, i.e. information about the authenticated User to whom the present document is to be disclosed. Gatepoint
User Context Present Document Rulesheet Deck Rulesheet Deck Rulesheet Deck Rulesheet Deck Rulesheet Deck How Does CDCL Work? • The Gatepoint creates an output document which is to be provided to the recipient user. • One by one, the nodes in the present document are evaluated… • To see whether they can be released to this recipient user. Gatepoint Output Document
User Context Present Document Output Document Rulesheet Deck Rulesheet Deck Rulesheet Deck Rulesheet Deck Rulesheet Deck How Does CDCL Work? • In this manner, the output document is assembled on the fly; it may include all, some, or none of the present document’s content. • It is guaranteed to be compliant with all of the rules in the Rulesheet Deck, and can be safely released to the recipient user. Gatepoint
How Does the Gatepoint Evaluate a Node? • It “deals” itself a hand from the deck: • For the present item (i.e., the node being evaluated) • Selects only those rules which are applicable: • Checks whether the present item matches the rule’s nodeset specification. • Checks whether the recipient user’s user context matches the rule’s userset specification. • If the answer to both of those is “yes”, the rule is added to the Hand.
A reliable, deterministic resolution of conflicting directives How Does the Gatepoint Evaluate a Node? • Resolves any conflicts between rules • Probably, the rules in the hand will specify some different outcomes. Those outcomes are then resolved by the Cascade. • Cascade • Not a sequence of waterfalls • Not a popular brand of dish soap. • Cascade is cribbed from the W3C’s “Cascading Style Sheets” activity. It means:
The Structure of a CDCL Rule • A CDCL Rule consists of three parts: • A Userset specification • A Nodeset specification • An Outcome • In general, you can think of it as a simple imperative statement: "If these users want to see these data items, I want this outcome to happen." • Outcomes are things like: • Disclose the info • Withhold/Redact • Deny knowledge of the info
Avoiding the “Lost In Translation” Effect • Policymakers should be able to read & write the rules • Should minimize programmer time spent understanding and implementing rules • Want to be able to react to rule changes quickly • Third parties (the average Joe) should be able to review and understand the rules One of the hardest problems for non-programmers is Boolean logic
Booliette Notation • Userset and Nodeset specifications are written in a special notation called “Booliette”. • Booliette expresses Boolean logic as nested bullet-point lists. • What’s a bullet-point list? Ha Ha just kidding. • Example: is here for humor. Higher order ANSI characters are not part of the permitted Booliettecharacter set * exactly-one-true: * my job is awesome * all-must-be-true: * my job is adequate * my job sends me to San Diego! * my job pays the big bucks * I am buffing The Resume. Ya hey!
Authoring a CDCL Rulesheet • step 0: Determine policy and write it down. • Authorized Policymaker • step 1: Rephrase policy statements as empty CDCL Rules. • Business Analyst • step 2: Fill Rules with an outcome specification, & logically exact implementation assertions. • Business Analyst • step 3: Write technical statements that test assertions at runtime. • Business Analyst/Developer • The following examples of this are from a lengthier demonstration at wijiscommons.org/
Step 0: Write Policy rule D Disclose all sentence information either to members of all Wisconsin Corrections roles or to members of all Wisconsin Courts roles or, as long as the prisoner's entry date is more than 30 days ago, to anyone.
Step 1: Rewrite as Empty CDCL Rule # Disclose all sentence information either to members of # all Wisconsin Corrections roles or to members of # all Wisconsin Courts roles rule id = {D1}
Step 2: Fill in outcome and assertions # Disclose all sentence information either to members of # all Wisconsin Corrections roles or to members of # all Wisconsin Courts roles rule id = {D1} apply-outcome:{disclose} for-items: * plain [sentence info] for-any-user-like-this: * at-least-one-of-these-true: * plain [Wisconsin Corrections user] * plain [Wisconsin Courts user]
Step 3: Apply Technical Content # Disclose all sentence information either to members of # all Wisconsin Corrections roles or to members of # all Wisconsin Courts roles rule id = {D1} apply-outcome:{disclose} for-items: * plain [sentence info] * presentitem-or-parent described-by xpath [//Prisoner/Sentence] for-any-user-like-this: * at-least-one-of-these-true: * plain [Wisconsin Corrections user] * recipientuser in ldap [dir.wi-doc.com/o=doc.wi.us?memberOf(ou=correctionalroles)] * plain [Wisconsin Courts user] * recipientuser in ldap [dir.wicourts.gov/o=wicourts.gov]
Bareknuckle Rules • When you write your Rule assertions about specific, even idiosyncratic features of Data Nodes, Documents, and recipient User Contexts, it's called Bareknuckle authoring. • Very powerful • Sometimes necessary • In general not a good thing.
Bareknuckle Illustration • As a general rule, references specific characteristics of a specific target document • type: e.g. an XPath expression statement like • "value-of(/../@juvenile) = 'true'“ Tight coupling between rules & data item. Changing data structure changes force rules to change
Helping Authors Avoid Bareknuckle Rules • Disclosure Concerns Abstraction Layer (DiCAL) • Reduce the quantity of rule mappings • Decouple the rules from specific data • Metadata Extensions • Improve the quality of the mappings • By Such Metrics As • Clarity • Generality • Economy • Stability
What is a DiCAL? • A Disclosure Concern is a general concept, or abstraction, of the kind of thing you're thinking about when you write a Rule. Examples: • "Juvenile" • "Personally Identifiable Information" • "Open Investigation" • The DiCAL is the combination of two things: • A shared abstract definition of the Concern publicly posted, identified by a URI. • A Bareknuckle-type mapping between the identified Concern and the characteristics of a particular Document type, canonized and published. • Any Rule that uses the Bareknuckle approach to address the mapped Document type can substitute a simple reference to the Disclosure Concern URI. • At runtime, the Gatepoint dereferences the URI to the appropriate mapping for the Present Document, and applies that.
DiCAL Abstraction Logic Example: Step 0 • Hypothetical rule like "Disclose all Inmate Medical Info to Corrections Medical Staff“ • Could reasonably lead to finished bareknuckle rule below: # Disclose all prisoner medical information to any # Corrections medical staff rule id = {H} apply-outcome:{disclose} for-items: * any-of: * plain [corrections drug dispensary info] * presentitem-or-parent described-by xpath [//Prison/DrugDispensary] * plain [subject medical history] * presentitem-or-parent described-by xpath [//Subject/MedicalHistory] for-any-user-like-this: * at-least-one-of-these-true: * plain [Wisconsin Corrections Medical user] * recipientuser in ldap [dir.wi-doc.com/o=doc.wi.us?memberOf(cn=correctionsmedical)]
DiCAL Abstraction Logic Example: Step 1 • Need to define a useful abstraction. "Medical Info"? "Inmate Medical Info"? "Corrections Sensitive Data"? • For this example, "Inmate Medical Info". • Assign URI "http://wi-doc.com/dical/concern/InmateMedicalInfo/" • Place mapping into an official mapfile published at a common location: <mapping> <concern>http://wi-doc.com/dical/concern/InmateMedicalInfo/</concern> <cdcl> * any-of: * plain [corrections drug dispensary info] * presentitem-or-parent described-by xpath [//Prison/DrugDispensary] * plain [subject medical history] * presentitem-or-parent described-by xpath [//Subject/MedicalHistory] </cdcl> </mapping>
DiCAL Abstraction Logic Example: Step 2 • Rule from Step 0 can be rewritten to a simpler & more robust form • Instead of the nodeset specification: for-items: * any-of: * plain [corrections drug dispensary info] * presentitem-or-parent described-by xpath [//Prison/DrugDispensary] * plain [subject medical history] * presentitem-or-parent described-by xpath [//Subject/MedicalHistory] Can write for-items: * dical [http://wi-doc.com/dical/concern/InmateMedicalInfo/] And achieve same result
Metadata Runtime Associations A representation to be able to access the specific data item Evaluates every data item & applies rules The URI defining a concrete definition of a disclosure concern The link between the concrete data and the URI The representation of the user A link to the primary custodian of this data
GJXDM/NIEM • Both GJXDM 3.1 beta1 & NIEM 1.0 beta 2 have mechanisms for supporting classification, location, & custodian metadata needed for CDCL • Departs from the traditional GJXDM representation of metadata as attributes • Linkage between the two defined in an XML instance
Simple Metadata Instance Example <Persons:metadata="M1 M2"> <PersonName> <PersonGivenName>Adam</PersonGivenName> <PersonSurName>Brooks</PersonSurName> </PersonName> <PersonBirthDate>1960-10-07</PersonBirthDate> </Person> <Metadatas:id="M1"> <ReportedDate>2005-08-01</ReportedDate> </Metadata> <my:Metadatas:id="M2"> <my:DatabaseID>2829019291</my:DatabaseID> </my:Metadata>
GJXDM/NIEM Extensions • The extensions to support the metadata types needed turn out to be relatively simple
Simple Instance Example <Persons:metadata="M1 M2"> <PersonName> <PersonGivenName>Adam</PersonGivenName> <PersonSurName>Brooks</PersonSurName> </PersonName> <PersonBirthDate>1960-10-07</PersonBirthDate> </Person> <my:ClassificationMetadatas:id="M1"> <my:ConcernReferenceURI>http://etc..</my:ConcernReferenceURI> <my:LocationalMetadatas:id="M2"> <my:AbsoluteLocation>/../@Person</my:AbsoluteLocation> </my:LocationalMetadata> </my:ClassificationMetadata> WARNING: None of these values were made explicitly correct
Applying Metadata • GJXDM/NIEM extensions needed to support metadata are defined in their own namespace. • Actual DiCAL mappings implemented using extensions can be implemented in many ways • Also contained in own namespace • Managed independently from IEPDs • Mapping associates • IEPD data node types to Metadata definition • Actual definition of Metadata instances to be used in construction of IEPD instances
To Be Continued... • The work is far from complete • Still need to research the following: • We need your help • You need to VOICE your opinion • Need to solidify • Lexical Analysis for Booliette • Lexical Analysis for CDCL • Solid list of keywords • XMLCode form • CDCL -> XMLCode parser • Mathematical proof of Booliette • Formal Gatepoint specification • Gatepoint Reference implementation
To Be Continued... • The work is far from complete
Questions? • Bill Blondeau whblondeau@yahoo.com bill.blondeau@wisconsin.gov • Joe Mierwa jjmierwa@visionair.com • Chelle Uecker cuecker@occourts.org
DiCAL Overview • Nodeset specification implicitly references an abstraction in the reasoning of the Rules Author: • In this case, the abstraction would be easy to identify, and characterize as "juvenile data". • The point is, the author is setting disclosure policy • not about the specific characteristics of the document • but about juvenile data as a concept. • The document characteristics are only a means to an end • the rules author is establishing (again *implicitly*) a mapping between • those characteristics of that document and • the implicitly defined abstraction that the author would probably describe as "juvenile data"
DiCAL - Step One • Take that implicit abstraction and make it explicit, • With an unambiguous name and well-understood semantics. • Done by defining a URI for the abstraction of "juvenile" data, and writing its defining characteristics down somewhere that's publicly accessible. • Some abstractions may require no more than a sentence or two to sufficiently describe the definition to everybody's satisfaction • Other abstractions might have their own entire dedicated websites • There may be multiple competing or complementary definitions for something that we would reasonably expect to see as a single abstraction; and there may be jurisdiction-specific definitions as well, each with its own URI. • Regardless of process, end result is that every rules author should be able to choose specific definitions for desired abstractions, & be confident that they are well-understood by all players* • End result is that implicit abstraction is now explicit.
DiCAL - Step Two • Make the implicit mapping explicit. • Explicitly associate our earlier XPath statement • (the one about the specific structure and semantics of that particular target document) with the URI of the abstraction. • Thus, the following maps as an explicit assertion http://wijiscommons.org/dcal/juvenileData/ value-of(/../@juvenile) = 'true'
DiCAL - Step Three • Move mapping out of individual rulesheet (where it can only be of use to its single containing rule) & place it into a public repository of such mappings • Or, for clarity, declare character string "juvenile" as an alias for full URI • Very simple & understandable • Makes the rule much more powerful • Now about the abstraction, not the implementation of a particular document. • Makes Nodeset applicable to any document format anywhere that has an acknowledged mapping to the abstraction identified by http://wijiscommons.org/dcal/juvenileData/ dical[http://wijiscommons.org/dcal/juvenileData/] dical[juvenile]