560 likes | 567 Views
Explore the underrated importance of data modeling in various domains, its relevance in the back-to-basics trend, and its critical role in effective application development, regulatory compliance, and the exploitation of emerging technologies.
E N D
Data Modeling is Underrated:A Bright Future Ahead in the Grand Schema Things Peter O’Kelly Research Director pokelly@burtongroup.com www.burtongroup.com pbokelly.blogspot.com Thursday – November 30, 2006
Data Modeling is Underrated • Agenda • Synopsis • ~7-minute summary • Discussion • Extended-play overview (for reference) • Analysis • Market snapshot • Market trends • Market impact • Recommendations
Data Modeling is Underrated • Synopsis • Data modeling used to be seen primarily as part of database analysis and design -- for DBMS-nerds only • There is now growing appreciation for the value of logical data modeling in many domains, both technical and non-technical • Historically, most data modeling techniques and tools have been inadequate, and often focused more on implementation details than logical analysis and design • Pervasive use of XML and broader exploitation of metadata, along with improved techniques and tools, is making data modeling more useful for all information workers (as well as data-nerds) • Data modeling is a critical success factor for XML – in SOA and elsewhere • Data modeling is now • A fundamental part of the back-to-basics trend in application development • Key to effective exploitation of emerging applications and tools • Essential to regulatory compliance (e.g., information disclosure tracking)
Data Modeling is Underrated • ~7-minute summary • Logical data modeling is often misunderstood and underrated • Models of real-world things (entities), attributes, relationships, and identifiers • Logical => technology-independent (not implementation models) • Logical data modeling is not 1:1 with relational database design • It’s as much about building contextual consensus among people as it is capturing model design for software systems • It’s also exceptionally useful for database design, however • Some of the historical issues • Costly, complex, and cumbersome tools/techniques • Disproportionate focus on physical database design
Data Modeling is Underrated • ~7-minute summary • Logical data modeling is more relevant than ever before • Entities, attributes, relationships, and identifiers • None of the above are optional if you seek to • Respect and accommodate real-world complexity • Establish robust, shared context with other people • Revenge of the DBMS nerds • Not just for normalized “number-crunching” anymore… • Native DBMS XML data model management => fundamental changes • XQuery: relational calculus for XML • SQL and XQuery have very strong synergy • All of the capabilities that made DBMS useful in the first place apply to XML as well as traditional database models • DBMS price/performance and other equations have radically improved • Logical modeling tools/techniques are more powerful and intuitive • And less expensive
Data Modeling is Underrated • ~7-minute summary • XML-based models are useful but insufficient • Document-centric meta-meta-models are not substitutes for techniques based on entities, attributes, relationships, and identifiers • Some XML-centric techniques have a lot in common with pre-relational data model types (hierarchical and network navigation) or mutant “object database” models • XML also unfortunately has ambiguous aspects like the unfortunate “Entity-Relationship” (E-R) model • Logical data modeling is not ideal for document-oriented scenarios (involving narrative, hierarchy, and sequence; optimized for human comprehension) • But a very large percentage of XML today is data-centric rather than document-centric • And increasingly pervasive beyond-the-basics hypertext (with compound and interactive document models) is often more data- than document-centric
Data Modeling is Underrated • ~7-minute summary • Ontology is necessary but insufficient • Categorization is obviously a useful organizing construct • “Folksonomies” are also often very effective • But… • Categorization is just one facet of modeling • Many related techniques are conducive to insufficient model detail, creating ambiguity and unnecessary complexity, e.g., for model mapping • So… • We’re now seeing microformats and other new words • … that are fundamentally focused on logical data model concepts • It’d be a lot simpler and more effective to start with logical data models in the first place
Data Modeling is Underrated • Discussion
[Extended-play version] Analysis • Market snapshot • Data modeling concepts • Data modeling benefits • Data modeling in the broader analysis/design landscape • Why data modeling hasn’t been used more pervasively
Market Snapshot • Data modeling concepts: the joy of sets • Core concepts • Entity: a type of real-world thing of interest • Anything about which we wish to capture descriptions • More precisely, an entity is an arbitrarily defined but mutually agreed upon classification of things in the real world • Examples: customer, report, reservation, purchase • Attribute: a descriptor (characteristic) of an entity • A customer entity, for example, is likely to have attributes including customer name, address, … • Relationship: a bidirectional connection between two entities • Composed of two links, each with a link label/descriptor • Example: customer has dialogue; dialogue of customer • Identifier: one or more descriptors (attributes and/or relationship links) that together uniquely identify entity instances, e.g., CustomerID
Market Snapshot • Data modeling concepts: example data model fragment diagram Entities Identifiers Attributes Attributes Relationship • Following Carlis/Maguire (from their data modeling book): • About each customer, we can remember its name, industry, address, renewal data, and ID. Each customer is identified by its ID. • About each dialogue, we can remember its customer and its date, topic, and analyst. Each dialogue is identified by its customer and its date. [Note: this model fragment is an example and is not very well-formed]
Market Snapshot • Data modeling concepts: example data model instance Customer CustomerID (PK1) CustomerName CustomerIndustry CustomerAddress CustomerRenewalDate 017823 Acme Widgets Manufacturing 123 Main Street… 2005/10/14 75912 NewBank.com Financial services 456 Central… 2006/05/28 91641 Degrees 4U Education P.O. Box 1642… 2004/12/31 Dialogue PKn: participates in primary key CustomerID (PK1, FK1) DialogueDate (PK1) DialogueTopic DialogueAnalyst FKn: participates in foreign key 75912 2005/06/18 Data architecture Peter O’Kelly Bonus: it’s very simple to create instance models (and thus relational database designs) from well-formed logical data models 91641 2003/12/13 SIP/SIMPLE Mike Gotta 017823 2004/10/14 Portal Craig Roth
Market Snapshot • Data modeling benefits • Precision and consistency • High fidelity models • Which are easier to maintain in order to reflect real-world changes • Improved • Ability to analyze, visualize, communicate, collaborate, and build consensus • Potential for data reuse • A fundamental DBMS goal • Easier to recognize common shapes and patterns • Impact analysis (e.g., “what if” assessments for proposed changes) • Exploitation of tools, servers, and services • DBMSs and modern design tools/services assume well-formed data models • “Being normal is not enough”… • SOA, defined in terms of schemas, requires data model consensus
Market Snapshot • Data modeling in the broader analysis/design landscape • Four dimensions to consider • Data, process, and events • Roles/concerns/views: strategic, operational, and technology • Logical and physical • Current/as-is and goal/to-be states
Market Snapshot • Data, process, and events • Think of nouns, verbs, and state transitions • Data: describes structure and state at a given point in time • Process: algorithm for accomplishing work and state changes • Event: trigger for data change and/or other action execution • Integrated models are critically important • Data modeling, for example, is guided by process and event analyses • Otherwise scope creep is likely • There is no clear right/wrong in data modeling • Scope and detail are determined by the processes and events you wish to support, and they often change over time
Market Snapshot • Roles/concerns/views • Three key dimensions • Strategic • Organization mission, vision, goals, and strategy • Operational • Data, process, and event details to support the strategic view • Technology • Systems (applications, databases, and services) to execute operations • Again pivotal to maintain integrated models • Data modeling that’s not guided by higher-level goal modeling can suffer from scope creep and become an academic exercise
Market Snapshot • Logical and physical • Another take on operational/technology • Logical: technology-independent data, process, and event models • Examples: • Entity-Relationship (ER) diagram • Data flow diagram (process model) • Physical: logical models defined in software • (Doesn’t imply illogical…) • Examples • Data definition language statements for database definition, including details such as indexing and table space management for performance and fault tolerance • Class and program modules in a specific programming language • Integration and alignment between logical and physical are key • But are often far from ideal, in practice today
Market Snapshot • Current/as-is and goal/to-be states • Combining as-is/to-be states and logical/physical Current/as-is Goal state/to-be Real-world model unconstrained by current systems Technology-independent view of current systems Logical New system view with high-fidelity mapping to logical goal state Systems already in place; the stuff we need to live with… Physical
Market Snapshot • Why data modeling hasn’t been used more pervasively • So, why isn’t everybody doing this?... • Data modeling is hard work • Historically • Disproportionate focus on physical modeling • Inadequate techniques and tools • Suboptimal “burden of knowledge” distribution • Reduced “green field” application development • Data modeling has a mixed reputation
Market Snapshot • Data modeling is hard work • It’s straightforward to read well-formed data models, but it’s often very difficult to create them • Key challenges • Capturing and accommodating real-world complexity • Dealing with existing applications and systems • Organizational issues • Collaboration and consensus-building • Role definitions and incentive systems that discourage designing for reuse and working with other project teams • Politics
Market Snapshot • Historically disproportionate focus on physical modeling • Radical IT economic model shifts during recent years • Design used to be optimized for scarce computing resources including MIPs, disk space, and network bandwidth • The “Y2K crisis” is a classic example of the consequences of placing too much emphasis on physical modeling-related constraints • Relatively stand-alone systems discouraged designing for reuse • Now • Applications are increasingly integrated, e.g., SOA • Hardware and networking resources are abundant and inexpensive • The ability to flexibly accommodate real-world changes is mission-critical • Logical modeling is more important than ever before
Market Snapshot • Historically inadequate techniques and tools • Tendency to focus on physical, often product-specific (e.g., PeopleSoft or SAP) models • Lack of robust repository offerings • Making it very difficult to discover, explore, and share/reuse models • Entity-Relationship (ER) “model” • More of an ambiguous and incomplete diagramming technique, but still the de facto standard for data modeling
Market Snapshot • Tangent: ER, what’s the matter? • Entity Relationship deficiencies • Per E. F. Codd [1990] • “Only the structural aspects were described; neither the operators upon those structures nor the integrity constraints were discussed. Therefore, it was not a data model • The distinction between entities and relationships was not, and is still not, precisely defined. Consequently, one person’s entity is another person’s relationship. • Even if this distinction had been precisely defined, it would have added complexity without adding power.” • Source: Codd, The Relational Model for Database Management, Version 2
Market Snapshot • Tangent: ER, what’s the matter? • Many vendors have addressed some original ER limitations, but the fact that ER is ambiguous and incomplete has led to considerable problems • The Logical Data Structure (LDS) technique is much more consistent and concise, but it’s only supported by one tool vendor (Grandite) • It’s possible to use the ER-based features in many tools in an LDS-centric approach, however • Ultimately, diagramming techniques are simply views atop an underlying meta-meta model • The most useful tools now include • Well designed and integrated meta-meta models • Options for multiple view types, including data, process, and event logical views, as well as assorted physical views
Market Snapshot • Historically inadequate techniques and tools • Unfortunate detours such as overzealous object-oriented analysis and design • Class modeling is not a substitute for data modeling • “Everything is an object” and system-assigned identifiers often mean insufficient specificity and endless refactoring • Fine to capture entity behaviors and to highlight generalization, but you still need to be rigorous about entities, attributes, relationships, and identifiers • No “Dummie’s Guide to Logical Data Modeling” • E.g., normalization: a useful set of heuristics for assessing and fixing poorly-formed data models • But there has been a shortage of useful resources for people who seek to develop data modeling skills – in order to create well-formed data models in the first place • Result: often intimidating levels of complexity…
Market Snapshot Historically inadequate tools and techniques An Object Role Modeling (ORM) example Consistent and concise But also overwhelming Doesn’t scale well for more complex modeling domains Useful for some designers But not as useful for collaborative modeling with subject matter experts who don’t seek to master the technique Source: http://www.orm.net/pdf/ORMwhitePaper.pdf
Market Snapshot • Historically suboptimal “burden of knowledge” distribution • Following Carlis: knowledge is generally captured in three places • Resource managers/systems such as DBMSs • Applications/programs • People’s heads • Universally-applicable data, process, and event details are ideally captured in DBMSs • Applications can be circumvented and are often cruelly complex • People come and go (and take their knowledge with them) • But in recent years, DBMSs have been relegated to reduced roles • Suboptimal in many data modeling-related respects • Often meant inappropriate distribution of the burden of knowledge • DBMSs (and thus data modeling) are now resurgent, however
Market Snapshot • Reduced “green field” application development • Following the enterprise shift toward purchased-and-customized applications such as ERP and CRM • Start with models supplied by vendor • Usually with major penalties for extensive customization • So we often see enterprises changing their operations to match purchased applications instead of the other way around • In many cases, packaged applications • Follow least common denominator approaches in order to support multiple DBMS types • Capture universally-applicable data/process/event model facets at the application tier instead of in DBMSs • Far from ideal distribution of the burden of knowledge • Trade off increased complexity for increased generality • Good for application vendors; not always so good for customer organizations • Overall, this has often resulted in • Reduced incentives and utility for data modeling • Many organizations deferring to application suppliers for data models, often with undesirable results such as “lock-in” and endless consulting
Market Snapshot • Recap: data modeling has a mixed reputation • Because of the historical challenges • The return on data modeling time investment has been far from ideal because of • Lack of best practices, techniques, and tools • Environmental dimensions that reduced the utility of data modeling • Many enterprise data modeling projects became IT full-employment acts • With endless scope creep, unclear milestones, completion criteria, and return on investment • As a result, enterprise data modeling endeavors have become scarcer during recent years, with the relentless IT focus on ROI and TCO • Obviously an untenable situation • Both IT people and information workers are increasingly making decisions when they literally don’t know what they’re talking about, due to the lack of high quality and fidelity data models
Analysis • Market trends • Back to data basics • Broader and deeper data modeling applicability • Availability of more and better data models • Simpler and more effective techniques and tools • Increasing data modeling utility, requirements, and risks
Market Trends • Back to data basics • Growing appreciation for • The reality that all bets are off if you’re not confident you have established consensus about goals, nouns, verbs, and events • Software development life cycle economic realities • It’s much more disruptive and expensive to correct models as you go through analysis, design, implementation, and maintenance phases • Less expensive hardware and networking means the return on time investment for logical modeling is increasing while the return for physical modeling is decreasing • Indeed, emerging model-driven tools increasingly make it possible for the logical model to serve as the application specification, with penalties for developers who insist on endlessly tweaking the generated physical models (code)
Market Trends • Broader data and deeper modeling applicability • SOA is one of the most significant data modeling-related development during recent years • All about services, but with a deep data model prerequisite • Don Box: services share schemas and contract, not class • From a DBMS-centric world view, web services => pragmatic XML evolution • Parameterized queries, as in DBMS stored procedures • Structured and grouped query results • SOA has also driven the need for web services repository (WSR) products • Increasingly powerful tools for information workers have also expanded the applicability of data modeling • An early example: Business Objects – focused on making data useful for more people through data model abstractions • Similar capabilities are now available throughout products such as Microsoft Office • Recent developments such as XQuery will dramatically advance the scope and power of applied set theory
Market Trends • Availability of more and better models • Resources such as books focused on the topic area, e.g., Carlis/Maguire and David Hay’s Data Model Patterns • Products that include expansive data models, ranging from ERP to recent data model-focused offerings such as • NCR Teradata’s logical data model-based solutions • “Universal model” resources from enterprise architecture tool vendors such as Visible Systems • Based on decades of in-market enterprise modeling experience
Market Trends • Availability of more and better models • Standards groups and initiatives, such as • ACCORD • Open Application Group • OASIS Universal Business Language • Models developed by enterprises and government agencies, e.g., • Canada’s Integrated Justice Information (IJI) initiative • Provides a data model and context framework for all aspects of law enforcement • No magic: a multi-year effort with pragmatic hard work and governance • Similar initiatives are now under way in the United States and other countries
Market Trends • Simpler and more effective techniques and tools • Most now include • Cleaner separation of concerns and more intuitive user experiences • For data modeling: ER subsets/refinements that reduce ambiguity and notational complexity • And support view preferences with variable levels of detail • Integrated meta-meta models and unified repositories • Supporting enterprise architecture models such as the Zachman Framework as navigational guides • Although there’s still a perplexing lack of repository-related standards
Market Trends • Data modeling in the enterprise architecture landscape • Relative to the Zachman Framework Source: http://www.zifa.com/
Market Trends • Simpler and more effective techniques and tools • Most now include (continued) • Model-driven analysis and design tools • Building on virtualization and application frameworks with declarative services for transactions, security, and more • Even more incentive to focus more on logical models and less on physical models • More powerful and robust forward- and reverse-engineering capabilities • To transform physical => logical as well as logical => physical • Many are also available at much lower cost • And some open source modeling tools have emerged
Market Trends • Increasing data modeling utility, requirements, and risks • To recap: much more utility from effective data modeling • Related trends and risks • Regulatory compliance requirements, especially concerning information disclosure • Impossible to track what’s been disclosed (both by and to whom) if you don’t know what you’re managing and who has access to it • Increasing demand for reverse-engineering tools in order to better understand existing systems and interactions • “Cognitive overreach” – the potential for information workers to create nonsensical queries based on poorly-designed data models • The queries will often execute and return arbitrary results • With which people will make equally arbitrary business decisions
Analysis • Market impact • Pervasive data modeling and model-driven analysis/design • Vendor consolidation and superplatform alignment • Potentially disruptive market dynamics
Market Impact • Pervasive data modeling and model-driven analysis/design • No longer optional (never really was) • Most of today’s software products assume effective data modeling • Using a DBMS or an abstraction layer such as Microsoft’s ADO.NET with poorly-designed data models results in significant penalties • Often implicit, e.g., in • Information worker-oriented tools such as the query and data manipulation tools included in Microsoft Office • Not a recent development – e.g., consider > $1B annual market for products such as Apple Filemaker Pro and Microsoft Access – but rapidly expanding • Future offerings such Microsoft Vista and Microsoft Office 2007, which are deeply data model- and schema-based • For documents, messages, calendar entries, and more, all with extensible schemas and tools for direct information worker metadata manipulation actions
Market Impact • Vendor consolidation and superplatform alignment • A familiar pattern –commoditization, standardization, and consolidation, resulting in • Significant merger/acquisition activity • Shifting product categories, in this context including • Specialized/focused modeling tools • Including widely-used products such as Microsoft Visio • Enterprise architecture/application lifecycle management tool suites • Essentially CASE++, with more and better integrated tools, deeper standards support, and often with support for strategic views • Examples: Borland, Embarcadero, Grandite, Telelogic, Visible • Superplatform-aligned tool suites • IBM, Microsoft, and Oracle, for example, all either now or plan to soon offer end-to-end model-driven tool suites • IBM currently has a significant market lead, through its Rational acquisition • Broader support for interoperability-focused standards initiatives such as XMI (OMG’s XML Metadata Interchange specification)
Market Impact • Vendor consolidation and superplatform alignment SDP S-Designor PowerSoft Sybase PowerDesigner Visio Microsoft Popkin Telelogic Some CASE and modeling tool vendor merger/acquisition activity TogetherSoft Borland Rational IBM
Market Impact • Potentially disruptive market dynamics • Opportunities for new or refocused entrants, e.g., • Adobe: a potential leader in WSR following its acquisition of Yellow Dragon Software • Adobe doesn’t offer data modeling tools, but it has a broad suite of tools that exploit XML and data models • The urgent need for WSR products could result in SOA-centric repository offerings expanding to encompass more traditional repository needs as well • Altova: expanding into UML modeling from its XML mapping/modeling franchise • Microsoft: Visual Studio Team System (VSTS) is Microsoft’s first direct foray in modeling tools • It used to offer Visual Modeler, an little-used OEM’d version of Rational Rose • VSTS won’t initially include data modeling tools, but they are part of the plan for future releases • MySQL AB: acquired an open source data modeling tool (DBDesigner 4) and is preparing to reintroduce an expanded version (which will remain open source)
Market Impact • Potentially disruptive market dynamics • New challenges for UML, with significant implications • UML is the most widely-used set of diagramming techniques today, but it’s not particularly useful for data modeling, and it has some ambiguities and limitations • Microsoft and some other vendors believe domain-specific languages (DSLs) are more effective than UML for many needs • If UML falters, vendors that have placed strategic bets on UML (such as Borland, IBM, and Oracle) will face major challenges • Open source modeling initiatives • Some examples • Argo UML • MySQL’s future Workbench tool set • MyEclipse: $29.95 annual subscription for multifaceted tools with modeling • These initiatives will accelerate modeling tool commoditization and standardization
Market Impact • The U in UML stands for “unified,” not “universal” • UML is in some ways ambiguous and is not a substitute for data modeling • Some tools include UML profiles for data modeling, however • UML profiles are similar to domain specific languages in many respects • It’s not clear that UML is ideal for meta-meta-meta models • UML represents unification of three leading diagramming techniques, but it’s not universally applicable • UML is much better than not using any modeling/diagramming tools, but it’s not a panacea • Although it’s getting more expressive and consistent, with UML v2
Analysis • Recommendations • Think and work in models • Build and use model repositories • Create high-fidelity modeling abstractions for SOA • Revisit modeling tool vendor assumptions and alternatives • Respect and accommodate inherent complexity
Recommendations • Think and work in models • Develop skills and experience in • Thinking at the type level of abstraction • Using set-oriented query tools/services • Data modeling utility now extends far beyond database analysis and design • Information workers who have effective data modeling skills will be much more productive • Use data modeling to analyze, visualize, communicate, and collaborate • Provide guidance in • Data modeling training and tools • Selecting appropriate tools • Don’t use ambiguous or incomplete diagramming techniques • Making resources available in models
Recommendations • Build and use model repositories • Do not • Needlessly recreate/reinvent models • Default to exclusively extrapolating models from existing XML schemas or query results • Reality check: that’s how most XML-oriented modeling is done today, but it often propagates suboptimal designs and limits reuse • This may seem familiar: it repeats an early DBMS pattern, when many developers simply moved eariler file designs into DBMSs rather than checking design assumptions/goals • Ensure policies and incentive systems are in place to encourage and reward model sharing via repositories • Add to data governance strategy
Recommendations • Create high-fidelity modeling abstractions for SOA • SOA is rapidly becoming a primary means of facilitating inter-application integration • Robust SOA schema design entails abstraction layers • Exposing public interfaces to private systems otherwise often means propagating suboptimal data model design decisions • Sharing services with users whom you may never actually meet • Making unambiguous and robust models more important than ever • WSR is likely to become a key part of enterprise model repository strategy • Encompassing contexts and models that aren’t exclusively SOA-focused