1 / 53

제 2 장

제 2 장. 데이터 웨어하우스 : 구성 블록 Data Warehouse: The Building Blocks. Chapter Objectives. Review formal definition of a data warehouse Discuss the defining features Distinguish between data warehouses and data marts Study each component or building block that makes up a data warehouse

valiant
Download Presentation

제 2 장

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 제 2 장 데이터 웨어하우스: 구성 블록 Data Warehouse: The Building Blocks Data Warehousing

  2. Chapter Objectives • Review formal definition of a data warehouse • Discuss the defining features • Distinguish between data warehouses and data marts • Study each component or building block that makes up a data warehouse • Introduce metadata and highlight its significance Data Warehousing

  3. Data Warehouse • Information delivery system • Integrate and transform enterprise data into information • suitable for strategic decision making • Take all the historic data from the various operational systems • Combine this internal data with any relevant data from outside sources • Pull them together Data Warehousing

  4. Set up information delivery system • Need different components or building blocks • Arranged together in the most optimal way • Arranged in a suitable architecture Data Warehousing

  5. Bill Inmon’s Definition of DW • The father of Data Warehouse • “A Data Warehouse is a subject oriented, integrated, nonvolatile, and time variant collection of data in support of management’s decisions.” Data Warehousing

  6. Sean Kelly • Another leading data warehouse practitioner • The data in the data warehouse is: • 분리되고 Separate • 이용가능하고 Available • 통합되고 Integrated • 시간을 새겨넣은 Time stamped • 주제 중심의 Subject oriented • 비휘발성의 Nonvolatile • 접근가능한 Accessible Data Warehousing

  7. DEFINING FEATURES • What about the nature of the data in the data warehouse? • How is this data different from the data in any operational system? • Why does it have to be different? • How is the data content in the data warehouse used? Data Warehousing

  8. DEFINING FEATURES • Some of Key Defining Features of the Data Warehouse • Subject-Oriented • Integrated Data • Time-Variant Data • Nonvolatile Data • 데이터 구체화정도 Data Granularity Data Warehousing

  9. Subject-Oriented Data • Data is stored by subjects, not by applications • The subjects are critical for the enterprise • Sales, shipments and inventory for a manufacturing company • Figure 2-1 • There is no application flavor • The data in a data warehouse cut across applications Data Warehousing

  10. Data Warehousing

  11. Integrated Data • Need to pull together all the relevant data from the various systems • Data from internal operational systems • Data from outside sources • Before the data can be stored in a DW, • Remove the inconsistencies • Standardize the various data elements • Go through a process of transformation, consolidation, and integration of the source data Data Warehousing

  12. Data Warehousing

  13. Standardization • Some of the items that would need standardization: • Naming Conventions • Codes • Data attributes • Measurements Data Warehousing

  14. Time-Variant Data • For an operational system, • the stored data contains the current values • The data in the data warehouse is meant for analysis and decision making. • The use needs data not only about the current purchase, but on the past purchases. • A data warehouse has to contain historical data, not just current values. Data Warehousing

  15. Time-variant nature • The time-variant nature of the data • Allows for analysis of the past • Relates information to the present • Enables forecasts for the future Data Warehousing

  16. Nonvolatile Date • The data in the data warehouse is not intended to run the day-to-day business. • You do not update the data warehouse every time you process a single order. • Data from the operational systems are moved into the data warehouse at specific intervals. • Figure 2-3, not update Data Warehousing

  17. Data Granularity • The analysis begins at a high level and moves down to lower levels of detail • Start by looking at summary data • Look at the breakdown • Data granularity in a data warehouse refers to the level of detail • The lower the level of detail, the finer the data granularity • The lowest level of detail  a lot of data in the data warehouse Data Warehousing

  18. Data Warehousing

  19. DATA WAREHOUSES AND DATA MARTS • In 1998, Bill Inmon stated, “The single most important issue facing the IT manager this year is whether to build the data warehouse first or the data mart first.” Data Warehousing

  20. DATA WAREHOUSES AND DATA MARTS • Before deciding to build a data warehouse, you need to ask: • Top-down or bottom-up approach? • Enterprise-wide or department? • Which first – data warehouse or data mart? • Build pilot or go with a full-fledged implementation? • Dependent or independent data marts? Data Warehousing

  21. Data Warehousing

  22. How are They Different? • Figure 2-5 • Two different basic approaches • Overall data warehouse feeding dependent data marts • Several departmental or local data marts combining into a data warehouse Data Warehousing

  23. Data Warehousing

  24. Top-Down Approach: Advantages • A truly corporate effort, an enterprise view of data • Inherently architected – not a union of disparate data marts • Single, central storage of data about the content • Centralized rules and control • May see quick results if implemented with iterations Data Warehousing

  25. Top-Down Approach: Disadvantages • Takes longer to build even with an iterative method • High exposure/risk to failure • Needs high level of cross-functional skills • High outlay without proof of concept Data Warehousing

  26. Bottom-Up Approach: Advantages • Faster and easier implementation of manageable pieces • Favorable return on investment and proof of concept • Less risk of failure • Inherently incremental; can schedule important data marts first • Allows project team to learn and grow Data Warehousing

  27. Bottom-Up Approach: Disadvantages • Each data mart has its own narrow view of data • Permeates redundant data in every data mart • Perpetuates inconsistent and irreconcilable data • Proliferates unmanageable interfaces Data Warehousing

  28. A Practical Approachby Ralph Kimball • Plan and define requirements at the overall corporate level • Create a surrounding architecture for a complete warehouse • Conform and standardize the data content • Implement the data warehouse as a series of supermarts, one at a time • Supermarts are carefully architected data marts Data Warehousing

  29. An Enterprise Data Warehouse • A data mart is a logical subset of the complete data warehouse • A data warehouse is a conformed union of all data marts • Individual data marts are targeted to particular business groups • The collection of all the data martsforman integrated whole, called the enterprise data warehouse Data Warehousing

  30. OVERVIEW OF THE COMPONENTS • Architecture is the proper arrangement of the components • Build a data warehouse with software and hardware components • Arrange the building blocks for maximum benefit • May lay special emphasis on one component Data Warehousing

  31. Basic Components of a typical warehouse • Figure 2-6: building blocks or components • Source Data Component • Data Staging Component • Data Storage Component • Store and manage the data, keep track of the data by means of the metadata repository • Information Delivery Component • Metadata Component • Management and Control Component Data Warehousing

  32. Data Warehousing

  33. Source Data Component • Production Data • Internal Data • Archived Data • External Data Data Warehousing

  34. Production Data • Data from the various operational systems • on different hardware platforms • by different database systems and operating systems • from many vertical applications • No conformance of data among the various operational systems • The significant and disturbing characteristic of production data is disparity • Standardize, transform, convert, and integrate the disparate data Data Warehousing

  35. Internal Data • Data from users’ “private” spreadsheets, documents, customer profiles, and sometimes even departmental database • Add additional complexity to the process of transforming and integrating the data • Determine strategies for collecting data from spreadsheets • Find ways of taking data from textual documents • Tie into departmental databases to gather pertinent data from these sources Data Warehousing

  36. Archived Data • Periodically take the old data and store it in archived files in an operational system • Many different methods of archiving • A separate archival database • Flat files on disk storage • Tape cartridges or microfilm and even off-site • A data warehouse keeps historical snapshots of data • Look into your archived data sets • Useful for discerning patterns and analyzing trends Data Warehousing

  37. External Data • Data from external sources for information that most executives use • Statistics relating to their industry produced by external agencies • Market share data of competitors • Standard values of financial indicators for their business • To spot industry trends and compare performance against other organizations • Usually, data from outside sources do not conform to your formats Data Warehousing

  38. Data Staging Component • Three major functions need to be performed for getting the data ready • extract the data • transform the data • and then load the data into the data warehouse storage • ETT: • 추출(Extraction) • 가공(Transformation) • 전송(Transportation) Data Warehousing

  39. Data Staging • Provide a place and an area with a set of functions to clean, change, combine, convert, deduplicate, and prepare source data for storage and use in the data warehouse Data Warehousing

  40. Data Extraction • Deal with numerous data sources • Tools for data extraction • Purchasing outside tools • Developing in-house programs • Extract the source data into • a group of flat files, • or a data-staging relational database, • or a combination of both Data Warehousing

  41. Data Transformation • Perform a number of individual tasks • Clean • Standardization • Combine • Purging and separating out • Sorting and merging • Assignment of surrogate keys • Results: a collection of integrated data that is cleaned, standardized, and summarized Data Warehousing

  42. Data Loading • Two distinct groups of tasks • The initial loading of the data into the data warehouse • Refresh cycles • Extract the changes to the source data • Transform the data revisions • And feed the incremental data revisions on an ongoing basis • Figure 2-7 Data Warehousing

  43. Data Warehousing

  44. Data Storage Component • A separate repository • To keep large volume of historical data for analysis • To keep the data in structures suitable for analysis • The data warehouses are “read-only” data repositories • The data is stable and it represents snapshots at specified periods • The database in a data warehouse must be open • Must be open to different tools • RDBMSs or MDDBs Data Warehousing

  45. Information Delivery Component • Who are the users? • The novices, the casual users, the business analysts, and the power users • Different methods of information delivery • Ad hoc reports, complex queries, multidimensional analysis, statistical analysis, EIS feed, data-mining applications • Information delivery mechanism • Online, internet, intranet, e-mail Data Warehousing

  46. Data Warehousing

  47. Metadata Component • The data about the data in the data warehouse • Similar to a data dictionary, but much more than a data dictionary • (Later, in a separate section) Data Warehousing

  48. Management and Control Component • Sit on top of all the other components • Coordinate the services and activities • Control the data transformation and the data transfer into the data warehouse storage • Moderate the information delivery to the users • Monitor the movements of data into the staging area and from there into the data warehouse storage • The metadata is the source of information for the management module Data Warehousing

  49. METADATA IN THE DATA WAREHOUSE • The Yellow Pages • A directory with data about the institutions • Types of Metadata • Operational Metadata • Extraction and Transformation Metadata • End-user Metadata Data Warehousing

  50. Operational Metadata • Contain all of next information about the operational data sources • Data for the data warehouse comes from several operational systems • The data elements have various field lengths and data types • You split records, combine parts of records from different source files, and deal with multiple coding schemes and field lengths Data Warehousing

More Related