1 / 56

PS1 PSPS Object Data Manager Design

PS1 PSPS Object Data Manager Design. PSPS Critical Design Review November 5-6, 2007 IfA. Detail Design. General Concepts Distributed Database architecture Ingest Workflow Prototype. Zones. Declination (Dec). Right Ascension (RA). Zones (spatial partitioning and indexing algorithm)

Download Presentation

PS1 PSPS Object Data Manager Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PS1 PSPSObject Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA

  2. Detail Design • General Concepts • Distributed Database architecture • Ingest Workflow • Prototype

  3. Zones Declination (Dec) Right Ascension (RA) Zones (spatial partitioning and indexing algorithm) • Partition and bin the data into declination zones • ZoneID = floor ((dec + 90.0) / zoneHeight) • Few tricks required to handle spherical geometry • Place the data close on disk • Cluster Index on ZoneID and RA • Fully implemented in SQL • Efficient • Nearby searches • Cross-Match (especially) • Fundamental role in addressing the critical requirements • Data volume management • Association Speed • Spatial capabilities

  4. Zoned Table ZoneID = floor ((dec + 90.0) / zoneHeight) * ZoneHeight = 8 arcsec in this example

  5. SQL CrossNeighbors SELECT * FROM prObj1 z1 JOIN zoneZone ZZ ON ZZ.zoneID1 = z1.zoneID JOIN prObj2 z2 ON ZZ.ZoneID2 = z2.zoneID WHERE z2.ra BETWEEN z1.ra-ZZ.alpha AND z2.ra+ZZ.alpha AND z2.dec BETWEEN z1.dec-@r AND z1.dec+@r AND (z1.cx*z2.cx+z1.cy*z2.cy+z1.cz*z2.cz) > cos(radians(@r))

  6. Good CPU Usage

  7. Partitions • SQL Server 2005 introduces technology to handle tables which are partitioned across different disk volumes and managed by a single server. • Partitioning makes management and access of large tables and indexes more efficient • Enables parallel I/O • Reduces the amount of data that needs to be accessed • Related tables can be aligned and collocated in the same place speeding up JOINS

  8. Partitions • 2 key elements • Partitioning function • Specifies how the table or index is partitioned • Partitioning schemas • Using a partitioning function, the schema specifies the placement of the partitions on file groups • Data can be managed very efficiently using Partition Switching • Add a table as a partition to an existing table • Switch a partition from one partitioned table to another • Reassign a partition to form a single table • Main requirement • The table must be constrained on the partitioning column

  9. Partitions • For the PS1 design, • Partitions mean File Group Partitions • Tables are partitioned into ranges of ObjectID, which correspond to declination ranges. • ObjectID boundaries are selected so that each partition has a similar number of objects.

  10. Distributed Partitioned Views • Tables participating in the Distributed Partitioned View (DVP) reside on different databases which reside in different databases which reside on different instances or different (linked) servers

  11. Concept: Slices • In the PS1 design, the bigger tables will be partitioned across servers • To avoid confusion with the File Group Partitioning, we call them “Slices” • Data is glued together using Distributed Partitioned Views • The ODM will manage slices. Using slices improves system scalability. • For PS1 design, tables are sliced into ranges of ObjectID, which correspond to broad declination ranges. Each slice is subdivided into partitions that correspond to narrower declination ranges. • ObjectID boundaries are selected so that each slice has a similar number of objects.

  12. Detail Design Outline • General Concepts • Distributed Database architecture • Ingest Workflow • Prototype

  13. PS1 Distributed DB system objZoneIndx orphans_l1 Detections_l1 LnkToObj_l1 detections detections objZoneIndx Orphans_ln Detections_ln LnkToObj_ln Linked servers Load Support1 Load Supportn LoadAdmin PartitionsMap Linked servers P1 Pm [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta [Objects_pm] [LnkToObj_pm] [Detections_pm] Meta PS1 PartitionsMap Objects LnkToObj Meta Detections PS1 database Query Manager (QM) Legend Database Full table [partitioned table] Output table Partitioned View Web Based Interface (WBI)

  14. Design Decisions: ObjID • Objects have their positional information encoded in their objID • fGetPanObjID (ra, dec, zoneH) • ZoneID is the most significant part of the ID • It gives scalability, performance, and spatial functionality • Object tables are range partitioned according to their object ID

  15. ObjectID Clusters Data Spatially Dec = –16.71611583 ZH = 0.008333 ZID = (Dec+90) / ZH = 08794.0661 ObjectID = 087941012871550661 RA = 101.287155 ObjectID is unique when objects are separated by >0.0043 arcsec

  16. Design Decisions: DetectID • Detections have their positional information encoded in the detection identifier • fGetDetectID (dec, observationID, runningID, zoneH) • Primary key (objID, detectionID), to align detections with objects within partitions • Provides efficient access to all detections associated to one object • Provides efficient access to all detections of nearby objects

  17. DetectionID Clusters Data in Zones Dec = –16.71611583 ZH = 0.008333 ZID = (Dec+90) / ZH = 08794.0661 DetectID = 0879410500001234567 ObservationID = 1050000 Running ID = 1234567

  18. ODM Capacity 5.3.1.3 The PS1 ODM shall be able to ingest into the ODM a total of • 1.51011 P2 detections • 8.31010 cumulative sky (stack) detections • 5.5109 celestial objects together with their linkages.

  19. PS1 Table Sizes - Monolithic Sizes are in TB

  20. What goes into the main Server Linked servers P1 Pm PS1 PartitionsMap Objects LnkToObj Meta PS1 database Objects LnkToObj Meta PartitionsMap Legend Database Full table [partitioned table] Output table Distributed Partitioned View

  21. What goes into slices Linked servers P1 Pm [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta PS1 PartitionsMap Objects LnkToObj Meta PS1 database [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta PartitionsMap Legend Database Full table [partitioned table] Output table Distributed Partitioned View

  22. What goes into slices Linked servers P1 Pm [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta PS1 PartitionsMap Objects LnkToObj Meta PS1 database [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta PartitionsMap Legend Database Full table [partitioned table] Output table Distributed Partitioned View

  23. Duplication of Objects & LnkToObj • Objects are distributed across slices • Objects, P2ToObj, and StackToObj are duplicated in the slices to parallelize “inserts” & “updates” • Detections belong into their object’s slice • Orphans belong to the slice where their position would allocate them • Orphans near slices’ boundaries will need special treatment • Objects keep their original object identifier • Even though positional refinement might change their zoneID and therefore the most significant part of their identifier

  24. Glue = Distributed Views Linked servers P1 Pm [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta PS1 PartitionsMap Objects LnkToObj Meta Detections PS1 database Detections Legend Database Full table [partitioned table] Output table Distributed Partitioned View

  25. Partitioning in Main Server • Main server is partitioned (objects) and collocated (lnkToObj) by objid • Slices are partitioned (objects) and collocated (lnkToObj) by objid Linked servers P1 Pm PS1 PS1 database Query Manager (QM) Web Based Interface (WBI)

  26. PS1 Table Sizes - Main Server Sizes are in TB

  27. PS1 Table Sizes - Each Slice Sizes are in TB

  28. PS1 Table Sizes - All Servers Sizes are in TB

  29. Detail Design Outline • General Concepts • Distributed Database architecture • Ingest Workflow • Prototype

  30. PS1 Distributed DB system objZoneIndx orphans_l1 Detections_l1 LnkToObj_l1 detections detections objZoneIndx Orphans_ln Detections_ln LnkToObj_ln Linked servers Load Support1 Load Supportn LoadAdmin PartitionsMap Linked servers P1 Pm [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta PS1 PartitionsMap Objects LnkToObj Meta Detections PS1 database Query Manager (QM) Legend Database Full table [partitioned table] Output table Partitioned View Web Based Interface (WBI)

  31. “Insert” & “Update” • SQLInsert and Update are expensive operations due to logging and re-indexing • In the PS1 design, Insert and Update have been re-factored into sequences of: Merge + Constrain + Switch Partition • Frequency • f1: daily • f2: at least monthly • f3: TBD (likely to be every 6 months)

  32. Ingest Workflow X(1”) DZone DXO_1a X(2”) NoMatch Resolve Detect DXO_2a P2PsfFits Orphans P2ToObj ObjectsZ CSV

  33. Ingest @ frequency = f1 Orphans_1 P2ToPsfFits_1 P2ToObj_1 StackToObj P2ToObj Objects 1 11 2 12 13 3 Orphans_1 Stack*_1 P2ToPsfFits_1 Objects_1 P2ToObj_1 ObjectsZ P2ToObj P2PsfFits Metadata+ Orphans SLICE_1 LOADER MAIN

  34. Updates @ frequency = f2 StackToObj P2ToObj Objects 1 11 2 12 3 13 Objects P2ToObj_1 Stack*_1 P2ToPsfFits_1 Orphans_1 Objects_1 SLICE_1 LOADER MAIN Metadata+

  35. Updates @ frequency = f2 StackToObj P2ToObj Objects 1 11 2 12 13 3 Objects Objects_1 P2ToPsfFits_1 Stack*_1 Orphans_1 P2ToObj_1 Objects_1 Objects Metadata+ SLICE_1 LOADER MAIN

  36. Snapshots @ frequency = f3 Objects P2ToObj StackToObj Objects 1 2 3 Snapshot Metadata+ MAIN

  37. Batch Update of a Partition select into 1 1 2 1 2 3 A1 A2 A3 … merged select into … where select into … where select into … where B1 + PK index B2 + PK index B3 + PK index switch switch switch B1

  38. Scaling-out • Apply Ping-Pong strategy to satisfy query performance during ingest 2 x ( 1 main + m slices) [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_p2] [LnkToObj_p2] [Detections_p2] Meta Linked servers P1 P2 Pm P1 [Objects_pm] [LnkToObj_pm] [Detections_pm] [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta P2 P3 Pm-1 Pm PS1 PS1 Detections Detections PartitionsMap Objects LnkToObj Meta PartitionsMap Objects LnkToObj Meta PS1 database Query Manager (QM) Legend Database Duplicate Full table [partitioned table] Partitioned View Duplicate P view

  39. Scaling-out • More robustness, fault-tolerance, and reabilability calls for 3 x ( 1 main + m slices) [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_p2] [LnkToObj_p2] [Detections_p2] Meta Linked servers P1 P2 Pm P1 [Objects_pm] [LnkToObj_pm] [Detections_pm] [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta P2 P3 Pm-1 Pm PS1 PS1 Detections Detections PartitionsMap Objects LnkToObj Meta PartitionsMap Objects LnkToObj Meta PS1 database Query Manager (QM) Legend Database Duplicate Full table [partitioned table] Partitioned View Duplicate P view

  40. Adding New slices SQL Server range partitioning capabilities make it easy • Recalculate partitioning limits • Transfer data to new slices • Remove data from slices • Define an d Apply new partitioning schema • Add new partitions to main server • Apply new partitioning schema to main server

  41. Adding New Slices

  42. Detail Design Outline • General Concepts • Distributed Database architecture • Ingest Workflow • Prototype

  43. ODM Ingest Performance 5.3.1.6 The PS1 ODM shall be able to ingest the data from the IPP at two times the nominal daily arrival rate* * The nominal daily data rate from the IPP is defined as the total data volume to be ingested annually by the ODM divided by 365. • Nominal daily data rate: • 1.51011 / 3.5 / 365 = 1.2108 P2 detections / day • 8.31010 / 3.5 / 365 = 6.5107 stack detections / day

  44. Number of Objects * “SDSS” includes a mirror of 11.3 <  < 30 objects to  < 0 Total GB of csv loaded data: 300 GB CSV Bulk insert load: 8 MB/s Binary Bulk insert: 18-20 MB/s CreationStarted: October 15th 2007 Finished: October 29th 2007 (??) Includes • 10 epochs of P2PsfFits detections • 1 epoch of Stack detections

  45. Prototype in Context

  46. Size of Prototype Database Table sizes are in billions of rows

  47. Size of Prototype Database Table sizes are in GB 9.6 TB of data in a distributed database

  48. Well-Balanced Partitions

  49. Ingest and Association Times

  50. Ingest and Association Times Educated Guess Wild Guess

More Related