740 likes | 943 Views
Configuring Global Payroll for Optimal Performance. nalin.n.patel@abbey.com david.kurtz@go-faster.co.uk. Abbey key facts 1. Sixth largest bank by assets in the UK Founded in 1944 Currently have approximately 18m customers 741 branches across UK. Abbey key facts 2.
E N D
Configuring Global Payroll for Optimal Performance nalin.n.patel@abbey.com david.kurtz@go-faster.co.uk
Abbey key facts 1 • Sixth largest bank by assets in the UK • Founded in 1944 • Currently have approximately 18m customers • 741 branches across UK
Abbey key facts 2 • Abbey's main offices are in London, Milton Keynes, Bradford, Glasgow and Belfast. • We have around 26,000 people (full time equivalent) • We have about 1.8 million shareholders • Assets at 30 June 2004 - £171 billion • Personal Financial Services trading profit before tax for 6 months to 30 June 2004 - £340 million
History of PeopleSoft at Abbey • PeopleSoft HRMS acquired for recruitment in 1994 • Implemented PeopleSoft HRMS in 1997 • Recruitment, Personnel & Training • Paylink used to send data from HRMS to payroll • Workflow and self-service with v7.5 in 2000 • JAVA HTML Clients • PeopleSoft HRMS upgraded to 8 SP1 in 2001 • Implemented PeopleSoft Payroll in August 2003 • Project initiated to upgrade to HCM 8.8
Current Platform • AppServer runs on SUN E4500 • Database runs on SUN E10000 • Both boxes are shared with other applications • Tier 1 mirrored disks • Oracle 9.2.0.4
Why PeopleSoft Payroll ? • Integrated HRMS • Common infrastructure • Web enabled • Automate administrative functions • Manager Self-Service • Absence and maternity input • Employee Self-Service • Overtime input • On-line payslips • Real-time data input • Increase system availability
PeopleSoft Payroll Implementation • Development commenced in January 2002 • In-house IT Project team • Project delays due to re-scoping and internal re-structure • Streamed payroll during parallel run tests • Went live with payroll and absence in August 2003 • 30,000 staff and 7,000 pensioners • 12 streams introduced in February 2004 • Introduced hash partition in July 2004 due to increased run times • Identify and calculate taking 2.5 hours • But we had to tune it.
Resources • If you can’t hear me say so now. • Please feel free to ask questions as we go along. • The presentation will be available from • Customer Connection –> Conference Website • www.go-faster.co.uk
Independent consultant Abbey, DoD, Unilever, UBS… System Performance Tuning Oracle Databases Unix Tuxedo PeopleSoft Apps Inc. Global Payroll Book www.psftdba.com Who am I?
Technical? DBA Developer Familiar with PeopleSoft infrastructure Non-Technical? HR Functional HR/P Administrator Project Manager Who are You?
Physical Database Considerations Parallel processing Increase concurrency Reduce Contention Reduce I/O Permit CPU usage some Oracle specific GP Changes Efficent GP ‘rules’ Reduce CPU consumption of rules Engine Data Migration Configuring Global Payroll
This is not theory! • This has been done for real • UBS – 32,000 payees • Abbey – 36,000 payees* • DoD – 640,000 payees (benchmark) • Unilever – 12,500 payees (weekly & monthly) • 3 other installations in UK, France & Japan • And it works!
Overview • Payroll is calculated by a Cobol program • GPPDPRUN • Single non-threaded process • Four Stages • Cancel • Identify • (Re-)Calculate • Finalise
Cancellation Monolithic SQL to delete results Identify Populating temporary work tables Database Intensive SQL Set processing ~10-20 minutes Calculation Opening cursors Load data into memory Evaluation of rules (Cobol only) Batch insert of results into database Cobol (CPU) Intensive ~6500 segments / hour / stream (was 400) Three stages with different behaviours
What is Streaming? • Employees are split into groups defined by ranges of employee ID • Each group/range can be processed by a different instance/stream of GPPDPRUN • The streams can then be run in parallel. • Vanilla PeopleSoft functionality. • This is not customisation
Why is Streaming Necessary? • GPPDPRUN is a standard Cobol program. • It is a single threaded process • One Cobol process can only run on one CPU at any one time • 36000 payees at 2700 payees /stream/hour • 97000 segments at 7350 segments/stream/hour • 49m - 1h11m - 12 streams • 13h12m if run in one stream • On a multi-processor server streaming enables consumption of extra CPU.
Calculation of Stream Definitions • Objective is roughly equal processing time for all stream • PS_GP_PYE_SEG_STAT indicates work to be done by payroll. • Calculate ranges of roughly equal numbers of rows for this table • Script using Oracle’s Analytic functions that directly populates PS_GP_STRM • Equal processing time does NOT correspond to equal volumes of result data.
Employee Distribution Creep • As new employees hired EMPLIDs allocated into the same stream. • That stream starts to run longer. • Effective execution time is maximum execution time for all streams. • Need to periodically recalculate stream ranges • Need to reflect this in physical changes.
Employee Distribution Creep • Company merger/divestment. • Pensioners • Abbey • 30000 employees – avg 3.03 segments per employee • 6000 pensioners – 1 segment per pensioner • 12 streams • Employee IDs allocated sequentially • Earlier streams richer in pensioners • Later streams richer in employees
Rollback Contention Snapshot Too Old Insert Contention I/O Volume Datafile I/O Redo/Archive Log Activity Database Contention
Rollback Contention • Working Storage Tables • Shared by all streams • Rows inserted/deleted during run • Different Streams never create locks that block each other • Do update different rows in same block during processing • 1 interested transaction per stream in many blocks. • There is a additional rollback overhead of 16 bytes per row if two rows in same block -v- different blocks • updates of ~<100 bytes / row
Read Consistency • Oracle guarantees that data is consistent throughout life of a query • If a block has been updated by another transaction since a long running query started, it must be possible to reconstruct the state of that block at the time the query started using the rollback segment. • If that information cannot be found in the rollback segment the long running query fails with ORA-01555.
ORA-01555 Snapshot Too Old • Rollback segments are not extended for read consistency. • Additional rollback overhead can cause rollback segments to spin and wrap. • Error message also described a ‘rollback segments too small.’ • In this case, to simply extend the segments is the wrong response. • CPU overhead to navigate rollback segment header chain
Insert Contention • During the calculation phase results are written to the result tables. • A number of stream can simultaneously insert into the same result tables. • Increases chance that one block will contain rows relating to more than one stream. • This in turn causes rollback problems during the cancel in the next calculation.
Another cause of ORA-1555 • If not processing calendar for the first time, previous results cancelled • Result table are deleted • Monolithic deletes from each table. • If Streams start together tend to delete same table at same time in each stream. • A long running delete is also a query for the purposes of read consistency. • It is necessary to reconstruct a block as at the time the long running delete started in order to delete a row from it. • Reconstruction occurs during ‘consistent read’. • Deletes by primary key columns, thus Oracle tends to look each row up row by index. Thus index reads also ‘consistent’.
Datafile and Log Physical I/O Activity • During the identify phase data is shuffled from table to table • This generates datafile and redo log I/O • Rollback activity is also written to disk, undo information is also written to the redo log. • All the data placed in the temporary working tables by a stream is of no use to any other instance of the calculation process. • It will be deleted by a future process. • Dirty blocks written to disk before the rollback segment wraps.
High Water Marks • The working storage tables tend to be used to drive processing. • Thus, the SQL tends to use full table scans. • In Oracle, High Water Mark is the highest block that has ever contained data. • Full Scans scan the table up to the high water mark. • Temporary tables contain data for ALL streams. • All streams can have to scan data for all streams.
How to avoid inter-stream contention? • Keep rows from different streams in different blocks • Each block should contain rows for one and only one stream. • Need Two Oracle Features • Partitioning • Global Temporary Tables
What is Partitioning? • Logically • a partitioned table is a still a single table • Physically • each partition is a separate table. • in a partitioned table, the partition in which a row is placed is determined by the value of one or more columns. • Local Index • is partitioned on the same logical basis as the table.
What is Partitioning? • Typically used in DSS • But can also be effective in OLTP • (From Oracle documentation)
Range Streams defined in terms of ranges Queries specify range of employees Fits well with range partitioning Ensures partition elimination. Range Partition on EMPLID Hash Psuedo-random Hash function Same input always gets same output Good for single value look up. Single pay period (calendar group ID) Hash partition on CAL_RUN_ID What sort of Partitioning
How should Range Partitioning used in GP? • Largest Result tables range each partitioned on EMPLID to match GP streaming • 1 stream : 1 partition • Thus each stream references one partition in each result table. • Only 1 interested transaction per block • Indexes ‘locally’ partitioned • Partitioning really designed for DSS systems. • Most efficient for large tables. • GP_RSLT_ACUM, GP_RSLT_ERN_DED, • GP_RSLT_PIN, GP_RSLT_PI_DATA • Effective on smaller ones too • GP_PYE_PRC_STAT, GP_PYE_SEG_STAT
How should Hash Partitioning used in GP? • Partition by CAL_RUN_ID because SQL contains • CAL_RUN_ID = … • Only worthwhile on the very largest • GP_RSLT_ACUM, GP_RSLT_ERN_DED, GP_RSLT_PIN • Adjust CAL_RUN_IDs to control partition to balance hash partition volumes.
Predicting Hash Values • Use Oracle PL/SQL function SELECT sys.dbms_utility.get_hash_value( CAL_RUN_ID,1,16) • Number of partitions should be a power of 2 • Due to mathematics of hash function • 16,32,64 not 12, 53,61, 106, 118 • Abbey use 32 • They want to hold 18 months of data, 18>16, so 32.
Calendar Group ID Suffixes • Original Calendar Group ID • AN2004/10 • Hash value 15 • But partition 15 already used and 14 is least empty • AN2004/10E • Hash value 14 • Putting data into hash partition with least data improves performance. • If only monthly payroll then you could arrange for one month per partition. That would make archiving easier later!
CAL_RUN_IDX HASHVALUEX ----------- ---------- AN2004/01 8 AN2004/02 7 AN2004/03 15 AN2004/04 6 AN2004/05 10 AN2004/06 18 AN2004/07 31 AN2004/08 3 AN2004/09 4 AN2004/10 15 AN2004/11 5 AN2004/12 30 CAL_RUN_IDX HASHVALUEX ----------- ---------- AN2004/01 8 AN2004/02 7 AN2004/03 15 AN2004/04 6 AN2004/05B 20 AN2004/06A 11 AN2004/07B 20 AN2004/08B 30 AN2004/09A 21 AN2004/10E 14 AN2004/11B 16 AN2004/12D 22 Calendar Group ID Suffixes (i)
CAL_RUN_IDX HASHVALUEX ----------- ---------- AN2004/01 8 AN2004/02 7 AN2004/03 15 AN2004/04 6 AN2004/05 10 AN2004/06 18 AN2004/07 31 AN2004/08 3 AN2004/09 4 AN2004/10 15 AN2004/11 5 AN2004/12 30 CAL_RUN_IDX HASHVALUEX ----------- ---------- AN2004/01BE 1 AN2004/02AL 2 AN2004/03AT 3 AN2004/04AJ 4 AN2004/05AF 5 AN2004/06AC 6 AN2004/07BC 7 AN2004/08AI 8 AN2004/09BJ 9 AN2004/10AW 10 AN2004/11BR 11 AN2004/12EB 12 Calendar Group ID Suffixes (ii)
Partitioning on other platforms • DB2 does range partitioning • Latest version will do multi-dimensional range partitioning • Only Oracle does range partitioning and hash sub-partitioning • multi-dimensional range partitioning could be more effective.
Global Temporary Tables • Oracle specific feature that is appearing in other DB platforms. • Definition is permanently defined in database catalogue. • Physically created on demand by database in temporary tablespace for duration of session/transaction. Then dropped. • Each session has its own copy of each referenced GT table. • Each physical instance of each GT table only contains data for one stream. • Working Storage Tables PS_GP_%_WRK converted to GT tables.
Advantages Not recoverable, therefore no Redo/Archive Logging some undo information improved performance reduce rollback No High Water Mark problems Smaller object to scan. No permanent tablespace overhead. Disadvantages Does consume temporary tablespace but only during payroll Can’t Analyze in Oracle 8i Work arounds Can in Oracle 9i Can hamper debugging New in Oracle 8.1, some bugs. GP not affected Global Temporary Tables
Cobol run on database server Either Cobol is active or database is active No more than one stream per CPU Perhaps CPUs -1 be careful not to starve database of CPU run process scheduler at lower OS priority Cobol and database on different servers Cobol active for 2/3 of execution time. Up to 1.5 streams per CPU on Cobol server Up to 3 streams per CPU on database server Hotsos Profiler How many streams should be run?
Other Streamable Processes • Application Engine • GP_PMT_PREP • OK in CH • Bug in UK extensions • GP_GL_PREP • GPGB_PSLIP • Bug fixed • Additional partitioned and GT tables required
Abbey Production Payroll Configuration • 2 nodes • Database Node • 12 CPU – shared with other services • Application Server/Process Scheduler Node • 8 CPU each • 12 Streams • 2/3 of 12 is 8, so all 8 application server node CPUs active during calculate phase • ‘nice’ the Cobol processes (by nicing the process scheduler) • 1/3 of 12 is 4, so 4 of 12 DB CPUs active • important to leave some free CPU for database else spins escalated to sleeps generating latch contention
Unilever Production Payroll Configuration • 1 node • 4 CPUs each – dedicated to GP only • 4 Streams • 1 per CPU • monthly payroll only – 10000 payees • weekly payroll not streamed
UBS Production Payroll Configuration • 2 nodes • Database Node • Application Server/Process Scheduler Node • 20 CPUs each – dedicated to HR&GP • 30 Streams • 2/3 of 30 is 20, so all 20 application server node CPUs active during calculate phase • 1/3 of 30 is 10, so 10 of 20 CPUs active
UBS QA Payroll Configuration • 2 nodes • Database Node • Application Server/Process Scheduler Node • 10 CPUs each • Still 30 Streams • Only 15 run concurrently • Full production volume payroll • < 1 hour
GP Development Goals • How to create and test efficient rules that work without adversely effecting performance • How best to identify problems particularly in the area of system setup/data versus a problem in a rule or underlying program • How to use GP payroll debugging tools
Efficient Rules • Responsible for two thirds of the execution time, and so could produce the greatest saving, it will also require the greatest effort. • Detailed functional and technical analysis of the definition of the payroll rules.
Efficient Rules • The process ideally starts during the design stage when various implementation schemes are analysed, intermediate tests are performed and the most efficient scheme is chosen. • All aspects of Global Payroll must be considered since creating rules to simplify calculation can adversely affect reporting or other online and batch areas and vice versa.