1 / 64

Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online

Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online. Larry Mellon Spring 2003. Context: What Is Automated Testing?. Random Input. Classes Of Testing. System Stress. Feature Regression. Load. QA. Developer. Collection & Analysis. Startup &

mireille
Download Presentation

Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Testing of Massively Multi-Player Games Lessons Learned fromThe Sims Online Larry Mellon Spring 2003

  2. Context: What Is Automated Testing?

  3. Random Input Classes Of Testing System Stress Feature Regression Load QA Developer

  4. Collection & Analysis Startup & Control Repeatable, Sync’ed Test Inputs System Under Test System Under Test System Under Test Automation Components

  5. What Was Not Automated? Startup & Control Repeatable, Synchronized Inputs Results Analysis Visual Effects

  6. Lessons Learned: Automated Testing 1/3 Design & Initial Implementation Architecture, Scripting Tests, Test Client Initial Results 1/3 Fielding: Analysis & Adaptations Wrap-up & Questions What worked best, what didn’t Tabula Rasa: MMP / SPG 1/3 Time (60 Minutes)

  7. Automation (Repeatable, Synchronized Input) (Data Management) Strong Abstraction Design Constraints Load Regression Churn Rate

  8. Single, Data Driven Test Client Load Regression Reusable Scripts & Data Single API Test Client

  9. Data Driven Test Client “Testing feature correctness” “Testing system performance” Load Regression Reusable Scripts & Data Single API Test Client Single API Key Game States Pass/Fail Responsiveness Configurable Logs & Metrics

  10. Problem: Testing Accuracy • Load & Regression: inputs must be • Accurate • Repeatable • Churn rate: logic/data in constant motion • How to keep testing client accurate? • Solution: game client becomes test client • Exact mimicry • Lower maintenance costs

  11. Test Client Game Client Test Control Game GUI State State Commands Presentation Layer Client-Side Game Logic Test Client == Game Client

  12. Game Client: How Much To Keep? Game Client View Presentation Layer Logic

  13. What Level To Test At? Game Client View Mouse Clicks Presentation Layer Logic Regression: Too Brittle (pixel shift) Load: Too Bulky

  14. What Level To Test At? Game Client View Internal Events Presentation Layer Logic Regression: Too Brittle (Churn Rate vs Logic & Data)

  15. Buy Lot Enter Lot Buy Object Use Object … Gameplay: Semantic Abstractions Basic gameplay changes less frequently than UI or protocol implementations. NullView Client View ~ ¾ Presentation Layer Logic ~ ¼

  16. Scriptable User Play Sessions • SimScript • Collection: Presentation Layer “primitives” • Synchronization: wait_until, remote_command • State probes: arbitrary game state • Avatar’s body skill, lamp on/off, … • Test Scripts: Specific / ordered inputs • Single user play session • Multiple user play session

  17. Scriptable User Play Sessions • Scriptable play sessions: big win • Load: tunable based on actual play • Regression: constantly repeat hundreds of play sessions, validating correctness • Gameplay semantics: very stable • UI / protocols shifted constantly • Game play remained (about) the same

  18. SimScript: Abstract User Actions include_scriptsetup_for_test.txt enter_lot $alpha_chimp wait_until game_state inlot chatI’m an Alpha Chimp, in a Lot. log_message Testing object purchase. log_objects buy_object chair 10 10 log_objects

  19. SimScript: Control & Sync # Have a remote client use the chair remote_cmd $monkey_bot use_object chair sit set_data avatar reading_skill 80 set_data book unlock use_object book read wait_until avatar reading_skill 100 set_recording on

  20. Client Implementation

  21. Event Generators Event Generators Event Generators Composable Client - Scripts - Cheat Console - GUI Presentation Layer Game Logic

  22. Event Generators Event Generators Event Generators Viewing Systems Viewing Systems Viewing Systems Composable Client - Console - Lurker - GUI - Scripts - Console - GUI Presentation Layer Game Logic Any / all components may be loaded per instance

  23. Lesson: View & Logic Entangled Game Client View Logic

  24. Few Clean Separation Points Game Client View Presentation Layer Logic

  25. Solution: Refactored for Isolation Game Client View Presentation Layer Logic

  26. Lesson: NullView Debugging ? Without (legacy) view system attached, tracing was “difficult”. Presentation Layer Logic

  27. Solution: Embedded Diagnostics Timeout Handlers … Diagnostics Diagnostics Diagnostics Presentation Layer Logic

  28. Talk Outline: Automated Testing 1/3 Design & Initial Implementation Architecture & Design Test Client Initial Results 1/3 Lessons Learned: Fielding Wrap-up & Questions 1/3 Time (60 Minutes)

  29. Mean Time Between Failure • Random Event, Log & Execute • Record client lifetime / RAM • Worked: just not relevant in early stages of development • Most failures / leaks found were not high-priority at that time, when weighed against server crashes

  30. Monkey Tests • Constant repetition of simple, isolated actions against servers • Very useful: • Direct observation of servers while under constant, simple input • Server processes “aged” all day • Examples: • Login / Logout • Enter House / Leave House

  31. QA Test Suite Regression • High false positive rate & high maintenance • New bugs / old bugs • Shifting game design • “Unknown” failures Not helping in day to day work.

  32. Talk Outline: Automated Testing ¼ Design & Initial Implementation Fielding: Analysis&Adaptations Non-Determinism Maintenance Overhead Solutions & Results Monkey / Sniff / Load / Harness ½ ¼ Wrap-up & Questions Time (60 Minutes)

  33. Analysis: Testing Isolated Features

  34. Analysis: Critical Path Test Case: Can an Avatar Sit in a Chair? use_object () • Failures on the Critical Path block access to much of the game. buy_object () enter_house () buy_house () create_avatar () login ()

  35. Solution: Monkey Tests • Primitives placed in Monkey Tests • Isolate as much possible, repeat 400x • Report only aggregate results • Create Avatar: 93% pass (375 of 400) • “Poor Man’s” Unit Test • Feature based, not class based • Limited isolation • Easy failure analysis / reporting

  36. Talk Outline: Automated Testing 1/3 Design & Initial Implementation Lessons Learned: Fielding Non-Determinism Maintenance Costs Solution Approaches Monkey / Sniff / Load / Harness 1/3 1/3 Wrap-up & Questions Time (60 Minutes)

  37. Analysis: Maintenance Cost • High defect rate in game code • Code Coupling: “side effects” • Churn Rate: frequent changes • Critical Path: fatal dependencies • High debugging cost • Non-deterministic, distributed logic

  38. Turnaround Time Tests were too far removed from introduction of defects.

  39. Critical Path Defects Were Very Costly

  40. Pre-Checkin Regression: don’t let broken code into Mainline. Solution: Sniff Test

  41. Solution: Hourly Diagnostics • SniffTest Stability Checker • Emulates a developer • Every hour, sync / build / test • Critical Path monkeys ran non-stop • Constant “baseline” • Traffic Generation • Keep the pipes full & servers aging • Keep the DB growing

  42. Analysis: CONSTANT SHOUTING IS REALLY IRRITATING • Bugs spawned many, many, emails • Solution: Report Managers • Aggregates / correlates across tests • Filters known defects • Translates common failure reports to their root causes • Solution: Data Managers • Information Overload: Automated workflow tools mandatory

  43. ToolKit Usability • Workflow automation • Information management • Developer / Tester “push button” ease of use • XP flavour: increasingly easy to run tests • Must be easier to run than avoid to running • Must solve problems “on the ground now”

  44. Sample Testing Harness Views

  45. Load Testing: Goals • Expose issues that only occur at scale • Establish hardware requirements • Establish response is playable @ scale • Emulate user behaviour • Use server-side metrics to tune test scripts against observed Beta behaviour • Run full scale load tests daily

  46. Load Testing: Data Flow Resource Debugging Data Load Testing Team Metrics Client Metrics Load Control Rig Test Test Test Test Test Test Test Test Test Client Client Client Client Client Client Client Client Client Test Driver CPU Test Driver CPU Test Driver CPU Game Traffic Internal System Server Cluster Probes Monitors

  47. Load Testing: Lessons Learned • Very successful • “Scale&Break”: up to 4,000 clients • Some conflicting requirements w/Regression • Continue on fail • Transaction tracking • Nullview client a little “chunky”

  48. Current Work • QA test suite automation • Workflow tools • Integrating testing into the new features design/development process • Planned work • Extend Esper Toolkit for general use • Port to other Maxis projects

  49. Talk Outline: Automated Testing 1/3 Design & Initial Implementation 1/3 Lessons Learned: Fielding Wrap-up & Questions 1/3 Biggest Wins / Losses Reuse Tabula Rasa: MMP & SSP Time (60 Minutes)

  50. Biggest Wins • Presentation Layer Abstraction • NullView client • Scripted playsessions: powerful for regression & load • Pre-Checkin Snifftest • Load Testing • Continual Usability Enhancements • Team • Upper Management Commitment • Focused Group, Senior Developers

More Related