1 / 50

VoIP Testing—A How-to Session for Performance and Functional Test Methodologies

VoIP Testing—A How-to Session for Performance and Functional Test Methodologies. Chris Bajorek, Director, CT Labs. Before We Start. Every year we consult with many companies, helping them to perform many different types of VoIP-oriented tests

amable
Download Presentation

VoIP Testing—A How-to Session for Performance and Functional Test Methodologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VoIP Testing—A How-to Session for Performance and Functional Test Methodologies Chris Bajorek, Director, CT Labs

  2. Before We Start • Every year we consult with many companies, helping them to perform many different types of VoIP-oriented tests • This provides a unique industry perspective on the market readiness of a wide range of VoIP products • I’m pleased to have this opportunity to share our test experiences with you today

  3. VoIP Products by Market Area • Residential (Voice over Broadband) • Analog terminal adapters, VoIP softphones,residential routers • Enterprise • IP PBXs, IP Contact Centers, VoIP phones &softphones, firewalls/ALGs, intrusion prevention devices, media servers (conferencing, voice mail, IVR) • Next-Gen Network Carriers and Service Providers • Session border controllers, softswitches, media servers, proxys, media gateways, VQ enhancement processors

  4. Building VoIP Networks:IMS is here and it needs testing. • Key elements of IMS: • Enables innovative new applications • High levels of network complexity • Modules from multiple vendorsmust peacefully coexist • High rate of carrier adoption • Global deployments • Standards based • Exploits strengths of IP+SIP

  5. IMS Basics: Functions by Layer • Services / Application Layer • Application servers, Media servers • Control / Switching Layer • HSS (Home Subscriber Server) • CSCF (Call Session Control Function) • BGCF (Breakout Gateway Control Function) • MGCF (Media Gateway Control Function) • MRFC (Media Resource Function Control) • etc. • Transport / Access Layer • IP/MPLS, PSTN/PLMN, Cellular, SONET/SDH, ATM, Satellite

  6. Risks of Inadequate Testing • From the CT Labs VoIP project files: • VoIP terminal adapters that act unreliable and “emulate” an occasional bad Internet connection • IP PBX’s that drop calls when subjected to only certain types of call loads • VoIP soft clients that distort the caller audio • High-end enterprise firewalls that grind to a standstill under certain denial-of-service attacks • Session border controllers that degrade voice quality at traffic levels below rated maximums

  7. Test Automation-Reaping the Benefits of Shorter Test Cycles Test Automation-Why you should consider using it sooner, not later

  8. Test Automation—The Benefits • Tightly controlled test environment • All aspects of the test setup can be controlled and coordinated by testing scripts • Repeatable results • Key to resolving issues that arise during testing • Includes ability to exactly reproduce product settings and test conditions • Faster test execution • Weeks of manual testing can literally be executed in Hours • Increased accuracy of results reporting • All of the above resulting in: • Lower testing costs over product’s lifetime • Greater product and delivered-service reliability • Fewer field failures, fewer customer-reported issues

  9. Challenges Using Live Callers in Tests • The exact timing and sequence of caller actions is not synchronized or repeatable • Ability to distinguish and describe nuances of results varies widely from person to person • i.e. reliability of reported results can be low • Ability to correlate assessment of voice quality and anomalies across multiple listeners is typically poor • Unless you just happen to know how to run ITU-T P.800 MOS tests • Call arrival profiles difficult to control when using large numbers of callers for “load tests” • In other words, don’t expect more than coarse results

  10. …while monitoring actual voice quality for each listener Then go thru a defined sequence of talker / listener subtests while under real call loads… H H H Conference test, via automation Conf #1 Conf #3 Ready Conf #2 Ready Ready 2 4 3 2 4 3 5 2 3 6 7

  11. Automation-based VoIP Testing Goals • Verify call-handling performance • Verify voice quality • With a wide variety of caller and noise environments • Verify performance under real-world traffic and network impairment conditions • Verify performance under malicious attack conditions • Verify service reliability • i.e. Availability of service under extended test run durations • Verify interoperability and feature interaction • Verify quality of access to enhanced services • Applications such as voice mail, conferencing, IVR, etc.

  12. Real-world automation testing: The 3-phase approach • Phase 1: Test with minimal stress in a sterile environment • i.e. no WAN impairments or network traffic, light call loads • This establishes an important performance baseline • Phase 2: Test with realistic network traffic and call load conditions • Phase 3: Test to rated device call loads

  13. “Rules of Thumb” that simplydo not work • “I tested it with 50 calls and the CPU only went to 25%, so we know the device can scale to 200 calls” • Not quite. Our experience shows that in fact most VoIP devices exhibit performance thresholding effects that are not linear and very hard to predict. In other words, after a certain load or capacity limit is reached the device can fail catastrophically. • If you don’t test to full rated capacity, you are playing Russian Roulette with your customers.

  14. “Rules of Thumb” that simply do not work • “We don’t need to test voice quality because we are OEM’ing the software that does that part.” • Dangerous assumption. OEM software typically has many interface points and configuration options and is hardly in and of itself a guarantee of performance. The “glue” code around these objects can still cause voice quality issues.

  15. Emulation of Network Impairments • Perfectly clean networks are not the real world • Real networks corrupt the flow of packets in the following time-varying ways: • Packet loss (especially burst loss), packet duplication, and out-of-order packets • Latency and jitter • Restricted bandwidth • If you test while inducing these conditions, your product or service will be the cause of far fewer post-deployment issues • You can perform both static and dynamic emulation of impairment conditions • Both have value depending on nature of the VoIP device • e.g. IP phone that renegotiates codec type or codec mode when network degrades in mid-call

  16. DUT DUT Network Emulator Firewall / SBC TDM Call Generator VoIP Call Generator The Network Emulator can be controlled via test script for WAN condition changes DUT dynamic Emulation of Network Impairments

  17. Adding “Internet Mix” Network Traffic • The goal: see the DUT’s impact on VoIP calls when subjected to network traffic at rated capacity • Product examples: • Firewalls, intrusion prevention devices, IP phones with integrated switch ports, session border controllers, etc • What we do: Generate real session-based “Internet Mix” traffic and measure throughput performance of VoIP calls and IMIX traffic • e.g. http, ftp, P2P, SMTP, POP3, etc • Open source tool: “D-ITG” http://www.grid.unina.it/software/ITG/ • Notable vendor: Shenick (www.shenick.com)

  18. Voice Quality Assessment- Automated Testing Techniques Voice and Video Quality Assessment- Automated Testing Techniques

  19. 1101001 Active end-to-end PESQ VQ Passive E-model VQ Voice Quality Test Techniques • Automated VQ measurement techniques are designed to estimate the way humans perceive voice quality • MOS live listener tests done per ITU-T P.800 • Active versus Passive VQ monitoring • Passive E-Model via packet inspection • Active end-to-end VQ measurement “to the audio wires” • Both techniques have their benefits

  20. Active vs Passive VQ Testing • Active voice quality testing • Involves evaluation of “received” audio signals as compared to known references • i.e. you drive real 2-way calls through the VoIP network • PESQ P.862 (2001) • High correlation with standard MOS-LQ subjective tests • Benefits: More accurate, uses mature standards (PESQ) for automated quality assessment • Negatives: Consumes VoIP network resources

  21. Active vs Passive VQ Testing • Passive voice quality testing • Involves passive evaluation of call-based packet flows • ITU-T G.107 E-Model • Can return estimated MOS-LQ and MOS-CQ scores (Listening versus Conversational) • Benefits: Can be embedded into products and test equipment with relatively low resource footprint • Negatives: Ignores (or models) VoIP endpoint-specific behaviors to network conditions. Vendor implementations can vary.

  22. 8 second reference prompt 60 second test call How PESQ works • Computes a voice quality score by comparing degraded received audio with a “reference” speech prompt • Reference prompts are actual speech clips played during an active test call • Quality scores relate only to the time during the test call when the reference prompts were played with far-end audio being captured • The calculation is not just comparing the reference and degraded waveforms, it is using a human perceptual model to ultimately compute a quality score (1=bad to 4.5=excellent)

  23. What PESQ VQ Testing is designed for • PESQ is a way to quickly and cost-effectively estimate the effects of one-way speech distortion and noise on speech quality • PESQ is “endpoint-agnostic” – can be used for VoIP-to-VoIP, VoIP-to-PSTN calls, etc. • Strengths • Provides excellent estimate of voice quality • Tests can be performed quickly • Tests are very repeatable

  24. E-Model PESQ Passive versus Active VQ: A Real Example • From actual CT Labs project • In this example, the phone had quality issues that the passive test did not see • Being aware of the difference in scoring techniques is critical when debugging reported VQ issues

  25. Video Quality Test Techniques • Automated Video quality measurement techniques estimate the way humans perceive picture quality • Live viewer tests done per ITU-T BT.500 • Three classes of objective video quality algorithms • Full reference, partial reference, and zero reference • Full reference techniques • PSNR (most used), VIM, SSIM. See ITU-T J.144. • Compute intensive, not useful for real time measurements • Software suite available at: http://www.compression.ru • Zero reference techniques • Best suited for in-service monitoring • Standards activity continues • Encompasses quality tests for picture, audio, multimedia, and network’s ability to carry streams.

  26. Load and Stress Testing Load and Stress Testing

  27. Load and Stress Testing • What it is • Verifying the DUT’s performance at rated call and traffic loads • Verify those “theoretical” specs on the data sheet • How many simultaneous sessions? It’s all relative • A full load & stress test on a 2-line VoIP terminal adapter will require 2 simultaneous calls • A full load & stress test on a carrier-grade session border controller may support upwards of 150,000 simultaneous SIP calls with media (or more) • The key is this: if you want to be assured of acceptable performance at your spec sheet limits, you cannot linearly scale a partial load test’s results

  28. Load / Stress Testing: Helpful Hints • Use call rates and call ramp profiles that emulate the actual call environment e.g. burst, ramp, etc. • Monitor and log DUT platform resources during test • CPU, memory, disk I/O, network I/O can all provide clues as to why a test failed • Capture a periodic snapshot into logfile for post-test run analysis: Windows Perfmon, Linux (various utilities) • Synchronize system clocks on DUT and test equipment devices before a test run • Allows failure events to be correlated from logs

  29. Load / Stress Testing: Pitfalls • Temptation is to do high volumes of “simple” calls • Problem with this: it will not exercise internal resources in real world way • Example: Conference bridge load test • The wrong way: calls with simple 1-dimensional “can you hear me?” test • The right way: multiple conferences of varying sizes with real talker-listener exchanges • Not running tests long enough • Not testing during DUT housekeeping periods • Leaving verbose DUT logging enabled can consume significant resources

  30. Test Automation Setups Functional Testing

  31. Functional Testing • What it is • Verify that the DUT can execute all features and functions correctly (positive stimulus/response testing) • Verify that DUT responds properly to negative stimuli • Very often ignored, to the detriment of product stability in field • How many simultaneous sessions to test? • Depends on device: one or a few as required to verify all features • Quick examples of functional • Application servers: Conferencing • Verify all host and listener TUI commands and DUT responses • VoIP endpoint devices: Terminal Adapters (TAs) • Verify all call features against softswitch/feature server environments • Question: Does verifying voice quality belong in a functional test?

  32. Functional Testing: A Few Hints • Test script synchronization with DUT is key • DTMF or MF handshaking • Typically involves “tagging” voice prompts with numeric sequences • Speech recognition • Delays • Automation-based functional tests allow: • Much Faster test cycles • TA functional test plan comparison: • 150 test cases verified against 4 different softswitch platforms • Good idea: functional test suite can be turned into a performance test suite • If the tests are designed on a flexible call generator platform • Can mix call traffic from functional and load generator platforms

  33. Test Automation Setups Test Automation Setups

  34. Session Border Controller/Firewall Automation Setup Goals • Verify call-handling performance and advertised specifications at real-world high density VoIP loads • Verify Voice Quality under different Codec, frame packing, and other configuration settings • Verify call-handling performance when subjected to different call rate profiles e.g. Burst, ramp, etc. • Verify thru-SBC registration performance under burst registration conditions • Verify ability to survive and handle legitimate VoIP call loads while under various types of DoS “attacks” • Verify long-term call handling reliability

  35. Network Emulator Firewall / SBC DUT VoIP Call Generator VoIP Call Generator RegistrationGenerator SIP Proxy SBC / Firewall AutomationTest Setup + voice quality opt. + voice quality opt. SIP AttackGenerator Protected Network Unprotected Network

  36. VoIP PSTN/Analog Terminal Adapter (TA)Real-World Network Model

  37. Automated Feature Test Suite Goals • Automate as much of the Terminal Adapter interoperability feature regression test as possible • i.e. Verify call features of TA devices against core VoIP service architectures • Support input configuration files, event and error log files • Support multiple TA devices and PSTN access lines in setup

  38. Off-netPSTN Caller Access PSTN / PBX LAN/WAN Gateway Feature Test Framework TA/ VoiceGateway Router / Edge Device TA/ Voice Gateway TA/ VoiceGateway TDM Call Generator VoIP Call Generator On-netSubscriber Access Softswitch + Media Server, etc VoIP PSTN/Analog Automation Feature Test Solution

  39. Automation Feature Test Framework Details • Supports 140+ feature tests • Including 2-way calls, 3-way calls, features including hold/park/transfer, 911/411, voice mail, + voice quality checking • Test run results captured in easily analyzed logs • Custom reports are generated • Individual test case scripts easily changed

  40. Emulation of IMS devices in a QA lab setting will be critical… unless you plan to purchase, support, and maintain a wide variety of third party IMS devices in your lab, a costly and time-consuming proposition. Setting up for IMS Tests

  41. NGN Device IMS DeviceEmulation VoIP Client TDM Endpoint Emulation TDM User (Vo)IP Endpoint Emulation IMS DeviceUnder Test WANEmulator SS7/TDM Network Network Analyzer IP Network Mixed EndpointEmulation Setting up for IMS Tests

  42. VoIP Security Testing— Issues to consider

  43. The bad news: VoIP systems are vulnerable Platforms are vulnerable VoIP-specific attacks are becoming more common The good news: The threat is still developing VoIP handsets are still in minority “out there” Vast majority of VoIP is company-internalCourtesy: Mark Collier, CTO SecureLogix VoIP networks share the same vulnerabilities that plague data networks, PLUS some specific additional threats VoIP Vulnerabilities/Threats

  44. Major area of focus at CT Labs VoIP Product Vulnerabilities Voice Applications Toll Fraud SPIT Protocol Attacks SIP Floods RTP Floods VoIP Protocols Services (Database, Web Server) Slammer worm SQL attacks Network Stack(s) (IP, UDP, TCP, RTP,…) SYN Floods, etc. (many…) Telephony Devices Network Devices Servers OS attacks (Windows worms, Viruses) Physical infrastructure (power, wiring) Physical Hacking

  45. B A Received Prompt Reference Prompt VoIP Call Generator VoIP Call Generator RTP RTP Router / switch Router / switch Device(s) Under Test InjectedSIP / RTPattacks DoS Attack Testing • Generate SIP-specific attacks (send “fuzzed” and other types of SIP protocol packet floods) while also sending legitimate SIP calls • Measure call performance (dropped, blocked, delayed calls), voice quality with security measures in place • Test calls sent with media (real speech) to verify true voice quality via PESQ while under attack

  46. SIP-Specific Attacks to Launch • i.e. in addition to lower-layer well known DoS attacks • Blast packets from these scenarios at up to line rates: • Malformed and Torture Test floods • Using SIP packets from open source Protos test suite • INVITE, REGISTER, and Response floods • Spoofed variations for above • i.e. Spoofing the IP address and port of legitimate devices, or spoofing the Via or AoR of legitimate users • RTP attacks • Rogue / Random RTP Fraud and Floods

  47. SIP-Specific Attacks: What to expect • Run each variation for 10-15 minutes • In the presence of varying levels of legitimate VoIP traffic • Monitoring DUT resources (CPU, memory), call completion rates, and voice quality of completed calls • It’s typical to see threshold failure effects • i.e. above certain levels of legitimate SIP calls + attack packets, service takes a major hit. Below that threshold normal calls may be handled fine. • DUT often shows weakness within seconds of test start • DUT may exhibit hard or soft crashes • Voice quality may show early warning of catastrophic failure

  48. NIST – National Institute of Standards and Technology Publication 800-58: “Security Considerations for VoIP Systems” (99 pgs, free) http://csrc.nist.gov/publications/nistpubs VoIPSA – Voice over IP Security Alliance Promoting education & awareness, research, testing methodologies & tools Extensive membership: vendors, VoIP providers, researchers, security vendors, test tool vendors www.voipsa.org PROTOS group - University of Oulu in Finland Using protocol fuzzing to discover a wide variety of DoS and buffer overflow vulnerabilities Have exposed HTTP, LDAP, SNMP, WAP, and VoIP vulnerabilities www.ee.oulu.fi/research/ouspg/protos/index.html Mu Security Manufacturers of a powerful protocol mutation tester (Mu-4000) www.musecurity.com Good resources onVoIP Security

  49. Feel free to callif you have any questions Chris Bajorekchris@ct-labs.com916-577-2110 (direct line)

More Related