1 / 81

AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS. Sample Statistics. If x 1 , x 2 , …, x n are n observations of the value of an unknown quantity X , they constitute a sample of size n for the population on which X is defined. Sample mean Sample variance.

elina
Download Presentation

AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AN INTRODUCTION TO STATISTICAL ANALYSISOF SIMULATION OUTPUTS

  2. Sample Statistics • If x1, x2, …, xn are n observations of the value of an unknown quantity X, they constitute a sample of size n for the population on which X is defined. • Sample mean • Sample variance

  3. Central Limit Theorem • If the n mutually independent random variables x1, x2, …, xn have the same distribution, and if their mean m and their variance s2 exist then …

  4. Central Limit Theorem • The random variable is distributed according to the standard normal distribution (zero mean and unit variance).

  5. Estimating a mean • Assume that we have a sample x1, x2, …, xn consisting of n independent * observations of a given population • The sample mean xbar is an unbiased estimator of the mean m of the population *This is the critical assumptionWithout it, we cannot apply the formula

  6. Large populations • For large values of n, the (1-)% confidence interval for m is given by • withfor the standard normal distribution

  7. Explanations • 1 – a expressed in percent is the level of the confidence interval • 90% means a = 0.10 • 95% means a = 0.05 • 99% means a = 0.01 • a is the error probability • Probability that the true mean falls outside the confidence interval

  8. 95% Confidence Intervals • For =.05, za/2=1.96 • Example: • If = 35, s= 4 and n = 100 • The 95% confidence interval for m is 35 ± 1.96x4/10 = 35 ± 7.84

  9. When s is not known • We can replace s in the preceding formula by the standard-deviation s of the sample • When n < 30, we must read the value of za/2 from a table of Student's t-distribution with n - 1 degrees of freedom • When n  30, we can use the standard values

  10. Confidence Intervals in CSIM • CSIM can automatically compute confidence intervals for the mean value of any table, qtable and so on. • For everything but boxes • xyx->confidence(); • For the elapsed times in a box • bd->time_confidence(); • For the population of a box • bd->_number_confidence(); • We get 90, 95 and 99 percent confidence intervals

  11. Confidence Intervals in CSIM • We get 90, 95 and 98 percent confidence intervals • Computed using batch means method • See next section • But only for the mean

  12. Estimating a proportion • A proportion p represents the probabilityP(X) for some fixed threshold  • 97% of our customers have to wait less than one minute • Confidence intervals for proportions are much easier to compute than confidence intervals for quantiles

  13. Assumptions • Assume that we have n independent observations x1, x2, …, xn of a given population variable X and that this variable has a continuous distribution • Let p represent the proportion we want to estimate, say P(X) • Let k represent the number of observations that are .

  14. Basic property • If p represent the proportion we want to estimate, say P(X), and k represent the number of observations that are  • The rv k is distributed according to a binomial distribution with mean np and variance p(1 – p) • The rv k/n is distributed according to a binomial distribution with mean p and variance p(1 – p)/n

  15. The formula • When n > 29.the distribution of k/n is approximately normal • We then have

  16. AUTOCORRELATION

  17. The problem • All statistical analysis techniques we have discussed assume that sample values are mutually independent • Generally false for quantities such as • Waiting times, response times, … • Tend instead to be autocorrelated • When the waiting lines are long, everybody wait a long time

  18. Traditional solution • Keep the measurements sufficiently apart • Sample them every T minutes apart • Not practical • Would require very long simulations

  19. Three good solutions • Batch means • Regenerative method • Time series analysis

  20. Batch means • We group consecutive observations into “batches” • We compute the means of these batches • We observe that autocorrelation among batch means decreases with size of batches • When size increases, each batch includes more observations that are far apart

  21. Example • We collected the following values: • 4, 3, 3, 4, 5, 5, 3, 2, 2, 3 • We group them into two batches of five observations: • 4, 3, 3, 4, 5 and 5, 3, 2, 2, 3 • The batch means are: • 3.8 and 3

  22. Batch means in CSIM • CSIM uses fixed-size batches • To compute confidence intervals • To control the duration of a simulation(run-length control)

  23. Regenerative method • Most systems with queues go through states that return it to an state identical to its original state • The system regenerates itself • Examples: • Whenever a disk array is brought back to its original state • Whenever a camper rental agency has all its campers available

  24. Key idea • Define a regeneration interval as an interval between two consecutive regeneration points: • Observations collected during the same regeneration interval can be correlated • Observations collected during different regeneration intervals are independent

  25. Application • We group together observations that occur within the same regeneration interval • We compute the means of these groups of observations • These group means are independent from each other

  26. Limitations of the approach • Not general: • System must go through regeneration points • System must be idle • Leads to complex computations • We rarely have exactly the same number of observations in two different regenerations intervals

  27. Time series analysis • Treats consecutive observations as elements of a time series • Estimates autocorrelation among the elements of a time series • Includes this autocorrelation in the computation of all confidence intervals

  28. RUN LENGTH CONTROL

  29. Objective • Accuracy of confidence intervals increases with duration of simulation • The 1/n factor • We would like to be able to stop the simulation once a given accuracy level has been reached for the confidence interval of a specific measurement

  30. The CSIM solution (I) • Specify • Quantity of interest • Mean value from a table, a qtable, … • A relative accuracy • Maximum relative error • A confidence level (say, 0.90 to 0.99) • A maximum simulation duration • In seconds of CPU time

  31. The CSIM solution (II) • void table::run_length(double accuracy,double conf_level, double max_time) • void qtable::run_length(double accuracy,double conf_level, double max_time) • void meter::run_length(double accuracy,double conf_level, double max_time) • void box::time_run_length(double accuracy,double conf_level, double max_time) • void box::number_run_length(double accuracy,double conf_level, double max_time)

  32. The CSIM solution (III) • Example: • thistable->run_length(.01, .95, 500) • Specifies than we want an error less than 1 percent for 95% confidence interval of mean of thistable • Stops simulation after 500 seconds • thisbox->:time_run_length(.01, .99, 500) • Same for time average of a box

  33. The CSIM solution (IV) • Replace termination test by • converged.wait();

  34. Example (I) • The campers • We want • A maximum error of 1 percent (0.01) • For the 95 percent confidence interval • Of average number of rented campers • A maximum simulation time of 100 s • Will use number aspect of agency box

  35. Example (II) • Add • agency->number_confidence() after activation of box agency in csim process

  36. Example (III) • Put main loop • while(simtime() < DURATION) { hold(exponential(MIART); customer();} in a separate arrivals process • Make it an infinite loop

  37. Example (IV) • Add • converged.wait(); after call to arrivals process • arrivals():converged.wait(); • Best way to let sim process generate customers and wait for terminationin parallel

  38. Example (V) • The new arrivals process void arrivals() { process(“arrivals”); // REQUIRED for(;;) { // forever loop hold(exponential(MIART)); customer(); } // forever} // arrivals

  39. Warnings • Confidence intervals do not take into account model inaccuracies • While the batch means method eliminates most effects of measurement autocorrelation, it is not always 100% effective • The max_time parameter of the run_length() will not necessarily stop the simulation just after the specified CPU time • Like the emergency brake of a train

  40. Objective • Partition processes into different classes • Low priority • High priority • Obtain separate statistics for each process priority class

  41. Declaring and initializing process classes • To declare a dynamic process class: • process_class *c; • To initialize a process class before it can be used in any other statement. • c = new process_class("low priority")

  42. To assign a priority class • Add inside the process • c->set_process_class(); • Processes that have not been assigned a process class belong to the “default” process class

  43. Reporting • Can use • report() • report_classes()

  44. Other options • Can change the name of a process class: • c->set_name("high priority"); • Can reset statistics associated with a process • c ->reset(); • Can do the same for all process classes: • reset_process_classes(); • Can delete a dynamic process class: • delete c;

  45. RANDOM NUMBERS

  46. Background • We need to distinguish between • Truly random numbers • Obtained through observations of a physical random process • Rolling dices, roulette • Atomic decay • Pseudo-random numbers • Obtained through arithmetic operations

  47. Example • Linear Congruential Generators (LCG) • Easy to implement and fast • Defined by the recurrence relation: rn+1 = (a rn + c ) mod m where • r1, r2, … are the random values • m is the "modulus“ • r0 is the seed

  48. Two realizations • GCC family of compilers • m = 232a = 69069 c = 5 • Microsoft Visual/Quick C/C++ • m = 232a = 214013 c = 2531011

  49. Problems with pseudorandom number generators • Much shorter periods for some seed states • Lack of uniformity of distribution • Correlation of successive values • …

  50. Better RNGs • Use the Mersenne twister • Period is 219937 - 1 • Blum-Blum-Schub

More Related