300 likes | 373 Views
Implementing Legacy Statistical Algorithms in a Spreadsheet Environment. Stephen W. Liddle Information Systems Faculty Rollins eBusiness Center. John S. Lawson Department of Statistics. Brigham Young University Provo, UT 84602. Overview. Introduction Fundamentals of VBA in Excel
E N D
Implementing Legacy Statistical Algorithms in a Spreadsheet Environment Stephen W. Liddle Information Systems Faculty Rollins eBusiness Center John S. Lawson Department of Statistics Brigham Young University Provo, UT 84602
Overview • Introduction • Fundamentals of VBA in Excel • Retargeting traditional algorithms to a spreadsheet environment • Converting FORTRAN to VBA • Conclusions
Why Convert FORTRAN Programs to Run in a Spreadsheet Environment? • Useful code available that is not implemented in standard statistical packages • FORTRAN compilers not usually available on normal Windows workstation • Many textbooks refer to published FORTRAN algorithms
Sources for Published FORTRAN Algorithms • STATLIB (http://lib.stat.cmu.edu/) • General Archive • Applied Statistics Archive • Journal of Quality Technology Archive • JASA Software Archive • JCGS Archive
Advantages of Running Legacy FORTRAN Code in Excel • Comfortable environment for practitioners • More user friendly input from spreadsheet • Output to spreadsheet allows further graphical and computational analysis of results with Excel functions
Proposed Methodology • Understand original FORTRAN program • Choose suitable I/O methods • Convert original FORTRAN code to VBA • Test and use resulting Excel code
Visual Basic For Applications • Built on ANSI BASIC • Language engine of Microsoft Office • Modern structured programming language • Has vast array of types, functions, programming helps • Powerful support environment (Office platform) • Popular in business contexts
Application Workbooks (Workbook) Worksheets (Worksheet) Range Chart Excel Object Model • Objects in Excel are addressable in VBA • Each object has: • Properties • Methods
Output Region Input Region Clicking these buttons runs the ORPS1 and ORPS2 algorithms. Input/Output Methods • Non-interactive • Files, databases • Worksheet cells • Interactive • Message boxes • Input boxes • Custom GUI forms
FORTRAN vs. VBA • VBA: (-b+Sqr (b^ 2-4*a*c))/(2*a) • FORTRAN: (-b+SQRT(b**2-4*a*c))/(2*a)
More Operators • .EQ. = • .NE. <> • .LT. < • .LE. <= • .GT. > • .GE. >= • .AND. And • .OR. Or • .NOT. Not • // &
Data Types • INTEGER Byte, Integer, Long • REAL Single • DOUBLE PRECISION Double • COMPLEX Non-primitive in VBA • LOGICAL Boolean • CHARACTER String • CHARACTER*length String*length • Other notable VBA types: • Currency, Decimal, Date, Variant
Worksheet Functions • ChiDist(x,deg_freedom) • Returns one-tailed probability of the χ2 distribution. • Correl(array1,array2) • Returns the correlation coefficient of two cell ranges. • Fisher(x) • Returns the Fisher transformation at a given x. • Pearson(array1,array2) • Returns the Pearson product moment correlation coefficient for two sets. • Quartile(array,quart) • Returns the requested quartile of a data set. • StDev(array) • Returns the standard deviation of a data set. • ZTest(array,x,sigma) • Returns the two-tailed P-value of a z-test.
Subtle Differences (“Gotchas”) • Implicit conversion of real to integer values • FORTRAN: truncate • VBA: round • Solution: use VBA’s Fix(), which truncates • Both languages allow implicit typing • This introduces ambiguity • Solution: supply explicit types everywhere
Eliminating Goto Statements • Computer science accepts the axiom that goto is generally “considered harmful” • We advocate rewriting alogrithms to use structured programming techniques where feasible • Sine qua non is “make it work” • It’s a good idea for maintainability, understandability to move to structured form
Eliminating Goto Statements DO 8 J=1,3 ... 6 ... IF(OBJFN.GT.BESTFN) GO TO 7 ... GO TO 6 7 IF(J.EQ.3) GO TO 8 XK=BESTK-STEP 8 CONTINUE
Eliminating Goto Statements For j=1 To 3 ... 6 ... IF(OBJFN.GT.BESTFN) GO TO 7 ... GO TO 6 7 IF(J.EQ.3) GO TO 8 XK=BESTK-STEP 8 Next j
Eliminating Goto Statements For j=1 To 3 ... 6 ... IF(OBJFN.GT.BESTFN) GO TO 7 ... GO TO 6 7 If j <> 3 Then xk = bestk - step End If Next j
Eliminating Goto Statements For j=1 To 3 ... Do Until objfn > bestfn ... Loop If j <> 3 Then xk = bestk - step End If Next j
Our Reasoning • Digital assets are fragile • FORTRAN is not universally available • Excel is a ubiquitous, powerful platform • VBA is a full-featured language capable of handling sophisticated statistical computations
Conclusions • We recommend creating a Web-based repository of Excel/VBA implementations of classic statistical algorithms • We can preserve our legacy algorithms in this modern spreadsheet environment • E-mail us if you want a copy of our manuscript (liddle or lawson@byu.edu)