450 likes | 466 Views
SHOWTIME!!. EVALUATING ACHIEVEMENT. INTRODUCTION. BOTH CHILDREN AND ADULTS WANT TO KNOW HOW THEY COMPARE TO OTHERS OR A STANDARD PRIMARY ROLE OF TEACHER OR PROGRAM LEADER IS TO PROMOTE DESIRABLE CHANGES IN PEOPLE FOR INSTRUCTIONAL OR PROGRAM PROCESS TO BE MEANINGFUL:
E N D
INTRODUCTION • BOTH CHILDREN AND ADULTS WANT TO KNOW HOW THEY COMPARE TO OTHERS OR A STANDARD • PRIMARY ROLE OF TEACHER OR PROGRAM LEADER IS TO PROMOTE DESIRABLE CHANGES IN PEOPLE • FOR INSTRUCTIONAL OR PROGRAM PROCESS TO BE MEANINGFUL: - RELEVANT STATED OBJECTIVES - INSTRUCTION OR PROGRAM MUST BE DESIGNED TO ACHIEVE OBJECTIVES EFFECTIVELY - RELIABLE AND VALID EVALUATION PROCESS THAT ASSESSES ACHIEVEMENT • TESTS ARE ADMINISTERED PRIMARILY TO FACILITATE THE ACHIEVEMENT OF INSTRUCTIONAL AND PROGRAM OBJECTIVES
INTRODUCTION • EDUCATIONAL TESTS CAN BE USED FOR PLACEMENT, DIAGNOSIS, EVALUATION OF LEARNING, PREDICTION, PROGRAM EVALUATION, AND MOTIVATION (CHAPTER 1) • EVALUATION IS NOT SYNONYMOUS WITH GRADING AND EVALUATION CAN OCCUR WITHOUT THE ASSIGNMENT OF GRADES • A TEACHER THAT PASSES ALL STUDENTS REGARDLESS OF THEIR LEVEL OF ACHIEVEMENT OR A TRAINER WHO DOES NOT TELL HIS/HER CLIENT THAT THEY ARE NOT DOING WELL IS IGNORING HER/HIS PROFESSIONAL RESPONSIBILITIES
EVALUATION • OFTEN FOLLOWS MEASUREMENT TAKING THE FORM OF A JUDGMENT ABOUT THE QUALITY OF A PERFORMANCE • OBJECTIVITY OF EVALUATION INCREASES WHEN IT IS BASED ON DEFINED STANDARDS SUCH AS - REQUIRED LEVELS OF PERFORMANCE BASED ON TEACHER’S OR TRAINER’S EXPERIENCE AND/OR CONVICTIONS - THE RANKED PERFORMANCE OF THE REST OF THE GROUP - EXISTING STANDARDS CALLED NORMS
TYPES OF EVALUATION • FORMATIVE EVALUATION THROUGHOUT THE PROGRAM MOTIVATES AND INFORMS PARTICIPANTS OF THEIR PROGRESS AS WELL AS ALLOWS FOR JUDGEMENT REGARDING THE PROGRAM’S EFFECTIVENSS • SUMMATIVE EVALUATION IS THE FINAL MEASURMENT OF A PARTICIPANT’S PERFORMANCE AT THE END OF A PROGRAM WHICH OFTEN INVOLVES COMPARISON AMONG STUDENTS OR STUDENTS TO NORMS OR AN IDEAL STANDARD
STANDARDS FOR EVALUATION: CRITERION REFERENCE STANDARDS • REPRESENTS THE LEVEL OF PERFOMRANCE THAT ALL INDIVIDUALS SHOULD BE ABLE TO ACHIEVE GIVEN PROPER INSTRUCTION • MUST BE USED WITH EXPLICIT OBJECTIVES • USED IN FORMATIVE EVALUATION TO DIAGNOSIS WEAKNESSES AND TO DETERMINE WHEN PARTICIPANTS ARE READY TO PROGRESS • STANDARDS TEND TO BE PASS OR FAIL
PROCEDURES TO DEVELOP CRITERION-REFERENCED STANDARDS • IDENTIFY THE SPECIFIC BEHAVIORS THAT MUST BE ACHIEVED TO ACCOMPLISH A BROAD OBJECTIVE • DEVELOP CLEARLY DEFINED OBJECTIVES THAT CORRESPOND TO THE SPECIFIC BEHAVIORS • DEVELOP STANDARDS THAT GIVE EVIDENCE OF SUCCESSFUL ACHIEVEMENT OF THE OBJECTIVE; THESE STANDARDS MAY BE BASED ON LOGIC, EXPERT OPINION, RESEARCH LITERATURE, AND/OR ANALYSIS OF TEST SCORES • TRY THE SYSTEM AND EVALUATE THE STANDARDS; DETERMINE WHETHER THE STANDARDS MUST BE ALTERED AND DO SO IF NECESSARY “IF STANDARDS ARE TOO HIGH, VERY FEW PEOPLE WILL PASS AND RECEIVE POSITIVE REINFORCEMENT; IF STANDARDS ARE TOO LOW, MANY WILL PASS THAT MAY HAVE FALSE ILLUSIONS OF THEIR CAPABILITIES”
STANDARDS OF EVALUATION: NORM-REFERENCED STANDARDS • COMPARE THE PERFORMANCES OF PEERS • USED IN SUMMATIVE EVALUATION TO DETERMINE IF BROAD PROGRAM OBJECTIVES HAVE BEEN MET • LEVELS OF PERFOMANCE ARE ESTABLISHED THAT DISTINGUISH BETWEEN ABILITY GROUPS RANGING FROM ‘HIGH ABILITY” TO “LOW ABILITY”
GRADING • GRADING IS A TWO-FOLD PROCESS - THE SELECTION OF THE MEASUREMNTS (SUBJECTIVE OR OBJECTIVE) THAT FORM THE BASIS OF THE GRADE AND THE ACTUAL CALCULATION • INSTRUCTIONAL PROCESS BEGINS WITH INSTRUCTIONAL OBJECTIVES AND CULMINATES WITH EVALUATION • GRADES SHOULD BE BASED ON INSTRUCTIONAL OBJECTIVES AND THE SCORES FROM RELIABLE AND VALID TESTS • SELECTION OF TESTING INSTRUMENTS SHOULD CONSIDER: - WHAT ARE THE INSTRUCTIONAL OBJECTIVES? - WERE THE STUDENTS TAUGHT IN ACCORDANCE WITH THESE OBJECTIVES? - DOES THE TEST YIELD SCORES THAT REFLECT ACHIEVEMENT OF THE OBJECTIVES?
GRADING ISSUES • IS IT A MAJOR OBJECTIVE OF THE PHYSICAL EDUCATION PROGRAM? • DO ALL STUDENTS HAVE IDENTICAL OPPORTUNITIES TO DEMONSTRATE THEIR ABILITY RELATIVE TO THE ATTRIBUTE? • CAN THE ATTRIBUTE BE MEASURED SO THAT THE TEST SCORES ARE RELIABLE AND THE INTERPRETATIONS OF THE SCORES VALID? • WERE THE GRADING POLICES EXPLAINED AT THE BEGINNING OF THE PROGRAM? • WERE THE GRADES BASED ON A SUFFICIENT AMOUNT OF VALID EVIDENCE? • WHAT SHOULD THE RANGE IN GRADING BE? • SHOULD THE RANGE IN GRADING BE THE SAME FOR A BEGINNING COURSE COMPARED TO AN ADVANCED COURSE? • SHOULD THE OVERALL QUALITY OF THE CLASS AFFECT THE GRADING DISTRIBUTION? • DOES THE GRADING REPRESENT ONLY ACHIEVEMENT OR ACHIEVEMENT AND STUDENT EFFORT AS WELL? • IF PASS-FAIL GRADES ARE ASSIGNED WILL ANYONE FAIL?
GENERALLY ACCEPTED GRADING PHILOSOPHY • GRADE A STUDENT RECEIVES SHOULD NOT DEPEND ON - THE SEMESTER OR YEAR IN WHICH THE CLASS IS TAKEN - THE INSTRUCTOR, PARTICULARLY IF SEVERAL INSTRUCTORS TEACH THE COURSE - OTHER STUDENTS IN THE COURSE
GRADING METHODS • NATURAL BREAKS • TEACHER’S STANDARD • RANK ORDER • NORMS
GRADING METHODS: NATURAL BREAKS • SCORES ARE LISTED FROM BEST TO WORST • EACH BREAK OR GAP IS A CUT-OFF POINT FOR A LETTER GRADE • USEFUL METHOD FOR TEACHERS WHO DO NOT BELIEVE IN SPECIFYING THE POSSIBLE GRADES AND PERCENTAGES FOR THESE GRADES • POOREST METHOD OF ASSIGNING GRADES • NON SEMESTER-TO-SEMESTER CONSISTENCY • EACH STUDENT’S GRADE IS DEPENDENT ON THE PERFORMANCE OF OTHER STUDENTS IN THE CLASS
GRADING METHODS: TEACHER’S STANDARD • GRADES ARE BASED ON THE TEACHER’S PERCEPTION OF WHAT IS FAIR AND APPROPRIATE, SOMETIMES WITHOUT ANALYZING ANY DATA • EX.: 90-100 A, 80-89 B, ETC • CONSISTENT STANDARDS FROM YEAR TO YEAR ARE POSSIBLE • STUDENT’S PERFORMANCE IS NOT DEPENDENT ON THE PERFORMANCE OF OTHER STUDENTS • GOOD METHOD FOR EXPERIENCED TEACHER’S WHO HAVE REASONABLE STANDARDS OR EXPECTATIONS OF STUDENTS’ ABILITIES • NORM-REFERENCED STANDARDS DEVELOPED USING THE CRITERION-REFERENCED STANDARDS SET BY THE TEACHER
GRADING METHODS: RANK ORDER • STRAIGHT FORWARD, NORM-REFERENCED METHOD OF GRADING • TEACHER DECIDES LETTER GRADES WILL BE ASSIGNED AND WHAT PERCENTAGE OF THE CLASS SHOULD RECEIVE EACH LETTER GRADE • SCORES ARE ORDERED AND GRADES ARE ASSIGNED • ADVANTAGES INCLUDE THAT IT IS QUICK AND EASY TO USE AND ALLOWS GRADES TO BE DISTRIBUTED AS WANTED • DISADVANTAGES INCLUDE THAT A STUDENT’S GRADE IS DEPENDENT ON THE GRADES OF OTHER STUDENTS AND THAT NO ALLOWANCE IS MADE FOR THE QUALITY OF THE CLASS WHICH RESULTS IN GRADES VARYING FROM SEMESTER TO SEMESTER
GRADING METHODS: NORMS • NORMS BASED ON ANALSYS OF THE DATA, NOT ON SUBJECTIVE STANDARDS CHOSEN BY THE TEACHER • DEVELOPED BY GATHERING SCORES FOR A LARGE NUMBER OF INDIVIDUALS WITH SIMILAR DEMOGRAPHICS • DATA IS STATISTICALLY ANALYZED AND PERFORMANCE STANDARDS ARE THEN CONSTRUCTED BASED ON THE ANALYSIS • ADVANTAGES INCLUDE - THE STUDENT’S GRADE IS NOT BASED ON THE PERFORMANCE OF THE GROUP OR CLASS BEING EVALUATED - THE NORMS CAN BE USED FOR SEVERAL YEARS (THEREBY PROVIDING CONSISTENCY FROM SEMESTER TO SEMESTER) BEFORE THEY NEED TO RE-EVALUATED AND PERHAPS REVISED • HOWEVER, THE TEACHER STILL NEEDS TO DECIDE HOW LETTER GRADES WILL BE ASSIGNED TO THE NORMS
GRADING METHODS: NORMSHOW WOULD YOU ASSIGN A LETTER GRADE TO THESE NORMS?
FINAL GRADES • ASSIGNMENT OF A FINAL GRADE OR FINAL CLASSIFICATION (FITNESS OR REHAB) MUST BE BASED ON ALL AVAILABLE INFORMATION • TEACHER SHOULD CHOOSE AND EXPLAIN THE FINAL GRADING SYSTEM AT THE BEGINNING OF A PROGRAM • THREE METHODS OF ASSIGNING FINAL GRADES - SUM OF LETTER GRADES - POINT SYSTEM - SUM OF THE T-SCORES
SUM OF THE LETTER GRADES • USED WHEN TEST SCORES REFLECT DIFFERENT UNITS OF MEASURE THAT CANNOT BE SUMMED • SCORES ON TESTS ARE CONVERTED TO LETTER GRADES • LETTER GRADES ON EACH TEST ARE CONVERTED TO POINTS (A+ = 14, A = 13, A- = 12, B+ = 11, ETC. DOWN TO F = 1 AND F- = 0) • POINTS ON ALL TESTS ARE ADDED TOGETHER AND DIVIDED BY THE NUMBER OF TESTS TO GET AN AVERAGE SCORE (POINT VALUE), WHICH IS CONVERTED BACK INTO A LETTER GRADE USING THE 14-POINT SCALE ABOVE
SUM OF THE LETTER GRADES WHEN TESTS ARE EQUALLY WEIGHTED • USING TABLE 5.5 AS AN EXAMPLE THAT HAD 5 TESTS • SUM = 45 / 5 TESTS = 9 • AVERAGE SCORE (POINT VALUE) OF 9 = B -
SUM OF THE LETTER GRADESWHEN TESTS ARE EQUALLY WEIGHTED • USING TABLE 5.6 AS AN EXAMPLE THAT HAD 5 TESTS • SUM = 59 / 5 TESTS = 11.8 • AVERAGE SCORE (POINT VALUE) OF 11.8 = B+ AS 12 IS NEEDED FOR AN “A-” • DOES THIS SEEM FAIR LOOKING AT THE TEST SCORES?
DRAWBACKS OF THE SUM OF THE LETTER GRADES METHOD • LOSE INFORMATION BY CONVERTING TEST SCORES TO POINT VALUES • 96% OR 93% ARE BOTH AN “A” OR 13 POINTS • WASTE OF TIME TO CALCULATE THE MEAN • NO ALLOWANCE IS MADE IN THE FINAL GRADE FOR THE REGRESSION EFFECT AND THUS VERY FEW HIGH OR LOW GRADES ARE GIVEN, MOST GRADES ARE IN THE MIDDLE OF THE RANGE • REGRESSION EFFECT: A STUDENT WHO EARNS AN “A” OR A “F” ON ONE TEST IS LIKELIER ON THE NEXT TO EARN A GRADE CLOSER TO “C” THAN TO REPEAT THE FIRST PERFORMANCE
POINT SYSTEMS • OFTEN USED BY CLASSROOM TEACHERS SO THAT ALL TEST SCORES ARE IN THE SAME UNIT OF MEASURE AND CAN BE EASILY COMBINED
SUM OF THE T-SCORES • CHANGE TEST SCORE TO T-SCORES AND SUM THE T-SCORES AS PREVIOUSLY DISCUSSED • POSSIBLE TO WEIGHT EACH TEST DIFFERENTLY IN SUMMING THE T-SCORES BY USING THE PROCEDURES JUST OUTLINED FOR WEIGHTING LETTER-GRADE POINTS
OTHER EVALUATION TECHNIQUES • BEST OF 5 PEOPLE RECEIVE A SCHOLARHIP, JOB, PROMOTION, ETC • RANK-ORDER SITUATION WHEN THE 5 BEST POPLE ARE REWARDED (PASS) AND REST GET NOTHING (FAIL) • NUMBER OF PEOPLE AWARDED OR RECOGNIZED IS NOT LIMITED • CRITERION-REFERENCED SITUATION THAT IDEALLY NEEDS A GOLD STANDARD OR A STANDARD ESTABLISHED BY EXPERT(S) • PHYSICAL THERAPIST OR ATHLETIC TRAINER SETS A STANDARD FOR RELEASING PEOPLE FROM THERAPY PROGRAM - CRITERION REFERENDED STANDARD WHERE STANDARD SHOULD BE BASED ON MINIMUM STRENGTH OR ABILITYNEEDED TO FUNCTION IN DAILY LIFE
AUTHENTIC ASSESSMENT “AN ATTEMPT TO EVALUATE PEOPLE IN A REAL-LIFE OR MORE “AUTHENTIC” SETTING”
CHARACTERISTICS OF AUTHENTIC ASSESSMENT • AUTHENTHIC ASSESSMENTS PRESENT CHALLENGES THAT ARE REPRESENTATIVE OF REAL LIFE • AUTHENTIC ASSESSMENTS REQUIRE STUDENTS TO DEMONSTRATE HIGHER-LEVEL THINKING • STUDENTS KNOW THE STANDARDS FOR ASSESSMENT FROM THE BEGINNING ALLOWING THEM TO CONSTANTLY RECEIVE FEEDBACK ABOUT THEIR PROGRESS • AUTHENTIC ASSESSMENTS BECOME PART OF THE CURRICULUM RESULTING IN TEACHERS TEACHING TO THE TEST • STUDENTS OFTEN PRESENT THE CULMINATION OF THE AUTHENTIC ASSESSMENT PUBLICLY • THERE IS AM EMPHASIS ON PROCESS (HOW STUDENTS ARRIVE AT THE CORRECT ANSWER) AND NOT JUST PRODUCT (CORRECT ANSWER)
TYPES OF AUTHENTIC ASSESSMENT • STUDENT PROJECTS • STUDENT LOGS • STUDENT JOURNALS • PEER OBSERVATION • SELF-ASSESSMENT • GROUP PROJECTS • PORTFOLIOS • EVENT TASKS • TEACHER OBSERVATION
RUBRICS • OFTEN USED IN AUTHENTIC ASSESSMENT • PERSON’S PERFORMANCE IS COMPARED TO CRITERIA SPECIFIED IN THE RUBRIC USING A SCALE THAT RANGES FROM 3 (OUTSTANDING, ACCEPTABLE, AND DEFICIENT) TO 5 (EXCELLENT, GOOD, SATISFACTORY, FAIR, AND POOR) LEVELS • WHEN DESIGNING THE RUBRIC: • DECIDE WHICH ERRORS WOULD BE MOST JUSTIFIABLE FOR DISCRIMINATING BETWEEN ABILITY LEVELS • BE AS SPECIFIC AS POSSIBLE WHEN DESIGNING RUBRICS AS THIS WILL INCREASE OBJECTIVITY
CONCERNS WITH AUTHENTIC ASSESSMENT • QUALITY (VALIDITY, RELIABILITY, AND OBJECTIVITY) OF AUTHENTIC ASSESSMENT • HOW WELL DOES THE AUTHENTIC ASSESSMENT TEST RELATE TO OTHER MEASURES (CRITERION-RELATED VALIDITY) - ONE MEASURE OF VOLLEYBALL SKILL SHOULD BE RELATED TO OTHER MEASURES OF VOLLEYBALL SKILL • ABILITY OF THE ASSESSMET TO PREDICT FUTURE PERFORMANCE (PREDICTIVE VALIDITY) - CAN AUTHENTIC ASSESSMENT OF CURRENT FITNESS PREDICT FUTURE FITNESS BEHAVIOR? • DOES THE AUTHENTIC ASSESSMENT COVER ALL AREAS OF THE ACTIVITY (CONTENT VALIDITY) - ARE THE AUTHENTIC ASSESSMENT OF SOME SOFTBALL SKILLS REFLECTIVE OF THE ALL THE COMPONENTS OF SOFTBALL? • DETAILED RUBRIC AND PRACTICE SCORING WITH THE RUBRIC CAN ENHANCE THE RELIABILITY AND OBJECTIVITY OF AUTHENTIC ASSESSMENT
CHARACTERISTICS OF GOOD AUTHENTIC ASSESSMENT • MEANINGFUL FOR BOTH TEACHERS AND STUDENTS • SERVES AS MOTIVATION FOR PERFORMANCE • EVALUATES ATTRIBUTES THAT ARE IMPORTANT TO BOTH TEACHERS AND STUDENTS • REQUIRES DEMONSTRATION OF COMPLEX COGNITION • EXEMPLIES CURRENT STANDARDS OF CONTENT QUALITY • MINIMIZES THE EFFECTS OF IRRELEVANT SKILLS • POSSESSES EXPLICIT STANDARDS FOR RATING OR JUDGMENT
PROGRAM EVALUATION • SUCCESS OF A PROGRAM DEPENDS LESS ON ITS PHYSICAL CHARACTERISTICS (E.G., FACILITIES AND EQUIPMENT) AND MORE ON THE MANNER IN WHICH THEY ARE USED IN THE INSTRUCTIONAL OR PROGRAM PROCESS • ARE STUDENTS ACHIEVING IMPORTANT INSTRUCTIONAL OBJECTIVES? • ARE PARTICIPANTS BENEFITING FROM THE PROGRAM? • ARE PROGRAM OBJECTIVES BEING MET? • BOTH FORMATIVE AND SUMMATIVE EVALUATION ARE REQUIRED FOR PROGRAM EVALUATION • REQUIRES PLANNED DATA COLLECTION FROM TESTING AND/OR GOOD DAILY RECORD KEEPING
PROGRAM EVALUATION • FORMATIVE EVALUAITON IS THE PROCESS OF JUDGING PERFOMANCE WITH REFERENCE TO AN ESTABLISHED STANDARD (CRITERION) • FORMATIVE EVALUATION REQUIRES SELECTION OF WELL-DEFINED PROGRAM OBJECTIVES AND ESTABLISHMENT OF REALISTIC STANDARDS • VALUE OF FORMULATIVE EVALUATION IS THAT IF IT SIGNALS THAT SOMETHING IS WRONG, ACTION CAN STILL BE TAKEN TO ADJUST AND IMPROVE THE PROGRAM
PROGRAM EVALUATION • SUCCESS OF A PROGRAM IS REFLECTED IN TERMS OF HOW WELL A PROGRAM ACHIEVES ITS BROAD, OVERALL OBJECTIVES • SCHOOL PERFORMANCES ARE OFTEN COMPARED TO NATIONAL, STATEWIDE, OR LOCAL NORMS • IN FITNESS PROGRAMS PARTICIPANT PEFORMANCE IS OFTEN COMPARED TO NATIONAL OR LOCAL STANDARDS OR PERHAPS TO LONG-TERM EXERCISE ADHERENCE PATTERNS
PROGRAM IMPROVEMENT • EVALUATION IS A DYNAMIC DECISION-MAKING PROCESS THAT WORKS TOWARD PROGRAM IMPROVEMENT • FORMATIVE EVALUATION LEADS TO HIGHER-LEVEL ACHIEVEMENT OF OBJECTIVES EVALUATED SUMMATIVELY • PRIMARY OBJECTIVE OF PROGRAM DEVELOPERS SHOULD BE IMPROVED PARTICIPANT PERFORMANCE OVER TIME
COMMENTS OR QUESTIONS?? THANK YOU, THANK YOU VERY MUCH!!