Personality assessment

Personality assessment Recent survey of practicing Ph.D.s, PsyD.s, and Ed.s revealed that only 32% use personality tests and only 43% do treatment planning. De-emphasis in personality training occurred at the same time as Mischel shock in 1968, so clinicians trained in the late 1960s and 1970s did not value personality assessment Today, treatment planning based on assessments is essential from both an ethical standpoint and for insurance reimbursement

Objective assessments? • Personality assessment is subjective - for the most part it is, though subjective doesn't necessarily mean inaccurate or even less accurate. • How can personality assessment be more objective • assess any biases and correct for them (lie, defensiveness) • find a method to avoid such biases • look for convergence with reports from others • assess with low face valid instruments and look for consistent patterns (though this only really addresses intentional faking) • Personality assessment is used to further describe the client, just as a diagnosis does (note that you would not say that depression is causing the patient's behaviors, you merely use the term to summarize a cluster of behaviors. The diagnosis itself also does not necessarily imply a causal mechanism nor an explanation - those from different perspectives would define it differently) • e.g., if someone is depressed it could be explained biologically, cognitively, behaviorally, or even in psychodynamic terms

The structure of personality • Personality involves stable patterns of behavior, affect, and cognitions. So how stable is stable? (states vs. traits) • Levels of analysis • 1. factors - groups of traits that show better global predictive utility (e.g., Big 5 of N, E, O, A, C; The Big 3 of N, E, P; Big 2) • 2. traits - clusters of consistent individual behaviors • 3. habits - consistent (over time) individual behaviors • 4. single acts - individual behaviors • All levels are used to predict future behavior with the top being the most robust • Consider this model when recommending or implementing change in clients

Predicting behavior • Difficult to predict specific single behaviors from global trends; (Epstein, 1983) • For clinical evaluations, if the context of interest is known, then you may want to trade off the generalizability and give a specific prediction • e.g., Pt.’s test scores indicate that he is generally impulsive. This may be exacerbated when in the company of other individuals who are also impulsive and when the individual is drinking, as alcohol minimizes any inhibition processes that he might have. This substantially increases the likelihood that he will act impulsively when...

Readings/Discussion • I will present all of the readings (see power point slides) • Read material in advance and know your MMPI • You will generate 3 questions each for every reading, at least one of which will be an open-ended question leading to class discussion. • Two scheduled “debates”: • Should we use a unique or standardized test battery? (Pros and Cons) • Should we use projective tests? (Are projectives tests or techniques)

Axis I and II • Personality addresses both AXIS I and AXIS II disorders. • What are some AXIS I disorders that might be related to personality traits?e.g., • depression and NA/Neuroticism • anxiety and NA/neuroticism • impulse control disorders & extraversion/sensation seeking • AXIS II personality disorders explicitly link up with personality assessments (video & DSM-IV) • Cluster A (odd): Paranoid, Schizoid, Schizotypal • Custer B (emotional): ASPD, Borderline, Histrionic, Narcissistic • Cluster C (anxious): Avoidant, Dependent, Obsessive-Compulsive • PD NOS – features of several Dx,but does not meet criteria for any one.

Selecting a test battery (see Beutler, 1995) • What is the referral question? • Single most important determinant • Are there any limiting factors with regard to the client? • Context of the evaluation? (work, school, hospital, etc.) • Follow up assessment relevant to trait findings (e.g., patients who show impulse control problems should also be assessed for potential for acting out violently) • Problem focused or broad, multipurpose battery • Nomothetic (allows for normative evaluations) or ipsative (allows for the evaluation of the individual) analysis

Next Class Debate: Pros and cons of using a standardized test battery vs. a unique battery to meet the client’s assessment needs. • Use Beutler (1995) reading and any other sources. • 2 page paper and debate • For next Tues: Complete MMPI-2 and score clinical scales

If using qualitative methods, consider: • 1.Method appropriateness – are there quantitative methods that you could use instead? • 2.Openness – make clear the theoretical orientation that undergirds the qualitative assessment • 3.Theoretical sensitivity – use qualitative methods that are based on accepted theories not your own theories 4.Bracketing of expectation – you must explicitly state where your conclusions depart from accepted theories 5.Responsibility – how were the qualitative methods administered and interpreted • 6.Saturation/generalizability – when assessing traits, sample from a large number and wide range of situations 7.verification of methods – cross-validate your methods using other reports, other test material to see if it agrees with your conclusions, do findings predict outcomes, etc.

If using qualitative methods, consider: (cont) • 8.grounding – stay close to the data when making interpretations (no big theoretical leaps) • 9.coherence – do all of the interpretations fit together to make a coherent story • 10.believability/usefulness – does the use of the qualitative method provide more info on the client, or just raise more questions? Does it result in a believable narrative? • 11. Intelligibility – Is the report readable and jargon free?

MMPI (Hathaway & McKinley, 1943) • 10 clinical scales and 3 validity scales • Empirical scale development with items selected based on their ability to differentiate normals, from a target group (another clinical group with similar symptoms was sometimes also employed) • Clients should be 18 or older & 6th grade education • Generally lower face validity (breaks with tradition of items that clearly sample the domain of interest); most relevant for clinical population

MMPI development • Item pool derived from psychological and psychiatric reports, textbooks, previous scales, etc. • Criterion group composition • Minnesota normals – 724 relatives and visitors of patients at the U. of M. Hospitals, 265 recent high school grads, 265 administration workers, and 254 medical patients • Clinical groups – 221 patients representing the major psychiatric categories (excludes those with multiple diagnoses, or questionable diagnoses) • Item analysis to identify those items differentiating the clinical and normal groups

MMPI development – cont. • The items that could differentiate were then cross validated with new groups of normals and patients • Later developed two non-clinical scales • M/F – initially to identify male homosexuals was augmented with broader items • Si – derived from an introversion/extraversion scale and cross validated by predicting involvement in college activities in a second sample (all female college students) • Validity scales were either derived rationally (L & K) or from baserates in the normal group (F)

Utility of the MMPI • Not considered a diagnostic inventory (as was originally intended) • Ineffective at differential diagnosis (based on how it was originally developed) • Numerical scale labels was intended to further minimize the connection with a specific diagnostic label

Some problems with MMPI • Method of determining the criterion group • The PIGs were not a truly random group (relatives and friends of those in the hospital – though largely the medical patients); convenient • Criterion and PIGs were largely from the midwest, in the late 1930s/early 1940s • Utility of some of the scales as it matched diagnostic concerns of that era, dated and culture-specific item content, and representativeness of the norm group.

MMPI vs. MMPI-2 (1989) • MMPI was the most widely used personality test in all pops (though only validated for inpatient adult samples) • MMPI validation and norm samples were ones of convenience with limited variability on education (M=8 years), coming from a rural background in the midwest • Normative data collected in the 1930s • Clinical cut-off now defined by t-score of 65 vs. 70 on the MMPI

MMPI vs. MMPI-2 • Advantages of updating the test • more representative norms (based on projected census data) • relevance of the items • language employed for the items (both temporally laden references like “drop the hanky”, and gender biases in item content) • addition of new scales of relevance today • Uniform T-score transformation now used so that T-scores reflect percentile ranks that are the same across all clinical scales

MMPI vs. MMPI-2 • Disadvantages to all updates • over 20,000 published studies no longer apply • MMPI-2 must revalidate all of the scales • inability to make comparisons with adolescent scores (MMPI-2 vs. MMPI-A) • Many of the new scales are very short and lack appropriate psychometric properties • How often should we redevelop or renorm the scale?

MMPI-2 (1989): 567 items • Norm group = 2,600 community based subjects • 1138 m & 1462 f, aged 18-85 (M=41, SD15.3), education 3 yrs - 20+, 61% married median incomes $25-$35,000, 3% of m and 6% of f receiving mental health treatment • 81% Caucasian, 12% A-A, 3% Hispanic, 3% Native American, 1% Asian-American

Validity scales • Assumption that the clinical population will not be able to answer forthright • Lie – naive or unsophisticated lying (low SES and education) • K – less obvious (high SES and education) defensiveness is a component of all responding • F – answering questions in such a way so as to be different from 90% or more of the population (non-normative responses); See fake bad/fake good profiles • F – K Index = can be used to indicate fake bad, with larger numbers making it more likely (little evidence to suggest that fake good can be detected); see p. 38

Clinical Scales • 1. Hs - exaggerated concerns re: physical illness, or tendency to report symptoms • 2. D - Clinical dep; unhappy & pessimistic about the future • 3. Hy - conversion reactions (substitute illness for emotions) • 4. Pd - History of delinquency, antisocial behavior (non-conventional re: moral standards)

Clinical scales - continued • 5. Mf - prototypical gender identity (military recruits, stewardesses, homosexual males students) • 6. Pa - paranoid symptoms (ideas of reference, persecution, grandeur) • 7. Pt - anxious, obsessive-compulsive, guilt ridden, self-doubts • 8. Sc - thought disorder, perceptual abnormalities (various types of Schiz.)

Clinical Scales - continued • 9. Ma - exhibition of mania, elevated mood, excessive activity, distractibility, (possible manic-depression or BP II) • 10. Si - college students scoring in the extreme range on introversion - extra. • Costa & McCrae (1990) suggest that the MMPI-2 wont work in the normal pop. As people don’t respond “passively” to items

New Validity Indexes • Basic validity comes from L, F, & K • VRIN (variable response inconsistency) • 47 pairs of items that should be answered similarly or the opposing direction. Client gets a point for each inconsistent response. • A completely random response set results in T scores of 96 for m and 98 for f (>80 inval.) • acquiescent responding T = 50

New Validity – cont. • TRIN (true response inconsistency) • 23 pairs of items that are opposite in content • either T/T or F/F to assess acquiescent or non-acquiescent responding • larger raw scores = true responding while smaller raw scores = false responding • raw scores should be between 6 and 12 in order to consider the profile valid • Fb - back infrequency items for latter part

Coding the Profile • List scale # codes in order of their T-score elevations (from highest to lowest) • usually only interpret 4 scale codes and order does not matter • Welsh coding system involves adding symbols to numerical scale codes • e.g., L F K 1 2 3 4 5 6 7 8 9 0 • T 57 75 43 69 88 75 94 52 81 75 79 59 65 • Welsh: 4268371095 FLK

Codes (listed to the right) • ** 100-109, * 90-99, “80-89, ‘70-79, +65-69, -60-64, /50-59, .:40-49, #30-39 • Some coding forms use ! to denote scores of 110-119 and !! for 120 or greater • Underline identical T-scores (and list in ascending order) as well as those within one point of each other • e.g., 4*26”837’10+95/ F’L/K.:

MMPI-2 practice case: M.S. • Integrate the MMPI-2 data with the client information (vs. laundry list). Note: profile valid. • e.g., profile 3-2/2-3 should revolve around the discussion of depression and the manifestation of symptoms (physical symptoms tend to be substituted) • How does this relate to M.S.? • Recent loss, seeing her physician, isolation • What does the 8 (or 2-3-8) tell you? • How might psychotic symptoms relate to M.S.? • Confusion from malnutrition, confusion as a result of depression, her age re: dementia? All are possible

M.S. - continued • Include discussion of (or section on) prognosis, recommendations, and diagnosis • Axis I: 296.24, Major depression, single episode, with psychotic features • AXIS II: No diagnosis (or deferred) • AXIS III: Malnutrition, dehydration, poor hygiene & personal care • AXIS IV: Death of spouse (Severity: extreme (acute event) • AXIS V: GAF: Current, 24; highest past year, 52

MMPI-2 with other pops. • MMPI was originally developed using Caucasian groups of patients • Although some research has shown mean score differences between majority and minority groups, this is less relevant to the issue of whether there is differential predictive validity (few studies on this) • Hall, Bansal, & Lopez, 2000, have conducted a meta-analysis of 30 years research on minority groups and the MMPI (both versions)

Hall et al., 2000 - summary • AA – first note that cultural identification moderates all findings (cf. acculturation) • Inconsistent findings re: mean differences, with F, 8, & 9 sometimes higher by approximately 5 T-score points • Many matched grouped studies of patients have found no differences, though Ns were small (meaning what?) • Generally no differences in predictive validity that achieve statistical or clinical significance and any differences can be attributed to SES and age • MMPI-2 has representative norms • Minimal information on the supplemental scales and even less for the content scales

Hall et al., 2000 – sum cont • Hispanics likewise show few differences from Caucasians • Possible differences for scales 3 and 0, with Hispanics scoring higher on 3 and lower on 0, but these effects were small with minimal clinical or statistical sig. • Much stronger effect for acculturation in this ethnic group • Few studies on Native Americans, but they show this pop. to score slightly higher on most scales • Few studies for Asian Americans, and they show slight elevations for scales F, 2, & 8. • Generally valid to use for these pops given appropriate acculturation and understanding of the language

Other populations • Given its original construction, there should be no problems using the MMPI in medical settings • Medical problems do not necessarily result in higher scores (i.e., more distress) • In substance abuse settings, no profile emerged to detect substance abuse, but scale 4 was a good predictor (see also the supplemental scales) • We will discuss forensic applications later in the semester (see chapter 13) • MMPI-2 can be used in non-clinical settings to screen for psychopathology, but there are some concerns. • False positives are more common • Has not been validated to predict success in other settings (e.g., jobs) which is true of most personality tests (predict interest)

MMPI-A (1992) • Do we need a different inventory for adolescents? Why? Scales of concern? • M/F for adolescents may be less defined • Theoretically Pd is thought to be elevated, but actually it tends to be lower • Personality is less stable overall so we need different norms to better interpret scores and relevant items for this age group • Valid for those aged 14-18 (for 18 y.o., the decision is based on life circumstances; e.g. at home? working?) • Important to score on both adult and adolescent norms as there can be substantial differences (T-score shifts of 15 points) • 478 items (some new some from the original inventory) • written & auditory forms both in English and Spanish

MMPI-A • Includes all of the clinical, & some new supplemental & content scales. So we use basically the same scales but different descriptors (i.e., a high score on Hs will not mean exactly the same thing for the MMPI-A; e.g., Pd equates more with acting out) • Biggest change was with the F scale since it is a norm defined scale (we need new norms) • Norms: 805 boys & 815 girls aged 14-18 solicited randomly from schools in 7 states. Represents the U.S. for SES and ethnicity (again minimal diffs for ethnicity) • Change from MMPI which had separate norms for different adolescent age groups (now only one) • F scale now has 2 parts: F1 = 1st part of test, F2 = 2nd part (F=total)

MMPI-A: New scales • New Supplemental scales: • Alcohol/drug problem proneness (PRO) – empirically derived to assess the likelihood of alcohol or other drug problems. Items differentiate adolescents in tx from those having other psychological problems • Alcohol/drug problem acknowledgement (ACK) – face valid items that reflect the admission of problems • Immaturity (IMM) – reporting behaviors, attitudes, and perceptions that reflect immaturity (e.g., poor impulse control, judgment, and self-awareness). Items predict academic problems and cognitive limitations. • Check for diagnoses such as oppositional-defiant, conduct disorder, and in adulthood ASPD

MMPI-A Psychometrics • For the most part, the psychometric properties of the MMPI-A are sound. The reliability values are lower than the MMPI-2 values, but still within acceptable limits. • Why might there be less temporal stability in the MMPI-A? • General interpretative data from the MMPI-2 can be generalized to the MMPI-A, but this data should be considered in light of the client’s position in life (i.e., consider how the scores relate to school life, problems with parents, need for independence, etc.) • Note: no K-correction for clinical scales even though a defensiveness score is calculated. So what are the clinical scale implications for a high K?

MCMI-III (Millon, 1990) • 175 item scale assessing problematic personality styles and classic psychiatric disorders (drawn from the DSM) • In contrast to the MMPI, this scale was derived theoretically to match the nosology (taxonomy) of the DSM to facilitate diagnosis and intervention planning. Assumes that any assessment is theory driven (vs. MMPI which tried to be a theoretical) • The theory is grounded in evolutionary principles assessing 4 spheres: existence (from serendipity to an organized structure), adaptation (survival), replication (reproductive styles that maximize diversity), and abstraction (the emergence of competencies to foster planning). • Scored according to a polarity model. e.g., self vs. other orientation (reproduction), pleasure vs. pain (existential, or aim of, existence) • Illustration: Schizoid is marked by deficits in both pleasure and pain as indicated by the lack of emotion and apathy

MCMI-III properties • A brief inventory (175 items) that takes only 30 minutes to complete • 3 modifier scales that correspond to the validity scales • Disclosure = defensiveness • Desirability = favorable response set • Debasement = lying • 11 clinical personality patterns: schizoid, avoidant, depressive, dependent, histrionic, narcissistic, antisocial, aggressive (sadistic), compulsive, passive-aggressive, self-defeating • 3 scales denoting severe personality patterns: schizotypal, borderline, paranoid • 7 clinical syndromes: anxiety, somatoform, bipolar, dysthymia, alcohol dependence, drug dependence, PTSD • 3 severe syndromes: thought disorder, major depression, delusional disorder

MCMI-III- continued • Scales interpreted based on base rates for each dx and it assumes that disorders are interconnected (consistent with comorbidity data) • Initial studies had classification rates of 90%, but follow-up studies have been much lower (50% or less) • Validity data has been equivocal and the reliability data is likewise lower than the MMPI-2 (these are related, and both linked to number of items)

CPI (Harrison & Gough) • Developed at the same time as the MMPI and served as the personality test for the normal population (MMPI for the clinical pop.). Drew from a similar item pool. • 480 T/F questions (some overlap with MMPI and others are new) • Emphasizes more positive/normal aspects of personality • 3 validity scales: well being (normals asked to fake bad), good impression (normals asked to fake good), communality (popular/obvious responding that may reflect defensiveness and conformity) • 15 general scales assessing a wide range of traits such as intellectual efficiency, capacity for status, achievement via conformity • Grouped into 4 quadrants (factors): Norm favoring vs. norm doubting and externalizing vs. internalizing

CPI - continued • CPI was revised in 1986 with norms based on 13,000 males & females • Most commonly used personality inventory overall • It has been replaced by the NEO-PI as most common in the last 15 years. • Psychometrically sound (reliability and validity coefficients are high and stable for different pops), but a very long instrument. • Also some question as to the need for validity scales in the normal pop. • Burisch suggests this is unnecessary provided; 1) no reason to lie, 2) knowledge of the construct(s), and 3) self awareness.

NEO-PI (Costa & McCrae, 1985, 1992) • Based on the empirically derived 5 factor model • Assumption that 5 factors can represent all of normal personality • Evaluated this model in a variety of contexts, with samples from all over the world and in different languages • Assumes that language is the best place to start examining how to describe behavior (132 Eskimo words for “snow” indicates it is a meaningful construct) • Neuroticism (emotional stability), extraversion, openness to new experience, agreeableness (quality of interactions) and conscientiousness (dutiful, organized). • 5 factors have been recovered from other inventories like the Myers-Briggs, 16PF, etc.

NEO-PI • Full version is 220 items and has 6 facets for each of the 5 factors • Short form (NEO-FFI) has 60 items and provides factor scores only • Norms are available for adults, college students and adolescents (though minimal differences between the latter two groups) • Strong psychometric properties including very stable retest coefficients, internal reliability, and validated with other personality scales. • Can be used to predict job interests (though vocational inventories such as the Strong Interest Inventory are better suited for this), but they do not predict job success (same is true for interest inventories) • Often used for intuitive purposes and not empirically validated purposes (e.g., assume that a manager should be low on N and high on C vs. empirically testing this assumption with current managers)

Measures of Affect • Note: The EPI (Eysenck) likewise measures personality (extraversion and neuroticism) in the normal population, and these two factors are usually the first two to emerge in factor analysis. • These factors correspond to the Big Two affect constructs (PA and NA) • Note: most of these measures do not address validity of responding • Nevertheless, research suggests that these scales tend to be fairly accurate and reflect actuarial rates for affective disorders (5-9% of adult women and 2-3% of adult men) • BDI – published in 1961 and revised in ’74, ’78, and ’96. • Among the most commonly used inventories with a comprehensive manuals published in 1987, 1993, and 1996 (BDI-II) • Normed for adolescents and adults aged 13 and older. 21 items with items arranged in a Guttman approach (increasing order of severity) • Suicide potential in items 2 and 9. For dx of Depression see neurovegetative items

BDI - continued • Internally consistent and reliabilities range from .48 to .86 for periods ranging from several hours to four weeks • Why are retest coefficients smaller? • No way to correct for faked scores • Validated extensively for use in clinical settings • BDI-II validated on 500 outpatients drawn from across the country and a student sample of 120 • 1 week retest was .93 and coefficient alphas were .92 or higher • Average BDI-II scores are 3 points higher than the original BDI • BDI-II time frame for each item focuses on last two weeks to match the DSM criteria

BAI (Beck & Steer, 1993) • 21 item symptomatic inventory • Items rated on a 0-3 scale • Validated for use for inpatient (N = 1,086), outpatient (N = 160) and college student samples (N=65). • Shows convergent validity with other measures of anxiety and some disciminant validity with depression measures (though they are correlated – sharing 10-25% variance) • Rapid self-report tool

CES-D (Radloff, 1977) • Developed by NIMH for use as a screening tool in the general population (also in college and geriatric pops) • Optimal test for this purpose in this population • 20 likert type items focusing on the last week • Better than the BDI-II at differentiating among those experiencing lower levels of depression • Internal consistency is high (.85 in general pop. and .90 in patient samples). • Retest figures tend to be low (.48) but this is less relevant for this construct • A score of 16 is clinical cutoff and it assesses depressed affect, positive affect, somatic activity, and interpersonal functioning

MAACL-R (Zuckerman & Lubin, 1985) • Originally published in 1965 and revised in ’85. (132 checklist type items) • Normed on over 1500 adults, 400 adolescents (approx. 90% Caucasian, 10% Black) • Scores for Anxiety, Depression, hostility, PA, and SS (the latter has very poor internal reliability) • A rapid assessment but not as good psychometrically • Can be used to evaluate states or traits and reliability figures are better (though not very high) for the latter • Scales don’t corr with social desirability and do converge with MMPI ratings

Behavioral Assessments • Assumption: behaviors can reflect cognitions and emotions (e.g., FACS; Ekman & Friesen, 1978) • Proliferation of behavioral assessments with limited validity due to the assumption that behavior can be easily defined and that it represents a meaningful (typically underlying) construct e.g., sweating, pacing • How to improve behavioral assessments? • Identify the actual behavior being assessed (lip turned downward vs. sadness) • Habitual behaviors may indicate underlying condition • Acknowledge role of both traits and situations

Personality assessment