Budgett, J., Brown, A., Daley, S., Page, T., Banerjee, S., Livingston, G., & Sommerlad, A. (2019). The social functioning in dementia scale (SF-DEM): Exploratory factor analysis and psychometric properties in mild, moderate and severe dementia. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring, 11, 45-52. doi:10.1016/j.dadm.2018.11.001
Introduction: The psychometric properties of the social functioning in dementia scale (SF-DEM) over different dementia severities are unknown.
Methods: We interviewed 299 family carers of people with mild, moderate or severe dementia from two UK research sites; examined acceptability (completion rates); conducted exploratory factor analysis and tested each factor’s internal consistency and construct validity.
Results: 285/299 (95.3%) carers completed questionnaires. Factor analysis indicated three distinct factors with acceptable internal consistency: spending time with other people, correlating with overall social function (r=0.56, p<0.001) and activities of daily living (ADLs) (r=-0.48, p<0.001); communicating with other people correlating with ADLs (r=-0.66, p<0.001); and sensitivity to other people correlating with quality of life (r=0.35, p<0.001) and inversely with neuropsychiatric symptoms (r=-0.45, p<0.001). The three factors’ correlations with other domains were similar across all dementia severities.
Discussion: The SF-DEM carer version measures three social functioning domains and has satisfactory psychometric properties in all severities of dementia.
Brown, A., & Fong, S. (2019). How valid are 11-plus tests? Evidence from Kent. British Educational Research Journal, 45, 1235-1254. doi:10.1002/berj.3560
Despite profound influence of selection-by-ability on children’s educational opportunities, empirical evidence for validity of 11-plus tests is scarce. This study focused on secondary selection in Kent, the largest grammar school area in England. We analysed scores from the ‘Kent Test’ (the 11-plus test used in Kent), Cognitive Assessment Tests (CAT4), and Key Stage 2 Standardised Assessment Tests (KS2) using longitudinal data of two year cohorts (N1=95, N2=99) from one primary school. All the assessment batteries provided highly overlapping information, with the decisive effect of content area (e.g. verbal versus maths) over task type (e.g. knowledge-loaded versus knowledge-free). Thus, the value in differentiating ‘pure’ (i.e. knowledge-free) ability in 11-plus testing is questionable. KS2 and Kent Test aggregated scores overlapped very strongly, sharing nearly 80% of variance; moreover, KS2-based eligibility decisions had higher sensitivity than the Kent Test in predicting the actual admissions to grammar schools after Head Teacher Assessment (HTA) appeals have taken place. Finally, the use of multiple pass marks for each Kent Test component as well as the total score was found to increase the chance of false rejection. This study provides preliminary evidence that national examinations could be a good basis for selection to grammar schools; it challenges the use of complex admission rules and multiple decisions and questions the value of 11-plus tests.
Brown, A., Page, T., Daley, S., Farina, N., Bassett, T., Livingston, G., Budgett, J., Gallaher, L., Feeney, I., Murray, J., Bowling, A., Knapp, M., & Banerjee, S. (2019). Measuring the quality of life of family carers of people with dementia: Development and validation of C-DEMQOL. Quality of Life Research, 28, 2299-2310. doi:10.1007/s11136-019-02186-w
Purpose. We aimed to address gaps identified in the evidence base and instruments available to measure the quality of life (QOL) of family carers of people with dementia, and develop a new brief, reliable, condition-specific instrument.
Methods. We generated measurable domains and indicators of carer QOL from systematic literature reviews and qualitative interviews with 32 family carers and 9 support staff, and two focus groups with 6 carers and 5 staff. Statements with five tailored response options, presenting variation on the QOL continuum, were piloted (n = 25), pre-tested (n = 122) and field-tested (n = 300) in individual interviews with family carers from North London and Sussex. The best 30 questions formed the C-DEMQOL questionnaire, which was evaluated for usability, face and construct validity, reliability, and convergent/discriminant validity using a range of validation measures.
Results. C-DEMQOL was received positively by the carers. Factor analysis confirmed that C-DEMQOL sum scores are reliable in measuring overall QOL (omega = 0.97) and its five subdomains: ‘meeting personal needs’ (omega = 0.95); ‘carer wellbeing’ (omega = 0.91); ‘carer-patient relationship’ (omega = 0.82); ‘confidence in the future’ (omega = 0.90), and ‘feeling supported’ (omega = 0.85). The overall QOL and domain scores show the expected pattern of convergent and discriminant relationships with established measures of carer mental health, activities, and dementia severity and symptoms.
Conclusions. The robust psychometric properties support the use of C-DEMQOL in evaluation of overall and domain-specific carer QOL; replications in independent samples and studies of responsiveness would be of value.
Daley, S., Murray, J., Farina, N., Page, T., Brown, A., Bassett, T., Livingston, G., Bowling, A., Knapp, M., & Banerjee, S. (2018). Understanding the quality of life of family carers of people with dementia: Development of a new conceptual framework. International Journal of Geriatric Psychiatry, 1-8. doi:10.1002/gps.4990
Dementia is a major global health and social care challenge, and family carers are a vital determinant of positive outcomes for people with dementia. This study’s aim was to develop a conceptual framework for the Quality of Life (QOL) of family carers of people with dementia.
We studied family carers of people with dementia and staff working in dementia services iteratively using in-depth individual qualitative interviews and focus groups discussions. Analysis used constant comparison techniques underpinned by a collaborative approach with a study-specific advisory group of family carers.
We completed 41 individual interviews with 32 family carers and 9 staff and two focus groups with 6 family carers and 5 staff. From the analysis, we identified 12 themes that influenced carer QOL. These were organised into three categories focussing on: person with dementia, carer, and external environment.
For carers of people with dementia, the QOL construct was found to include condition-specific domains which are not routinely considered in generic assessment of QOL. This has implications for researchers, policy makers and service providers in addressing and measuring QOL in family carers of people with dementia.
Guenole, N., Brown, A., & Cooper, A. (2018). Forced Choice Assessment of Work Related Maladaptive Personality Traits: Preliminary Evidence from an Application of Thurstonian Item Response Modeling. Assessment, 25, 513-526. doi:10.1177/1073191116641181
This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model’s fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Mono-trait hetero-method correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments.
Brown, A., & Maydeu-Olivares, A. (2018). Ordinal Factor Analysis of Graded-Preference Questionnaire Data. Structural Equation Modeling: A Multidisciplinary Journal, 25, 516-529. doi:10.1080/10705511.2017.1392247
We introduce a new comparative response format, suitable for assessing personality and similar constructs. In this “graded-block” format, items measuring different constructs are first organized in blocks of 2 or more; then, pairs are formed from items within blocks. The pairs are presented one at a time, to enable respondents expressing the extent of preference for one item or the other using several graded categories. We model such data using confirmatory factor analysis (CFA) for ordinal outcomes. We derive Fisher information matrices for the graded pairs, and supply R code to enable computation of standard errors of trait scores. An empirical example illustrates the approach in low-stakes personality assessments and shows that similar results are obtained when using graded blocks of size 3 and a standard Likert format. However, graded-block designs may be superior when insufficient differentiation between items is expected (due to acquiescence, halo or social desirability).
Wetzel, E., Brown, A., Hill, P., Chung, J., Robins, R., & Roberts, B. (2017). The narcissism epidemic is dead; long live the narcissism epidemic. Psychological Science, 28, 1833-1847. doi:10.1177/0956797617724208
Are recent cohorts of college students more narcissistic than their predecessors? To address debates about the so-called “narcissism epidemic,” we used data from three cohorts of students (N1990s = 1,166; N2000s = 33,647; N2010s = 25,412) to test whether narcissism levels (overall and specific facets) have increased across generations. We also tested whether our measure, the Narcissistic Personality Inventory (NPI), showed measurement equivalence across the three cohorts, a critical analysis that had been overlooked in prior research. We found that several NPI items were not equivalent across cohorts. Models accounting for nonequivalence of these items indicated a small decline in overall narcissism levels from the 1990s to the 2010s (d = ?0.27). At the facet-level, leadership (d = ?0.20), vanity (d = –0.16), and entitlement (d = –0.28) all showed decreases. Our results contradict the claim that recent cohorts of college students are more narcissistic than earlier generations of college students.
Page, T., Farina, N., Brown, A., Daley, S., Bowling, A., Bassett, T., Livingston, G., Knapp, M., Murray, J., & Banerjee, S. (2017). Instruments Measuring the Disease-Specific Quality of Life of Family Carers of People with Neurodegenerative Diseases: A Systematic Review. BMJ Open, 7. doi:10.1136/bmjopen-2016-013611
Objective: Neurodegenerative diseases, such as dementia, have a profound impact on those with the conditions and their family carers. Consequently, the accurate measurement of family carers’ quality of life (QOL) is important. Generic measures may miss key elements of the impact of these conditions so using disease-specific instruments has been advocated. This systematic review aimed to identify and examine the psychometric properties of disease-specific outcome measures of QOL of family carers of people with neurodegenerative diseases (Alzheimer’s disease, other dementias; Huntington’s disease; Parkinson’s disease; Multiple Sclerosis; and Motor Neurone Disease).
Design: Systematic review.
Methods: Instruments were identified using five electronic databases (PubMed, PsycINFO, Web of Science, Scopus and IBSS) and lateral search techniques. Only studies which reported the development and/or validation of a disease-specific measure for adult family carers, and which were written in English, were eligible for inclusion. The methodological quality of the included studies was evaluated using the COSMIN checklist. The psychometric properties of each instrument were examined.
Results: Six hundred and seventy six articles were identified. Following screening and lateral searches, a total of eight articles were included; these reported seven disease-specific carer QOL measures. Limited evidence was available for the psychometric properties of the seven instruments. Psychometric analyses were mainly focused on internal consistency, reliability and construct validity. None of the measures assessed either criterion validity or responsiveness to change.
Conclusions: There are very few measures of carer QOL that are specific to particular neurodegenerative diseases. The findings of this review emphasise the importance of developing and validating psychometrically robust disease-specific measures of carer QOL.
Farina, N., Page, T., Daley, S., Brown, A., Bowling, A., Bassett, T., Livingston, G., Knapp, M., Murray, J., & Banerjee, S. (2017). Factors associated with the quality of life of family carers of people with dementia: A systematic review. Alzheimers & Dementia, 13, 572-581. doi:10.1016/j.jalz.2016.12.010
INTRODUCTION: Family carers of people with dementia are their most important support in practical, personal and economic terms. Carers are vital to maintaining the quality of life (QOL) of people with dementia. This review aims to identify factors related to the QOL of family carers of people with dementia.
METHODS: Searches on terms including ‘carers’, ‘dementia’, ‘family’ and ‘quality of life’ in research databases. Findings were synthesised inductively, grouping factors associated with carer QOL into themes.
RESULTS: 909 abstracts were identified. Following screening, lateral searches and quality appraisal, 41 studies (n=5,539) were included for synthesis. Ten themes were identified: demographics; carer-patient relationship; dementia characteristics; demands of caring; carer health; carer emotional wellbeing; support received; carer independence; carer self-efficacy; and future.
DISCUSSION: The quality and level of evidence supporting each theme varied. We need further research on what factors predict carer QOL in dementia and how to measure it.
Brown, A., Inceoglu, I., & Lin, Y. (2016). Preventing Rater Biases in 360-Degree Feedback by Forcing Choice. Organizational Research Methods, 20, 121-148. doi:10.1177/1094428116668036
We examined the effects of response biases on 360-degree feedback using a large sample (N=4,675) of organizational appraisal data. Sixteen competencies were assessed by peers, bosses and subordinates of 922 managers, as well as self-assessed, using the Inventory of Management Competencies (IMC) administered in two formats – Likert scale and multidimensional forced choice. Likert ratings were subject to strong response biases, making even theoretically unrelated competencies correlate highly. Modeling a latent common method factor, which represented non-uniform distortions similar to those of “ideal-employee” factor in both self- and other assessments, improved validity of competency scores as evidenced by meaningful second-order factor structures, better inter-rater agreement, and better convergent correlations with an external personality measure. Forced-choice rankings modelled with Thurstonian IRT yielded as good construct and convergent validities as the bias-controlled Likert ratings, and slightly better rater agreement. We suggest that the mechanism for these enhancements is finer differentiation between behaviors in comparative judgements, and advocate the operational use of the multidimensional forced-choice response format as an effective bias prevention method.
Velikonja, T., Edbrooke-Childs, J., Calderon, A., Sleed, M., Brown, A., & Deighton, J. (2016). The psychometric properties of the Ages & Stages Questionnaires for use as population outcome indicators at 2.5 years in England: A systematic review. Child: Care, Health and Development, 43, 1-17. doi:10.1111/cch.12397
Background: Early identification of children with potential development delay is essential to ensure access to care. The Ages & Stages Questionnaires (ASQ) are used as population outcome indicators in England as part of the 2.5 year review.
Method: The aim of this article was to systematically review the worldwide evidence for the psychometric properties of the ASQ third edition (ASQ-3TM) and the Ages & Stages Questionnaires®: Social-Emotional (ASQ:SE). Eight electronic databases and grey literature were searched for original research studies available in English language, which reported reliability, validity, or responsiveness of the ASQ-3TM or ASQ:SE for children aged between 2 and 2.5 years. Twenty studies were included. Eligible studies used either the ASQ-3TM or the ASQ:SE and reported at least one measurement property of the ASQ-3TM and/or ASQ:SE. Data were extracted from all papers identified for final inclusion, drawing on Cochrane guidelines.
Results: Using ‘positive’, ‘intermediate’, and ‘negative’ criteria for evaluating psychometric properties, results showed ‘positive’ reliability values in 11/18 instances reported, ‘positive’ sensitivity values in 13/18 instances reported, and ‘positive’ specificity values in 19/19 instances reported.
Conclusions: Variations in age or language versions used, quality of psychometric properties, and quality of papers resulted in heterogeneous evidence. It is important to consider differences in cultural and contextual factors when measuring child development using these indicators. Further research is very likely to have an important impact on the interpretation of the ASQ-3TM and ASQ:SE psychometric evidence.
Chua, K., Brown, A., Little, R., Matthews, D., Morton, L., Loftus, V., Watchurst, C., Tait, R., Romeo, R., & Banerjee, S. (2016). Quality-of-life assessment in dementia: the use of DEMQOL and DEMQOL-Proxy total scores. Quality of Life Research, 25, 3107-3118. doi:doi:10.1007/s11136-016-1343-1
There is a need to determine whether health-related quality-of-life (HRQL) assessments in dementia capture what is important, to form a coherent basis for guiding research and clinical and policy decisions. This study investigated structural validity of HRQL assessments made using the DEMQOL system, with particular interest in studying domains that might be central to HRQL, and the external validity of these HRQL measurements.
HRQL of people with dementia was evaluated by 868 self-reports (DEMQOL) and 909 proxy reports (DEMQOL-Proxy) at a community memory service. Exploratory and confirmatory factor analyses (EFA and CFA) were conducted using bifactor models to investigate domains that might be central to general HRQL. Reliability of the general and specific factors measured by the bifactor models was examined using omega (?) and omega hierarchical (? h) coefficients. Multiple-indicators multiple-causes models were used to explore the external validity of these HRQL measurements in terms of their associations with other clinical assessments.
Bifactor models showed adequate goodness of fit, supporting HRQL in dementia as a general construct that underlies a diverse range of health indicators. At the same time, additional factors were necessary to explain residual covariation of items within specific health domains identified from the literature. Based on these models, DEMQOL and DEMQOL-Proxy overall total scores showed excellent reliability (? h > 0.8). After accounting for common variance due to a general factor, subscale scores were less reliable (? h < 0.7) for informing on individual differences in specific HRQL domains. Depression was more strongly associated with general HRQL based on DEMQOL than on DEMQOL-Proxy (?0.55 vs ?0.22). Cognitive impairment had no reliable association with general HRQL based on DEMQOL or DEMQOL-Proxy.
The tenability of a bifactor model of HRQL in dementia suggests that it is possible to retain theoretical focus on the assessment of a general phenomenon, while exploring variation in specific HRQL domains for insights on what may lie at the ‘heart’ of HRQL for people with dementia. These data suggest that DEMQOL and DEMQOL-Proxy total scores are likely to be accurate measures of individual differences in HRQL, but that subscale scores should not be used. No specific domain was solely responsible for general HRQL at dementia diagnosis. Better HRQL was moderately associated with less depressive symptoms, but this was less apparent based on informant reports. HRQL was not associated with severity of cognitive impairment.
Lin, Y., & Brown, A. (2016). Influence of Context on Item Parameters in Forced-Choice Personality Assessments. Educational and Psychological Measurement, 77, 389-414. doi:10.1177/0013164416646162
A fundamental assumption in computerized adaptive testing (CAT) is that item parameters are invariant with respect to context – items surrounding the administered item. This assumption, however, may not hold in forced-choice (FC) assessments, where explicit comparisons are made between items included in the same block. We empirically examined the influence of context on item parameters by comparing parameter estimates from two FC instruments. The first instrument was compiled of blocks of three items, whereas in the second, the context was manipulated by adding one item to each block, resulting in blocks of four. The item parameter estimates were highly similar. However, a small number of significant deviations were observed, confirming the importance of context when designing adaptive FC assessments. Two patterns of such deviations were identified, and methods to reduce their occurrences in a FC CAT setting were proposed. It was shown that with a small proportion of violations of the parameter invariance assumption, score estimation remained stable.
Brown, A. (2016). Thurstonian Scaling of Compositional Questionnaire Data. Multivariate Behavioral Research, 51, 345-356. doi:10.1080/00273171.2016.1150152
To prevent response biases, personality questionnaires may use comparative response formats. These include forced choice, where respondents choose among a number of items, and quantitative comparisons, where respondents indicate the extent to which items are preferred to each other. The present article extends Thurstonian modeling of binary choice data (Brown & Maydeu-Olivares, 2011a) to “proportion-of-total” (compositional) formats. Following Aitchison (1982), compositional item data are transformed into log-ratios, conceptualized as differences of latent item utilities. The mean and covariance structure of the log-ratios is modelled using Confirmatory Factor Analysis (CFA), where the item utilities are first-order factors, and personal attributes measured by a questionnaire are second-order factors. A simulation study with two sample sizes, N=300 and N=1000, shows that the method provides very good recovery of true parameters and near-nominal rejection rates. The approach is illustrated with empirical data from N=317 students, comparing model parameters obtained with compositional and Likert scale versions of a Big Five measure. The results show that the proposed model successfully captures the latent structures and person scores on the measured traits.
van Damm, N., Brown, A., Mole, T., Davis, J., Britton, W., & Brewer, J. (2015). Development and Validation of the Behavioral Tendencies Questionnaire. PLoS ONE, 10, 1-21. doi:10.1371/journal.pone.0140867
At a fundamental level, taxonomy of behavior and behavioral tendencies can be described
in terms of approach, avoid, or equivocate (i.e., neither approach nor avoid). While there are
numerous theories of personality, temperament, and character, few seem to take advantage
of parsimonious taxonomy. The present study sought to implement this taxonomy by
creating a questionnaire based on a categorization of behavioral temperaments/tendencies
first identified in Buddhist accounts over fifteen hundred years ago. Items were developed
using historical and contemporary texts of the behavioral temperaments, described as
“Greedy/Faithful”, “Aversive/Discerning”, and “Deluded/Speculative”. To both maintain
this categorical typology and benefit from the advantageous properties of forced-choice
response format (e.g., reduction of response biases), binary pairwise preferences for items
were modeled using Latent Class Analysis (LCA). One sample (n1 = 394) was used to estimate
the item parameters, and the second sample (n2 = 504) was used to classify the participants
using the established parameters and cross-validate the classification against
multiple other measures. The cross-validated measure exhibited good nomothetic span
(construct-consistent relationships with related measures) that seemed to corroborate the
ideas present in the original Buddhist source documents. The final 13-block questionnaire
created from the best performing items (the Behavioral Tendencies Questionnaire or BTQ)
is a psychometrically valid questionnaire that is historically consistent, based in behavioral
tendencies, and promises practical and clinical utility particularly in settings that teach and
study meditation practices such as Mindfulness Based Stress Reduction (MBSR).
Megreya, A., Bindemann, M., & Brown, A. (2015). Criminal thinking in a Middle Eastern prison sample of thieves, drug dealers and murderers. Legal and Criminological Psychology, 20, 324-342. doi:10.1111/lcrp.12029
Purpose: The Psychological Inventory of Criminal Thinking Styles (PICTS) has been applied extensively to the study of criminal behaviour and cognition. This study aimed to explore the psychometric characteristics (factorial structure, reliability and external validity) of an Arabic version of the PICTS, to explore cross-cultural differences between a sample of Middle-Eastern (Egyptian) prisoners and Western prison samples, and to examine the influence of type of crime on criminal thinking styles.
Method: A group of 130 Egyptian male prisoners who had been sentenced for theft, drug dealing or murder completed the PICTS. Their scores were compared with the reported data of American, British, and Dutch prisoners.
Results: The Arabic PICTS showed scale reliabilities estimated by coefficient alpha comparable to the English version, and reliabilities estimated as test-retest correlations were high. Confirmatory factor analysis showed that the PICTS subscale scores of Egyptian prisoners best fitted a two-factor model, in which one dimension comprised mollification, entitlement, superoptimism, sentimentality and discontinuity, and the second dimension reflected the thinking styles of power orientation, cut-off and cognitive indolence. Observed levels of thinking styles varied by type of crime, specifically between prisoners sentenced for theft, drug dealing, and murder. Cultural differences in criminal thinking styles were also found, whereby the Egyptian prisoners recorded the highest scores in most thinking styles, while American, Dutch and English prisoners were more comparable to each other.
Conclusions: This study provides one of the first investigations of criminal thinking styles in a non-Western sample and suggests that cross-cultural differences in the structure of these thinking styles exist. In addition, the results indicate that criminal thinking styles need to be understood by the type of crime for which a person has been sentenced.
Wetzel, E., Roberts, B., Fraley, C., & Brown, A. (2015). Equivalence of Narcissistic Personality Inventory constructs and correlates across scoring approaches and response formats. Journal of Research in Personality, 61, 87-98. doi:10.1016/j.jrp.2015.12.002
The prevalent scoring practice for the Narcissistic Personality Inventory (NPI) ignores the forced-choice nature of the items. The aim of this study was to investigate whether findings based on NPI scores reported in previous research can be confirmed when the forced-choice nature of the NPI’s original response format is appropriately modeled, and when NPI items are presented in different response formats (true/false or rating scale). The relationships between NPI facets and various criteria were robust across scoring approaches (mean score vs. model-based), but were only partly robust across response formats. In addition, the scoring approaches and response formats achieved equivalent measurements of the vanity facet and in part of the leadership facet, but differed with respect to the entitlement facet.
Brown, A. (2014). Item Response Models for Forced-Choice Questionnaires: A Common Framework. Psychometrika, 81, 135-160. doi:10.1007/s11336-014-9434-9
In forced-choice questionnaires, respondents have to make choices between two or more items presented at the same time. Several IRT models have been developed to link respondent choices to underlying psychological attributes, including the recent MUPP (Stark, Chernyshenko & Drasgow, 2005) and Thurstonian IRT (Brown & Maydeu-Olivares, 2011) models. In the present article, a common framework is proposed that describes forced-choice models along three axes: 1) the forced-choice format used; 2) the measurement model for the relationships between items and psychological attributes they measure; and 3) the decision model for choice behavior. Using the framework, fundamental properties of forced-choice measurement of individual differences are considered. It is shown that the scale origin for the attributes is generally identified in questionnaires using either unidimensional or multidimensional comparisons. Both dominance and ideal point models can be used to provide accurate forced-choice measurement; and the rules governing accurate person score estimation with these models are remarkably similar.
Guenole, N., & Brown, A. (2014). The consequences of ignoring measurement invariance for path coefficients in structural equation models. Frontiers in Psychology, 5, 1-16. doi:10.3389/fpsyg.2014.00980
We report a Monte Carlo study examining the effects of two strategies for handling measurement non-invariance – modeling and ignoring non-invariant items – on structural regression coefficients between latent variables measured with item response theory models for categorical indicators. These strategies were examined across four levels and three types of non-invariance – non-invariant loadings, non-invariant thresholds, and combined non-invariance on loadings and thresholds – in simple, partial, mediated and moderated regression models where the non-invariant latent variable occupied predictor, mediator, and criterion positions in the structural regression models. When non-invariance is ignored in the latent predictor, the focal group regression parameters are biased in the opposite direction to the difference in loadings and thresholds relative to the referent group (i.e., lower loadings and thresholds for the focal group lead to overestimated regression parameters). With criterion non-invariance, the focal group regression parameters are biased in the same direction as the difference in loadings and thresholds relative to the referent group. While unacceptable levels of parameter bias were confined to the focal group, bias occurred at considerably lower levels of ignored non-invariance than was previously recognized in referent and focal groups.
Hill, A., Stoeber, J., Brown, A., & Appleton, P. (2014). Team perfectionism and team performance: A prospective study. Journal of Sport & Exercise Psychology, 36, 303-315. doi:10.1123/jsep.2013-0206
Perfectionism is a personality characteristic that has been found to predict sports performance in athletes. To date, however, research has exclusively examined this relationship at an individual level (i.e., athletes’ perfectionism predicting their personal performance). The current study extends this research to team sports by examining whether, when manifested at team level, perfectionism predicts team performance. A sample of 231 competitive rowers from 36 boats completed measures of self-oriented, team-oriented, and team-prescribed perfectionism prior to competing against one another in a 4-day rowing competition. Strong within-boat similarities in the levels of team members’ team-oriented perfectionism supported the existence of collective team-oriented perfectionism at the boat level. Two-level latent growth curve modeling of day-by-day boat performance showed that team-oriented perfectionism positively predicted the position of the boat in mid-competition and the linear improvement in position. The findings suggest that imposing perfectionistic standards on team members may drive teams to greater levels of performance.
Brodbeck, J., Bachmann, M., Brown, A., & Znoj, H. (2014). Effects of depressive symptoms on antecedents of lapses during a smoking cessation attempt: An ecological momentary assessment study. Addiction, 109, 1363-1370. doi:10.1111/add.12563
AIMS: To investigate pathways through which momentary negative affect and depressive symptoms affect risk of lapse during smoking cessation attempts.
DESIGN: Ecological Momentary Assessment was carried out during two weeks after an unassisted smoking cessation attempt. A three-month follow-up measured smoking frequency.
SETTING:Data were collected via mobile devices in German-speaking Switzerland.
PARTICIPANTS: A total of 242 individuals (age 20-40, 67% men) reported 7,112 observations.
MEASUREMENTS: Online surveys assessed baseline depressive symptoms and nicotine dependence. Real-time data on negative affect, physical withdrawal symptoms, urge to smoke, abstinence-related self-efficacy, and lapses.
FINDINGS: Two-level structural equation model suggested that on the situational level, negative affect increased the urge to smoke and decreased self-efficacy (? = .20; ? = -.12, respectively), but had no direct effect on lapse risk. A higher urge to smoke (? = .09) and lower self-efficacy (? = -.11) were confirmed as situational antecedents of lapses. Depressive symptoms at baseline were a strong predictor of a person's average negative affect (? = .35, all p <.001). However, the baseline characteristics influenced smoking frequency three months later only indirectly, through influences of average states on the number of lapses during the quit attempt.
CONCLUSIONS: Controlling for nicotine dependence, higher depressive symptoms at baseline were strongly associated with a worse longer-term outcome. Negative affect experienced during the quit attempt was the only pathway through which the baseline depressive symptoms were associated with a reduced self-efficacy and increased urges to smoke, all leading to the increased probability of lapses.
Brown, A., Ford, T., Deighton, J., & Wolpert, M. (2014). Satisfaction in Child and Adolescent Mental Health Services: Translating Users’ Feedback into Measurement. Administration and Policy in Mental Health and Mental Health Services Research, 41, 434-446. doi:10.1007/s10488-012-0433-9
The present research addressed gaps in our current understanding of validity and quality of measurement provided by Patient Reported Experience Measures (PREM). We established the psychometric properties of a freely available Experience of Service Questionnaire (ESQ), based on responses from 7,067 families of patients across 41 UK providers of Child and Adolescent Mental Health Services (CAMHS), using the two-level latent trait modeling. Responses to the ESQ were subject to strong ‘halo’ effects, which were thought to represent the overall positive or negative affect towards one’s treatment. Two strongly related constructs measured by the ESQ were interpreted as specific aspects of global satisfaction, namely Satisfaction with Care, and with Environment. The Care construct was sensitive to differences between less satisfied patients, facilitating individual and service-level problem evaluation. The effects of nesting within service providers were strong, with parental reports being the most reliable source of data for the between-provider comparisons. We provide a scoring protocol for converting the hand-scored ESQ to the model-based population-referenced scores with supplied standard errors, which can be used for benchmarking services as well as individual evaluations.
Stoeber, J., Kobori, O., & Brown, A. (2014). Examining mutual suppression effects in the assessment of perfectionism cognitions: Evidence supporting multidimensional assessment. Assessment, 21, 647-660. doi:10.1177/1073191114534884
Perfectionism cognitions capture automatic perfectionistic thoughts and have explained variance in psychological adjustment and maladjustment beyond trait perfectionism. The aim of the present research was to investigate whether a multidimensional assessment of perfectionism cognitions has advantages over a unidimensional assessment. To this aim, we examined in a sample of 324 university students how the Perfectionism Cognitions Inventory (PCI) and the Multidimensional Perfectionism Cognitions Inventory (MPCI) explained variance in positive affect, negative affect, and depressive symptoms when factor or subscale scores were used as predictors compared to total scores. Results showed that a multidimensional assessment (PCI factor scores, MPCI subscale scores) explained more variance than a unidimensional assessment (PCI and MPCI total scores) because, when the different dimensions were entered simultaneously as predictors, perfectionistic strivings cognitions and perfectionistic concerns cognitions acted as mutual suppressors thereby increasing each others’ predictive validity. With this, the present findings provide evidence that?regardless of whether the PCI or the MPCI is used?a multidimensional assessment of perfectionism cognitions has advantages over a unidimensional assessment in explaining variance in psychological adjustment and maladjustment.
Stoeber, J., Kobori, O., & Brown, A. (2014). Perfectionism cognitions are multidimensional: A reply to Flett and Hewitt (2014). Assessment, 21, 666-668. doi:10.1177/1073191114550676
We reply to Flett and Hewitt’s (2014) commentary on our findings (Stoeber, Kobori, & Brown, 2014) focusing on the multidimensionality of the Perfectionism Cognitions Inventory (PCI) and the question of whether the Multidimensional Perfectionism Cognitions Inventory (MPCI) represents an alternative to the PCI. In addition, we reiterate the importance of considering suppression effects when examining different dimensions of perfectionism and, in concluding, invite researchers to join forces to further advance the assessment of multidimensional perfectionism cognitions.
Deighton, J., Tymms, P., Vostanis, P., Belsky, J., Fonagy, P., Brown, A., Martin, A., Patalay, P., & Wolpert, M. (2013). The Development of a School-Based Measure of Child Mental Health. Journal of Psychoeducational Assessment, 31, 247-257. doi:10.1177/0734282912465570
Early detection of child mental health problems in schools is critical for implementing strategies for prevention and intervention. The development of an effective measure of mental health and well-being for this context must be both empirically sound and practically feasible. This study reports the initial validation of a brief self-report measure for child mental health suitable for use with children as young as eight (“Me and My School” (M&MS)). After factor analysis, and studies of measurement invariance, two subscales emerged: emotional difficulties and behavioral difficulties. These two subscales were highly correlated with corresponding constructs of the Strengths and Difficulties Questionnaire (SDQ) and showed correlations with attainment, deprivation and educational needs similar to ones obtained between these demographic measures and the SDQ. Results suggest that this school-based self-report measure is psychometrically sound, and has the potential of contributing to school mental health surveys, evaluation of interventions, and recognition of mental health problems within schools.
Brown, A., & Maydeu-Olivares, A. (2013). How IRT can solve problems of ipsative data in forced-choice questionnaires. Psychological Methods, 18, 36-52. doi:10.1037/a0030641
In multidimensional forced-choice (MFC) questionnaires, items measuring different attributes are presented in blocks, and participants have to rank-order the items within each block (fully or partially). Such comparative formats can reduce the impact of numerous response biases often affecting single-stimulus items (aka, rating or Likert scales). However, if scored with traditional methodology, MFC instruments produce ipsative data, whereby all individuals have a common total test score. Ipsative scoring distorts individual profiles (it is impossible to achieve all high or all low scale scores), construct validity (covariances between scales must sum to zero), criterion related validity (validity coefficients must sum to zero), and reliability estimates.
We argue that these problems are caused by inadequate scoring of forced-choice items, and advocate the use of item response theory (IRT) models based on an appropriate response process for comparative data, such as Thurstone’s Law of Comparative Judgment. We show that by applying Thurstonian IRT modeling (Brown & Maydeu-Olivares, 2011), even existing forced-choice questionnaires with challenging features can be scored adequately and that the IRT-estimated scores are free from the problems of ipsative data.
Brodbeck, J., Bachmann, M., Croudace, T., & Brown, A. (2013). Comparing Growth Trajectories of Risk Behaviors From Late Adolescence Through Young Adulthood: An Accelerated Design. Developmental Psychology, 49, 1732-1738. doi:10.1037/a0030873
Risk behaviors such as substance use or deviance are often limited to the early stages of the life course. Whereas the onset of risk behavior is well studied, less is currently known about the decline and timing of cessation of risk behaviors of different domains during young adulthood. Prevalence and longitudinal developmental patterning of alcohol use, drinking to the point of drunkenness, smoking, cannabis use, deviance, and HIV-related sexual risk behavior were compared in a Swiss community sample (N = 2,843). Using a longitudinal cohort-sequential approach to link multiple assessments with 3 waves of data for each individual, the studied period spanned the ages of 16 to 29 years. Although smoking had a higher prevalence, both smoking and drinking up to the point of drunkenness followed an inverted U-shaped curve. Alcohol consumption was also best described by a quadratic model, though largely stable at a high level through the late 20s. Sexual risk behavior increased slowly from age 16 to age 22 and then remained largely stable. In contrast, cannabis use and deviance linearly declined from age 16 to age 29. Young men were at higher risk for all behaviors than were young women, but apart from deviance, patterning over time was similar for both sexes. Results about the timing of increase and decline as well as differences between risk behaviors may inform tailored prevention programs during the transition from late adolescence to adulthood.
Brown, A., & Maydeu-Olivares, A. (2012). Fitting a Thurstonian IRT model to forced-choice data using Mplus. Behavior Research Methods, 44, 1135-1147. doi:10.3758/s13428-012-0217-x
To counter response distortions associated with the use of rating scales (a. k. a. Likert scales), items can be presented in a comparative fashion, so that respondents are asked to rank the items within blocks (forced-choice format). However, classical scoring procedures for these forced-choice designs lead to ipsative data, which presents psychometric challenges that are well described in the literature. Recently, Brown and Maydeu-Olivares (Educational and Psychological Measurement 71: 460-502, 2011a) introduced a model based on Thurstone's law of comparative judgment, which overcomes the problems of ipsative data. Here, we provide a step-by-step tutorial for coding forced-choice responses, specifying a Thurstonian item response theory model that is appropriate for the design used, assessing the model's fit, and scoring individuals on psychological attributes. Estimation and scoring is performed using Mplus, and a very straightforward Excel macro is provided that writes full Mplus input files for any forced-choice design. Armed with these tools, using a forced-choice design is now as easy as using ratings.
Brown, A., & Maydeu-Olivares, A. (2011). Item Response Modeling of Forced-Choice Questionnaires. Educational and Psychological Measurement, 71, 460-502. doi:10.1177/0013164410375112
Multidimensional forced-choice formats can significantly reduce the impact of numerous response biases typically associated with rating scales. However, if scored with classical methodology, these questionnaires produce ipsative data, which lead to distorted scale relationships and make comparisons between individuals problematic. This research demonstrates how item response theory (IRT) modeling may be applied to overcome these problems. A multidimensional IRT model based on Thurstone’s framework for comparative data is introduced, which is suitable for use with any forced-choice questionnaire composed of items fitting the dominance response model, with any number of measured traits, and any block sizes (i.e., pairs, triplets, quads, etc.). Thurstonian IRT models are normal ogive models with structured factor loadings, structured uniquenesses, and structured local dependencies. These models can be straightforwardly estimated using structural equation modeling (SEM) software Mplus. A number of simulation studies are performed to investigate how latent traits are recovered under various forced-choice designs and provide guidelines for optimal questionnaire design. An empirical application is given to illustrate how the model may be applied in practice. It is concluded that when the recommended design guidelines are met, scores estimated from forced-choice questionnaires with the proposed methodology reproduce the latent traits well.
Lievens, F., Sanchez, J., Bartram, D., & Brown, A. (2010). Lack of consensus among competency ratings of the same occupation: Noise or substance?. Journal of Applied Psychology, 95, 562-571. doi:10.1037/a0018035
Although rating differences among incumbents of the same occupation have traditionally been viewed as error variance in the work analysis domain, such differences might often capture substantive discrepancies in how incumbents approach their work. This study draws from job crafting, creativity, and role theories to uncover situational factors (i.e., occupational activities, context, and complexity) related to differences among competency ratings of the same occupation. The sample consisted of 192 incumbents from 64 occupations. Results showed that 25% of the variance associated with differences in competency ratings of the same occupation was related to the complexity, the context, and primarily the nature of the occupation's work activities. Consensus was highest for occupations involving equipment-related activities and direct contact with the public.
Maydeu-Olivares, A., & Brown, A. (2010). Item Response Modeling of Paired Comparison and Ranking Data. Multivariate Behavioral Research, 45, 935-974. doi:10.1080/00273171.2010.531231
The comparative format used in ranking and paired comparisons tasks can significantly reduce the impact of uniform response biases typically associated with rating scales. Thurstone's (1927, 1931) model provides a powerful framework for modeling comparative data such as paired comparisons and rankings. Although Thurstonian models are generally presented as scaling models, that is, stimuli-centered models, they can also be used as person-centered models. In this article, we discuss how Thurstone's model for comparative data can be formulated as item response theory models so that respondents' scores on underlying dimensions can be estimated. Item parameters and latent trait scores can be readily estimated using a widely used statistical modeling program. Simulation studies show that item characteristic curves can be accurately estimated with as few as 200 observations and that latent trait scores can be recovered to a high precision. Empirical examples are given to illustrate how the model may be applied in practice and to recommend guidelines for designing ranking and paired comparisons tasks in the future.
Brown, A., & Maydeu-Olivares, A. (2010). Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy. Industrial and Organizational Psychology, 3, 489-493. doi:10.1111/j.1754-9434.2010.01277.x
Bartram, D., Warr, P., & Brown, A. (2010). Let’s Focus on Two-Stage Alignment Not Just on Overall Performance. Industrial and Organizational Psychology, 3, 335-339. doi:10.1111/j.1754-9434.2010.01247.x
Brown, A. (2010). Doing less but getting more: Improving forced-choice measures with Item Response Theory. Assessment and Development Matters, 2, 21-25. Retrieved from http://shop.bps.org.uk/assessment-development-matters-vol-2-no-1-spring-2010.html
Forced-choice tests, despite being resistant to response biases and showing good operational validities, have psychometric problems if scored traditionally. These questionnaires are generally longer than their normative counterparts, and more cognitively challenging.
The OPQ32i was shortened and re-scored using the latest advances in IRT. One item was removed out of each block, making the completion quicker and less cognitively complex. The shortened version (OPQ32r) shows good reliability, equivalent or better validity than the full ipsative version, and produces scale scores with normative properties.
Results suggest that the IRT methodology can significantly improve efficiency of existing forced-choice measures so that test takers can do less (complete shorter and easier questionnaire) and test users can get more (bias-resistant instrument of superior psychometric quality).
Bywater, J., & Brown, A. (2010). Shorter Personality Questionnaires—A User’s Guide Part 1. Assessment and Development Matters, 2, 15.
In this two part series, James Bywater and Anna Brown summarise some of the issues involved in determining the correct length of assessment in a personality questionnaire (PQ). In the first instalment they discuss the general issues that test designers face, and in the second they cover some more modern solutions to these, with associated disadvantages.
It is aimed at practitioners rather than hard core psychometricians and can not be exhaustive. However wherever possible it attempts to distil out practical messages for the audience.
Bywater, J., & Brown, A. (2010). Shorter Personality Questionnaires—A User’s Guide Part 2. Assessment and Development Matters, 2, 10.
In this two part series, James Bywater and Anna Brown summarise some of the issues involved in determining the correct length of assessment in a personality questionnaire (PQ). In the last edition of Assessment & Development Matters they discussed the general issues that test designers face, and in this one they cover some more modern solutions to these.
It is aimed at practitioners rather than hard core psychometricians and can not be exhaustive. However wherever possible it attempts to distil out practical messages for the audience.
Warr, P., Bartram, D., & Brown, A. (2005). Big Five validity: Aggregation method matters. Journal of Occupational and Organizational Psychology, 78, 377-386. doi:10.1348/096317905X53868
Correlations between Big Five personality factors and other variables have been examined in three different ways: direct scoring of items within a factor, application of a composite score formula, and taking the average of single-scale correlations. Those methods were shown to yield consistently different outcomes in four sets of data from sales-people and managers. Factor correlations with job performance were greatest for direct scoring, and were reduced by half when scale correlations were averaged. The insertion of previously suggested estimates into the composite score formula yielded intermediate correlations with performance. It is necessary to interpret summary accounts of correlations with a compound construct in the light of the aggregation method employed.