Chalder fatigue scale

The Chalder fatigue scale (CFQ) is a questionnaire created by the research team of Trudie Chalder at Kings College to measure the severity of tiredness in fatiguing illnesses. The Fatigue Scale has been used in multiple randomized trials of behavioral interventions in patients with myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS), including the PACE-trial. While the scale has good internal consistency and convergent validity, it has been criticized for having ceiling effects and other operational flaws.

Origin
The Chalder fatigue scale was developed by the research team of Trudie Chalder at King's College London in 1993. The scale was based on a similar questionnaire that had been used in a hospital-based case study comparing ME/CFS patients to patients with neuromuscular illnesses, and a study testing the efficacy of cognitive behavioral therapy in ME/CFS patients.

From 14 to 11 items
Three versions of the CFQ exist. Originally the scale had 14 questions but following a principal component analysis (PCA) and item discriminative properties based on receiver-operating characteristic (ROC) analysis, 3 items were excluded. While the 14-item scale was used in several studies in the 1990s, the 11-item version is the most used version, today.


 * 1) Do you have problems with tiredness?
 * 2) Do you need to rest more?
 * 3) Do you feel sleepy or drowsy?
 * 4) Do you have problems starting things?
 * 5) Do you lack energy?
 * 6) Do you have less strength in your muscles?
 * 7) Do you feel weak?
 * 8) Do you have difficulties concentrating?
 * 9) Do you make slips of the tongue when speaking?
 * 10) Do you find it more difficult to find the right word?
 * 11) How is your memory?

The deteled items from the 14-item scale are:
 * Do you start things without difficulty but get weak as you go on?
 * Do you think as clearly as usual?
 * Are you still interested in the things you used to do?

Likert or bimodal scoring
The Chalder fatigue has two scoring system. In the bimodal scoring system, respondents answer each question with a 1 or a 0 to indicate the questions apply to them or not. In the Likert scoring system, respondents can give a score of 0 to 3 to indicate how each statement applies to them, from “less than usual” to “much more than usual”. While the first version counts the number of symptoms, the other weights the intensity of the symptoms. Some have argued that these changes should not be seen as merely different scoring schemes, but as different versions of the same scale. In the PACE trial both scoring schemes of the Chalder fatigue scale were recorded. The data showed that 22 participants showed improvement at the primary trial endpoint based on one scoring method, while the other scoring method showed a decline.

Physical and mental fatigue
The 11-item chalder fatigue scale is often divided into two components: one that measures physical fatigue (questions 1-7) and one that measures mental fatigue (questions 8-11). These components were confirmed by an analysis of a sample of 361 ME/CFS patients and 1615 healthy persons. Other analyses however found only 1 (De Vries et al. 2003, Norton et al. 2016, Jing et al. 2016) or more than two components (Morris et al. 1998, Fong et al. 2015) in the 11-item Chalder fatigue scale.

Internal consistency and convergent validity
The Chalder fatigue scale has been shown to have good internal consistency, as indicated by a split half reliabilities of 0.85 and a Cronbach alpha that ranges between 0.86 and 0.92. De Vries et al. 2003 demonstrated that the CFQ had good test-retest reliability and convergent validity, as it correlated strongly with other fatigue questionnaires. This was confirmed by Jason et al. who noted a good correlation between the CFQ and the fatigue severity scale by Krupp et al.

The correlation of the CFQ with other measures of health has been conflicting. Fong et al. 2015 noted that the CFQ correlated modestly with a poor physical and mental quality of life, while Wong et al. found that the CFQ was weakly correlated with physical quality of life. According to Jason et al. 2000 the CFQ correlated poorly with characteristic CFS-symptoms such as post-exertional malaise or muscle pain, compared to another fatigue questionnaire.

Content validity
Some authors have questioned whether the CFQ adequately assesses fatigue. Morris et al. noted that the item “feeling sleepy or drowsy” seems more related to sleepiness and problems with maintaining sleep at night, than fatigue. Kindlon T. has argued that the item “do you have problems starting things” seems more related to motivation instead of fatigue. Both these questioned received low scores in a study of ME patients. Wilshere et al. pointed out that four items of the CFQ measure cognitive difficulties (difficulties concentrating, slips of the tongue, difficulty finding the right word and memory problems) instead of fatigue. Since a bimodal score of four or more on the CFQ has been defined as caseness of fatigue,  it’s possible for a patient to be a fatigue case if their only symptoms are these neurological symptoms. A detailed critique of the CFQ, written by members of the online forum Science for ME, concluded: "“The scale assumes that memory problems, speech errors, sleepiness/drowsiness, muscle weakness and so on are indicators of fatigue, and that the more such symptoms a patient reports, the greater their overall fatigue. These assumptions are untested and their basis is unclear.”"

Discrimination
The Chalder fatigue scale has been used in studies of various diseases such as cancer, multiple sclerosis, and rheumatoid arthritis. It has been translated into different languages including Brazilian and Chinese. The use of the CFQ in patients with ME/CFS, however, has been criticized. Most studies on the validity of the CFQ in this patient population used the Oxford criteria, a definition of ME/CFS that has been rejected by American health authorities.

Cella et al. reported that high scores on the CFQ effectively discriminate between ME/CFS patients and the general population. In their study, a person with a Likert score of 29, had less than a 5% chance of not having CFS. Good sensitivity of the CFQ in selecting patients with ME/CFS was confirmed by Jason et al. 2011. The same study indicated however, that the CFQ lacks adequate specificity: some patients without ME/CFS also obtained high scores.

High sensitivity but low specificity was also the conclusion of a study that used the CFQ as an early screening tool in making the diagnosis of ME/CFS. While the CFQ was able to distinguish ME/CFS patients from healthy persons, it failed to differentiate ME/CFS from other fatiguing illnesses such as multiple sclerosis  and lupus. A further study by Friedberg & Jason concluded that the CFQ was unable to distinguish individuals with CFS from those with primary depression.

Only one studied the validity of the CFQ in patients with ME. Goudsmit et al. report that the fatigue scale failed to discriminate between patients with moderate and severe ME. There was a large overlap in fatigue scores between these two groups.

Ceiling effects
The use of the Chalder Fatigue Scale in ME/CFS has been criticized because ME/CFS patients often record the maximum score on most of the 11 questions. As a result, patients can no longer indicate a worsening of their fatigue, a phenomenon that is called the ceiling effect. This can influence the findings of randomized trials. As explained by Rebecca Goldin: "“Let us suppose for a moment that 100 people are experiencing extreme fatigue. They each answer “much worse than usual” on the Questionnaire (a 3) to all 11 questions, resulting in a score of 33. Over the course of a year, there are random fluctuations in their health—half get worse, and half get better.  Now they take the questionnaire again. Those who get worse still answer “3” to all questions (final score: 33). Those who improve now answer a “2” to all questions, stating that they are just “worse than usual” but not “much worse” (final score: 22). The new average is now 27.5, a significant improvement over the original score of 33.”"In other words, if patients record the maximum score and half of them improve while the other half deteriorates during follow-up then only the improvement will become visible on the questionnaire.

Stouten (2005) calculated lower bounds for the number of items with the maximum score on the CFQ for several behavioral intervention studies. High ceiling effects were noted in multiple trials. In the randomized trials of Deale et al. and Powell et al. the intervention group recorded the maximum bimodal score on more than 90% of the questions on the CFQ. A study on 25 patients with ME, found that 50% of the patients recorded the maximum score using the bimodal method.

The problem is less pronounced using the Likert score, though 15% of ME patients still indicated the maximum score of 33. In the FINE and PACE trial, 29.1% and 14.5% of the participants respectively scored the maximum score at baseline.”

Problems with the Likert score
Due to ceiling effects, the Likert scoring has become the more popular version of the CFQ. The PACE trial, for example, changed their primary outcome of fatigue from bimodal scoring as chosen by the protocol, to Likert scoring.

Separate problems have been noted with this scoring method. The introduction to this version of the CFQ, asks respondents to compare themselves to how they felt when they were last well. A response of ‘no more than usual’ (score 1) would thus indicate full recovery. Persons without fatigue problems would score 11/33, indicating that they had fatigue ‘no more than usual’. Indeed, the use of the CFQ in healthy community samples yielded scores of 12-14.

Yet the Likert score of the CFQ also offers the option “less than usual” (score 0). It’s not clear what such an answer means. It seems to indicate an abnormal absence of fatigue complaints. Evidence that this option confuses respondents, comes from a trial on cognitive behavioral therapy in patients with multiple sclerosis. Post-treatment MS patients recorded a score of less than 10, indicating they had less fatigue than healthy persons. Even the control which received relaxation therapy had lower fatigue scores than healthy persons. This indicates that they misinterpreted the less than usual” (score 0) option. Results as these questions the reliability of the Likert scoring system of the CFQ.

Notable studies

 * 1993, Development of a fatigue scale
 * 2011, PACE trial
 * 2017, Public Review - Draft of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) Common Data Elements (CDE); Fatigue Subgroup Materials (Full Text)

Learn more

 * Chalder fatigue scale
 * ME analysis - The Chalder fatigue scale
 * 2016, Fatigued by scales as outcome measures
 * 2018, Analysis of the Chalder fatigue scale submitted by a team of Science for ME forum members to the NIH/CDC Common Data Elements review.