The American Academy of Sleep Medicine Inter-scorer Reliability program: respiratory events

J Clin Sleep Med. 2014 Apr 15;10(4):447-54. doi: 10.5664/jcsm.3630.

Abstract

Study objectives: The American Academy of Sleep Medicine (AASM) Inter-scorer Reliability program provides a unique opportunity to compare a large number of scorers with varied levels of experience to determine agreement in the scoring of respiratory events. The objective of this paper is to examine areas of disagreement to inform future revisions of the AASM Manual for the Scoring of Sleep and Associated Events.

Methods: The sample included 15 monthly records, 200 epochs each. The number of scorers increased steadily during the period of data collection, reaching more than 3,600 scorers by the final record. Scorers were asked to identify whether an obstructive, mixed, or central apnea; a hypopnea; or no event was seen in each of the 200 epochs. The "correct" respiratory event score was defined as the score endorsed by the most scorers. Percentage agreement with the majority score was determined for each epoch and the mean agreement determined.

Results: The overall agreement for scoring of respiratory events was 93.9% (κ = 0.92). There was very high agreement on epochs without respiratory events (97.4%), and the majority score for most of the epochs (87.8%) was no event. For the 364 epochs scored as having a respiratory event, overall agreement that some type of respiratory event occurred was 88.4% (κ = 0.77). The agreement for epochs scored as obstructive apnea by the majority was 77.1% (κ = 0.71), and the most common disagreement was hypopnea rather than obstructive apnea (14.4%). The agreement for hypopnea was 65.4% (κ = 0.57), with 16.4% scoring no event and 14.8% scoring obstructive apnea. The agreement for central apnea was 52.4% (κ = 0.41). A single epoch was scored as a mixed apnea by a plurality of scorers.

Conclusions: The study demonstrated excellent agreement among a large sample of scorers for epochs with no respiratory events. Agreement for some type of event was good, but disagreements in scoring of apnea vs. hypopnea and type of apnea were common. A limitation of the analysis is that most of the records had normal breathing. A review of controversial events yielded no consistent bias that might be resolved by a change of scoring rules.

Keywords: Scoring; apnea; hypopnea; reliability; respiratory events.

MeSH terms

  • Humans
  • Observer Variation*
  • Reproducibility of Results
  • Respiratory Physiological Phenomena
  • Sleep Apnea Syndromes / diagnosis*
  • Sleep Apnea Syndromes / physiopathology
  • Sleep Medicine Specialty / standards*
  • Societies, Medical / standards
  • United States