Because direct measurements of past occupational exposures are rarely available in population-based case-control studies, exposure assessment of job histories by multiple expert raters is frequently used; however, the subjective nature of this method makes measuring reliability an important quality control step. We evaluated inter-rater reliability of 7729 retrospective jobs reported in the National Birth Defects Prevention Study. Jobs were classified as exposed, unexposed, or exposure unknown by two independent industrial hygienists; exposed jobs were further evaluated for intensity, frequency, and routes. Exposure prevalence ranged from 0.1-9.8%. Inter-rater reliability for exposure (yes/no), assessed by kappa coefficients, was fair to good for cadmium (κ = 0.46), chlorinated solvents (κ = 0.59), cobalt (κ = 0.54), glycol ethers (κ = 0.50), nickel compounds (κ = 0.65), oil mists (κ = 0.63), and Stoddard Solvent (κ = 0.55); PAHs (κ = 0.24) and elemental nickel (κ = 0.37) had poor agreement. After a consensus conference resolved disagreements, an additional 4962 jobs were evaluated. Inter-rater reliability improved or stayed the same for cadmium (κ = 0.51), chlorinated solvents (κ = 0.81), oil mists (κ = 0.63), PAHs (κ = 0.52), and Stoddard solvent (κ = 0.92) in the second job set. Inter-rater reliability varied by exposure agent and prevalence, demonstrating the importance of measuring reliability in studies using a multiple expert rater method of exposure assessment.