The purpose of the study was to estimate the temporal processing capacity of human object identification under different stimulus conditions. Objects, either facial images or characters, were shown in a rapid sequence on a computer display using a rapid serial visual presentation (RSVP) method. One of the images was a target and the other images were distracters. The task of the observer was to identify the target. A staircase algorithm was used to determine the threshold frequency of image presentation in the RSVP sequence. The threshold frequency was determined as a function of image contrast, size, and mean luminance. The results showed that the threshold frequency, around 10 Hz for faces (100 ms per face) and about 25 Hz for characters (40 ms per character), was independent of contrast and size at medium and high contrast values, medium and large sizes, and high luminances, but decreased at very low contrasts or small sizes and medium or low levels of luminance. Computer simulations with a model, in which temporal integration limited perceptual speed, suggest that the experimentally found difference in processing time for faces and characters is not due to the physical differences of these stimulus types, but it seems that face-specific sites in the brain process facial information slower than object-specific areas process character information. Contrast, size, and luminance affect the signal-to-noise ratio and the temporal characteristics of low-level neural signal representation. Thus, the results suggest that at low contrasts, low luminances and small sizes, the processing speed of object identification is limited by low-level factors, while at high contrasts and luminances, and at large sizes, processing speed is limited by high-order processing stages. Processing speed seems to depend on stimulus type so that for faces processing is slower than for characters.