This paper presents our response to the first international challenge on facial emotion recognition and analysis. We propose to combine different types of features to automatically detect action units (AUs) in facial images. We use one multikernel support vector machine (SVM) for each AU we want to detect. The first kernel matrix is computed using local Gabor binary pattern histograms and a histogram intersection kernel. The second kernel matrix is computed from active appearance model coefficients and a radial basis function kernel. During the training step, we combine these two types of features using the recently proposed SimpleMKL algorithm. SVM outputs are then averaged to exploit temporal information in the sequence. To evaluate our system, we perform deep experimentation on several key issues: influence of features and kernel function in histogram-based SVM approaches, influence of spatially independent information versus geometric local appearance information and benefits of combining both, sensitivity to training data, and interest of temporal context adaptation. We also compare our results with those of the other participants and try to explain why our method had the best performance during the facial expression recognition and analysis challenge.