The rising incidence of head and neck cancer represents a serious global health challenge, requiring more accurate diagnosis and innovative surgical approaches. Multimodal nonlinear optical microscopy, combining coherent anti-Stokes Raman scattering (CARS), two-photon excited fluorescence (TPEF), and second-harmonic generation (SHG) with deep learning-based analysis routines, offers label-free assessment of the tissue's morphochemical composition and allows early-stage and automatic detection of disease. For clinical intraoperative application, compact devices are required. In this preclinical study, a cohort of 15 patients was examined with a newly developed rigid CARS/TPEF/SHG endomicroscope. To detect head and neck tumor from the multimodal data, deep learning-based semantic segmentation models were used. This preclinical study yields in a diagnostic sensitivity of 88% and a specificity of 96%. To combine diagnostics with therapy, machine learning-inspired image-guided selective tissue removal was used by integrating femtosecond laser ablation into the endomicroscope. This enables a powerful approach of intraoperative "seek and treat," paving the way to advanced surgical treatment.