Laryngeal high-speed video classification

For the clinical diagnosis of pathological conditions of the human body a variety of sophisticated examination techniques are employed these days. Most of these approaches yield vast amounts of images and measurement data with high spatial and/or temporal resolutions, e.g. MRI, CT, and Ultrasound. In order to reliably evaluate these data for diagnostic purposes, a certain extent of subjective experience is required on the part of the physician. Due to different reasons, in usual clinical time frames the amount of time available for analyzing and interpreting the acquired data is limited. As a result, diagnostic failure may occur, which can have serious consequences for the affected patient. By means of combined image processing and data analysis approaches this crucial diagnostic process can be objectified and automated. Thus, Computer-Aided Diagnosis systems can be provided to the physician, facilitating her/his clinical decision and yielding more reliable identification of pathological alterations.

One particular field of interest within this medical context is the automatic identification of voice disorders, resulting in perceivable hoarseness. Commonly, for this purpose audio recordings of the acoustical voice signal are analyzed with specialized software quantifying the amount of perturbation (noise) in the signal. However, this type of acoustical analysis does not allow for the clear assignment of certain clinical pictures to a distinct set of perturbation parameters. A more revealing approach for voice diagnosis consists in endoscopic examination of the sound-producing vocal folds in the larynx by means of digital high-speed cameras. These cameras are capable of recording the laryngeal movements at a frame rate of several thousand images per second, and thus, allowing for conclusive real-time analysis. However, the task of manually analyzing the resulting high-speed videos is time-consuming and error-prone. Through automated feature extraction from the recordings and subsequent machine learning analysis, laryngeal movement patterns can be quantitatively captured and automatically classified according to different diagnostic classes (e.g. organic and functional dysphonia). By means of the distributedDataMining infrastructure, we evaluated a large number of machine learning paradigms (e.g. Support Vector Machines, Artificial Neural Networks) and corresponding parameter optimization strategies (e.g. Grid search, Evolution strategy, Genetic algorithms). This preliminary evaluation step allowed us to identify certain learning schemes and parameters which are particularly suited for the considered clinical classification task. Details on the proposed methodology and the obtained classification results can be found in [1],[2],[3] and [4].


  1. Voigt D. Objective Analysis and Classification of Vocal Fold Dynamics from Laryngeal High-Speed Recordings. Aachen: Shaker Verlag GmbH; 2010.
  2. Voigt D, Döllinger M, Braunschweig T, Yang A, Eysholdt U, Lohscheller J. Classification of functional voice disorders based on phonovibrograms. Artificial Intelligence in Medicine. 2010;49(1):51-9.
  3. Voigt D, Lohscheller J, Döllinger M, Yang A, Eysholdt U. Automatic diagnosis of vocal fold paresis by employing phonovibrogram features and machine learning methods. Comput Methods Programs Biomed. 2010;99(3):275-88.
  4. Voigt D, Eysholdt U. Identifying relevant analysis parameters for the classification of vocal fold dynamics. J Acoust Soc Am. 2011;130:2550.