Abstract: Multimodal data processing, especially the fusion of image and speech modality, is important for future human computer interface, medical applications and security surveillance. This ...