Abstract: Audio-visual zero-shot learning (ZSL) leverages both video and audio information for model training, aiming to classify new video categories that were not seen during the training. However, ...
Fish have been known to make sounds for over two millennia, yet much of this underwater world has remained acoustically ...
12don MSN
2025 in visual storytelling
Explore some favorite visual stories of designers, developers and art directors from The Washington Post’s Design, Graphics and Opinions teams.
Abstract: Source Device Identification (SDI) is pivotal in multimedia forensics, as it entails the recognition of the device that captured a specific image or video. This paper introduces an ...
Bipolar Disorder, Digital Phenotyping, Multimodal Learning, Face/Voice/Phone, Mood Classification, Relapse Prediction, T-SNE, Ablation Share and Cite: de Filippis, R. and Al Foysal, A. (2025) ...
In this paper, we propose a new multi-modal task, termed audio-visual instance segmentation (AVIS), which aims to simultaneously identify, segment and track individual sounding object instances in ...
Deepfake scams are increasing at an alarming rate, surging over 520% in 2025 alone. AI-generated voices and faces are tricking people into transferring millions of dollars, often under the guise of ...
Music is an essential part of human culture, but automatically classifying songs into genres is a challenging problem for computers. With the explosion of digital music libraries, manual tagging is ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results