Audio-visual based online multi-source separation

Ong, J; Vo, BT; Nordholm, S; Vo, BN; Moratuwage, Diluka; Shim, C

Audio-visual based online multi-source separation

journal contribution

posted on 2024-08-06, 02:00 authored by J Ong, BT Vo, S Nordholm, BN Vo, Diluka Moratuwage, C Shim

Meeting or conference assistance is a popular application that typically requires compact configurations of co-located audio and visual sensors. This paper proposes a novel solution for online separation of an unknown and time-varying number of moving sources using only a single microphone array co-located with a single visual device. The approach exploits the complementary nature of simultaneous audio and visual measurements, accomplished by a model-centric 3-stage process of detection, tracking, and (spatial) filtering, which performs separation in a block-wise or recursive fashion. Fusing the measurements requires solving the multi-modal space-time permutation problem, since the audio and visual measurements reside in different observation spaces, but also are unidentified or unlabeled (with respect to the unknown and time-varying number of sources), and are subject to noise, extraneous measurements and missing measurements. A labeled random finite set tracking filter is applied to resolve the permutation problem and recursively estimate the source identities and trajectories. A time-varying set of generalized side-lobe cancellers is constructed based on the tracking estimates to perform online separation. Evaluations are undertaken with live human speakers.

Funding

Category 1 - Australian Competitive Grants (this includes ARC, NHMRC)

History

Volume

30

Start Page

1219

End Page

1234

Number of Pages

16

eISSN

2329-9304

ISSN

2329-9290

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publisher DOI

https://dx.doi.org/10.1109/TASLP.2022.3156758

Full Text URL

https://ieeexplore.ieee.org/document/9733974

Peer Reviewed

Yes

Open Access

No

Era Eligible

Yes

Journal

IEEE/ACM Transactions on Audio Speech and Language Processing

Usage metrics

Keywords

Audio-visual Source separation Spatial filtering Labeled random finite sets Generalized labeled multi-Bernoulli

Licence

CQUniversity General 1.0

Audio-visual based online multi-source separation

Funding

Category 1 - Australian Competitive Grants (this includes ARC, NHMRC)

History

Volume

Start Page

End Page

Number of Pages

eISSN

ISSN

Publisher

Publisher DOI

Full Text URL

Peer Reviewed

Open Access

Era Eligible

Journal

Usage metrics

Categories

Keywords

Licence

Exports