CQUniversity
Browse

Audio-visual based online multi-source separation

journal contribution
posted on 2024-08-06, 02:00 authored by J Ong, BT Vo, S Nordholm, BN Vo, Diluka Moratuwage, C Shim
Meeting or conference assistance is a popular application that typically requires compact configurations of co-located audio and visual sensors. This paper proposes a novel solution for online separation of an unknown and time-varying number of moving sources using only a single microphone array co-located with a single visual device. The approach exploits the complementary nature of simultaneous audio and visual measurements, accomplished by a model-centric 3-stage process of detection, tracking, and (spatial) filtering, which performs separation in a block-wise or recursive fashion. Fusing the measurements requires solving the multi-modal space-time permutation problem, since the audio and visual measurements reside in different observation spaces, but also are unidentified or unlabeled (with respect to the unknown and time-varying number of sources), and are subject to noise, extraneous measurements and missing measurements. A labeled random finite set tracking filter is applied to resolve the permutation problem and recursively estimate the source identities and trajectories. A time-varying set of generalized side-lobe cancellers is constructed based on the tracking estimates to perform online separation. Evaluations are undertaken with live human speakers.

Funding

Category 1 - Australian Competitive Grants (this includes ARC, NHMRC)

History

Volume

30

Start Page

1219

End Page

1234

Number of Pages

16

eISSN

2329-9304

ISSN

2329-9290

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Peer Reviewed

  • Yes

Open Access

  • No

Era Eligible

  • Yes

Journal

IEEE/ACM Transactions on Audio Speech and Language Processing

Usage metrics

    CQUniversity

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC