Acoustic Scene Classification with Modulation Spectrogram Features and a Convolutional Recurrent Network

پذیرفته شده برای ارائه شفاهی
کد مقاله : 1041-ISAV (R2)
1دانشگاه تهران- گروه الگوریتم ها و محاسبات
2دانشکده علوم مهندسی، دانشکده فنی، دانشگاه تهران
One of the major objectives of artificial intelligent systems is making the machine aware of the environment. Acoustic scene classification (ASC) aims to detect the auditory scene of the recorded sound. In this paper, we propose a novel feature extraction approach based on evaluating the modulation spectrogram features instead of the commonly used Mel spectrogram. Modulation spectrogram provides more discriminant features for classification. We split the recording into several temporal segments and compute the modulation spectrogram for each segment individually. The obtained feature tensors then construct the input data of a Convolutional Long Short Term Memory (Conv-LSTM) model for classification. Using LSTM, we can capture constructive temporal information used for classification. The spectral structure of the audio signal is effectively extracted by convolutional layers. The proposed model outperforms the state of the art methods in terms of the prediction accuracy for evaluation data in ASC on the DCASE 2017 dataset.
کلیدواژه ها