Urban Sound Classification for Audio Analysis using Long Short Term Memory

Authors

  • Shivam Tyagi Department of Computer Science and Engineering, Bharati Vidyapeeth’s College of Engineering, New Delhi
  • Kanishka Aggarwal Department of Computer Science and Engineering, Bharati Vidyapeeth’s College of Engineering, New Delhi
  • Deepika Kumar Bharati Vidyapeeth's College of Engineering, New Delhi
  • Shreya Garg Department of Computer Science and Engineering, Bharati Vidyapeeth’s College of Engineering, New Delhi
  • Neeraj Department of Computer Science and Engineering, Bharati Vidyapeeth’s College of Engineering, New Delhi

Keywords:

Sound classification, Urban sound8k, Mel spectrogram, long short term memory, Deep learning

Abstract

The process of audio classification involves categorizing audio signals into predefined classes based on their acoustic characteristics. Deep learning techniques have played a significant role in addressing this issue. Researchers have proposed various approaches to advance the field, including exploring different neural network architectures, incorporating auxiliary information like keywords or sentence information to guide audio classification, and implementing diverse training strategies.  In this study, the researchers propose the use of a Long Short-Term Memory (LSTM) network for classifying environment sounds. The UrbanSound8K dataset's audio data files are categorized into 10 classes using the proposed LSTM model. The researchers evaluate the model using various metrics. The results show an accuracy of 0.86, precision of 0.87, recall of 0.87, support value of 1747, and an f1 score of 0.87 achieved by the proposed model. The researchers compare their methodology with state-of-the-art approaches and present the empirical evaluation alongside their findings.

Downloads

Published

2023-08-24