Introduction/Background:

We are going to perform a classification task on two different music audio datasets, MusicNet and GTZAN. MusicNet consists of 330 WAV and MIDI files (per file type) corresponding to 330 separate classical piano compositions belonging to 15 different composers. GTZAN consists of 1000 mean feature matrices stored in CSV files and spectrogram PNG images (per file type) corresponding to 10 genres of music. For MusicNet, the task is to identify the composer for a given input of audio data and for GTZAN the task is to classify the genre of music given an input of audio data. Both of these datasets are taken from Kaggle and work in classification has recently gotten up to ~92% [4.]. Previous works struggled getting any model above 80% [1.], [2.]. One study introduced a gradient boosted ensemble decision tree method called LightGBM that outperformed fully connected neural networks [2.]. Results these days outperform them but not much recent work has been done in using tree classifiers in this problem and most implementations appear to focus on neural network implementations. Therefore, we aim to re-examine decision trees’ abilities for this task and attempt to improve upon neural network results. Additionally, in our exploratory data analysis and data pre-processing we would like to consider non-linear, as well as linear, dimensionality reduction techniques. We would like to evaluate these different methods similar to Pal et al., [3], by reducing dimensions to a specified number, running a clustering algorithm on the data, and then evaluating results posthoc. In their results, t-SNE consistently outperformed other dimension reduction techniques. Therefore, we plan to use t-SNE in order to better understand and visualize our data in addition to principle components analysis (PCA).

Proposed Methods

We plan to primarily explore using neural networks such as fully connected and convolutional. Within the class of neural networks as methods, we are specifically planning on improving upon CNNs and the combined method illustrated in [1.]. We will also explore and evaluate gradient boosted decision trees such as XGBoost and LightGBM. We plan to compare these methods to other classic methods such as support vector classifiers and logistic regression. If permitting, we may construct a super learner model and see if we can get improvements on our results that justifies such a costly ensemble learning method.

Potential Results/Discussions

In our exploration of our data, we hope to be able to effectively reduce the dimension of the CSV datasets while maintaining some ability to separate data belonging to different classes. For each of our classification tasks, we expect to see a combination of CNNs and MLPs to perform the best from previous works with marginal improvements from adding in more spectrogram and image type data. Despite neural networks being expected to outperform other methods, we believe that gradient boosted decision trees may perform similarly and may provide benefits in lower training time and being able to analyze decision splits in parent nodes.

Checkpoints

Initial EDAs and Inner Group Presentations/Tutorials - October 20th
Dataset Pre-processing - October 25th
First Model Implementations - November 1st
Midterm Report - November 10th
Final Report - December 5th

Contribution Table

Contributor Name	Contribution Type
Austin Barton	Github, Proposal, Dataset Choices
Aditya Radhakrishnan	GanttChart
Isabelle Murray	GanttChart, Contribution Table
Karpagam Karthikeyan	Video script, Video presentation
Niki (Keyang) Lu	Video presentation

Gantt Chart

Link to Gantt Chart: Gantt Chart

Gantt Chart Page 1

Gantt Chart Page 2

Gantt Chart Page 3

References

1. Pun, A., &; Nazirkhanova, K. (2021). Music Genre Classification with Mel Spectrograms and CNN.

2. Jain, S., Smit, A., &; Yngesjo, T. (2019). Analysis and Classification of Symbolic Western Classical Music by Composer.

3. Pál, T., & Várkonyi, D.T. (2020). Comparison of Dimensionality Reduction Techniques on Audio Signals. Conference on Theory and Practice of Information Technologies.

4. Gupta, S. (2021). GTZAN-Genre Classification-Deep Learning-Val-92.4%.