06 – January 26

 

Mel-frequency Cepstral Coefficients (MFCCs)

MFCCs provide information about the timbral characteristics of waveform and discard pitch information. Used in speech recognition because they provide a good representation of formants and in music information retrieval for timbre-dependent tasks like genre recognition.

  • Calculate the Short Time Fourier Transform (STFT, which we covered last class in our discussion of spectrograms)
  • Map the linear frequencies of the spectrum obtained from the STFT  onto the mel scale, using triangular overlapping windows
  • Convert the spectrum amplitudes to the log scale
  • Take the Discrete Cosine Transform (DCT) of the mel log-amplitudes

The mel scale is a perceptual scale of pitches judged to be equidistant from one another. The mel and the hertz scale intersect at 1000 Mels/1000 Hz, the divergence outside this point is shown in the plot below.

The DCT is similar to the discrete Fourier transform (DFT) but differs in that it is real, whereas the DFT is complex. Thus the DCT returns a single coefficient for each frequency with a fixed phase, whereas the DFT returns two coefficients (amplitude and phase) for each frequency. It is useful for compression algorithms because much of the information is collected in the lowest coefficients.

– x is the signal
– N is the length of the signal
– k is the number of coefficients

The figure below shows MFCCs 2–13 for avm.wav (the opening phrase of Schubert’s Ave Maria).

Today we will be explore the frequency-domain representations available in the MATLAB Chroma Toolbox by Meinard Mueller and Sebastian Ewert and Auditory Toolbox by Malcolm Slaney.

Chromagram

Installation Instructions: Once you download the MATLAB Chroma Toolbox and unzip the folder, add it and its subfolders to your MATLAB path. Details about the toolbox are available in this paper: Müller, M., and S. Ewert. 2011. Chroma toolbox: MATLAB impelmentations for extracting variants of chroma-based audio features. In Proceedings of the International Society for Music Information Retrieval.

Chromagrams represent the total energy in each chroma (the 12 pitch class) across the entire spectrum by applying octave equivalency to the frequency bins of a STFT or the output of a filter bank (a constant-Q filter bank is used in the Chroma Toolbox). The figure below shows a chromagram of avm.wav (the opening phrase of Schubert’s Ave Maria).

Cochleagram

Installation Instructions: Once you download the toolbox, you will need to add the unzipped folder and its subfolders to your MATLAB path. Then, in MATLAB, go to the ‘src’ folder and run the ‘Makefile’ script to compile the mex files. After that you should be able to run the ‘test_auditory’ script. If you get the following error message running the ‘test_auditory’ script: “Error: File: SeneffEarSetup.m Line: 217 Column: 5  “i” previously appeared to be used as a function or command, conflicting with its use here as the name of a variable.” You will have to make the following change to ‘SeneffEarSetup.m’: in line 214 change cf = exp(i*2*pi*f/fs); to cf = exp(1i*2*pi*f/fs); — that is add a ‘1’ before the ‘i’ after ‘cf = exp(‘

The figure below shows a cochleagram of avm.wav (the opening phrase of Schubert’s Ave Maria).

Leave a Reply

Your email address will not be published. Required fields are marked *