a simple look at wiki page reveals that MFCC (the Mel-Frequency Cepstral Coefficients) are computed based on (logarithmically distributed) human auditory bands, instead of a linear so as an inital expectation there are about 10 full octaves from 30 hz to 16 khz (or 11 if you begin from 20Hz to go up 20Khz) and even further if you prefer processing 1/3 octaves, you would then have around 30-40 ...
MFCC - Significance of number of features - Signal Processing Stack ...
I'm studying speech-recognition, in particular the use of MFCC for feature extraction. All examples I've found online tend to graph a series of MFCC extracted from a particular utterance as follows (
MFCC is represented by 39 values for each window frame. 12 values are the mel filter-bank and we get 13th value by taking DCT [ Is this right ]? So rest are the delta and double delta and their energy. Below is the equation for calculating mel frequency cepstrum: It appears to me that it gives a single value for a window frame.
Also, the shape of and MFCC of low energy seem to be very similar. I realised that it is similar after I take DCT. Where is the my mistake in calculation? Cheers! Celdor EDIT: I understand now why the first MFCC coeficient is very low. If I look at DCT II, its first component is just a straight line:
The steps of computing the Mel-Frequency Cepstrum Coefficients (MFCC) are: Frame blocking -> Windowing-> abs(DFT) -> Mel filter bank-> Sum coefficients for each filter-> Logarithm -> DCT But what is the purpose of the logarithm step?