Abstract:
Since the earliest studies of human behavior, emotions have attracted
attention of researchers in many disciplines, including psychology,
neuroscience, and lately computer science. Speech is considered a salient
conveyor of emotional cues, and can be used as an important source for
emotional studies. Speech is modulated for different emotions by varying
frequency- and energy-related acoustic parameters such as pitch, energy,
and formants. In this paper, we explore analyzing inter- and intra-subband
energy variations to differentiate six emotions. The emotions considered are
anger, disgust, fear, happiness, neutral, and sadness. In this research, TwoLayered Cascaded Subband Cepstral Coeffcients (TLCS-CC) analysis was
introduced to study energy variations within low and high arousal emotions
as a novel approach for emotion classification. The new approach was
compared with Mel Frequency Cepstral Coeffcients (MFCC) and Log
Frequency Power Coeffcients (LFPC). Experiments were conducted on the
Berlin Emotional Data Corpus (BEDC). With energy-related features, we
could achieve average accuracy of 73.9% and 80.1% for speakerindependent and-dependent emotion classification respectively.