(a) Quantifying the data in the Fisher corpus
How many utterances and words do we have in the Fisher corpus on Carmen?
total lines/”utterances” in Fisher (math in Python):
total words in Fisher (math in Python):
29734 utterances
362020 words
in class
forgot to subtract the time-stamps and the markers for the words — perhaps the best way to do this would be to multiply the number of total number lines by three, and then subtract those non-words (three in each line) from the total number of words.
(b) Do people laugh?
What is the percentage of utterances containing laughter, totaling all the files of the “058” directory?
utterances containing laughter in Fisher (directions from the pdf):
the number of lines 1681 is the number of “utterances” with laughter in all of Fisher
utterances with laughter in only 058:
using different commands returned the same number of lines, but different numbers of bytes. Is it the “-R” that changed it?
10556 utterances in 058
199 utterances in 058 containing [laughter]
percentage of utterance containing [laughter] in 058 (math in Python):
approx 1.9% of the utterances in 058 contain laughter.