Homework 1
tasks in the homework:
We want to address the question of whether women talk more than men. To answer this, we will use the Fisher corpus on Carmen.
Write a python script that outputs —
- the raw total number of words spoken by women
- the raw total number of words spoken by men
- the total number of utterances spoken by women
- the total number of utterance spoken by men
- the average number of words per utterance spoken by women and by men
- the number of female speakers
- the number of male speakers
the previous assignment plus opening the Fisher files with nested ‘for’ loops gives:
which is the output of this loop (so far) written in class:
# ------ importing all directories ----- # import os # ----- initialization of tracking variables ----- # totalWordsSpoken = 0 totalUtterances = 0 wordsW = 0 wordsM = 0 wordsN = 0 utterW = 0 utterM = 0 utterN = 0 # ----- listing top-level directory ----- # dir = "Fisher" dirA = os.listdir(dir) # ----- listing subdirectories ----- # for dirB in dirA: dirC = dir + "/" + dirB fileA = os.listdir(dirC) # ----- opening files ----- # for fileB in fileA : path = dirC + "/" + fileB fileC = open(path) # ----- for loop ----- # for sentence in fileC: # ----- processing block ----- # words = sentence.split() onlywords = words[3:] genderLetter = words[2][2] speakerID = words[2][0] numberwords = len(onlywords) onlysen = " ".join(onlywords) # ----- count ----- # totalWordsSpoken += numberwords totalUtterances += 1 # ----- gender output ----- # if genderLetter == 'f': gender = "woman" wordsW += numberwords utterW += 1 elif genderLetter == 'm': gender = "man" wordsM += numberwords utterM += 1 else: gender = "non-gendered person" wordsN = numberwords utterN += 1 # ----- output per sentence----- # print() print(" ", "sentence number", totalUtterances, ":", onlysen) print(" ", "number of words:", numberwords) print(" ", "words are:", onlywords) print(" ", "speaker ID:", speakerID) print(" ", "speaker is a:", gender) # ----- final totals output ----- # print() print(" ", "total number of words:", totalWordsSpoken) print(" ", "total number of utterances:", totalUtterances) print(" ", "the average number of words per utterance was :", totalWordsSpoken / totalUtterances) print() print(" ", "total words spoken by women:", wordsW) print(" ", "total number of utterances:", utterW) print(" ", "the average number of words per utterance was :", wordsW / utterW) print() print(" ", "total words spoken by men:", wordsM) print(" ", "total number of utterances:", utterM) print(" ", "the average number of words per utterance was :", wordsM / utterM) print() print(" ", "total words unaccounted for by gender:", wordsN) print(" ", "total number of utterances unaccounted for:", utterN) print()
need to add some stuff in to count the number of speakers of each gender.
having a bit of trouble with the if/else statements here and the nesting. I’m going to build a code for only one file, extend it to two, and then run it on the entire corpus once I can differentiate between the ‘A-f:’ in one file, and the ‘A-f:’ in the next file in the list. I’m also assuming that there are never any repeated speakers across files.
Here are the counts for fe_03_06500.txt to compare the code against:
(there is 1 female speaker, and 1 male speaker.
here is my speaker count abstraction (I’m sure there’s an easier way to do this):
# --- babyHW1.py --- # # --- instantiate --- # a = 13 #w b = 0 #w c = 2 #m d = 9 #m e = 0 #n w = 0 m = 0 n = 0 # --- speaker count --- # if a > 0: w += 1 else: w = 0 if b > 0: w += 1 else: w = 0 if c > 0: m += 1 else: m = 0 if d > 0: m += 1 else: m = 0 if e > 0: n += 1 else: n = 0 # --- output --- # print("w:", w) print("m:", m) print("n:", n)