09 | May | 2014 | Technical Tools

tasks in the homework:

We want to address the question of whether women talk more than men. To answer this, we will use the Fisher corpus on Carmen.
Write a python script that outputs —

the raw total number of words spoken by women

the raw total number of words spoken by men

the total number of utterances spoken by women

the total number of utterance spoken by men

the average number of words per utterance spoken by women and by men

the number of female speakers

the number of male speakers

the previous assignment plus opening the Fisher files with nested ‘for’ loops gives:

which is the output of this loop (so far) written in class:

# ------ importing all directories ----- #
import os

# ----- initialization of tracking variables ----- #
totalWordsSpoken = 0
totalUtterances = 0

wordsW = 0
wordsM = 0
wordsN = 0

utterW = 0
utterM = 0
utterN = 0

# ----- listing top-level directory ----- #
dir = "Fisher"
dirA = os.listdir(dir)

# ----- listing subdirectories ----- #
for dirB in dirA:
    dirC = dir + "/" + dirB
    fileA = os.listdir(dirC)
    
    # ----- opening files ----- #
    for fileB in fileA :
        path = dirC + "/" + fileB
        fileC = open(path)

        # -----	for loop ----- #
        for sentence in fileC:
            
            # -----	processing block ----- #
            words = sentence.split()
            onlywords = words[3:]
            genderLetter = words[2][2]
            speakerID = words[2][0]
            numberwords = len(onlywords)
            onlysen = " ".join(onlywords)
            
            # -----	count ----- #
            totalWordsSpoken += numberwords
            totalUtterances += 1
            
            # -----	gender output ----- #
            if genderLetter == 'f':
                gender = "woman"
                wordsW += numberwords
                utterW += 1
            elif genderLetter == 'm':
                gender = "man"
                wordsM += numberwords
                utterM += 1
            else:
                gender = "non-gendered person"
                wordsN = numberwords
                utterN += 1

            # -----	output per sentence----- #
            print()
            print("  ", "sentence number", totalUtterances, ":", onlysen)
            print("  ", "number of words:", numberwords)
            print("  ", "words are:", onlywords)
            print("  ", "speaker ID:", speakerID)
            print("  ", "speaker is a:", gender)

# -----	final totals output ----- #
print()
print("  ", "total number of words:", totalWordsSpoken)
print("  ", "total number of utterances:", totalUtterances)
print("  ", "the average number of words per utterance was :",
      totalWordsSpoken / totalUtterances)
print()
print("  ", "total words spoken by women:", wordsW)
print("  ", "total number of utterances:", utterW)
print("  ", "the average number of words per utterance was :",
      wordsW / utterW)
print()
print("  ", "total words spoken by men:", wordsM)
print("  ", "total number of utterances:", utterM)
print("  ", "the average number of words per utterance was :",
      wordsM / utterM)
print()
print("  ", "total words unaccounted for by gender:", wordsN)
print("  ", "total number of utterances unaccounted for:", utterN)
print()

need to add some stuff in to count the number of speakers of each gender.

having a bit of trouble with the if/else statements here and the nesting. I’m going to build a code for only one file, extend it to two, and then run it on the entire corpus once I can differentiate between the ‘A-f:’ in one file, and the ‘A-f:’ in the next file in the list. I’m also assuming that there are never any repeated speakers across files.
Here are the counts for fe_03_06500.txt to compare the code against:
(there is 1 female speaker, and 1 male speaker.

here is my speaker count abstraction (I’m sure there’s an easier way to do this):

# --- babyHW1.py --- #

# --- instantiate --- #
a = 13 	#w
b = 0	#w
c = 2	#m
d = 9	#m
e = 0	#n

w = 0
m = 0
n = 0

# --- speaker count --- #
if a > 0:
	w += 1
else:
	w = 0

if b > 0:
	w += 1
else:
	w = 0

if c > 0:
	m += 1
else:
	m = 0

if d > 0:
	m += 1
else:
	m = 0

if e > 0:
	n += 1
else:
	n = 0

# --- output --- #
print("w:", w)
print("m:", m)
print("n:", n)

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Ohio State nav bar

Day: 9 May 2014

Homework 1

A and B loops

Python IDs