Homework 1



Homework 1

tasks in the homework:

We want to address the question of whether women talk more than men. To answer this, we will use the Fisher corpus on Carmen.
Write a python script that outputs —

  • the raw total number of words spoken by women
  • the raw total number of words spoken by men
  • the total number of utterances spoken by women
  • the total number of utterance spoken by men
  • the average number of words per utterance spoken by women and by men
  • the number of female speakers
  • the number of male speakers

the previous assignment plus opening the Fisher files with nested ‘for’ loops gives:
HW1_output

which is the output of this loop (so far) written in class:

# ------ importing all directories ----- #
import os

# ----- initialization of tracking variables ----- #
totalWordsSpoken = 0
totalUtterances = 0

wordsW = 0
wordsM = 0
wordsN = 0

utterW = 0
utterM = 0
utterN = 0

# ----- listing top-level directory ----- #
dir = "Fisher"
dirA = os.listdir(dir)

# ----- listing subdirectories ----- #
for dirB in dirA:
    dirC = dir + "/" + dirB
    fileA = os.listdir(dirC)
    
    # ----- opening files ----- #
    for fileB in fileA :
        path = dirC + "/" + fileB
        fileC = open(path)

        # -----	for loop ----- #
        for sentence in fileC:
            
            # -----	processing block ----- #
            words = sentence.split()
            onlywords = words[3:]
            genderLetter = words[2][2]
            speakerID = words[2][0]
            numberwords = len(onlywords)
            onlysen = " ".join(onlywords)
            
            # -----	count ----- #
            totalWordsSpoken += numberwords
            totalUtterances += 1
            
            # -----	gender output ----- #
            if genderLetter == 'f':
                gender = "woman"
                wordsW += numberwords
                utterW += 1
            elif genderLetter == 'm':
                gender = "man"
                wordsM += numberwords
                utterM += 1
            else:
                gender = "non-gendered person"
                wordsN = numberwords
                utterN += 1

            # -----	output per sentence----- #
            print()
            print("  ", "sentence number", totalUtterances, ":", onlysen)
            print("  ", "number of words:", numberwords)
            print("  ", "words are:", onlywords)
            print("  ", "speaker ID:", speakerID)
            print("  ", "speaker is a:", gender)

# -----	final totals output ----- #
print()
print("  ", "total number of words:", totalWordsSpoken)
print("  ", "total number of utterances:", totalUtterances)
print("  ", "the average number of words per utterance was :",
      totalWordsSpoken / totalUtterances)
print()
print("  ", "total words spoken by women:", wordsW)
print("  ", "total number of utterances:", utterW)
print("  ", "the average number of words per utterance was :",
      wordsW / utterW)
print()
print("  ", "total words spoken by men:", wordsM)
print("  ", "total number of utterances:", utterM)
print("  ", "the average number of words per utterance was :",
      wordsM / utterM)
print()
print("  ", "total words unaccounted for by gender:", wordsN)
print("  ", "total number of utterances unaccounted for:", utterN)
print()

need to add some stuff in to count the number of speakers of each gender.

having a bit of trouble with the if/else statements here and the nesting. I’m going to build a code for only one file, extend it to two, and then run it on the entire corpus once I can differentiate between the ‘A-f:’ in one file, and the ‘A-f:’ in the next file in the list. I’m also assuming that there are never any repeated speakers across files.
Here are the counts for fe_03_06500.txt to compare the code against:
(there is 1 female speaker, and 1 male speaker.
grep_speakerID_06500

here is my speaker count abstraction (I’m sure there’s an easier way to do this):

# --- babyHW1.py --- #

# --- instantiate --- #
a = 13 	#w
b = 0	#w
c = 2	#m
d = 9	#m
e = 0	#n

w = 0
m = 0
n = 0

# --- speaker count --- #
if a > 0:
	w += 1
else:
	w = 0

if b > 0:
	w += 1
else:
	w = 0

if c > 0:
	m += 1
else:
	m = 0

if d > 0:
	m += 1
else:
	m = 0

if e > 0:
	n += 1
else:
	n = 0

# --- output --- #
print("w:", w)
print("m:", m)
print("n:", n)

A and B loops

import os

countX = 0
countA = 0
countB = 0

dirX = "Fisher"
listA = os.listdir(dirX)
countX += 1

for itemB in listA:
    pathC = dirX + "/" + itemB
    listD = os.listdir(pathC)
    countA += 1
    
    for itemE in listD:
        pathF = pathC + "/" + itemE
        itemG = open(pathF)
        countB += 1

        for fileH in itemG:
            outputI = len(fileH)


print()
print(dirX)
print(itemB)
print(itemE)
print(itemG)
print(outputI)

print()
print(" lines/items in --")
print("   --       Fisher =", countX)
print("   --    Fisher/.. =", countA)
print("   -- Fisher/../.. =", countB)
print()

Python IDs



[pdf]

module.fuction("argument")

here,
module = os
funtion = listdir
argument = path

so…

import os
os.listdir("Fisher")   
# This is a relative path (relative to where I am)\
  which director I start in