over the weekend coding music… well, I guess this is debugging music, really.
some more coding music:
Joe Acheson (Hidden Orchestra) – Archipelago Mixtape by Tru Thoughts on Mixcloud
over the weekend coding music… well, I guess this is debugging music, really.
some more coding music:
Joe Acheson (Hidden Orchestra) – Archipelago Mixtape by Tru Thoughts on Mixcloud
proper link to the python 3.3 tutorial
A bit of music to code by:
DJ DARKHORSE : MIXTAPE N° 188 by The Voice Of Cassandre on Mixcloud
finishing up Alice and word count (exercise 7):
from the python tutorial: tutorial
all of these values will set up a dictionary
a = dict(one=1, two=2, three=3) b = {'one': 1, 'two': 2, 'three': 3} c = dict(zip(['one', 'two', 'three'], [1, 2, 3])) d = dict([('two', 2), ('one', 1), ('three', 3)]) e = dict({'three': 3, 'one': 1, 'two': 2}) a == b == c == d == e
from the tutorial:
tasks in the homework:
We want to address the question of whether women talk more than men. To answer this, we will use the Fisher corpus on Carmen.
Write a python script that outputs —
- the raw total number of words spoken by women
- the raw total number of words spoken by men
- the total number of utterances spoken by women
- the total number of utterance spoken by men
- the average number of words per utterance spoken by women and by men
- the number of female speakers
- the number of male speakers
the previous assignment plus opening the Fisher files with nested ‘for’ loops gives:
which is the output of this loop (so far) written in class:
# ------ importing all directories ----- # import os # ----- initialization of tracking variables ----- # totalWordsSpoken = 0 totalUtterances = 0 wordsW = 0 wordsM = 0 wordsN = 0 utterW = 0 utterM = 0 utterN = 0 # ----- listing top-level directory ----- # dir = "Fisher" dirA = os.listdir(dir) # ----- listing subdirectories ----- # for dirB in dirA: dirC = dir + "/" + dirB fileA = os.listdir(dirC) # ----- opening files ----- # for fileB in fileA : path = dirC + "/" + fileB fileC = open(path) # ----- for loop ----- # for sentence in fileC: # ----- processing block ----- # words = sentence.split() onlywords = words[3:] genderLetter = words[2][2] speakerID = words[2][0] numberwords = len(onlywords) onlysen = " ".join(onlywords) # ----- count ----- # totalWordsSpoken += numberwords totalUtterances += 1 # ----- gender output ----- # if genderLetter == 'f': gender = "woman" wordsW += numberwords utterW += 1 elif genderLetter == 'm': gender = "man" wordsM += numberwords utterM += 1 else: gender = "non-gendered person" wordsN = numberwords utterN += 1 # ----- output per sentence----- # print() print(" ", "sentence number", totalUtterances, ":", onlysen) print(" ", "number of words:", numberwords) print(" ", "words are:", onlywords) print(" ", "speaker ID:", speakerID) print(" ", "speaker is a:", gender) # ----- final totals output ----- # print() print(" ", "total number of words:", totalWordsSpoken) print(" ", "total number of utterances:", totalUtterances) print(" ", "the average number of words per utterance was :", totalWordsSpoken / totalUtterances) print() print(" ", "total words spoken by women:", wordsW) print(" ", "total number of utterances:", utterW) print(" ", "the average number of words per utterance was :", wordsW / utterW) print() print(" ", "total words spoken by men:", wordsM) print(" ", "total number of utterances:", utterM) print(" ", "the average number of words per utterance was :", wordsM / utterM) print() print(" ", "total words unaccounted for by gender:", wordsN) print(" ", "total number of utterances unaccounted for:", utterN) print()
need to add some stuff in to count the number of speakers of each gender.
having a bit of trouble with the if/else statements here and the nesting. I’m going to build a code for only one file, extend it to two, and then run it on the entire corpus once I can differentiate between the ‘A-f:’ in one file, and the ‘A-f:’ in the next file in the list. I’m also assuming that there are never any repeated speakers across files.
Here are the counts for fe_03_06500.txt to compare the code against:
(there is 1 female speaker, and 1 male speaker.
here is my speaker count abstraction (I’m sure there’s an easier way to do this):
# --- babyHW1.py --- # # --- instantiate --- # a = 13 #w b = 0 #w c = 2 #m d = 9 #m e = 0 #n w = 0 m = 0 n = 0 # --- speaker count --- # if a > 0: w += 1 else: w = 0 if b > 0: w += 1 else: w = 0 if c > 0: m += 1 else: m = 0 if d > 0: m += 1 else: m = 0 if e > 0: n += 1 else: n = 0 # --- output --- # print("w:", w) print("m:", m) print("n:", n)
import os countX = 0 countA = 0 countB = 0 dirX = "Fisher" listA = os.listdir(dirX) countX += 1 for itemB in listA: pathC = dirX + "/" + itemB listD = os.listdir(pathC) countA += 1 for itemE in listD: pathF = pathC + "/" + itemE itemG = open(pathF) countB += 1 for fileH in itemG: outputI = len(fileH) print() print(dirX) print(itemB) print(itemE) print(itemG) print(outputI) print() print(" lines/items in --") print(" -- Fisher =", countX) print(" -- Fisher/.. =", countA) print(" -- Fisher/../.. =", countB) print()
module.fuction("argument")
here,
module = os
funtion = listdir
argument = path
so…
import os os.listdir("Fisher") # This is a relative path (relative to where I am)\ which director I start in
In class, we have seen how to read from a file. Here is what the code looks like so far:
#------------initialization of tracking variables------------------- totalWordsSpoken = 0 totalUtterances = 0 #------------open the file------------------------- fisherFile = open("Fisher/065/fe_03_06500.txt") #-----------processing block--------------------- for line in fisherFile: #list of the items in the line words = line.split() print("Here are all the words", words) #extracting speaker ID speaker = words[2] #actual words uttered by the speaker actualWords = words[3:] print("the sentence is spoken by", speaker) print("their actual utterance was", actualWords) print("the sentence has", len(actualWords), "words") totalWordsSpoken += len(actualWords) totalUtterances += 1 #---------done with all the sentences; post-analysis---------- print("the total number of words spoken was", totalWordsSpoken) print("the total number of utterances was", totalUtterances) print("the average number of words per utterance was", totalWordsSpoken / totalUtterances)
Now we want to keep track of the gender information too. We want the total of words and the total of utterances uttered by women as well as the total of words and the total of utterances uttered by men. Look at the notes and adapt your code to use a “if statement” to do so. Make sure your code runs ;-) The notes give the results.
wrote this code in class (see Python scripting)
# ----- initialization of tracking variables ----- # totalWordsSpoken = 0 totalUtterances = 0 wordsW = 0 wordsM = 0 wordsN = 0 utterW = 0 utterM = 0 utterN = 0 # ----- opening file ----- # fF = open("../Downloads/Fisher/065/fe_03_06500.txt") # ----- for loop ----- # for sentence in fF: # ----- processing block ----- # words = sentence.split() onlywords = words[3:] genderLetter = words[2][2] speakerID = words[2][0] numberwords = len(onlywords) onlysen = " ".join(onlywords) # ----- count ----- # totalWordsSpoken += numberwords totalUtterances += 1 # ----- gender output ----- # if genderLetter == 'f': gender = "woman" wordsW += numberwords utterW += 1 elif genderLetter == 'm': gender = "man" wordsM += numberwords utterM += 1 else: gender = "non-gendered person" wordsN = numberwords utterN += 1 # ----- output per sentence----- # print() print(" ", "sentence number", totalUtterances, ":", onlysen) print(" ", "number of words:", numberwords) print(" ", "words are:", onlywords) print(" ", "speaker ID:", speakerID) print(" ", "speaker is a:", gender) # ----- final totals output ----- # print() print(" ", "total number of words:", totalWordsSpoken) print(" ", "total number of utterances:", totalUtterances) print(" ", "the average number of words per utterance was :", totalWordsSpoken / totalUtterances) print() print(" ", "total words spoken by women:", wordsW) print(" ", "total number of utterances:", utterW) print(" ", "the average number of words per utterance was :", wordsW / utterW) print() print(" ", "total words spoken by men:", wordsM) print(" ", "total number of utterances:", utterM) print(" ", "the average number of words per utterance was :", wordsM / utterM) print() print(" ", "total words unaccounted for by gender:", wordsN) print(" ", "total number of utterances unaccounted for:", utterN) print()
which returns this output on the macs at school (haven’t tried it at home yet)
>>> fisherFile = open("Fisher/065/fe_03_06500.txt")
Creates a file object
“tail” in Unix prints ~ the last ten lines of a flie
(use as such):
tail "Fisher/065/fe_03_06500.txt"
this code to count the number of words by gender:
# ----- initialization of tracking variables ----- # totalWordsSpoken = 0 totalUtterances = 0 wordsW = 0 wordsM = 0 wordsN = 0 utterW = 0 utterM = 0 utterN = 0 # ----- opening file ----- # fF = open("../Downloads/Fisher/065/fe_03_06500.txt") # ----- for loop ----- # for sentence in fF: # ----- processing block ----- # words = sentence.split() onlywords = words[3:] genderLetter = words[2][2] speakerID = words[2][0] numberwords = len(onlywords) onlysen = " ".join(onlywords) # ----- count ----- # totalWordsSpoken += numberwords totalUtterances += 1 # ----- gender output ----- # if genderLetter == 'f': gender = "woman" wordsW += numberwords utterW += 1 elif genderLetter == 'm': gender = "man" wordsM += numberwords utterM += 1 else: gender = "non-gendered person" wordsN = numberwords utterN += 1 # ----- output per sentence----- # print() print(" ", "sentence number", totalUtterances, ":", onlysen) print(" ", "number of words:", numberwords) print(" ", "words are:", onlywords) print(" ", "speaker ID:", speakerID) print(" ", "speaker is a:", gender) # ----- final totals output ----- # print() print(" ", "total number of words:", totalWordsSpoken) print(" ", "total number of utterances:", totalUtterances) print(" ", "the average number of words per utterance was :", totalWordsSpoken / totalUtterances) print() print(" ", "total words spoken by women:", wordsW) print(" ", "total number of utterances:", utterW) print(" ", "the average number of words per utterance was :", wordsW / utterW) print() print(" ", "total words spoken by men:", wordsM) print(" ", "total number of utterances:", utterM) print(" ", "the average number of words per utterance was :", wordsM / utterM) print() print(" ", "total words unaccounted for by gender:", wordsN) print(" ", "total number of utterances unaccounted for:", utterN) print()
how do we check that our output is correct? Count them manually?
In class, we have started writing a script that processes two hard-coded sentences, and keep a running total of the number of words actually uttered by the speakers. We did this in a dumb way: copy-pasting the processing block. Now simplify the script using a “for loop”. The idea should be to replace the two copies of the processing block with a single copy inside a loop. To do this, you’ll need to create a list containing the two sentences, write a “for loop”” operating over this list, and put the processing block inside that loop. You can of course have more than 2 sentences in the list ;-)
Here is what the code looks like so far (also posted on the website):
# initialization # keep track of total of words totalWords = 0 # processing of sentence 1 sentence = "B-f: I'm in graduate school" # get the words of the sentence words = sentence.split() print("Words of sentence 1:", words) # extract speaker ID speaker = words[0] # extract words uttered (everyting except first element in the list) actualWords = words[1:] # number of words uttered numberAWords = len(actualWords) # increment the total totalWords = totalWords + numberAWords print("speaker is: ", speaker) print("words are", actualWords) print("number of words:", numberAWords) print("total so far:", totalWords) # processing of sentence 2 sentence = "A-f: at OSU?" # get the words of the sentence words = sentence.split() print("Words of sentence 2:", words) # extract speaker ID speaker = words[0] # extract words uttered (everyting except first element in the list) actualWords = words[1:] # number of words uttered numberAWords = len(actualWords) # increment the total totalWords = totalWords + numberAWords print("speaker is: ", speaker) print("words are", actualWords) print("number of words:", numberAWords) print("total so far:", totalWords) # post-analysis print("total words uttered:", totalWords)
my code:
# -- initialization of tracking variables totalWordsSpoken = 0 totalUtterances = 0 # ----- variables in list ----- # s1 = "B-f: I'm in graduate school" s2 = "A-m: at OSU?" s3 = "B-f: um, yeah. Here at OSU" s4 = "A-m: cool, me too." s5 = "C-?: me want cookies!! nom nom nom" senList = [s1, s2, s3, s4, s5] # ----- for loop ----- # for sentence in senList: # ----- processing block ----- # words = sentence.split() onlywords = words[1:] genderLetter = words[0][2] speakerID = words[0][0] numberwords = len(onlywords) # ----- count ----- # totalWordsSpoken += len(onlywords) totalUtterances += 1 # ----- gender output ----- # if genderLetter == 'f': gender = "woman" elif genderLetter == 'm': gender = "man" else: gender = "non-gendered person" # ----- output ----- # print(" ") print(" ", "sentence number", totalUtterances, ":", sentence[5:]) print(" ", "number of words:", numberwords) print(" ", "words are:", onlywords) print(" ", "speaker ID:", speakerID) print(" ", "speaker is a:", gender) # ----- final output : counts outside the loop # how does it know to exit? no indentation? print(" ") print(" ", "total number of words:", totalWordsSpoken) print(" ", "total number of utterances:", totalUtterances) print(" ", "the average number of words per utterance was :", totalWordsSpoken / totalUtterances) print(" ")
A problem that I ran into was trying to make the elements in the list strings