Elliott | Technical Tools

HW 2

May 16, 2014 at 1:19pm17 May 2014 by Elliott

over the weekend coding music… well, I guess this is debugging music, really.

some more coding music:

Joe Acheson (Hidden Orchestra) – Archipelago Mixtape by Tru Thoughts on Mixcloud

Sorting and Zipf frequency

May 15, 2014 at 12:11pm15 May 2014 by Elliott

proper link to the python 3.3 tutorial

A bit of music to code by:

DJ DARKHORSE : MIXTAPE N° 188 by The Voice Of Cassandre on Mixcloud

some screenshots I’d like to save:

Dictionaries and keys

May 14, 2014 at 11:13am14 May 2014 by Elliott

finishing up Alice and word count (exercise 7):

from the python tutorial: tutorial

all of these values will set up a dictionary

a = dict(one=1, two=2, three=3)
b = {'one': 1, 'two': 2, 'three': 3}
c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
d = dict([('two', 2), ('one', 1), ('three', 3)])
e = dict({'three': 3, 'one': 1, 'two': 2})
a == b == c == d == e

creating our dict:

adding to it:

from the tutorial:

Homework 1

May 9, 2014 at 11:57pm10 May 2014 by Elliott

Homework 1

tasks in the homework:

We want to address the question of whether women talk more than men. To answer this, we will use the Fisher corpus on Carmen.
Write a python script that outputs —

the raw total number of words spoken by women

the raw total number of words spoken by men

the total number of utterances spoken by women

the total number of utterance spoken by men

the average number of words per utterance spoken by women and by men

the number of female speakers

the number of male speakers

the previous assignment plus opening the Fisher files with nested ‘for’ loops gives:

which is the output of this loop (so far) written in class:

# ------ importing all directories ----- #
import os

# ----- initialization of tracking variables ----- #
totalWordsSpoken = 0
totalUtterances = 0

wordsW = 0
wordsM = 0
wordsN = 0

utterW = 0
utterM = 0
utterN = 0

# ----- listing top-level directory ----- #
dir = "Fisher"
dirA = os.listdir(dir)

# ----- listing subdirectories ----- #
for dirB in dirA:
    dirC = dir + "/" + dirB
    fileA = os.listdir(dirC)
    
    # ----- opening files ----- #
    for fileB in fileA :
        path = dirC + "/" + fileB
        fileC = open(path)

        # -----	for loop ----- #
        for sentence in fileC:
            
            # -----	processing block ----- #
            words = sentence.split()
            onlywords = words[3:]
            genderLetter = words[2][2]
            speakerID = words[2][0]
            numberwords = len(onlywords)
            onlysen = " ".join(onlywords)
            
            # -----	count ----- #
            totalWordsSpoken += numberwords
            totalUtterances += 1
            
            # -----	gender output ----- #
            if genderLetter == 'f':
                gender = "woman"
                wordsW += numberwords
                utterW += 1
            elif genderLetter == 'm':
                gender = "man"
                wordsM += numberwords
                utterM += 1
            else:
                gender = "non-gendered person"
                wordsN = numberwords
                utterN += 1

            # -----	output per sentence----- #
            print()
            print("  ", "sentence number", totalUtterances, ":", onlysen)
            print("  ", "number of words:", numberwords)
            print("  ", "words are:", onlywords)
            print("  ", "speaker ID:", speakerID)
            print("  ", "speaker is a:", gender)

# -----	final totals output ----- #
print()
print("  ", "total number of words:", totalWordsSpoken)
print("  ", "total number of utterances:", totalUtterances)
print("  ", "the average number of words per utterance was :",
      totalWordsSpoken / totalUtterances)
print()
print("  ", "total words spoken by women:", wordsW)
print("  ", "total number of utterances:", utterW)
print("  ", "the average number of words per utterance was :",
      wordsW / utterW)
print()
print("  ", "total words spoken by men:", wordsM)
print("  ", "total number of utterances:", utterM)
print("  ", "the average number of words per utterance was :",
      wordsM / utterM)
print()
print("  ", "total words unaccounted for by gender:", wordsN)
print("  ", "total number of utterances unaccounted for:", utterN)
print()

need to add some stuff in to count the number of speakers of each gender.

having a bit of trouble with the if/else statements here and the nesting. I’m going to build a code for only one file, extend it to two, and then run it on the entire corpus once I can differentiate between the ‘A-f:’ in one file, and the ‘A-f:’ in the next file in the list. I’m also assuming that there are never any repeated speakers across files.
Here are the counts for fe_03_06500.txt to compare the code against:
(there is 1 female speaker, and 1 male speaker.

here is my speaker count abstraction (I’m sure there’s an easier way to do this):

# --- babyHW1.py --- #

# --- instantiate --- #
a = 13 	#w
b = 0	#w
c = 2	#m
d = 9	#m
e = 0	#n

w = 0
m = 0
n = 0

# --- speaker count --- #
if a > 0:
	w += 1
else:
	w = 0

if b > 0:
	w += 1
else:
	w = 0

if c > 0:
	m += 1
else:
	m = 0

if d > 0:
	m += 1
else:
	m = 0

if e > 0:
	n += 1
else:
	n = 0

# --- output --- #
print("w:", w)
print("m:", m)
print("n:", n)

A and B loops

May 9, 2014 at 2:09pm9 May 2014 by Elliott

import os

countX = 0
countA = 0
countB = 0

dirX = "Fisher"
listA = os.listdir(dirX)
countX += 1

for itemB in listA:
    pathC = dirX + "/" + itemB
    listD = os.listdir(pathC)
    countA += 1
    
    for itemE in listD:
        pathF = pathC + "/" + itemE
        itemG = open(pathF)
        countB += 1

        for fileH in itemG:
            outputI = len(fileH)


print()
print(dirX)
print(itemB)
print(itemE)
print(itemG)
print(outputI)

print()
print(" lines/items in --")
print("   --       Fisher =", countX)
print("   --    Fisher/.. =", countA)
print("   -- Fisher/../.. =", countB)
print()

Python IDs

May 9, 2014 at 10:03am9 May 2014 by Elliott

[pdf]

module.fuction("argument")

here,
module = os
funtion = listdir
argument = path

so…

import os
os.listdir("Fisher")   
# This is a relative path (relative to where I am)\
  which director I start in

Exercise 4

May 8, 2014 at 11:55pm9 May 2014 by Elliott

Exercise 4

In class, we have seen how to read from a file. Here is what the code looks like so far:

#------------initialization of tracking variables-------------------
totalWordsSpoken = 0
totalUtterances = 0

#------------open the file-------------------------

fisherFile = open("Fisher/065/fe_03_06500.txt")

#-----------processing block---------------------

for line in fisherFile:
        #list of the items in the line
        words = line.split()

        print("Here are all the words", words)

        #extracting speaker ID
        speaker = words[2]

        #actual words uttered by the speaker
        actualWords = words[3:]

        print("the sentence is spoken by", speaker)
        print("their actual utterance was", actualWords)
        print("the sentence has", len(actualWords), "words")

        totalWordsSpoken += len(actualWords)
        totalUtterances += 1

#---------done with all the sentences; post-analysis----------

print("the total number of words spoken was", totalWordsSpoken)
print("the total number of utterances was", totalUtterances)
print("the average number of words per utterance was",
      totalWordsSpoken / totalUtterances)

Now we want to keep track of the gender information too. We want the total of words and the total of utterances uttered by women as well as the total of words and the total of utterances uttered by men. Look at the notes and adapt your code to use a “if statement” to do so. Make sure your code runs ;-) The notes give the results.

wrote this code in class (see Python scripting)

# ----- initialization of tracking variables ----- #
totalWordsSpoken = 0
totalUtterances = 0

wordsW = 0
wordsM = 0
wordsN = 0

utterW = 0
utterM = 0
utterN = 0

# ----- opening file ----- #

fF = open("../Downloads/Fisher/065/fe_03_06500.txt")


# -----	for loop ----- #
for sentence in fF:
	
    # -----	processing block ----- #
    words = sentence.split()
    onlywords = words[3:]
    genderLetter = words[2][2]
    speakerID = words[2][0]
    numberwords = len(onlywords)
    onlysen = " ".join(onlywords)
	
    # -----	count ----- #
    totalWordsSpoken += numberwords
    totalUtterances += 1

    # -----	gender output ----- #
    if genderLetter == 'f':
        gender = "woman"
        wordsW += numberwords
        utterW += 1
    elif genderLetter == 'm':
        gender = "man"
        wordsM += numberwords
        utterM += 1
    else:
        gender = "non-gendered person"
        wordsN = numberwords
        utterN += 1

    # -----	output per sentence----- #
    print()
    print("  ", "sentence number", totalUtterances, ":", onlysen)
    print("  ", "number of words:", numberwords)
    print("  ", "words are:", onlywords)
    print("  ", "speaker ID:", speakerID)
    print("  ", "speaker is a:", gender)

# -----	final totals output ----- #

print()
print("  ", "total number of words:", totalWordsSpoken)
print("  ", "total number of utterances:", totalUtterances)
print("  ", "the average number of words per utterance was :",
      totalWordsSpoken / totalUtterances)
print()
print("  ", "total words spoken by women:", wordsW)
print("  ", "total number of utterances:", utterW)
print("  ", "the average number of words per utterance was :",
      wordsW / utterW)
print()
print("  ", "total words spoken by men:", wordsM)
print("  ", "total number of utterances:", utterM)
print("  ", "the average number of words per utterance was :",
      wordsM / utterM)
print()
print("  ", "total words unaccounted for by gender:", wordsN)
print("  ", "total number of utterances unaccounted for:", utterN)
print()

which returns this output on the macs at school (haven’t tried it at home yet)

Python scripting

May 8, 2014 at 10:25am8 May 2014 by Elliott

open interactive Python

>>> fisherFile = open("Fisher/065/fe_03_06500.txt")

Creates a file object

“tail” in Unix prints ~ the last ten lines of a flie
(use as such):

tail "Fisher/065/fe_03_06500.txt"

this code to count the number of words by gender:

# ----- initialization of tracking variables ----- #
totalWordsSpoken = 0
totalUtterances = 0

wordsW = 0
wordsM = 0
wordsN = 0

utterW = 0
utterM = 0
utterN = 0

# ----- opening file ----- #

fF = open("../Downloads/Fisher/065/fe_03_06500.txt")


# -----	for loop ----- #
for sentence in fF:
	
    # -----	processing block ----- #
    words = sentence.split()
    onlywords = words[3:]
    genderLetter = words[2][2]
    speakerID = words[2][0]
    numberwords = len(onlywords)
    onlysen = " ".join(onlywords)
	
    # -----	count ----- #
    totalWordsSpoken += numberwords
    totalUtterances += 1

    # -----	gender output ----- #
    if genderLetter == 'f':
        gender = "woman"
        wordsW += numberwords
        utterW += 1
    elif genderLetter == 'm':
        gender = "man"
        wordsM += numberwords
        utterM += 1
    else:
        gender = "non-gendered person"
        wordsN = numberwords
        utterN += 1

    # -----	output per sentence----- #
    print()
    print("  ", "sentence number", totalUtterances, ":", onlysen)
    print("  ", "number of words:", numberwords)
    print("  ", "words are:", onlywords)
    print("  ", "speaker ID:", speakerID)
    print("  ", "speaker is a:", gender)

# -----	final totals output ----- #

print()
print("  ", "total number of words:", totalWordsSpoken)
print("  ", "total number of utterances:", totalUtterances)
print("  ", "the average number of words per utterance was :",
      totalWordsSpoken / totalUtterances)
print()
print("  ", "total words spoken by women:", wordsW)
print("  ", "total number of utterances:", utterW)
print("  ", "the average number of words per utterance was :",
      wordsW / utterW)
print()
print("  ", "total words spoken by men:", wordsM)
print("  ", "total number of utterances:", utterM)
print("  ", "the average number of words per utterance was :",
      wordsM / utterM)
print()
print("  ", "total words unaccounted for by gender:", wordsN)
print("  ", "total number of utterances unaccounted for:", utterN)
print()

returns this result:

how do we check that our output is correct? Count them manually?

Exercise 3

May 7, 2014 at 7:21pm8 May 2014 by Elliott

exercise 3

In class, we have started writing a script that processes two hard-coded sentences, and keep a running total of the number of words actually uttered by the speakers. We did this in a dumb way: copy-pasting the processing block. Now simplify the script using a “for loop”. The idea should be to replace the two copies of the processing block with a single copy inside a loop. To do this, you’ll need to create a list containing the two sentences, write a “for loop”” operating over this list, and put the processing block inside that loop. You can of course have more than 2 sentences in the list ;-)

Here is what the code looks like so far (also posted on the website):

# initialization
# keep track of total of words
totalWords = 0


# processing of sentence 1
sentence = "B-f: I'm in graduate school"

# get the words of the sentence
words = sentence.split()

print("Words of sentence 1:", words)

# extract speaker ID
speaker = words[0]

# extract words uttered (everyting except first element in the list)
actualWords = words[1:]

# number of words uttered
numberAWords = len(actualWords)

# increment the total
totalWords = totalWords + numberAWords

print("speaker is: ", speaker)
print("words are", actualWords)
print("number of words:", numberAWords)
print("total so far:", totalWords)

# processing of sentence 2
sentence = "A-f: at OSU?"

# get the words of the sentence
words = sentence.split()

print("Words of sentence 2:", words)

# extract speaker ID
speaker = words[0]

# extract words uttered (everyting except first element in the list)
actualWords = words[1:]

# number of words uttered
numberAWords = len(actualWords)

# increment the total
totalWords = totalWords + numberAWords

print("speaker is: ", speaker)
print("words are", actualWords)
print("number of words:", numberAWords)
print("total so far:", totalWords)

# post-analysis
print("total words uttered:", totalWords)

my code:


# -- initialization of tracking variables
totalWordsSpoken = 0
totalUtterances = 0

# -----	variables in list ----- #

s1 = "B-f: I'm in graduate school"
s2 = "A-m: at OSU?"
s3 = "B-f: um, yeah. Here at OSU"
s4 = "A-m: cool, me too."
s5 = "C-?: me want cookies!! nom nom nom"

senList = [s1, s2, s3, s4, s5]

# -----	for loop ----- #
for sentence in senList:
	
	# -----	processing block ----- #
	words = sentence.split()
	onlywords = words[1:]
	genderLetter = words[0][2]
	speakerID = words[0][0]
	numberwords = len(onlywords)
	
	# -----	count ----- #
	totalWordsSpoken += len(onlywords)
	totalUtterances += 1
	
	# -----	gender output ----- #
	if genderLetter == 'f':
		gender = "woman"
	elif genderLetter == 'm':
		gender = "man"
	else:
		gender = "non-gendered person"

	# -----	output ----- #
	print("  ")
	print("  ", "sentence number", totalUtterances, ":", sentence[5:])
	print("  ", "number of words:", numberwords)
	print("  ", "words are:", onlywords)
	print("  ", "speaker ID:", speakerID)
	print("  ", "speaker is a:", gender)
		
# -----	final output : counts outside the loop
#		how does it know to exit? no indentation?

print("  ")
print("  ", "total number of words:", totalWordsSpoken)
print("  ", "total number of utterances:", totalUtterances)
print("  ", "the average number of words per utterance was :",
            totalWordsSpoken / totalUtterances)

print("  ")

returns these results:

A problem that I ran into was trying to make the elements in the list strings

Ohio State nav bar

Author: Elliott

HW 2

Sorting and Zipf frequency

Dictionaries and keys

Zipf’s Law

Homework 1

A and B loops

Python IDs

Exercise 4

Python scripting

Exercise 3