Dictionaries and keys

finishing up Alice and word count (exercise 7):
Screen Shot 2014-05-14 at 10.20.05 AM

from the python tutorial: tutorial

all of these values will set up a dictionary

a = dict(one=1, two=2, three=3)
b = {'one': 1, 'two': 2, 'three': 3}
c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
d = dict([('two', 2), ('one', 1), ('three', 3)])
e = dict({'three': 3, 'one': 1, 'two': 2})
a == b == c == d == e

creating our dict:
Screen Shot 2014-05-14 at 11.19.52 AM

adding to it:
Screen Shot 2014-05-14 at 11.21.18 AM

from the tutorial:
Screen Shot 2014-05-14 at 11.49.37 AM

A and B loops

import os

countX = 0
countA = 0
countB = 0

dirX = "Fisher"
listA = os.listdir(dirX)
countX += 1

for itemB in listA:
    pathC = dirX + "/" + itemB
    listD = os.listdir(pathC)
    countA += 1
    
    for itemE in listD:
        pathF = pathC + "/" + itemE
        itemG = open(pathF)
        countB += 1

        for fileH in itemG:
            outputI = len(fileH)


print()
print(dirX)
print(itemB)
print(itemE)
print(itemG)
print(outputI)

print()
print(" lines/items in --")
print("   --       Fisher =", countX)
print("   --    Fisher/.. =", countA)
print("   -- Fisher/../.. =", countB)
print()

Python IDs



[pdf]

module.fuction("argument")

here,
module = os
funtion = listdir
argument = path

so…

import os
os.listdir("Fisher")   
# This is a relative path (relative to where I am)\
  which director I start in

Python scripting



open interactive Python

>>> fisherFile = open("Fisher/065/fe_03_06500.txt")

Creates a file object

“tail” in Unix prints ~ the last ten lines of a flie
(use as such):

tail "Fisher/065/fe_03_06500.txt"

this code to count the number of words by gender:

# ----- initialization of tracking variables ----- #
totalWordsSpoken = 0
totalUtterances = 0

wordsW = 0
wordsM = 0
wordsN = 0

utterW = 0
utterM = 0
utterN = 0

# ----- opening file ----- #

fF = open("../Downloads/Fisher/065/fe_03_06500.txt")


# -----	for loop ----- #
for sentence in fF:
	
    # -----	processing block ----- #
    words = sentence.split()
    onlywords = words[3:]
    genderLetter = words[2][2]
    speakerID = words[2][0]
    numberwords = len(onlywords)
    onlysen = " ".join(onlywords)
	
    # -----	count ----- #
    totalWordsSpoken += numberwords
    totalUtterances += 1

    # -----	gender output ----- #
    if genderLetter == 'f':
        gender = "woman"
        wordsW += numberwords
        utterW += 1
    elif genderLetter == 'm':
        gender = "man"
        wordsM += numberwords
        utterM += 1
    else:
        gender = "non-gendered person"
        wordsN = numberwords
        utterN += 1

    # -----	output per sentence----- #
    print()
    print("  ", "sentence number", totalUtterances, ":", onlysen)
    print("  ", "number of words:", numberwords)
    print("  ", "words are:", onlywords)
    print("  ", "speaker ID:", speakerID)
    print("  ", "speaker is a:", gender)

# -----	final totals output ----- #

print()
print("  ", "total number of words:", totalWordsSpoken)
print("  ", "total number of utterances:", totalUtterances)
print("  ", "the average number of words per utterance was :",
      totalWordsSpoken / totalUtterances)
print()
print("  ", "total words spoken by women:", wordsW)
print("  ", "total number of utterances:", utterW)
print("  ", "the average number of words per utterance was :",
      wordsW / utterW)
print()
print("  ", "total words spoken by men:", wordsM)
print("  ", "total number of utterances:", utterM)
print("  ", "the average number of words per utterance was :",
      wordsM / utterM)
print()
print("  ", "total words unaccounted for by gender:", wordsN)
print("  ", "total number of utterances unaccounted for:", utterN)
print()

returns this result:
Screen Shot 2014-05-08 at 11.54.15 AM

how do we check that our output is correct? Count them manually?

Python sentences

class notes [pdf]

Python (if installed properly) always tells the terminal where it is located, so, relative to Python, the program doesn’t need to be in the same directory, or even referenced in relation to it.

Screen Shot 2014-05-07 at 10.32.06 AM

*/ — set up Cygwin on the Dellosaurus to do the same thing, if possible? — /*

First program (in Xcode on mac):
Screen Shot 2014-05-07 at 11.09.15 AM

returns this output in Python 3.3.4 in Terminal:
Screen Shot 2014-05-07 at 11.09.29 AM

because the program is processed in a linear order, we can change the value of “sentence” at any point after where it was initially set
Screen Shot 2014-05-07 at 12.57.14 PM

but if you don’t redefine the other variables, it returns:
Screen Shot 2014-05-07 at 12.37.47 PM

redefining the variables for the second sentence:
Screen Shot 2014-05-07 at 12.53.57 PM

you get the correct, expected output, and the correct number count:
Screen Shot 2014-05-07 at 12.41.31 PM

I wonder if there is a more concise way to do this, or maybe I’m doing it incorrectly?

the for loop
unlike other programming languages, Python uses indentation to define the scope of the loop (not {} like some, etc.)

to start counting, we have to set the total to 0 so that we can mark it as an integer, and have a starting point

the loop runs in the interactive mode in Python:

# -- for loop
for number in [1, 2, 3]:
    number *= 2
    print(number)

# -- processing 1
number = 1
number *= 2
print(number)

# -- processing 2
number = 2
number *= 2
print(number)

# -- processing 3
number = 3
number *= 2
print(number)

in class programs:
sentence.py

#this is a note not seen by Python

# -- first test
# print("   ")
# print("hello world!")

# -- doing it the long way (sentence 1)
# print("   ")
# print("Here are all the words:", "B-f: I'm in graduate school".split())
# print("This sentence has", len("B-f: I'm in graduate school".split()), "words")

# print("   ")
# print("If we remove the identifier at the beginning, then...")
# print("   ")
# print("Here are all the words:", "B-f: I'm in graduate school".split()[1:])
# print("This sentence has", len("B-f: I'm in graduate school".split()[1:]), "words")

# print("   ")
# print("   ", "assigning variables to the things that we've just input ")
# print("   ", "it's a good idea to give a sensible name to your variable")

# -- adding variables (sentence 1)

print("   ")
sentence = "B-f: I'm in graduate school"
words = sentence.split()
onlywords = words[1:]
gender = words[0][2]
speakerID = words[0][0]
numberwords = len(onlywords)

# print("all word-like things:", words)
# print("   ")
# print("the words", onlywords, "were spoken by a", gender, "person")

print("  ", "for the sentence,", sentence)
print("  ")
print("  ", "number of words:", numberwords)
print("  ", "words are:", onlywords)
print("  ", "speaker ID:", speakerID)
print("  ", "speaker is:", gender)

# print("   ")
# print("  * --", "figure out how to print something based on the input", "-- *")
# print("  * --", "like, return 'Female' for the 'gender' variable 'f'", "-- *")
# print("   ", "")


# -- looking at a second sentence (sentence 2): changing the value of "sentence", and re-running everything
# -- taking the whole thing out of the print (after the fact)

print("   ")
sentence = "B-f: at OSU"
words = sentence.split() # only need to redefine the "words" variable
# -- don't need to redefine the other variables

print("  ", "for the sentence,", sentence)
print("  ")
print("  ", "number of words:", numberwords)
print("  ", "words are:", onlywords)
print("  ", "speaker ID:", speakerID)
print("  ", "speaker is:", gender)

print("  ")

sentence2.py



# -- initialization of tracking variables
totalWordsSpoken = 0
totalUtterances = 0

# -- adding variables (sentence 1)

print("   ")
sentence = "B-f: I'm in graduate school"
words = sentence.split()
onlywords = words[1:]
gender = words[0][2]
speakerID = words[0][0]
numberwords = len(onlywords)

totalWordsSpoken += len(onlywords)
totalUtterances += 1

print("  ", "sentence 1")
print("  ", "number of words:", numberwords)
print("  ", "words are:", onlywords)
print("  ", "speaker ID:", speakerID)
print("  ", "speaker is:", gender)

# -- sentence 2

print("   ")
sentence = "B-f: at OSU"
words = sentence.split() # define these variables again
onlywords = words[1:]
gender = words[0][2]
speakerID = words[0][0]
numberwords = len(onlywords)

totalWordsSpoken += len(onlywords)
totalUtterances += 1

print("  ", "sentence 2")
print("  ", "number of words:", numberwords)
print("  ", "words are:", onlywords)
print("  ", "speaker ID:", speakerID)
print("  ", "speaker is:", gender)

# -- final output


print("  ")
print("  ", "total number of words:", totalWordsSpoken)
print("  ", "total number of utterances:", totalUtterances)
print("  ", "the average number of words per utterance was :",
            totalWordsSpoken / totalUtterances)

print("  ")

Python intro


class notes [pdf]

python tutorial

notes
need to use escape characters for some elements
if you are using single quotes for something other than delimiting the string, it needs to be marked as unique — not a command

>>> 'I\'m in graduate school'

length = takes an argument and returns the length of the string

>>> len("hello")
5

len = length
int = integer
str = string (by convention, we put a string between quotes – single or double)

type = will return one of these data types

>>> type("hello")
<class 'str'>

the dot ” . ” connects the string with

>>> "I'm in graduate school".split()
["I'm", 'in', 'graduate', 'school']
>>> len(["I'm", 'in', 'graduate', 'school'])
4
>>> len("I'm in graduate school".split())
4

in lists, the elements are ordered, Python starts with [0]
The number in brackets is called the index, and points to a specific element within the list

>>> "I'm in graduate school".split()[3]
'school'
>>> 

here’s an exciting oddity that I accidentally discovered:

>>> "I'm in graduate school".split()[2][3]
'd'

the second index returns the third character within the second element split from the string

Ctrl+C will escape you from Python back to cmd (or Terminal, as the case may be)

Unix counting


class notes [pdf]

more v less — there is less difference between them on the mac, but you can still see the output in Terminal after you quit if you use more, but the output disappears if you use less

-lh = lists the files with the additional metadata (time created, etc.)

wc shows:

lines  words  bytes  file_name.txt

Screen Shot 2014-05-06 at 9.59.05 AM

And from the manual:
Terminal_man_wc

| “pipe” – from the pdf :

We use the vertical bar “|” to connect two commands together so that the output from one command becomes the input of the next command. Two (or more) commands connected in this way form what’s called a pipe.

regular expression

xkcd
regular expression games

grep = going to count the number of lines that some element we are looking for appears in
useful for looking for a word in a file
to get out, type Ctrl+C or Ctrl+D

grep returns the number of lines, and when you conjoin it with “wc” ( via | ), it counts the number of words in each line that contains the expression that you were searching for, NOT the number of instances of the expression itself.

grep –color=auto”[laughter]” will return each character that matches those in brackets. (the brackets have a particular function in regular)
if you type “grep” in with nothing else, you get a list of available commands.

Unix navigation

Unix commands (in Terminal on mac)

class notes

$ man ls — manual about the operator “ls”

to exit, once in the manual — type “q”

most of the applications that can run in terminal won’t respond to the mouse (i.e. scrolling), but will respond to the keyboard

cd = change directory (like cmd prompt in windows)

pwd = shows which directory you are in (the address/location) — “pathname working directory”

directories in a computer looks just like a syntax tree — hierarchy

Downloads
       Fisher
           058
              fe_03_05851.txt
              fe_03_05852.txt
              etc.
           065

.. = move up one level in the hierarchy

can combine these elements for efficiency: cd ../065 (if you’re in 058, for example)

cd without an argument will take you back to the home directory. (all the way back up the hierarchy)

more = open text files (look into) text files
less = same command as “more” — ‘less is more’ but it allows backward as well as forward movement in the file

wc = word, line, character, and byte count