06 | May | 2014 | Technical Tools

Exercise 2

May 6, 2014 at 11:05pm7 May 2014 by Elliott

Exercise 2

(a) Quantifying the data in the Fisher corpus

How many utterances and words do we have in the Fisher corpus on Carmen?

files in Fisher:

wcs in Fisher:
065

058

total lines/”utterances” in Fisher (math in Python):

total words in Fisher (math in Python):

29734 utterances
362020 words

in class
forgot to subtract the time-stamps and the markers for the words — perhaps the best way to do this would be to multiply the number of total number lines by three, and then subtract those non-words (three in each line) from the total number of words.

(b) Do people laugh?

What is the percentage of utterances containing laughter, totaling all the files of the “058” directory?

utterances containing laughter in Fisher (directions from the pdf):

from Cygwin:

the number of lines 1681 is the number of “utterances” with laughter in all of Fisher

utterances in only 058:

utterances with laughter in only 058:

using different commands returned the same number of lines, but different numbers of bytes. Is it the “-R” that changed it?

10556 utterances in 058
199 utterances in 058 containing [laughter]

percentage of utterance containing [laughter] in 058 (math in Python):

approx 1.9% of the utterances in 058 contain laughter.

Python intro

May 6, 2014 at 11:06am6 May 2014 by Elliott

class notes [pdf]

python tutorial

notes
need to use escape characters for some elements
if you are using single quotes for something other than delimiting the string, it needs to be marked as unique — not a command

>>> 'I\'m in graduate school'

length = takes an argument and returns the length of the string

>>> len("hello")
5

len = length
int = integer
str = string (by convention, we put a string between quotes – single or double)

type = will return one of these data types

>>> type("hello")
<class 'str'>

the dot ” . ” connects the string with

>>> "I'm in graduate school".split()
["I'm", 'in', 'graduate', 'school']
>>> len(["I'm", 'in', 'graduate', 'school'])
4
>>> len("I'm in graduate school".split())
4

in lists, the elements are ordered, Python starts with [0]
The number in brackets is called the index, and points to a specific element within the list

>>> "I'm in graduate school".split()[3]
'school'
>>>

here’s an exciting oddity that I accidentally discovered:

>>> "I'm in graduate school".split()[2][3]
'd'

the second index returns the third character within the second element split from the string

Ctrl+C will escape you from Python back to cmd (or Terminal, as the case may be)

Unix counting

May 6, 2014 at 9:52am6 May 2014 by Elliott

class notes [pdf]

more v less — there is less difference between them on the mac, but you can still see the output in Terminal after you quit if you use more, but the output disappears if you use less

-lh = lists the files with the additional metadata (time created, etc.)

wc shows:

lines  words  bytes  file_name.txt

And from the manual:

| “pipe” – from the pdf :

We use the vertical bar “|” to connect two commands together so that the output from one command becomes the input of the next command. Two (or more) commands connected in this way form what’s called a pipe.

regular expression

xkcd
regular expression games

grep = going to count the number of lines that some element we are looking for appears in
useful for looking for a word in a file
to get out, type Ctrl+C or Ctrl+D

grep returns the number of lines, and when you conjoin it with “wc” ( via | ), it counts the number of words in each line that contains the expression that you were searching for, NOT the number of instances of the expression itself.

grep –color=auto”[laughter]” will return each character that matches those in brackets. (the brackets have a particular function in regular)
if you type “grep” in with nothing else, you get a list of available commands.

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Ohio State nav bar

Day: 6 May 2014

Exercise 2

Python intro

Unix counting