The Virtue of Laziness: -append- and -replace- with big loops

When you’re running large production jobs in Stata (or whatever) that involve iterative runs (perhaps over a set of files, or subsets of data, or files representing subsets of data…), it’s useful to produce files containing summary stats, parameter estimates, metadata, etc.

You can do this with the -log- and -cmdlog- commands, and I’m very fond of Stata’s -file’ commands.

You run into problems, however, when deciding whether to ‘append’ or ‘replace’ your logs and output files. You could write one set of commands for your first iteration with the ‘replace’ option, and a separate set for the 2nd…nth iterations, but that’s painful — and unnecessary.

Instead, just define a local macro whose value will either be ‘append’ or ‘replace,’ depending on whether the loop is in its first iteration or not, respectively:

Continue reading The Virtue of Laziness: -append- and -replace- with big loops

Using semaphores to make Stata wait on a -winexec- result

You can run non-Stata commands from within Stata. cf and its Windows-specific cousin .

Problem is, Stata doesn’t wait to see how your command turned out — it just keeps going.

Here’s where semaphores come in handy. I’ll spare background, because does a fine job. Applied to waiting, and to Stata, the idea is as follows:

  • Create a file somewhere. This will serve as your semaphore
  • Launches a non-Stata command via -shell-.
  • -while- the file still exists, -pause- for a few seconds
  • The non-Stata command somehow deletes the file you created — eg you’ve executed a batch file whose last command is to delete it.
  • Once the file is gone, Stata exits the loop (moving on)

Here we go:

Continue reading Using semaphores to make Stata wait on a -winexec- result

Make use of UN TFR data in Stata

This post is deprecated: this post does a far better job.

This isn’t a difficult task, but I do it rarely enough that it’s not solidly lodged in memory.

I want to merge some UN TFR data with what I’m already working on. I can get Excel files from the UN World Population Prospects pages, but these aren’t exactly pret-a-porter for Stata. For one thing, they’ve got a bunch of rows I don’t care about at the top. I’m also working with time period as my unit of analysis, not country as the UN data are arranged.

Small matter! Let’s…

  • download the TFR file in Excel format,
  • open it in Stata,
  • toss the rows at the top,
  • treat the data with variable labels as such,
  • rename the period variables,
  • save the country-level file in Stata (for good measure),
  • reshape into a period-level file.

Continue reading Make use of UN TFR data in Stata

Parking Ratios and Planning Dilemmas

I’m on the University Area Commission‘s Zoning Committee. We’re reviewing the City of Columbus’ Draft University District Plan. The Plan itself is neither policy nor law; it’s a set of recommendations reflecting, ostensibly, aspirations for the District. Yet, because the Plan will inform and doubtless be used as justification for policy and practice in the foreseeable future, there’s a lot at stake in getting it “right.” The City’s Planning Division has put who-knows-how-many hours into developing it, holding public forums, collecting and analyzing community input, and putting up with tough questions and sometimes outright ignorance and confusion from constituents (plenty from me).

Parking is a central and constant concern and struggle. Continue reading Parking Ratios and Planning Dilemmas

Scrape personnel data from ASC QuickSites

My unit needs to periodically review the ‘people’ pages on our web site. We could do this page-by-page, but with 70-odd pages this is a drag and it limits the our ability to coordinate these checks.

So, I wrote a little Python using lxml to scrape all of our people and produce a tabular file that multiple staff can pop into Excel and edit. The QuickSites don’t provide for mass updates (that’s OK! they do lots of good stuff), but this gets us just a little more efficiency.

import lxml
from lxml.html import parse
import lxml.html as lh
import urllib
rowData = [['Name', 'position', 'education', 'lastnamenum', 'statement']]
outputfile = '/Volumes/WYRK/iprscrape/affils.txt','w'
f = open('/Volumes/WYRK/iprscrape/affils.txt','w')
doc = parse('http://ipr.osu.edu/directory').getroot()
for link in doc.cssselect('a'):
	if link.get('href') and link.text_content() and '/people' in link.get('href'):
		name=None
		role=None
		position=None
		education=None
		personurl = 'http://ipr.osu.edu' + link.get('href')
		result = urllib.urlopen(personurl)
		html = result.read()
		lastnamenum = personurl.replace("http://ipr.osu.edu/people/","",1).strip()
		tree = lh.fromstring(html)
		name = tree.xpath('//*[@id="title"]/text()')
		role = tree.xpath('//*[@id="bio_block"]/div[1]/div/div/text()')
		for div in tree.cssselect('div.field.field-type-text.field-field-ascpeople-position'):
			position = div.text_content().strip()
		bio = tree.xpath ('//*[@id="bio_block"]/ul/li/text()')
		education = tree.xpath('//*[@id="leftcontent"]/div/div[2]/ul/li/text()')
		f.write('{0} \t {1} \t {2} \t {3} \t {4} \t {5}\n' .format(lastnamenum, name, role, position, education, bio))
f.close()

Learning to Program

I saw Scott Klein‘s excellent talk at the also-excellent Big Data Future conference this week, and started reading his ProPublica posts. One of them had a provocative title: How to Start Learning to Program. The post, unfortunately, doesn’t deliver on the title’s promise; in fact, it could lead to non-programmers getting their feet wet and becoming discouraged because they needed to do production-quality work rather than getting into programming the best/most fun way, doing goofy stuff for its own sake, with no productive goal. (None of this is an ad hominem slight on Mr. Klein, who’s doing terrific stuff and is by all accounts a fantastic guy.)

I thought, You know what’d be really handy? A comparison of how the canonical first-program is written in a bunch of different languages. Partly to illustrate the regularities and divergence across language for a basic task, and to show how, in some languages, it can be hard to do even a simple thing.

Then I found this: http://en.wikipedia.org/wiki/List_of_Hello_world_program_examples.

OK, so ‘Hello World’ is a canonical example, but it’s not informative about the stuff you really want a program to do, like iterate. So, here’s Nodewave’s comparison of the 500-iteration loop printing “I will not throw paper airplanes in class.” Of course, most languages have multiple ways to achieve the same end, and this listing is not exhaustive (but at the time of this writing, there are three Perl entries…).

Penultimately, and contrary to the advice to leaf through books, take a look at the tutorials at http://www.w3schools.com/. You don’t need to install anything to get started, the tutorials don’t have the unnecessary, yet apparently obligatory, prose that fills lots of programming books.

Finally, the realization that there are better and worse ways to write code

— that what works may not be what’s best, that what’s best may simply be unnecessary, and that the easiest way to do something in the short run can cost you later

— can come late, and it’s often followed by the feeling why didn’t someone tell the importance of doing things the right way? Yet, even in the hyper-formalized world of coding there are things that can only be known via oblique means and subtle vehicles:

XKCD's

Box Sync for Mac: Setting a custom location

tOSU has partnered with the Box cloud storage service. It’s got its virtues, to be sure. Ease of customization is not one of them (sync reliability prior to v4.0.something was sketchy, too, but let’s stay focused). In fact, it feels like Box had to go to real lengths to make it so difficult to do something basic like, oh, choose where your synced files live. I can see the appeal for enterprise management, but those aims could have been achieved without making things so obtuse for end-users.

Box Sync for Mac wants to put your files in your home folder. In fact, when you install it and log in with your tOSU credentials, it automatically creates that location (if non-existent) and starts syncing there.

What if you…

  • Don’t have enough space in that location?
  • Have an SSD and would like to minimize extraneous writes?
  • Roaming profile?
  • Roaming + offline files?

The list goes on. Lots of situations, to be sure, where this default behavior is pathological.

Continue reading Box Sync for Mac: Setting a custom location

Portrait of Ohio State University

The portrait is up. http://trustees.osu.edu/universityportrait/ . To be sure, it’s better thought of as a gallery of highlights than as something to be read cover-to-cover, except for the true enthusiast.

Incidentally, the Presidential Profile (largely crafted by the Presidential Search Advisory Subcommittee) is here: http://trustees.osu.edu/assets/files/Presidential-Profile.pdf