Learning to Program

I saw Scott Klein‘s excellent talk at the also-excellent Big Data Future conference this week, and started reading his ProPublica posts. One of them had a provocative title: How to Start Learning to Program. The post, unfortunately, doesn’t deliver on the title’s promise; in fact, it could lead to non-programmers getting their feet wet and becoming discouraged because they needed to do production-quality work rather than getting into programming the best/most fun way, doing goofy stuff for its own sake, with no productive goal. (None of this is an ad hominem slight on Mr. Klein, who’s doing terrific stuff and is by all accounts a fantastic guy.)

I thought, You know what’d be really handy? A comparison of how the canonical first-program is written in a bunch of different languages. Partly to illustrate the regularities and divergence across language for a basic task, and to show how, in some languages, it can be hard to do even a simple thing.

Then I found this: http://en.wikipedia.org/wiki/List_of_Hello_world_program_examples.

OK, so ‘Hello World’ is a canonical example, but it’s not informative about the stuff you really want a program to do, like iterate. So, here’s Nodewave’s comparison of the 500-iteration loop printing “I will not throw paper airplanes in class.” Of course, most languages have multiple ways to achieve the same end, and this listing is not exhaustive (but at the time of this writing, there are three Perl entries…).

Penultimately, and contrary to the advice to leaf through books, take a look at the tutorials at http://www.w3schools.com/. You don’t need to install anything to get started, the tutorials don’t have the unnecessary, yet apparently obligatory, prose that fills lots of programming books.

Finally, the realization that there are better and worse ways to write code

— that what works may not be what’s best, that what’s best may simply be unnecessary, and that the easiest way to do something in the short run can cost you later

— can come late, and it’s often followed by the feeling why didn’t someone tell the importance of doing things the right way? Yet, even in the hyper-formalized world of coding there are things that can only be known via oblique means and subtle vehicles:

XKCD's

Box Sync for Mac: Setting a custom location

tOSU has partnered with the Box cloud storage service. It’s got its virtues, to be sure. Ease of customization is not one of them (sync reliability prior to v4.0.something was sketchy, too, but let’s stay focused). In fact, it feels like Box had to go to real lengths to make it so difficult to do something basic like, oh, choose where your synced files live. I can see the appeal for enterprise management, but those aims could have been achieved without making things so obtuse for end-users.

Box Sync for Mac wants to put your files in your home folder. In fact, when you install it and log in with your tOSU credentials, it automatically creates that location (if non-existent) and starts syncing there.

What if you…

  • Don’t have enough space in that location?
  • Have an SSD and would like to minimize extraneous writes?
  • Roaming profile?
  • Roaming + offline files?

The list goes on. Lots of situations, to be sure, where this default behavior is pathological.

Continue reading Box Sync for Mac: Setting a custom location

Clean up inconsistent text with Levenshtein distance

Researchers who deal with text data, particularly categorical text (opposed to free prose) have long recognized the need to clean up data entry coding and other inconsistencies. For example, I’ve seen many files with a field for ‘organization’ where the organization’s name is, alternately, spelled out, abbreviated, misspelled, or otherwise heterogeneous. For example:

  • National Association for the Advancement of Colored People
  • National Association for the Advancement of Colored Persons
  • The National Association for the Advancement of Colored People
  • NAACP
  • N.A.A.C.P.
  • N A A C P

… and so on.  In an analysis, we generally want to refer to these as the same organization, but that involves lots of cleaning up by hand.

Continue reading Clean up inconsistent text with Levenshtein distance