Digging into Random Forests

More than a year ago, I worked through a few Fast.ai courses:

It was a fantastic experience: an introduction to the nuts and bolts of applying machine intensive analysis to real problems in a way that one could see results almost immediately; a code-based approach (it helped that I already knew lots of Python) that got into the nitty-gritty of solving tough problems; an emphasis on working habits that led to an agile approach; and an amazingly active and collaborative community of students.  Somehow, whatever Jeremy Howard says always makes sense and is steeped in hard-won expertise.  It was a perfect fit for me and I learned a ton.

But honestly, I have not applied the Fast.ai material to my own projects much at all.  I think the simple reason is that there is an inertia I have to overcome when I start a Fast.ai type analysis: securing enough GPU computing power, making sure all the modules are up-to-date and aligned to the project at hand, relearning the processes that Jeremy outlined and adapting them to the project.  While I am sure that many Fast.ai students are plugged in enough to make those obstacles relatively trivial, my available time and resources just aren’t a good match to make all that work.

But I’ve realized that there should be a lot I can do towards implementing a machine learning approach without needing the heavy infrastructure, especially since the problems I’m likely to tackle aren’t necessarily the massive ones targeted in the courses.  Jeremy’s approach covered tools students need to tackle some of the biggest problems – but that brought lots of complexity.  I just need to pare it down to the basics –  Fast.ai on the cheap – and create a learning environment for myself in which I can make some progress.

The projects I have in mind right now would be best served by an exploration of Random Forest methodologies.  That’s where I will start…