Posts

Day 1

Wow! What a day! I am so pumped for this semester!!

The morning started off with Introduction to Database Systems, a class with coursework in database systems use, logical design, entity-relationship model, normalization, query languages and SQL, relational algebra and calculus, object relational databases, XML, active databases; database design project. The professor is a network security researcher for Battelle (cool!) and seems interested in making sure that the content we learn in the class is applicable to the real world. For instance, he’s going to prevent us from learning to code in such a way that leaves things open to SQL injection exploits. He challenged the classed to, “think about [our lives] in terms of data.” He presented us with a thought exercise: consider the thought, “The man was afraid to go home because of the man in the mask.” We may ask, “Who is the man in the mask?” One student conjured an image of Bain from Batman – another, an image of a thief. The figure is sort of amorphous without further description. And then, with a word, “Baseball,” everyone imagines an umpire of catcher. This is the way we see ourselves in relation to data – we know it is in the background impacting the way that we interact with the world, but are unable to understand precisely what the data is, what it “looks” like, and as such are left not knowing what the state of affairs truly is. While there is no magical word like ”baseball” to illuminate the machinations of database systems, the professor stated that his goal is to illuminate a secondary level of thinking with regard to data, and a new way of connecting the dots.

After a brief lunch break, I sat in on Lie Groups and Representation Theory for fun – a class investigating two structures, Lie groups and Lie algebras, that are useful in mathematical physics. This class is a 7000 level graduate course, and I might be a bit out of my depth, but it looks pretty neat.

Then it was onward to Survey of Artificial Intelligence I, with coursework in basic concepts and techniques in artificial Intelligence, including problem solving, knowledge representation, and machine learning. The professor for the course is very animated and encouraged a lot of discussion during the class, spending the entire period investigating the question, “What is intelligence?” Is an amoeba intelligent? What about a gorillabird, or dog? What precisely does it mean to be intelligent? It is an ill defined concept, and many people will provide different definitions. An elementary stab off the top of my head might be something like the following: Intelligence is a broad variety of spectra describing the ability of a collection of organized matter to analyze and respond to it’s environment with regard to given criteria. We closed by acknowledging two schools of thought in artificial intelligence: that of an engineering approach and that of the connectionists. Engineers seek to provide algorithmic solutions to problems while the connectionists seek to mimic neural circuitry with the implementation of neural networks. Given my previous coursework in Quantitative Neuroscience, I find the connectionist approach to be a bit more enticing, but I am eager to see how both approaches work and to encounter the insights that may be gleaned about real world situations form each.

The last class of the afternoon was Intermediate Data Analysis II, a class that will focus heavily on regression techniques. Regression was touched upon in the end of my Stat 4202 class, but it is this class that will provide me with the deep understanding of regression required for a career in data science and analytics. Regression is a core technique that, once thoroughly understood, can be extended to much more complicated settings. We will consider continuous response in this class, though categorical response is possible. I learned that applying a log transformation spreads out small values and compresses large value and is a useful technique in dealing with outliers. We closed by working through some elementary R code, and recalling the assumptions of linear regression:

  1. The assumption that the mean of Y given X is a linear function of X.
  2. The assumption of a common standard deviation of Y given X, common to all of X.
  3. The independence of response for each value of x in X. That is, given x, y is determined independently.

The last class of the day was quite the finale, Introduction to Data Mining, a class with coursework in knowledge discovery, data mining, data preprocessing, data transformations; clustering, classification, frequent pattern mining, anomaly detection, graph and network analysis; applications. . The class was packed. Not a single seat was left in the house. The professor really knows his stuff and is a researcher for IBM’s Watson! We began asking what data mining is, and determined that it lies somewhere in between Machine Learning and Statistical Analaysis. It is a results driven process that utilizes end to end Knowledge Discovery in Databases (KDD).

slide_6

We explored the various arenas in which data mining is useful, including bioinformatics, customer relationship management (CRM), fraud detection, targeted marketing, and recommendation systems. We briefly described a predictive method (classification) and two descriptive methods (clustering and association rules) that we will be investigating during the class. We closed be reviewing some elementary statistical and linear algebra concepts. I am by far the most excited for this class and can’t wait to put on my proverbial helmet and hit the mines!

I’m in for a real treat this semester.

A Defining Moment

While a degree in theoretical mathematics demonstrates a passion for problem solving, its curricula does not require the acquisition of skills necessary for entry level data analytics work. I intend to acquire these skills this semester and to document my progress here. I am pursuing coursework in Database Systems, Data Analysis, Artificial Intelligence, Data Visualization, Data Mining, and Machine Learning. To say that I am excited for what lies ahead in the coming weeks and months is an understatement.