One of the more recent topics gaining traction in Big Data Analytics is the notion of machine learning. Many people think that this is a recent development or phenomenon occurring as a result of newer Big Data technologies. But think about the phrase “machine learning”. Essentially, the computer “learns” based on what the user or programmer has instructed the computer to do. Let’s start off with a simple example. RFM(recency of purchase,frequency of purchase, and monetary amount per purchase)are commonly used to create indexes where the above actual behaviors of the individual are compared against the average behaviors of the group or population thereby creating indexes for each behavior. Each of these behaviors(recency,frequency, and monetary value) can then be combined to create one overall index. Ranks can then be assigned based on this overall index. As new information occurs regarding updated customer behavior, the computer ends up recreating the indexes and ends up assigning a different rank for the customer. For example in one period, the customer is assigned a rank of 2 where in the most recent period, the customer is assigned a rank of 10 with higher ranks implying a higher RFM index. The computer in effect has learned that the current behavior warrants a different assignment of rank. Obviously this is machine learning at its most basic or simplest level. Let’s take a look at a more advanced technique such as stepwise multiple regression. Here the computer learning is more advanced in that the mathematical rules or algorithm instructs the computer when to include or drop variables based on a certain statistical threshold. The machine or computer outputs the best equation based on the threshold instructions from the user. Now let’s take up this notion of machine learning up a notch.
The use of neural net modelling has always been considered a type of artificial intelligence. Some of its more practical applications have been in the area of facial recognition. The mathematics being employed in neural net modelling is about trying to find non linear patterns or relationships as opposed to stepwise multiple or logistic regression where the explained relationship is assumed to be linear or logistic. In attempting to find these non linear patterns, the mathematics becomes quite iterative in the sense that an estimate is built which is then compared against an observed behavior. The algorithm then gets adjusted based on the results of this comparison of the “observed” output versus the “estimate” output in essence providing feedback to the algorithm. Hence this notion of the feedback loop is a common phrase and term used in most discussions of neural net modelling and which ultimately represents the “learning” of the computer.
The use of decision-trees in building models incorporates this notion of machine learning as the algorithms or logic again relies on an iterative process. As we saw with neural nets, feedback occurs between an estimate output and observed output with the end objective being to narrow the gap between the estimate and observed. The only difference is the type of mathematics being employed which will differ as not all decision-tree tools employ the same mathematics such as for example CHAID vs. CART.
As predictive analytics solutions have advanced, many different types of models have been applied in a variety of business scenarios. With all these different models being employed, one advancement in predictive analytics has been to create a technique that combines them all together(ensemble modelling) in order to produce a better overall solution. Why not combine a CHAID,logistic regression, and neural net into one overall solution? Here, the computer is learning to optimize and combine tools after initially learning how to build the separate tools themselves.
In all these examples, the computer is learning based on the programmer’s mathematical instructions. In many cases, the learning is advanced as the instructions to the computer employ complex algorithms. Yet even in RFM cases, the machine still learns as it can reassign individuals to different segments at different points in time based on its initial instructions from the human programmer.
The essence of machine learning is about simplification of human work effort. Rather than the human being manually doing the work, the machine performs the task. However, much of the recent research and development in this area goes beyond just mathematics and into the realm of advanced computer science. For instance, can programmers begin to code the way a human being processes information in a given situation? This is the ultimate dilemma between machines and humans. Let’s think about this further?
Obviously, the win of IBM’s “Big Blue” over the top human being chess player certainly accentuated the fact that maybe we are moving into a brave new world of computers over humans. Yet, chess is a game where permutations and probabilities can be estimated for each move. The computer programmer in this case is coding instructions for all possible outcomes. There is no “judgment” by the computer as the machine is bound or limited by its instructions from the programmer. The human being, however, is not bound by these constraints or limitations. The human being might use so-called “intuition” to override a certain move given the situation and environment surrounding the game and more importantly the type of opponent he or she is confronting. In chess, the advantage of this so-called “intuition” factor may be limited as the game itself is very structured and does lend itself to outcomes that are optimized through machine learning.
As our learning in science and mathematics continues to develop, we can always instruct or “codify” our learning into the machine. But the “real” challenge for the machine and distinct advantage of the human is the ability to “apply” this learning. Through his or her experiences, the human being formulates judgments about the right course of action to be taken in a given situation. At any one time, there are millions of interactions going on within the human brain that enable the human to formulate a given action. Each of these interactions will be unique to the specific human being. Interactions will be composed of facts and intuition where intuition relative to facts will vary with each action depending on the type of situation. Good examples of this are doctors or lawyers who may prescribe or advise real different actions in exactly the same client case. The computer, of course, might have codified all the facts of the case with instructions on how to process historical “similar “ situations. The issue here is the so-called “accuracy” of instructions being employed to the computer in processing these historical outcomes. The nimbleness of the human brain in looking at exploring many different “similar” outcomes and most importantly assigning different weights to a given outcome is a huge advantage for the human. The nimbleness of the human brain through its complex structure of neurons and interactions for now is something that simply cannot be programmed. But as computers science advances, the discipline will look to develop solutions that increase this so-called nimbleness at the machine level. Certainly, as a data science practitioner or user of tools in machine learning, our distinct and current advantage is the nimbleness of our brains and our ability to apply solutions within a given situation. Do I believe that the computer can be developed to the point where it can assist practitioners in this area. Yes, to a point, but the nimbleness of the human brain is something I don’t believe can ever be entirely programmed into the machine. Computers and machine learning will always assist humans but never replace them. In fact, as our world becomes more complex, there will be an increased need for automation and so-called “quicker” learning provided by machines. But this will just increase the need for practitioners to apply this learning and to ultimately rely on the “nimbleness” of their thinking in developing solutions in an ever-increasing complex world.
Richard Boire, B.Sc. (McGill), MBA (Concordia), is the founding partner at the Boire Filler Group, a nationally recognized expert in the database and data analytical industry and is among the top experts in this field in Canada, with unique expertise and background experience.
Mr. Boire’s mathematical and technical expertise is complimented by experience working at and with clients who work in the B2C and B2B environments. He previously worked at and with Clients such as: Reader’s Digest, American Express, Loyalty Group, and Petro-Canada among many to establish his top notch credentials.
After 12 years of progressive data mining and analytical experience, Mr. Boire established his own consulting company – Boire Direct Marketing in 1994. He writes numerous articles for industry publications, is a well-sought after speaker on data mining, and works closely with the Canadian Marketing Association on a number of areas including Education and the Database and Technology councils. He is currently the Chair of Predictive Analytics World Toronto.