5 years ago
Are Random Variables a Fact of Life in Predictive Models?

 In some of the more recent literature, discussion has ensued about the use of pure random or noise variables that end up as key variables in predictive models. In our big data environment with millions of records and thousands of variables, intuitively one might think that random or spurious variables might be a normal outcome in many models. As a data miner, I am always intrigued by fellow colleagues who arrive at certain findings based on their research and work. When considering the validity of these comments, I remember my own experiences in building hundreds of models over

