In some of the more recent literature, discussion has ensued about the use of pure random or noise variables that end up as key variables in predictive models. In our big data environment with millions of records and thousands of variables, intuitively one might think that random or spurious variables might be a normal outcome in many models. As a data miner, I am always intrigued by fellow colleagues who arrive at certain findings based on their research and work. When considering the validity of these comments, I remember my own experiences in building hundreds of models over the years
Already receive the Machine Learning Times emails? The Machine Learning Times now requires legacy email subscribers to upgrade their subscription - one time only - in order to attain a password-protected login and gain complete access.