As I have stated in previous articles, the most difficult challenge in building predictive models is the creation of the analytical file. Typically, this comprises between 80%-90% of the data scientist’s time with 10%-20% comprising the actual run or runs of the different mathematical/statistical algorithms. In the creation of the analytical file, the two elements in its design are the development of the target variable and the development of the independent variables or potential predictor variables. The data challenges are a reality in creating the right analytical file. Yet, with certain models such as fraud, the level of complexity increases.
Already receive the Machine Learning Times emails? The Machine Learning Times now requires legacy email subscribers to upgrade their subscription - one time only - in order to attain a password-protected login and gain complete access.