Machine Learning Times
Machine Learning Times
A University Curriculum Supplement to Teach a Business Framework for ML Deployment
    In 2023, as a visiting analytics professor...
The AI Playbook: Providing Important Reminders to Data Professionals
 Originally published in DATAVERSITY. This article reviews the new...
Decode the Algorithm: Navigate the World of Machine Learning in Business with ‘The AI ​​Playbook’
  This article reviews the new book, The AI Playbook, by...
To Deploy Machine Learning, You Must Manage Operational Change—Here Is How UPS Got It Right
 Originally published in Harvard Data Science Review. For more...

2 months ago
Fashion Repeats Itself: Generating Tabular Data Via Diffusion and XGBoost

Originally published by Alexia Jolicoeur-Martineau, Sept 19, 2023.

Since AlexNet showed the world the power of deep learning, the field of AI has rapidly switched to almost exclusively focus on deep learning. Some of the main justifications are that 1) neural networks are Universal Function Approximation (UFA, not UFO 🛸), 2) deep learning generally works the best, and 3) it is highly scalable through SGD and GPUs. However, when you look a bit further down from the surface, you see that 1) simple methods such as Decision Trees are also UFAs, 2) fancy tree-based methods such as Gradient-Boosted Trees (GBTs) actually work better than deep learning on tabular data, and 3) tabular data tend to be small, but GBTs can optionally be trained with GPUs and iterated over small data chunks for scalability to large datasets. At least for the tabular data case, deep learning is not all you need.

In this joint collaboration with Kilian Fatras and Tal Kachman at the Samsung SAIT AI Lab, we show that you can combine the magic of diffusion (and their deterministic sibling conditional-flow-matching (CFM) methods) with XGBoost, a popular GBT method, to get state-of-the-art tabular data generation and diverse data imputations.  To make it accessible to everyone (not just AI researchers but also statisticians, econometricians, physicists, data scientists, etc.), we made the code available through a Python library (on PyPI) and an R package (on CRAN). See our Github for more information. [Note: The R code will be released soon.]

To continue reading this article, click here.

11 thoughts on “Fashion Repeats Itself: Generating Tabular Data Via Diffusion and XGBoost

  1. In the context of fashion trends, leveraging advanced machine learning techniques such as diffusion models and XGBoost can offer groundbreaking insights. For instance, when analyzing the popularity of a pharmaceutical product like Ozempic in South Africa, these models can predict shifts in consumer interest or behavior by generating tabular data that captures patterns over time. By applying these techniques, stakeholders can identify potential cycles in fashion or product usage, thus understanding how historical trends might influence future demands. This approach not only enhances predictive accuracy but also provides a strategic edge in market analysis and planning.


Leave a Reply