Sometimes people ask what is the difference between what statisticians and machine learning researchers do. The best answer that I have found so far can be found in

"Statistical Modeling: The Two Cultures" by Leo Breiman (Statistical Science, 16:199-231, 2001).

According to this, statisticians like to start by making modeling assumptions about how the data is generated (e.g., the response is a noise added to the linear combination of the predictor variables), while in machine learning people use algorithm models and treat the data mechanism as unknown. He estimates that (back in 2001) less than 2% of statisticians work in the realm when the data mechanism is considered as unknown.

It seems that there are two problem with the data model approach.

One is that the this approach does not address the ultimate question which is making good predictions: if the data does not fit the model, this approach has nothing to offer (it does not make sense to apply a statistical test if the as…

"Statistical Modeling: The Two Cultures" by Leo Breiman (Statistical Science, 16:199-231, 2001).

According to this, statisticians like to start by making modeling assumptions about how the data is generated (e.g., the response is a noise added to the linear combination of the predictor variables), while in machine learning people use algorithm models and treat the data mechanism as unknown. He estimates that (back in 2001) less than 2% of statisticians work in the realm when the data mechanism is considered as unknown.

It seems that there are two problem with the data model approach.

One is that the this approach does not address the ultimate question which is making good predictions: if the data does not fit the model, this approach has nothing to offer (it does not make sense to apply a statistical test if the as…