I am noticing a lot of hype around data science and analytics. Not that it is new, probably been going on for the past 4 to 6 years. So I looked into what it really is, beyond the buzzwords. What I don't get is that most of it is what has already being done, just in a different fashion. What seems to be new is that it is being simplified so that everyone can do it.
Analytics is mostly taking data, organizing it, and creating graphs or dashboards. DBAs and web developers have been doing that for a long time. IT server dashboards, web traffic, visitors, application performance, etc, etc.The only difference is that now it is being used for non-IT related things as well. It's not limited to IT only. Mostly good, can't complain.
But my confusion is around the data science and machine learning area. These concepts have been known for a long time, they are just being reintroduced to a different audience. Things like classification, linear/logisticic regression, decision trees, random forest, neural networks, etc. 10-15 years ago you had to know either SAS or some other specialized statistical software to do these. Then there was the issue of having a powerful enough computer to handle the calculations. Now, there is R, and Python has packages that can do a lot of these statistical thingies.
Realistically, to use these concepts and put them into practice, one has to know some serious stats or maths. Beyond the usual Hello World equivalents such as the credit card defaults data, and flower/wheat type data sets, real data is neither clean, nor distributed normally. Then there is the issue with interpreting the results, and then fine tuning them. Sure, lots of tutorials exist, but would any non-Stats or math person be able to take those and make something serious.
I have sat through the YouTube videos and GitHub tutorials. Basic regression and classification is easy to do. But, segmented regression, time series analysis, k-fold validation and all that, things get complicated.
I guess no real question, just observations from what I have noticed.
Analytics is mostly taking data, organizing it, and creating graphs or dashboards. DBAs and web developers have been doing that for a long time. IT server dashboards, web traffic, visitors, application performance, etc, etc.The only difference is that now it is being used for non-IT related things as well. It's not limited to IT only. Mostly good, can't complain.
But my confusion is around the data science and machine learning area. These concepts have been known for a long time, they are just being reintroduced to a different audience. Things like classification, linear/logisticic regression, decision trees, random forest, neural networks, etc. 10-15 years ago you had to know either SAS or some other specialized statistical software to do these. Then there was the issue of having a powerful enough computer to handle the calculations. Now, there is R, and Python has packages that can do a lot of these statistical thingies.
Realistically, to use these concepts and put them into practice, one has to know some serious stats or maths. Beyond the usual Hello World equivalents such as the credit card defaults data, and flower/wheat type data sets, real data is neither clean, nor distributed normally. Then there is the issue with interpreting the results, and then fine tuning them. Sure, lots of tutorials exist, but would any non-Stats or math person be able to take those and make something serious.
I have sat through the YouTube videos and GitHub tutorials. Basic regression and classification is easy to do. But, segmented regression, time series analysis, k-fold validation and all that, things get complicated.
I guess no real question, just observations from what I have noticed.