Data science
Data Science arises as a new area that aims to materialize processes and practices to explore, analyze and generate models that enable the description and prediction from a wide range of data types. Ultimately, these processes and practices will support better performance and efficacy of the organizations and quality of life of the citizens.
Data Science models and transforms data to subsidize the decision process through computational thinking, towards data-driven decision making.
Data Scientist
Professional of the decade
Profile:
- Analytical ability
- Investigative capacity
- Entrepreneurship
- Business understanding
- Programming skills
Data Science in Practice
Data management: several general or specialized platforms for all kinds of data
Data mining: several implementations of each technique
User expertise: does the data scientist need to program?
NO! (S)he needs just to think algorithmically.
Lemonade in the context of data science
Enablers:
- Wide availability of algorithm implementations
- Broad spectrum of databases and storage technologies
- Massively parallel processing commercial solutions
- Mature virtualization technology
- Real time transpiling technology is a reality
- Awareness of the data potential
Motivations
- Data scientists do not need to program, literally
- Data scientists need to abstract algorithmically tasks
- Cloud-fashion web-based platforms provide good interactive support
- Visual programming is a need
Data mining
Machine learning
Data science 101
Techniques, algorithms and models
How to choose between the different available techniques?
Is my data set ready for what I want to do?
How to formulate the correct question about data?
Predict and evaluate an answer
Standing over the shoulders of giants (or Ctrl+C, Ctrl+V)
Copy workflows
Use external Tutorials
Repositories of machine learning experiments
Resources
Kaggle
Cortana Intelligence Gallery
"Cortana Intelligence Gallery enables our growing community of developers and data scientists to share their analytics solutions".
Graph analysis
https://blog.cloudera.com/blog/2016/10/how-to-do-scalable-graph-analytics-with-apache-spark/
Regression
https://hortonworks.com/tutorial/predicting-airline-delays-using-sparkr/
Sentiment analysis
https://hortonworks.com/tutorial/sentiment-analysis-with-apache-spark/
← About Installation →