In this lesson we will learn about Data science
Whatever problem you are solving you must have a knowledge of exactly what you are solving.
Data science involves : computer science, Domain science and statistics
• Computer science involves hacking skills.
• Statistics involves Math and statistics knowledge
• Domain science involves substantive expertise.
So, Data scientist is a person who is a Programmer, computer scientist, mathematician, story teller and a domain expert.
What kind of problems can you solve?
• Predict whether a patient, hospitalized due to a heart attack, will have a second heart attack. The prediction is to be based on demographic, diet and clinical measurement for that patient.
• Predict the price of a stock in 6 months from now. on the basis of company performance measures and economic data.
• Identify the risk factors for a prostate cancer, based on clinical and demographic variables.
Data science project life cycle:
• What is the business problem?
• What kind of data that we require?
• Is that data already available or should i collect the data?
• What is the best way to collect the data?
• Prepare the data for exploration
• Build hypothesis to discover the insights
• Evaluate the hypothesis
• Develop visualizations to present results.
• Basic principles of Data science
• Use many data sources
• Design smart ways to collect the data
• Prioritize the data that is important for you
• Use mathematics to understand the data
• Help yourself with data visualization tools
• Are you able to communicate to your younger brother what you are doing?
Communicate in simple words, involves business managers like things if they are simple and easy to understand.
• It involves design methods to collect data.
• Obtain already available data, which involves:
1 Internal data: customer’s data and purchase data
2 External data: Face book data-APIs, Google analytics data and Google analytics APIs
• Oracle warehouse builder
• IBM Infosphere Information Server
• PowerCenter Informatica
Data collection is the process of gathering and measuring information on targeted variables in an established systematic fashion, which then enables one to answer relevant questions and evaluate outcomes. The data collection component of research is common to all fields of study including physical and social sciences, humanities and business. While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same. The goal for all data collection is to capture quality evidence that then translates to rich data analysis and allows the building of a convincing and credible answer to questions that have been posed.
Generally there are three types of data collection ways:
• Surveys: Standardized paper-and -pencil or phone questionnaires that ask predetermined
• Interviews: Structured or unstructured one-on-one directed conversations with key individuals or leaders in a community.
• Focus groups: Structured interviews with small groups of like individuals using standardized questions, follow-up questions, and exploration of other topics that arise to better understand participants.