Coursera Notes
Module One
Wednesday, June 18, 2025
1. Course Introduction
1.1. Basic introduction to the course and the topics which it will discuss
2. Modern Data Ecosystem
2.1. Enterprise data environment
2.1.1. Data integrated from disparate sources
2.1.2. Different types of analysis and skills to generate insights
2.1.3. Active stakeholders to collaborate and act on insights generated
2.1.4. Tools, applications and infrastructure to store, process and disseminate data as
required
2.2. Data sources
2.2.1. Structured and unstructured
2.2.1.1. Texts, images, videos, user clicks, social media platforms, real time
events, legacy databases, agencies etc…
2.2.2. First step : pull a copy of the data from the data repository
2.2.2.1. Challenges: reliability, security and integrity
2.2.3. Second step : once the raw data is collected, it needs to get organized, cleaned up,
and optimized for access by end users. It also needs to conform to guidelines that
regulate storage and use of data.
2.2.3.1. Ex. Health, biometric or household
2.2.3.2. Ex. Adhering to master data tables within the organization to ensure
standardization of master data across all applications and systems of an
organization
2.2.3.3. Challenges: data management and working with data repositories that
provide high availability, flexibility, accessibility, and security
2.3. Users
2.3.1. Third step : it reaches business stakeholders, applications, programmers, analysts,
and data science
2.3.1.1. Challenges : interfaces, APIs, and applications that can get this data to
the end users inline with their specific needs
2.4. Emerging technologies shaping the modern data ecosystem
2.4.1. Cloud computing, machine learning and big data
2.4.2. Benefits include: access to limitless storage, high-performance computing, open
source technologies, machine learning technologies, and the latest tools and
libraries
2.4.3. These new technologies allow for new methods of data analysis to arise due to
the increasing amount of data and new tools available to analyse them
3. Key Players in a Data Ecosystem
3.1. Data Engineer
3.1.1. People who develop and maintain data architectures and make data available for
business operations and analysis
3.1.2. They work within the data ecosystem to extract, integrate, and organize data from
disparate sources
3.1.3. They clean, transform and prepare data design, store and manage data in data
repositories
3.1.4. They enabled data to be accessible in formats and systems that the various
business applications as well as stakeholders like data analysts and data scientists
can utilize
3.1.5. Key qualities : good knowledge of programming, sound knowledge of systems
and technology architectures, and in depth understanding of relational databases
and non-relational data stores
3.2. Data Analyst
3.2.1. People who translate data and numbers into plain language, so organizations can
make decisions
3.2.2. They inspect and clean data for deriving insights, identify correlations, find
patterns, and apply statistical methods
3.2.3. They analyze and mine data and visualize data to interpret and present the
findings of data analysis
3.2.4. They answer questions to help improve how the data is understood and
interpreted
3.2.5. Key qualities : good knowledge of spreadsheets, writing queries, and using
statistical tools to create charts and dashboards. some programming skills, strong
analytical and storytelling skills
3.3. Data Scientists
3.3.1. People who analyze data for actionable insights and build machine learning or
deep learning models that train on past data to create predictive models
3.3.2. They are people that answer questions about predictions and the numerical
aspects
3.3.3. Key qualities : knowledge of mathematics, statistics, and a fair understanding of
programming languages, databases, and building data models and domain
knowledge
3.4. Business Analysts/Business Intelligence Analysts
3.4.1. People who leverage the work of data analysts and data scientists to look at
possible implications for their business and the actions they need to take or
recommend. (BI analysts do the same but focus more on external influences)
3.4.2. They provide business solutions by organizing and monitoring data on different
business functions and exploring that data to extract insights and actionables that
improve business performance.
4. Defining Data Analysis
4.1. Process of gathering, cleaning, analyzing and mining data, interpreting results, and
reporting the findings
4.2. To find patterns within data and correlations between different data points
4.3. It helps businesses understand their performance and make informed choices in the future
4.3.1. It helps them validate their COA before committing to it
4.3.2. It helps them save time, money and resources
4.4. Descriptive analytics
4.4.1. Summarizes past data and presents the findings to stakeholders
4.4.2. Provides essential insights into past events
4.5. Diagnostic analytics
4.5.1. Takes the insights from descriptive analytics to dig deeper to find the cause of the
outcome
4.6. Predictive analytics
4.6.1. Uses past insights to observe trends to predict future outcomes
4.6.2. Can be used for risk assessment and sales forecasts
4.7. Prescriptive analytics
4.7.1. Analyses past decisions and events and the likelihood of different outcomes to
make informed decisions
4.8. Key steps
4.8.1. Starts with understanding the problem: where you are and where you need to be
4.8.2. A clear metric should be set that can be measured
4.8.2.1. Ex. In a quarter or during a festival season, gathering data once you
know what you're going to measure and how you're going to measure it,
you identify the data you require, the data sources you need to pull this
data from, and the best tools for the job
4.8.3. Cleaning the data by fixing quality issues that could affect the accuracy of the
analysis
4.8.3.1. Clean missing or incomplete values
4.8.3.2. Standardize the data coming from multiple sources
4.8.4. Extract and analyze the data from different perspectives
4.8.4.1. Manipulate the data in several ways to gain as many perspectives
4.8.4.2. Understand the correlations, patterns, variations and trends
4.8.4.3. Conduct further research (possibly in an interactive loop)
4.8.5. Interpret you results and evaluate if the analysis is defendable against objections
4.8.6. Present your findings using reports, dashboards, charts, graphs, maps and case
studies
5. Viewpoints: What is Data Analytics?
5.1. Interviews from various data analysts.
6. Data Analytics vs. Data Analysis
6.1. Analysis - detailed examination of the elements or structure of something
6.1.1. Can be done without numbers or data, such as business analysis, psycho analysis,
etc.
6.1.2. Based on inferences based on historical data
6.2. Analytics - the systematic computational analysis of data or statistics
6.2.1. Implies use of data for performing numerical manipulation and inference
6.2.2. For predicting future performance
7. Responsibilities of a Data Analyst
7.1. Acquiring data from primary and secondary data sources
7.2. Creating queries to extract required data from databases and other data collection systems
7.3. Filtering, cleaning, standardizing, and reorganizing data in preparation for data analysis
7.4. Using statistical tools to interpret data sets
7.5. Using statistical techniques to identify patterns and correlations in data
7.6. Analyzing patterns in complex data sets and interpreting trends
7.7. Preparing reports and charts that effectively communicate trends and patterns
7.8. Creating appropriate documentation to define and demonstrate the steps of the data
analysis process
7.9. Skills needed to become a good data analyst that correspond to the mentioned
responsibilities
7.9.1. Expertise in using spreadsheets such as Microsoft Excel or Google Sheets,
Proficiency in statistical analysis and visualization tools and software such as
IBM Cognos, IBM SPSS, Oracle Visual Analyzer, Microsoft Power BI, SAS, and
Tableau
7.9.2. Proficiency in at least one of the programming languages such as R, Python, and
in some cases C++, Java, and MATLAB
7.9.3. Good knowledge of SQL, and ability to work with data in relational and NoSQL
databases
7.9.4. The ability to access and extract data from data repositories such as data marts,
data warehouses, data lakes, and data pipelines
7.9.5. Familiarity with Big Data processing tools such as Hadoop, Hive, and Spark.
7.10. Functional skills required
7.10.1. Proficiency in Statistics to help you analyze your data, validate your analysis, and
identify fallacies and logical errors
7.10.2. Analytical skills that help you research and interpret data, theorize, and make
forecasts
7.10.3. Problem-solving skills, because ultimately, the end-goal of all data analysis is to
solve problems
7.10.4. Probing skills that are essential for the discovery process, that is, for
understanding a problem from the perspective of varied stakeholders and
users—because the data analysis process really begins with a clear articulation of
the problem statement and desired outcome
7.10.5. Data Visualization skills that help you decide on the techniques and tools that
present your findings effectively based on your audience, type of data, context,
and end-goal of your analysis
7.10.6. Project Management skills to manage the process, people, dependencies, and
timelines of the initiative
7.11. Soft skills required
7.11.1. The ability to work collaboratively with business and cross-functional teams;
communicate effectively to report and present your findings; tell a compelling
and convincing story; and gather support and buy-in for your work
7.11.2. Being curious; during the course of your work, you will stumble upon patterns,
phenomena, and anomalies that may show you a different path.
7.11.3. The ability to allow new questions to surface and challenge your assumptions and
hypotheses makes for a great analyst.
8. Viewpoints: Qualities and Skills to be a Data Analyst
8.1. Interviews from various data analysts.
9. Generative AI: An Essential Skill for Today’s Data Analysts
9.1. Generative adversarial networks (GANs) - They consist of 2 neural networks, the
generator, that creates new data and the discriminator, which evaluates it. The generator
eventually will begin to produce more realistic data.
9.2. Variational autoencoders (VAEs) - VAEs encode input data into a compressed format
and decode it back to generate new data points similar to the input
9.3. Transformers - Used in NLP, natural language processing, where transformers generate
human-like text by predicting the next word in a sequence. Generative pre-trained
transformer 3 (GPT 3) is a notable example.
9.4. Generative AI models combine various models to represent and process content
9.4.1. Ex. to generate text, NLP techniques change raw characters into sentences, parts
of speech, entities and actions, represented as vectors using encoding techniques.
9.4.2. Ex. images are transformed into various visual elements, represented as vectors
9.4.3. To be cautioned, these techniques can encode bias, racism, deception and puffery
9.5. Use cases include:
9.5.1. Implementing chatbots for customer service & technical support
9.5.2. Deploying deepfakes to mimic individuals
9.5.3. Improve dubbing and subbing in movies and educational content
9.5.4. Writing emails, profiles, resumes, essays
9.5.5. Creating art
9.5.6. Improving product demo videos
9.5.7. Suggesting new drug compounds
9.5.8. Designing physical products
9.5.9. Optimising chip designs
9.5.10. Writing music
9.6. Benefits include:
9.6.1. Automating the manual process of writing content.
9.6.2. Reducing the effort of responding to emails.
9.6.3. Improving the response to specific technical queries.
9.6.4. Creating realistic representations of people.
9.6.5. Summarizing complex information into a coherent narrative.
9.6.6. Simplifying the process of creating content in a particular style.
9.7. Limitations include:
9.7.1. It does not always identify the source of content.
9.7.2. It can be challenging to assess the bias of original sources.
9.7.3. Realistic-sounding content makes it harder to identify inaccurate information.
9.7.4. It can be difficult to understand how to tune into new circumstances.
9.7.5. Results can gloss over bias, prejudice, and hatred.
9.8. Concerns include: It can provide inaccurate and misleading information.
9.8.1. It is more difficult to trust without knowing the source and provenance of
information.
9.8.2. It can promote new kinds of plagiarism that ignore the rights of content creators
and artists of original content.
9.8.3. It might disrupt existing business models built around search engine optimization
and advertising.
9.8.4. It makes it easier to generate fake news.
9.8.5. It makes it easier to claim that real photographic evidence of wrongdoing was
just an AI-generated fake.
9.8.6. It could impersonate people for more effective social engineering cyberattacks.
9.8.7. Given the newness of GenAI tools and their rapid adoption, enterprises should
prepare for the inevitable “trough of disillusionment” that’s part and parcel of
emerging technology by adopting sound AI engineering practices and making
responsible AI a cornerstone of their GenAI efforts, ensuring transparency,
ethical considerations, and long-term sustainability in their AI implementations.
9.9. Examples of GenAI:
9.9.1. Text generation tools include GPT, Jasper, AI-Writer, and Lex.
9.9.2. Image generation tools include Dall-E 2, Midjourney, and Stable Diffusion.
9.9.3. Music generation tools include Amper, Dadabots, and MuseNet.
9.9.4. Code generation tools include codeStarter, Codex, GitHub Copilot , and Tabnine.
9.9.5. Voice synthesis tools include Descript, Listnr, and Podcast.ai.
9.9.6. AI chip design tool companies include Synopsys, Cadence, Google, and
NVIDIA.
9.10. Data augmentation: Create synthetic data to augment existing data sets, which is
especially useful when data is scarce or imbalanced. This can improve predictive model
performance.
9.11. Anomaly Detection: Identify anomalies or outliers by understanding the distribution of
normal data. This is valuable in fraud detection, network security, and quality control.
9.12. Text and image generation: Generate realistic text and images for marketing, content
creation, and customer engagement, such as automatic product descriptions and
marketing visuals.
9.13. Simulation and forecasting: Simulate scenarios and forecast future events by generating
potential outcomes from historical data. This is crucial in financial planning, supply chain
management, and strategic decision-making.
10. A Day in the Life of a Data Analyst
10.1. Sivaram Jaladi is a data analyst at Fluentgrid, a smart grid solutions company in
Visakhapatnam, India. His client, a South Indian power utility, was facing a spike in
overbilling complaints. He investigated using complaint data, subscriber information, and
billing records. His initial hypotheses focused on usage patterns, area-wise complaint
concentration, and repeat complaints by the same subscribers. He discovered that most
complaints came from subscribers with over seven years of service. The complaints were
concentrated in specific geographic areas. Further analysis showed that the faulty meters
belonged to the same batch from a particular supplier, installed mostly in those affected
areas. He presented these findings with clear data sources and methodology to
stakeholders. The issue may evolve, prompting further analysis in the future.
11. Viewpoints : Applications of Data Analytics
11.1. Data analytics is widely applied across everyday life and industries. From analyzing
consumer behavior in commercials to tracking health metrics like blood sugar, data
analysis is everywhere. It is essential across all sectors—sales, finance, HR, and
operations—in industries like airlines, pharmaceuticals, and banking. During the
pandemic, companies increasingly relied on analytics to understand shifting customer
habits and adapt their strategies. In finance, alternative data such as sentiment analysis
from social media, satellite imagery, and geolocation data is now used to enhance
investment decisions and predict market activity. The scope and impact of analytics are
universal and continuously expanding.
Module Two