KEMBAR78
MLT Unit 5 Notes | PDF | Machine Learning | Advertising
0% found this document useful (0 votes)
21 views14 pages

MLT Unit 5 Notes

Unit 5 discusses the applications of machine learning, particularly focusing on recommendation systems that utilize algorithms to suggest products based on user data. It covers various types of recommender systems, including content-based, collaborative filtering, and hybrid approaches, along with their benefits and challenges in online advertising. The document also addresses issues like data validity, cold start problems, and the importance of balancing user experience with revenue generation.

Uploaded by

ranandraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

MLT Unit 5 Notes

Unit 5 discusses the applications of machine learning, particularly focusing on recommendation systems that utilize algorithms to suggest products based on user data. It covers various types of recommender systems, including content-based, collaborative filtering, and hybrid approaches, along with their benefits and challenges in online advertising. The document also addresses issues like data validity, cold start problems, and the importance of balancing user experience with revenue generation.

Uploaded by

ranandraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

UNIT-5

APPLICATIONS OF MACHINE LEARNING

1. Recommendation Systems

A recommendation system is an artificial intelligence or AI algorithm, usually associated with machine learning,
that uses Big Data to suggest or recommend additional products to consumers. These can be based on various
criteria, including past purchases, search history, demographic information, and other factors.

1. Recommender Systems: Why and How?

o Recommender systems are algorithms that provide personalized suggestions for items (such as
movies, music, products, or articles) that are most relevant to each user.

o With the massive growth of available online content, users are inundated with choices. Efficient
recommender systems are crucial for web platforms to increase user satisfaction and engagement.

o Here are some examples of platforms that rely on recommender systems:

 YouTube: Recommends videos to users based on their interests, helping them discover
relevant content amidst a vast library of videos.

 Spotify: Offers personalized music recommendations from over 80 million song tracks
and podcasts.

 Amazon: Suggests relevant products from a catalog of more than 350 million items.

o These platforms leverage powerful machine learning models to generate personalized


recommendations for each user1.

2. Explicit vs. Implicit Feedback:

o In recommender systems, machine learning models predict the rating (or preference) of a user for
an item. During inference, the system recommends items with the highest predicted ratings.

o We collect user feedback to train and evaluate our models. There are two types of feedback:

 Explicit Feedback: Users explicitly rate items (e.g., giving stars or thumbs up/down).
While detailed, explicit feedback is harder to collect.

 Implicit Feedback: Assumes that user-item interactions (e.g., purchases, browsing


history, songs played) indicate preferences. This feedback is abundant but less detailed.

o Most modern recommender systems rely on implicit feedback due to its abundance and
practicality.

3. Design Pattern for Recommender Systems:

o Recommender systems follow a design pattern consisting of four stages:

 Retrieval: Gather relevant candidate items.

 Filtering: Narrow down the candidate items based on user preferences or constraints.

 Scoring: Assign scores to the filtered items (e.g., using collaborative filtering or content-
based methods).

 Ordering: Rank the items based on their scores and present recommendations to the user.
Page 1
o These stages form the backbone of various recommender systems, ensuring effective content
delivery.

4. Content-Based Models and Cold-Start Scenarios:

o Content-based recommendation systems focus on item features (e.g., genre, actors, keywords) and
recommend similar items.

o They work well in cold-start scenarios (when little user interaction data is available) by relying on
item metadata.

o However, they require knowledge of both user and item features

2. Model for Recommendation Systems.

1. Content-Based Recommendation Systems:

o Content-based systems recommend items to users based on the similarity between previously high-
rated items and the features of those items.

o Here’s how they work:

 Item Profiles: First, a profile is created for each item, representing its properties (such as
genre, actors, or keywords).

 User Preferences: The system then calculates the similarity between items based on these
properties.

 User Recommendations: When a user interacts with certain items (e.g., rates movies or
listens to songs), the system recommends similar items based on their features.

o Content-based recommendation systems are commonly used in applications like movie


recommendations, music playlists, and news articles.

2. Collaborative Filtering:

o Collaborative filtering recommends items based on similarity measures between users and/or
items.

o There are two main types:

 User-Based: Identifies users with similar preferences and recommends items liked by
those similar users.

 Item-Based: Focuses on the similarity between items and recommends items similar to
those the user has already interacted with.

o Collaborative filtering is widely used in e-commerce, social media, and entertainment platforms.

3. Hybrid Approaches:

o Many recommendation systems combine content-based and collaborative filtering techniques to


improve accuracy.

o Hybrid models leverage the strengths of both approaches, providing more personalized
recommendations.

o These systems enhance user engagement, satisfaction, and overall business growth.

Page 2
4. Machine Learning-Based Recommendation Systems:

o These systems segment customers based on user data and behavioral patterns (such as purchase
history, browsing activity, likes, and reviews).

o ML algorithms analyze this data to predict users’ interests and target them with personalized
product or content suggestions.

o Examples include personalized product recommendations on e-commerce websites and content


suggestions on streaming services.

5. Matrix Factorization:

o Matrix factorization is a fundamental model for recommendation systems.

o It falls into the category of model-based collaborative filtering.

o The idea is to learn embeddings (low-dimensional representations) for users and items.

o Close embeddings correspond to similar items or users, enabling accurate recommendations.

6. Discovering Features of Documents:

a) Text Preprocessing:

o Start by preprocessing the text in the documents. This involves tasks like:

 Tokenization: Splitting the text into individual words or tokens.

 Lowercasing: Converting all text to lowercase to ensure consistency.

 Removing Punctuation: Eliminating punctuation marks.

 Handling Special Characters: Addressing special characters or symbols.

b) Text Representation:

o Choose an appropriate text representation technique to convert the processed text into numerical
features that can be used by machine learning algorithms. Common techniques include:

 TF-IDF (Term Frequency-Inverse Document Frequency):

 Calculate the TF-IDF values for words in each document. This gives you a
numerical representation of the importance of words in the document relative to
their frequency in the entire corpus.

 Word Embeddings:

 Use pre-trained word embeddings like Word2Vec, GloVe, or FastText to convert


words into dense vector representations.

 You can average these word vectors to get a document vector or use more
advanced techniques like Doc2Vec to directly generate document embeddings.

 BERT and Transformers:

 More recently, transformer-based models like BERT can be fine-tuned on your


text data to generate context-rich embeddings for documents.

Page 3
c) Feature Engineering:

o Alongside the text-based features, consider incorporating other features that might enhance the
quality of recommendations. These could include:

 Metadata:

 If available, metadata such as author, publication date, genre, etc., can provide
valuable information.

 Topic Modeling:

 Use techniques like Latent Dirichlet Allocation (LDA) or Non-Negative Matrix


Factorization (NMF) to discover the main topics within documents.

 Use the topic proportions as features.

 Sentiment Analysis:

 Extract sentiment scores from the text to understand the emotional tone of the
documents.

 Named Entity Recognition (NER):

 Identify and use entities like names, locations, and organizations as features.

d) Feature Vector Creation:

o Combine the various features you’ve extracted into a single feature vector for each document. This
vector represents the document’s content from multiple perspectives.

e) Normalization:

o Normalize the feature vectors to ensure that the values are on a consistent scale. This is important
for distance-based similarity calculations.

f) Similarity Calculation:

o Calculate the similarity between documents using techniques like cosine similarity or Euclidean
distance.

o This measures how similar the content of two documents is based on their feature vectors.

g) Recommendation Generation:

o When a user expresses interest in a particular document, find similar documents based on the
calculated similarities.

o The most similar documents can then be recommended to the user.

h) Personalization:

o To enhance personalization, consider incorporating user feedback into the content-based


recommendations.

o Update the feature vectors based on the documents the user interacts with and tailor
recommendations accordingly.

Page 4
3. Advertising on the Web

1. What Is a Recommendation System?

o A recommendation system is an advanced technology that utilizes machine learning and data
analysis to provide personalized suggestions to users.

o It collects and analyzes user behavior, preferences, and historical interactions to predict and
present items, services, or content aligned with individual interests.

2. Types of Recommendation Systems:

o There are several types of recommender systems:

 Content-Based Filtering: Recommends items similar to those a user has previously


interacted with based on item features (e.g., genre, keywords).

 Collaborative Filtering: Assumes that users with similar preferences will like similar items.
It uses user-item interactions to make recommendations.

 Hybrid Methods: Combine content-based and collaborative filtering techniques for


improved accuracy.

 Deep Learning-Based: Utilize neural networks to learn complex patterns and generate
recommendations.

3. Benefits of Recommendation Systems in Advertising:

o Personalized User Experience (UX): By showing relevant ads, users are more likely to engage,
leading to a better overall experience.

o Increased Revenue: Targeted ads result in higher click-through rates (CTR) and conversions.

o Enhanced User Engagement: Users appreciate relevant content and are more likely to interact
with ads.

o Improved Ad Campaign Efficiency: Advertisers can allocate resources effectively by reaching the
right audience.

4. Industries Utilizing Recommendation Systems:

o E-Commerce: Recommending products based on user preferences and browsing history.

o Media and Entertainment: Suggesting movies, shows, or music.

o Travel: Recommending destinations, hotels, or flights.

Page 5
o Gaming: Personalized game recommendations.

o Advertising: Delivering targeted ads to users.

5. Real-World Examples:

o Pinterest: The Ads Intelligence team at Pinterest builds recommendation systems to provide
advertisers with effective targeting and personalized ad placements.

4. Issues in Online Advertising

1. Limited Resources:

o Online advertising platforms often face resource constraints, such as limited server capacity,
bandwidth, or computational power.

o Efficiently managing these resources while delivering personalized recommendations is a


challenge.

2. Data Validity Period:

o The validity of user behavior data (clicks, views, purchases) is time-sensitive.

o Recommendations based on outdated data may not accurately reflect users’ current preferences.

o Balancing real-time updates with historical data relevance is crucial.

3. Cold Start Problem:

o Recommender systems struggle when dealing with new users or items (the “cold start” scenario).

o Without sufficient historical data, it’s challenging to make accurate recommendations.

o Strategies like content-based recommendations or hybrid approaches can mitigate this issue.

4. Long Tail Problem:

o The majority of items in an inventory have low popularity (the “long tail”).

o Focusing only on popular items neglects niche products or content.

o Recommendation systems need to address both popular and long-tail items.

5. User Engagement and Ad Fatigue:

o Users often become bored with advertising content, especially video ads.

o Capturing changes in user interests over a brief period is challenging.

6. Scalability:

o As online platforms grow, handling large user bases and massive amounts of data becomes
complex.

o Scalable algorithms and distributed computing are essential for efficient recommendations.

7. Privacy and Ethical Concerns:

o Balancing personalized recommendations with user privacy is critical.

Page 6
o Avoiding intrusive tracking and respecting user consent are ongoing challenges.

8. Ad Blockers and Ad Blindness:

o Users increasingly employ ad blockers to avoid intrusive ads.

o Ad blindness occurs when users ignore or mentally filter out ads due to their ubiquity.

9. Dynamic User Preferences:

o User preferences change over time due to various factors (seasonal trends, life events, etc.).

o Recommendation systems must adapt to these shifts.

10. Quality vs. Revenue Trade-off:

o Balancing revenue generation (through ads) with user experience (relevant recommendations) is
delicate.

o Overloading users with ads can lead to dissatisfaction.

5. Online and offline algorithms

1. Offline Algorithms:

o Definition: Offline algorithms are designed to make recommendations based on historical data,
typically collected in batch mode.

o Usage:

 These algorithms analyze past interactions (such as clicks, purchases, or views) to learn
patterns and generate recommendations.

 They are suitable for scenarios where real-time updates are not critical.

o Advantages:

 Can handle large datasets efficiently.

 Suitable for batch processing and model training.

 Often used for initial model development and evaluation.

o Challenges:

 May suffer from concept drift (changes in user behavior over time).

 Not ideal for rapidly changing environments.

 Lack real-time responsiveness.

2. Online Algorithms:

o Definition: Online algorithms adapt to real-time data and make recommendations as new
interactions occur.

o Usage:

 These algorithms continuously update their models based on recent user behavior.

Page 7
 Used in scenarios where freshness and responsiveness are critical (e.g., news feeds, real-
time bidding).

o Advantages:

 React promptly to changes in user preferences.

 Handle dynamic environments.

 Improve recommendation quality by incorporating up-to-date data.

o Challenges:

 Need efficient online learning techniques.

 Must manage computational resources effectively.

 Balancing exploration (trying new recommendations) and exploitation (recommending


known items).

3. Transfer Learning for Online Recommendations:

o Method:

 Transfer learning allows offline recommendation models to be adapted for online use.

 The model trained offline is fine-tuned using online data.

o Advantages:

 Improves recommendation quality by leveraging pre-trained models.

 Addresses concept drift by adapting to real-time data.

 Scales well for large user bases.

o Considerations:

 Regular retraining is necessary to maintain model performance.

 Efficient online learning techniques are crucial.

4. Reinforcement Learning (RL) for Online Advertising:

o Context:

 RL has gained interest for online advertising in recommendation platforms (e.g., e-


commerce and news feed sites).

 It allows advertisers to dynamically adjust bids, creatives, and targeting strategies.

o Benefits:

 Optimizes ad delivery based on user interactions and feedback.

 Balances exploration (trying new ads) and exploitation (showing effective ads).

 Adapts to changing user behavior and market dynamics.

Page 8
6. The matching Problem

1. Matching Problem in Advertising:

o In online advertising, the matching problem refers to the process of connecting advertisers (who
want to display their ads) with relevant users (who might be interested in those ads).

o The goal is to serve ads to users who are likely to engage with them, leading to clicks, conversions,
or other desired actions.

o Key components of the matching problem include:

 User Profiling: Understanding user interests, demographics, and behavior.

 Ad Targeting: Selecting the right ads based on user profiles and context.

 Real-Time Decision Making: Making quick decisions on which ad to show to a specific


user during their online session.

 Optimization: Maximizing ad relevance and user engagement while respecting constraints


(e.g., budget, ad inventory).

2. Machine Learning Approaches:

o Machine learning plays a crucial role in solving the matching problem:

 Feature Engineering: Extracting relevant features from user data (search queries,
browsing history, location, etc.) and ad content (keywords, creatives, landing pages).

 Predictive Models: Building models to predict user behavior (e.g., click-through rate,
conversion probability) based on features.

 Ranking Algorithms: Determining the order in which ads are displayed to users (e.g.,
using gradient boosting, neural networks, or reinforcement learning).

 Contextual Matching: Considering the context (user intent, device, time of day) when
serving ads.

 Personalization: Tailoring ad recommendations to individual users.

3. Challenges:

o Scale: Handling millions of users and ads in real time.

o Freshness: Keeping user profiles and ad data up to date.

o Privacy: Balancing personalized recommendations with user privacy.

o Ad Fatigue: Avoiding showing the same ad repeatedly to the same user.

o Budget Constraints: Optimizing ad delivery within budget limits.

7. The AdWords Problem

1. AdWords Problem:

o The AdWords problem is a fundamental challenge faced by online advertising platforms,


particularly in the context of pay-per-click (PPC) advertising.

Page 9
o It involves optimizing the allocation of ads to available ad slots (such as search engine results
pages or display placements) to maximize revenue while respecting constraints like advertisers’
budgets.

o Key components of the AdWords problem include:

 Advertisers: Each advertiser has a budget and bids for specific keywords or ad
placements.

 User Queries: Users submit search queries or interact with content.

 Ad Auctions: When a user query matches an advertiser’s keyword, an auction determines


which ad(s) to display.

 Quality Score: Ad relevance, landing page quality, and expected click-through rate
influence ad rankings.

 Budget Constraints: Advertisers’ daily budgets limit their spending.

 User Experience: Balancing relevant ads with user satisfaction.

o The goal is to allocate ad slots efficiently, considering both advertiser bids and user relevance.

2. Machine Learning in AdWords:

o Machine learning plays a crucial role in solving the AdWords problem:

 Predictive Models: Predicting click-through rates (CTR) based on historical data and
features (e.g., ad content, user context).

 Ranking Algorithms: Determining the order in which ads appear in search results or on
websites.

 Budget Optimization: Allocating budgets across campaigns and keywords.

 Ad Relevance: Evaluating ad quality and relevance.

 Personalization: Tailoring ads to individual users.

o Google AdWords leverages machine learning to optimize ad delivery, improve targeting, and
enhance user experience.

3. Challenges:

o Scale: Handling millions of advertisers, keywords, and user queries.

o Real-Time Decision Making: Ad auctions occur in milliseconds, requiring efficient algorithms.

o Quality vs. Revenue Trade-off: Balancing ad relevance with revenue generation.

o Dynamic User Behavior: User preferences change over time.

o Privacy and Ethical Considerations: Respecting user privacy while personalizing ads.

Page 10
8. The Balance Algorithm

1. Balance Algorithm:

o The Balance Algorithm is a concept used in online advertising to optimize the allocation of ad slots
while considering both advertisers’ bids and user relevance.

o It aims to strike a balance between maximizing revenue (by showing high-bidding ads) and
enhancing user experience (by displaying relevant ads).

o Key considerations include:

 Bid Amounts: Advertisers bid different amounts for specific keywords or placements.

 Ad Quality: The relevance and quality of ads impact user engagement.

 User Satisfaction: Showing too many ads can negatively affect user satisfaction.

 Budget Constraints: Advertisers have daily budgets that limit their spending.

o The algorithm dynamically adjusts ad placements based on real-time auctions and user
interactions.

2. Machine Learning in Balance Algorithms:

o Machine learning techniques play a crucial role in optimizing ad delivery:

 Predictive Models: Predicting click-through rates (CTR) based on historical data and
features (e.g., ad content, user context).

 Ranking Algorithms: Determining the order in which ads appear in search results or on
websites.

 Budget Allocation: Allocating budgets across campaigns and keywords.

 Personalization: Tailoring ads to individual users.

o The goal is to find the right balance between maximizing revenue and providing a positive user
experience.

3. Challenges:

o Real-Time Decision Making: Ad auctions occur in milliseconds, requiring efficient algorithms.

o Quality vs. Revenue Trade-off: Balancing ad relevance with revenue generation.

o Dynamic User Behavior: User preferences change over time.

o Privacy and Ethical Considerations: Respecting user privacy while personalizing ads.

9. Application of dimensionality reduction

1. Data Visualization:

o Dimensionality reduction helps visualize high-dimensional data in lower dimensions.

o Techniques like Principal Component Analysis (PCA) transform data into a lower-dimensional
space, making it easier to plot and interpret.

Page 11
2. Noise Reduction:

o High-dimensional data often contains noise or irrelevant features.

o By reducing dimensions, we filter out noise and focus on the most informative features.

3. Feature Engineering:

o Dimensionality reduction aids in feature selection.

o We can identify significant variables and exclude irrelevant ones, improving model performance.

4. Computational Efficiency:

o High-dimensional data requires more computational resources.

o Reducing dimensions speeds up training and inference.

5. Anomaly Detection:

o In fraud detection or fault diagnosis, dimensionality reduction highlights anomalies.

o Outliers become more apparent in lower-dimensional spaces.

6. Collaborative Filtering:

o Recommender systems use dimensionality reduction to handle sparse user-item matrices.

o Techniques like Singular Value Decomposition (SVD) reduce dimensions while preserving user
preferences.

7. Natural Language Processing (NLP):

o In text analysis, reducing word vectors’ dimensions (using techniques like Word2Vec) captures
semantic meaning efficiently.

8. Gene Expression Analysis:

o In bioinformatics, dimensionality reduction helps analyze gene expression data.

o It identifies relevant genes and reduces noise.

9. Image Compression:

o Techniques like autoencoders reduce image dimensions while preserving essential features.

o This aids in image storage and transmission.

10. Clustering and Classification:

o Reduced dimensions improve clustering and classification algorithms.

o K-means, DBSCAN, and SVMs benefit from dimensionality reduction.

10. SVD for Latent Semantic Indexing

Latent Semantic Analysis (LSA), also known as Latent Semantic Indexing (LSI), is a natural language
processing method that uses statistical approaches to identify associations among words in a document. Let’s
explore how Singular Value Decomposition (SVD) is applied in LSA:

Page 12
1. Latent Semantic Analysis (LSA):

o LSA aims to capture the underlying semantic structure of a collection of documents.

o It deals with issues like synonymy (different words with similar meanings) and polysemy (one
word having multiple meanings).

o For example, consider the words “mobile,” “phone,” “cell phone,” and “telephone.” LSA ensures
that documents containing any of these related terms are retrieved when a user poses a query like
“The cell phone has been ringing.”

2. Assumptions of LSA:

o Words used in the same context are analogous to each other.

o The hidden semantic structure of the data is unclear due to word ambiguity.

3. Singular Value Decomposition (SVD):

o SVD is a mathematical operation that decomposes a matrix into three matrices:

 Let’s define:

 C: Collection of documents.

 d: Number of documents.

 n: Number of unique words in the entire collection.

 M: Word-to-document matrix (size: d x n).

 SVD decomposes M as follows:

 M = UΣVT

 U: Distribution of words across different contexts.

 Σ: Diagonal matrix representing the association among contexts (from


highest to lowest significance).

 VT: Distribution of contexts across different documents.

o SVD allows us to truncate unnecessary contexts, reducing dimensions and serving as a


dimensionality reduction technique.

o By selecting the k largest diagonal values in Σ, we obtain an approximated matrix Mk:

 Mk = UkΣkVTk

4. Applications of LSA with SVD:

o Information Retrieval: LSA improves document retrieval by capturing latent semantic


relationships.

o Topic Modeling: LSA identifies underlying topics within a collection of documents.

o Document Clustering: Similar documents are grouped based on latent semantic similarities.

Page 13
o Dimensionality Reduction: LSA reduces the high-dimensional word space to a lower-dimensional
semantic space.

Page 14

You might also like