KEMBAR78
Apache Mahout Algorithms | PDF
Mahout
Algorithms
Mahmut Karakaya
Agenda
- Introduction
- Collaborative Filtering
- Map/Reduce
- Clustering
- Demo
What mahout means
Elephant rider in Hindi
What Apache Mahout is
- Java, Hadoop
- Collaborative Filtering
- Mahout In Action
- user@mahout.apache.org
- 0.9 (1-Feb-2014)
Who uses Mahout
Mahout in Apache Foundation
overstock.com saves $2m a year
Judd Bagley Saum Noursalehi
Others
- Weka (Machine Learning Library)
- Lenskit (Grouplens)
- EasyRec (RestAPI)
- Write yourself:)
Need to know ML?
Need to know ML?
hadoop.jar mahout-core-0.8-job.jar 
org.apache.mahout.cf.taste.hadoop.item.
RecommenderJob 
-Dmapred.input.dir=input/input.txt 
-Dmapred.output.dir=output
--usersFile input/users.txt --booleanData
Data Model (u,i,r)
Similarity
Cosine Similarity
Cosine Similarity
Collaborative Filtering
- Data format = userId, itemId, rating
- Create Model + Predict
Item Based - Similarity Matrix (Item-Item)
Item Based - Predict
- Weighted Sum:
r^(3,1) = 2 * 0.91 + ...
Item Based
Item Based.. Why in Mahout
- Generic recommender like User Based
- User Based similarity matrix is heavier
Singular Value Decomposition (SVD)
SVDRecommeder
Factorization
Factorizer
Singular Value Decomposition (SVD)
m * n → m * k + n * k
10M → 100K + 10K
Lets say; m=10K
n = 1K
k=10
Singular Value Decomposition (SVD)
SVD k=3 λ=0.1 a=40 c.a=1
SVD k=3 λ=0.1 a=40 c.a=1
SVD k=3 λ=0.1 a=40 c.a=10
SVD.. Why in Mahout
- Won Netflix Prize
- Parallelizable by row, column
Map / Reduce Mapper
1.txt 2.txt
Hello Hello
Hello
Map / Reduce Mapper
Map / Reduce Mapper
Map1 Map2
Hello,1 Hello,1
Hello,1
Map / Reduce Reducer
Map / Reduce Reducer
Hello,3
Map / Reduce ItemBased
Map / Reduce ItemBased
hadoop.jar mahout-core-0.8-job.jar 
org.apache.mahout.cf.taste.hadoop.item.
RecommenderJob 
-Dmapred.input.dir=input/input.txt 
-Dmapred.output.dir=output
--usersFile input/users.txt --booleanData
Map / Reduce ItemBased
Map / Reduce ItemBased
Map / Reduce ItemBased
Map 1
Map / Reduce ItemBased
Reduce 1
Map / Reduce ItemBased
Reduce 1
Map / Reduce ItemBased
Map 2
Map / Reduce ItemBased
Reduce 2
Map / Reduce ItemBased
Map / Reduce.. Why in Mahout
Clustering
- KMeans Clustering (SM,MR)
- Fuzzy kMeans (SM,MR)
- Canopy Clustering (SM,MR)
- Dirichlet (SM,MR)
Kmeans
Kmeans
Clustering Evaluation
Clustering Intra Distance
Clustering Inter Distance
Clustering.. Why in Mahout
- Sparsity
- ~10m of 11m users registered 1 Sony product
Clustering.. Why in Mahout
- Group Recommendation
- Cluster Based Recommendation
Create WishList Experience
- Mahout (SVD)
- Play
- Heroku
- MongoLab
- Rest
http://recommenderplaybbs.herokuapp.com/
Thank you

Apache Mahout Algorithms