Guide to Using scikit-learn KMeans and DBSCAN
1. KMeans Clustering
KMeans is a centroid-based clustering algorithm that partitions data into K distinct clusters.
Step-by-step Guide:
1. Import the required libraries:
from sklearn.cluster import KMeans
2. Prepare your dataset:
Example: X = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
3. Initialize and fit the model:
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
4. Access results:
- kmeans.labels_: Cluster labels of the training data points
- kmeans.cluster_centers_: Coordinates of cluster centers
- kmeans.predict([[0, 0], [12, 3]]): Predict cluster for new samples
2. DBSCAN Clustering
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups together points that are close
to each other and marks points in low-density regions as outliers.
Guide to Using scikit-learn KMeans and DBSCAN
Step-by-step Guide:
1. Import the required libraries:
from sklearn.cluster import DBSCAN
2. Prepare your dataset:
Example: X = [[1, 2], [2, 2], [2, 3], [8, 7], [8, 8], [25, 80]]
3. Initialize and fit the model:
dbscan = DBSCAN(eps=3, min_samples=2).fit(X)
4. Access results:
- dbscan.labels_: Cluster labels for each point (-1 means noise)
- dbscan.core_sample_indices_: Indices of core samples
Tips:
- eps: Maximum distance between two samples for one to be considered as in the neighborhood of the other.
- min_samples: The number of samples in a neighborhood for a point to be considered as a core point.