Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports term frequency-inverse document frequency calculations (TF-IDF).
Copyright (c) 2011-2017. Jake Brukhman. (jbrukh@gmail.com). All rights reserved. See the LICENSE file for BSD-style license.
This is meant to be an low-entry barrier Go library for basic Bayesian classification. See code comments for a refresher on naive Bayesian classifiers, and please take some time to understand underflow edge cases as this otherwise may result in innacurate classifications.
Using the go command:
go get github.com/jbrukh/bayesian
go install !$See the GoPkgDoc documentation here.
- Conditional probability and "log-likelihood"-like scoring.
- Underflow detection.
- Simple persistence of classifiers.
- Statistics.
- TF-IDF support.
To use the classifier, first you must create some classes and train it:
import "github.com/jbrukh/bayesian"
const (
Good bayesian.Class = "Good"
Bad bayesian.Class = "Bad"
)
classifier := bayesian.NewClassifier(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff, Bad)Then you can ascertain the scores of each class and the most likely class your data belongs to:
scores, likely, _ := classifier.LogScores(
[]string{"tall", "girl"},
)Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:
probs, likely, _ := classifier.ProbScores(
[]string{"tall", "girl"},
)To use the TF-IDF classifier, first you must create some classes
and train it and you need to call ConvertTermsFreqToTfIdf() AFTER training
and before calling classification methods such as LogScores, SafeProbScores, and ProbScores)
import "github.com/jbrukh/bayesian"
const (
Good bayesian.Class = "Good"
Bad bayesian.Class = "Bad"
)
// Create a classifier with TF-IDF support.
classifier := bayesian.NewClassifierTfIdf(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff, Bad)
// Required
classifier.ConvertTermsFreqToTfIdf()Then you can ascertain the scores of each class and the most likely class your data belongs to:
scores, likely, _ := classifier.LogScores(
[]string{"tall", "girl"},
)Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:
probs, likely, _ := classifier.ProbScores(
[]string{"tall", "girl"},
)Use wisely.