Sensitivity, Specificity and Accuracy
A classical approach to assess accurateness of algorithms (and also diagnostics tests) is by using ROC curves. The rate of false positives (or negatives) against true positives. The further away from the 45 degree line the better. Actually its the area under the curve that matters. Bigger is "usually" better.
First, a quick summary of sensitivity, specificity and accuracy; then some thoughts as to whether they apply in your case. I've linked here to Wikipedia entries, which are very good summaries of this topic.
To calculate sensitivity, specificity and accuracy, you require 4 things:
- True positives (TP). A true positive is an observation, identified by your algorithm, which is a real instance of a feature.
- False positives (FP). A false positive is when your algorithm identifies an observation as a real instance of the feature but in fact, it is not.
- True negatives (TN). A true negative is the case where the observation is not a real instance of the feature and your algorithm identifies it as such.
- False negatives (FN). A false negative is the case where your algorithm does not identify the observation as a real instance of the feature but in fact, it is.
Sensitivity is then:
TP / (TP + FN)
And specificity is:
TN / (TN + FP)
Accuracy does not have a single, concise definition - it's measured in many ways, using various combinations of TP, FP, TN and FN. One measure is:
TP + TN / (TP + FP + TN + FN)
Another metric is positive predictive value, defined as:
TP / (TP + FP)