Watchful Overview

What is Watchful?

Watchful is a Machine Teaching Platform designed to help data science teams get better models into production faster.

Watchful provides data scientists and subject matter experts an intelligent system to create high quality training data in hours instead of weeks.

📘
Dig Deeper
Learn more about Machine Teaching here

Who should use Watchful?

Watchful is meant for you if you have the following cases:

The data you use for your machine learning projects requires subject matter experts to label it
You require a high degree of explainability in your machine learning pipelines
You find yourself retraining your models frequently due to data/model drift (e.g: this is common in adversarial problem spaces like fraud detection)

What kinds of data can I use with Watchful?

Right now, we support CSV file imports and work best with data that is more textual than it is numerical. We will be expanding this scope over time to include other data modalities like time series, images, video, etc.

What is Probabilistic Labeling?

Classically, labeled training data is assumed to be deterministic by nature. For example, if you were to train a model to identify spam or ham e-mails, you would expect your training data to be encoded with 1's for spam and 0's for ham (or vice versa). However, the real world is rarely this clean. In reality, there is usually some ambiguity or "gray area" that isn't quite captured by deterministic labels.

Email	Is_Spam
"Subject: finest online medicine here need pres cription medication without a prior prescri ption? absolutely no doctor's appointments needed! ..."	1
"Subject: board presentation please find attached the presentation for the board of directors regarding the las vegas expansion ..."	0
"Subject: re : congratulations thanks. congratulations to you ..."	0

Labeling teams typically use some form of majority vote mechanism to collapse disagreements in the labeling process to a single label, however some information is lost in this process. If the disagreement is encoded as a probability instead of being collapsed to a 0 or a 1, your model has more information with which to learn the candidate's relationship to the class. Probabilistically labeled data encodes richer information about your candidates' relationship to the class, in a way that deterministically labeled data cannot.

Email	Prob_Spam
"Subject: finest online medicine here need pres cription medication without a prior prescri ption? absolutely no doctor's appointments needed! ..."	100
"Subject: board presentation please find attached the presentation for the board of directors regarding the las vegas expansion ..."	0
"Subject: re : congratulations thanks. congratulations to you, very well deserved ..."	25

How do I use Watchful's Probabilistic Labels?

There are a few ways you could train a model using your probabilistically labeled data. You could:

Directly use your full probabilistic labels as they are
Use the most-likely labels, which means to use Spam = 1 if Prob_Spam >= 50 and Spam = 0 if Prob_Spam < 50. This will yield results quickly but may have varying results depending on how much time was spent labeling
Filter out the lower quality labels by imposing thresholds; one example would be to use Spam = 1 if Prob_Spam > 90 and Spam = 0 if Prob_Spam < 10. This would lead to the abstention of data that lie within the range 10 < Prob_Spam < 90. Importantly, you are making a trade-off between the quality and the amount of labeled data
Sample labels from your probabilistically labeled data
Use your labeled data together with other relevant features of your dataset

Generally speaking, we recommend to use the probabilities themselves where possible as we've seen that it can lead to improved model performance without introducing much complexity to the pipeline.

What is Watchful?

📘Dig Deeper

Who should use Watchful?

What kinds of data can I use with Watchful?

What is Probabilistic Labeling?

How do I use Watchful's Probabilistic Labels?

📘
Dig Deeper