Hinters are the primary way to distill your heuristic knowledge about a task in a way that Watchful can interpret. They allow you to build noisy indicators which Watchful can learn from.
One way to build hinters is to manually write/discover queries that you know to be strongly correlated with a class. As an example, you might know that the term
Red Sox correlates highly with the class
Sports in the context of news articles. You can write this out as a query in Watchful's interface, add an expectation to indicate the strength of your conviction (see more about this below), and then check it in as a hinter so Watchful can learn from the heuristic.
It's important to note that, in order to create an NER hinter, your query must contain a capture group, or the explicit span of text that will be labeled with the NER class. This is captured by double square brackets; for example, in the query,
TOKS: all of the [[ Red Sox ]] players, the selected NER class (say,
Team) would be applied to
Red Sox. For more information, refer to the Querying article.
When writing queries watchful can assist you with the auto complete feature. As you type, the auto complete window will update based on your cursor location and show potential options to extend your query. For example typing
[ pos will show available
When the query box is empty Watchful will display the previous queries you have typed.
While in the Query Helper window and when you are not selected on text, Watchful will display the breakdown of each part of the query and the
hits across your dataset. From here you can see how impactful each part of the query is on the number of results returned.
Watchful has several visual cues to help you understand what your hinter is matching. Utilizing these can help you explore you data and refine your hinter for optimal results.
Hinter matches are indicated by orange highlighting over the text:
Capture groups are shown as orange-highlighted text with a darker orange underline:
Attributes are rendered by Watchful as you type the
entities that are present in the candidate table view. We display above the candidate text visual "hats" which are are reactive to your cursor in the query input box. For example if you type
pos you will see all parts of speech present. You can refine your view by typing a specific part of speech such as
PROPN and only the text that matches
PROPN will display.
Full Text Classes
As you introduce more signal into Watchful, the platform is able to suggest hinters for you to try. In general, we encourage you to use the suggested hinters as much as possible, as they are predicted to be maximally impactful to Watchful's understanding of the class. That said, suggested hinters can be overly general at times so it can be helpful to look through "specializations" of that hinter (by mousing over the suggestion) or using it as a base to build a better hinter off of.
For NER classes, Watchful provides both hinter and hand label suggestions to speed up your workflow. To start getting suggestions, you must first hand label at least one kilobyte of data. See Hand Labeling for reference.
Once you've submitted some hand labels, you can navigate back to the Query page to start interacting with the Suggestion Engine.
For hand label suggestions, you can choose
Skip. Only choose
Yes if the entire highlighted text belongs to the class; if something is missing, click no; if too much is highlighted, click no. You can skip if you don’t want to pass judgment.
For hinter suggestions, you can
Try populates that suggestion in the query input for you to review and set an expectation (see Expectation section below). You can click
Skip if it doesn’t look like a good suggestion - but bear in mind, you will also see negative hinter suggestions, which are worth checking in as well.
Expectations are the user's prediction of how correlated a query is with a class. A way to think about a hinter's expectation is: "Of all of the data this query may run on, I expect that X% is of the class in question, and 100-X% is of some other class." This is meant to be an estimation - while your expectation has an impact on how Watchful interprets the class, you shouldn't spend a huge amount of time trying to aim for "perfection" in your expectations as the model expects these to be purely heuristic.
As an example,
Red Sox may be known to be highly correlated with the class
Sports in the context of news articles - so you may enter an extremely high expectation for the hinter (something between 95-99%). You may also have the conviction that the term
Patriot has a high correlation with the class
Sports on the same type of data, but it might also have a strong correlation with another class (like
WorldPolitics). You can account for this in two ways:
- Go through 1-2 pages of the query results manually to get some sense of what percentage of the returned candidates are actually positive for the class
- Provide a gut estimate of what percentage of the data this hinter sees will be positive for the class.
In general, we suggest gut estimates where possible. By manually counting instances of a class in a dataset, it's possible that your expectation may overfit to the specific dataset you worked with at the time. This makes it difficult to reuse hinters in the future (i.e: re-running them on future samples from the same data source).
To view all your hinters click on the hinter list button in the upper left. Here you will find all hinters that have been created for the selected class. From here you can
update expectation and
select a hinter to revisit or modify.
Watchful assisted expectation for Hinters
In the hinter list you will see two expectations the
current expectation and the
empirical expectation. Empirical expectation is derived by watchful from the hand labels in the dataset. In this way the empirical suggestions are "smart" and will update as more hand labels are created.
You can choose to update each empirical suggestion individually by clicking the
arrow icon or click the
submit button to approve all.
- Keep your hinters small. The more atomic & overlapping hinters you have, the more signal Watchful has to process.
- Use hinter suggestions as much as you can. They speed up your workflow and allow you to create many more hinters than you otherwise would be able to manually.
- Use your gut where possible. By applying a expectationt that "feels" right based on your understanding of your classes, you can build robust hinters that allow you to repeatedly create high quality training data from future samples.
Updated 7 months ago
Understand how to interpret your results