Querying
Queries
Watchful supports a rich query syntax to let you explore your data, create heuristics, and analyze your labels. The table updates in real-time as you type your query to keep you tightly looped with your data. Note: Watchful uses the Rust Regex library so any uses of regex in the query language will be limited to what that library supports (specifically: constructs for which only backtracking solutions are known to exist are not supported by principle). This means that backreferences and look-around assertions are not supported in the regex implementation.


Implicit ROWS Queries
Name | Example | Description |
---|---|---|
Regex |
| Regular expression matching its pattern within the slashes on the data determined by the surrounding syntax of the query. A regex without surrounding syntax matches text within each cell of the dataset. The |
Bare word |
| Any candidate that has |
Hint |
| Finds data that is being matched by the hinter with the specified id. |
Hinted |
| Finds data matched by any hinter |
Label |
| Finds data that has a plabel* for the specified class that matches the numerical comparison. *plabel is short for “probabilistic label” |
Hand label |
| Finds data that’s been hand labeled positively or negatively for the specified class. |
Column |
| Finds data in the Field* that matches the comparison or regex. * The field is the column from your source dataset |
Negation |
| Finds the data that does not match the predicate after the exclamation point. |
Boolean operators |
| Finds data that matches both or either predicate. |
Attributes |
| Any token across any column that has a length greater than 5 characters |
Numeric |
| Any candidate where the column |
Advanced Usage
Index & Context 101
Every query in Watchful has a context, or query unit, whether explicit or implicit. The default context for every query is ROWS
- meaning every query operates at the "row" level by default. The table above outlines ROWS queries. For example:
/abc|123/i
is equal to ROWS: /abc|123/i
- both queries simply match each row for the regex /abc|123/i
.
You can change the context from ROWS
to CELLS
(think of a cell in an excel spreadsheet) by simply making the query CELLS: /abc|123/i
. Now the query matches each cell for the regex /abc|123/i
. This can be a powerful tool to scope data and queries to match data at a variety of granularities. As an example:
SENTS TOKS: /^[A-Z]/
will match any token that starts with a capital letter. Leveling that up to SENTS TOKS: /^[A-Z]/+
will match any contiguous set of tokens that start with capital letters (i.e: anything that is title cased). You can take that one step further by doing SENTS TOKS: /^[A-Z]/+ inc
which would now match any sequence of title-cased tokens that end in "inc". By adding a space between /^[A-Z]/+
and inc
, you indicate to Watchful that you'd like the sequence to end with the token Inc
. As you might have guessed, this query would match certain types of company names (e.g: Watchful Inc.
or Acme Studios Inc.
)
Spaces separate query predicates that act upon sequences at the level of the given context. TOKS: /ABC/ /123/
matches any two contiguous tokens where the first contains ABC
, and the second contains 123
. If the query was SENTS: /ABC/ /123/
, it would match any two contiguous sentences where the first contains ABC
and the second contains 123
.
Refer to the table below for more information about query units.
Explicit Sequence Unit Queries
Name | Example | Definition |
---|---|---|
Query unit |
| The query unit determines the element which will be matched against the query, such as sentences (SENTS), entire rows (ROWS), single columns within the row (CELLS), or single tokens (TOKS). |
Sequence unit |
| The sequence unit lets you query by a sequence of constraints that match successive elements of the given type, such as tokens, within the query unit, for example matching successive tokens within a single sentence by |
Constraint |
| Following the colon after the query and sequence units we find constraints. These constraints are applied to one sequence unit (or query unit if not specified). A constraint is one of the following:
|
Compound Constraint |
| Multiple constraints can be combined by not separating them with a space character. The intersection of the constraints will determine if the Step unit is matched, i.e. all the constraints must match on that Step unit. |
Quantifier |
| A constraint can match a variable number of sequence units by using a quantifier. The quantifiers go on the end of the constraint without a space character and have the same definition as regex quantifiers:
|
Constraint sequence |
| A sequence of constraints where a match is defined as consecutive items in the Step (or Context if not specified) that matches the constraints in order. |
Bracket expression |
| A [bracket expression] can be used to define metadata constraints or a nested query. A metadata constraint is defined as: A nested query is defined as: |
Capture group |
| A capture group surrounds one or many constraints with double brackets. Notice the space between the brackets and the constraint in the examples. The capture group is used by NER hinters, where the entities captured by the capture group is the data that will have their plabel affected. Consider this sentence: |
Common Query Types
Example | Definition |
---|---|
| False negatives - any candidate that has a positive hand label but Watchful is currently predicting probability less than 10% for the class |
| False positives - any candidate that has a negative hand label but Watchful is currently predicting probability greater than 90% for the class |
| Any token that contains |
| An information extraction query that matches any sequence of title-cased tokens that ultimately end in |
| Any token that contains |
| Any token that has a specified attribute and matches the regex. |
Updated 2 months ago