Watchful supports a rich query syntax to let you explore your data, create heuristics, and analyze your labels. The table updates in real-time as you type your query to keep you tightly looped with your data. Note: Watchful uses the Rust Regex library so any uses of regex in the query language will be limited to what that library supports (specifically: constructs for which only backtracking solutions are known to exist are not supported by principle). This means that backreferences and look-around assertions are not supported in the regex implementation.


Implicit ROWS Queries

Regular expression matching its pattern within the slashes on the data determined by the surrounding syntax of the query. A regex without surrounding syntax matches text within each cell of the dataset.

The i flag at the end of the regex sets a flag to ignore case when matching.
Bare wordabc123Any candidate that has abc123 explicitly somewhere in it, in any case. Equivalent to /\babc123\b/i but more performant
Hinthint 42Finds data that is being matched by the hinter with the specified id.
HintedhintedFinds data matched by any hinter
Labellabel Class > 90
label Class < 90
label Class = 90
label Class != 90
label Class <= 90
label Class >= 90
Finds data that has a plabel* for the specified class that matches the numerical comparison.

*plabel is short for “probabilistic label”
Hand labelhandlabel pos Class
handlabel neg Class
Finds data that’s been hand labeled positively or negatively for the specified class.
ColumnField ~ /regex/
Field < 42
Field > 42
Field <= 42
Field >= 42
Field = "string"
Field != "string"
Finds data in the Field* that matches the comparison or regex.

* The field is the column from your source dataset
!hint 42
!label Class > 90
!handlabel pos Class
!Field ~ /regex/
Finds the data that does not match the predicate after the exclamation point.
Boolean operators/regex/ && hinted
/regex/ || hinted
Finds data that matches both or either predicate.
AttributesTOKS: [len > 5]Any token across any column that has a length greater than 5 characters
NumericFieldName < 20
FieldName > 3.14
Any candidate where the column FieldName is of a numeric type (number, float, or exponential) and is less-than ( < ) or greater-than ( > ) some value.

Advanced Usage

Index & Context 101

Every query in Watchful has a context, or query unit, whether explicit or implicit. The default context for every query is ROWS - meaning every query operates at the "row" level by default. The table above outlines ROWS queries. For example:
/abc|123/iis equal to ROWS: /abc|123/i - both queries simply match each row for the regex /abc|123/i.

You can change the context from ROWS to CELLS (think of a cell in an excel spreadsheet) by simply making the query CELLS: /abc|123/i. Now the query matches each cell for the regex /abc|123/i. This can be a powerful tool to scope data and queries to match data at a variety of granularities. As an example:

SENTS TOKS: /^[A-Z]/ will match any token that starts with a capital letter. Leveling that up to SENTS TOKS: /^[A-Z]/+will match any contiguous set of tokens that start with capital letters (i.e: anything that is title cased). You can take that one step further by doing SENTS TOKS: /^[A-Z]/+ inc which would now match any sequence of title-cased tokens that end in "inc". By adding a space between /^[A-Z]/+ and inc, you indicate to Watchful that you'd like the sequence to end with the token Inc. As you might have guessed, this query would match certain types of company names (e.g: Watchful Inc. or Acme Studios Inc.)

Spaces separate query predicates that act upon sequences at the level of the given context. TOKS: /ABC/ /123/ matches any two contiguous tokens where the first contains ABC, and the second contains 123. If the query was SENTS: /ABC/ /123/, it would match any two contiguous sentences where the first contains ABCand the second contains 123.

Refer to the table below for more information about query units.

Explicit Sequence Unit Queries

Query unitTOKS:
The query unit determines the element which will be matched against the query, such as sentences (SENTS), entire rows (ROWS), single columns within the row (CELLS), or single tokens (TOKS).
Sequence unitROWS CELLS:


The sequence unit lets you query by a sequence of constraints that match successive elements of the given type, such as tokens, within the query unit, for example matching successive tokens within a single sentence by SENTS TOKS:
ConstraintTOKS: token
TOKS: /regex/
TOKS: “string”
TOKS: [attr name]

CELLS: /some cell/
CELLS: “entire cell”
CELLS: [attr name]
Following the colon after the query and sequence units we find constraints. These constraints are applied to one sequence unit (or query unit if not specified). A constraint is one of the following:
- Regex
- Bareword
- String
- Dot
- BracketExpression
Compound ConstraintTOKS: [text a]/xyz/

CELLS: [column A][len > 9]
TOKS: [!text foo][hinted][!hint 42]
Multiple constraints can be combined by not separating them with a space character. The intersection of the constraints will determine if the Step unit is matched, i.e. all the constraints must match on that Step unit.
QuantifierTOKS: token+
TOKS: token*
TOKS: token?

TOKS: /regex/+
TOKS: “and”*
TOKS: first .* last
SENTS TOKS: [len > 5]+
A constraint can match a variable number of sequence units by using a quantifier. The quantifiers go on the end of the constraint without a space character and have the same definition as regex quantifiers:
- ? means zero or one (or optional)
- * means zero or more (any number of times)
- + means one or more (any number of times, but at least once).
Constraint sequenceTOKS: token /regex/

ROWS SENTS: /1st/ /2nd/ /3rd/
TOKS: token+ /regex/? token
SENTS: [len < 10] .+ [len < 10]
A sequence of constraints where a match is defined as consecutive items in the Step (or Context if not specified) that matches the constraints in order.
Bracket expressionTOKS: [attr val]
TOKS: [len < 3]
TOKS: [case title]
TOKS: [!case title]
TOKS: [text "token"]
TOKS: [!text /token/]
TOKS: [column ColumnA]
TOKS: [!column /regex/i]
SENTS: [TOKS: a || b][hinted]
A [bracket expression] can be used to define metadata constraints or a nested query.

A metadata constraint is defined as:
[name op? value]

A nested query is defined as:
[query], where query can be any valid query.
Capture groupTOKS: [[ token ]]
TOKS: token [[ . ]]
TOKS: token [[ . . ]]

TOKS: work for [[ .+ Inc ]]
SENTS TOKS: [[ /regex/ .? ]] /a/
A capture group surrounds one or many constraints with double brackets. Notice the space between the brackets and the constraint in the examples.

The capture group is used by NER hinters, where the entities captured by the capture group is the data that will have their plabel affected.

Consider this sentence:
I like grilled asparagus for dinner
with the query:
SENTS TOKS: like [[ grilled . ]] for dinner
It would match and display the whole sentence, and capture “grilled asparagus”, which we could use in an NER hinter if we want to create entities matching the query.

Common Query Types

handlabel pos CLASS && label CLASS < 10False negatives - any candidate that has a positive hand label but Watchful is currently predicting probability less than 10% for the class
handlabel neg CLASS && label CLASS > 90False positives - any candidate that has a negative hand label but Watchful is currently predicting probability greater than 90% for the class
TOKS: /abc123/[column COL]Any token that contains abc123 and is in the column COL
SENTS TOKS: [[ /^[A-Z]/+ ]] incAn information extraction query that matches any sequence of title-cased tokens that ultimately end in Inc, while explicitly capturing only the title-cased tokens
SENTS TOKS: /abc123/[len > 10][len < 15]Any token that contains abc123and is both greater than 10 characters and less than 15 characters
TOKS: [attr /regex/]Any token that has a specified attribute and matches the regex.

What’s Next

Find out more about creating Hinters from Queries