Datasets
The Settings page allows you to import, replace, and export datasets for use inside your project. You can navigate to the Settings page by clicking the gear icon in the top right corner of the project.
Data Format
Watchful only supports CSV data where the column names start with a capital letter, and only use alphanumeric characters and underscores.
- Files must be in .csv format with a header row
- Column names must be capitalized
- Note that Watchful Community Edition has a 5MB import limit
Dataset Import Choices
Watchful currently supports .csv
file types either locally from disk, or via our s3
integration, which is available with Watchful Scale.
Local CSV File
Once the file source is selected, Watchful will show you a preview of what your dataset will look like when loaded into the project. If the data looks as expected, you can confirm importing the dataset by clicking the "Import" button on the top right.
s3 Import
Watchful supports integration with any s3
API-capable storage via Watchful Hub (see Setting up Watchful Hub). To import a CSV file in an existing bucket, you will need:
- Watchful Hub set up with appropriate AWS credentials.
- To be logged in to the Application (see Watchful Hub Overview and User Roles)
- The name of the bucket, and the path of the file.
For the given S3 URI:
s3://meerkat-manor/season-1/cast.csv
:
- The S3 Bucket would be
meerkat-manor
- The S3 Path would be
season-1/cast.csv
- Note: Do not include the leading
/
in either field
To begin, select Import from S3, then paste the S3 bucket and S3 Path into the Add S3 file to project fields.
Replacing a Dataset
You can replace the existing dataset in this project by selecting a new dataset to import. Replacing a dataset will permanently remove the current file from your project. Existing Hinters and Hand Labels will remain in the project. You will need to relabel for base rate via the Hand Label tab each time you replace a dataset so that Watchful gets an accurate representation of the classes in the dataset. More on how hand labels are used within Watchful here Hand Labeling
After replacing the datasets, you will have to:
β’ Recalibrate your base rates by Hand Labeling.
β’ Some hinters may need to be edited if they rely on specific attributes of the dataset being replaced.
Importing Hand Labels
Watchful supports importing hand labels with the exact column name, HandLabels
, and in the specific format Watchful expects:
"Class1-Y"
or"Class1-N"
for a single class label,"Class1-Y Class2-N Class3-Y"
and each class should be separate by aspace
for multi-class labels.
Class names must be of the following format:
- Class names must start with an uppercase letter
- Can only contain alphanumerical characters or '_'
- Cannot exceed 32 characters.
In addition, all "Y"/"N"'s must be capitalized.
This is also the format that we use to export hand-labels as part of the Exporting process.
When Watchful sees the HandLabels
column in your dataset, it will automatically create hand labels out of all correctly formatted, hand-labelled candidates. If your dataset contains classes Watchful does not have already, Watchful will create those classes. To make sure your hand labels are imported successfully, you can:
- If your dataset contains hand labels for classes Watchful hasn't seen before, check to see that Watchful has created those classes.
- Use the
handlabel <pos/neg> <Class-Name>
query to see all hand labelled candidates, and verify how many there are (See Querying section for more details).
Limitations
- File size is limited by the amount of RAM available in the machine
- No matter how large the file, Watchful does a streamed import which allows you to start working immediately while the rest of the file is imported.
Updated almost 2 years ago