2.3.16 Queries

In most cases, the first step to building an experiment or sandbox pipeline is to design a query. The query API is a powerful tool that selects the data you want to use to train your model. See the tutorial in Getting Started with the ML SDK for a practical introduction to queries.

Queries can be created in the Microchip ML Model Builder UI and, also, programmatically using the create query API.

Examples:

client.create_query('my_query', columns = ['AccelX', 'AccelY', 'AccelZ'],
                             metadata_columns = ['Subject'],
                             label_columns=['Label']
                             metadata_filter = '[Subject] IN [User001, User002]',
                             force = True)

client.pipeline.set_input_query('my_query')

Managing the query cache

# list the queries in the current project
client.list_queries()

# get a query that you have already created
q = client.get_query("<query-name>")

# check if there is a cache and how many partitions there are
print(q.cache)

# update the cache for the query from the latest information in the project
q.cache_query()

# check the status of the query
q.cache_query_status()

# stop the current query cache operation
q.cache_query_stop()

class mplabml.datamanager.queries. Queries(connection, project)

Base class for a collection of queries.

build_query_list(): Populates the function_list property from the server

create_query(name, columns=None, metadata_columns=None, metadata_filter='', label_column='')

Creates a query with the given input properties and inserts it onto the server

Parameters

name (str) – Name of the query
columns (list [ str ]) – Sensor columns to select
metadata_columns (list ( str ]) – Metadata columns to select
metadata_filter (str) – Specifies one or more metadata filter conditions

Returns

query object

get_or_create_query(name)

Calls the REST API and gets the query by name; if it does not exist, insert a new query

Parameters: name (str) – name of the query
Returns: query object

get_queries()

Gets all project queries from the server and creates corresponding local query objects

Returns: list[query]

get_query_by_name(name, raise_exception=False)

Retrieves a query by name from the server if it exists

Parameters: name (str) –
Returns: query object or None

get_query_by_uuid(uuid)

Retrieves a query by name from the server if it exists

Parameters: name (str) –
Returns: query object or None

new_query(): Initializes a new query for the project, but does not assign property values or insert it into the server

class mplabml.datamanager.query. Query(connection, project)

Base class for a query

Queries extract project data or a subset of project data for use in a pipeline. The query must specify which columns of data to extract and what filter conditions to apply.

cache_query(renderer=None): Caches the current version of the query

cache_query_status(renderer=None): Gest the status of the current caching of the query

cache_query_stop(renderer=None): Kills the job for the currently executing query

cache_queryv1(renderer=None): Caches the current version of the query

cache_queryv2(renderer=None): Caches the current version of the query

check_query_cache_up_to_date(renderer=None)

Checks if the current cached query is up to date with the current training data

The sensor data in a query is cached when the query is built. If the segments or metadata have changed since the last time the sensor data was cached, then, for a query to use the new data, it needs to be rebuilt. This API returns whether or not the sensor data changed since the last time the query was cached.

property columns: Sensor columns to include in the query result
Note:
Columns must correspond to actual project sensor columns or the reserved word ‘SequenceID’ for the original sample index.

property combine_labels

Combine label values into new value to use in the query result

Label = Gesture Label_Values = A,B,C,D,E combine_labels = {‘Group1’:[‘A’,’B’,C’],’Group2’:[‘D’,’E’]}

The labels that will be returned will be group1 and group2

property created_at: Date of the Pipeline creation

data(partition=0): Calls the REST API for query execution and returns the result
Note:
Intended for previewing the query result before creating a query call object and using it in a sandbox step. The resulting DataFrame is not cached on the server, but when it is used in a sandbox, it may be cached.

delete(renderer=None): Calls the REST API and deletes the query object from the server

get_feature_statistics(): Returns metadata statistics for the query

get_statistics_summary(renderer=None): Returns metadata statistics for the query

initialize_from_dict(data): Reads a json dict and populates a single query

insert(renderer=None): Calls the REST API and inserts a new query

property label_column: Label columns to use in the query result
Note:
Columns must correspond to actual project label column

property metadata_columns: Metadata columns to include in the query result
Note:
Columns must correspond to actual project metadata columns.

property metadata_filter

Filter criteria of the query

Parameters: value (str) – Similar to a SQL WHERE clause, the string can contain any number of AND-concatenated expressions where square brackets surround the column name and comparison value, with the operator in between. Supported operators: >, >=, <, <=, =, !=, IN

Examples:

metadata_filter = '[Subject] > [5] AND [Subject] <= [15]'
metadata_filter = '[Gender] = [Female] AND [Activity] != [Walking]'
metadata_filter = '[Subject] IN [5, 7, 9, 11, 13, 15]'

Note:

Queries do not support OR-concatenation between expressions, but often the IN operator can be used to achieve OR-like functionality on a single column. For example:

[Gesture] IN [A, M, L]

is equivalent to:

[Gesture] = [A] OR [Gesture] = [M] OR [Gesture] = [L]

property name: Name of the query

plot_statistics(renderer=None, **kwargs): Generates a bar plot of the query statistics

post_feature_statistics(window_size=None): Returns metadata statistics for the query

refresh(): Calls the REST API and self populate using the uuid

property segmenter

Segmenter to use for the query

Parameters: value (int) – ID of segmenter

size(): Returns the size of the DataFrame, which would result from the query

statistics_segments(renderer=None): Returns metadata statistics for the query

property summary_statistics: Name of the query

update(renderer=None): Calls the REST API and updates the query object on the server