2.3.16 Queries
In most cases, the first step to building an experiment or sandbox pipeline is to design a query. The query API is a powerful tool that selects the data you want to use to train your model. See the tutorial in Getting Started with the ML SDK for a practical introduction to queries.
Queries can be created in the Microchip ML Model Builder UI and, also, programmatically using the create query API.
Examples:
client.create_query('my_query', columns = ['AccelX', 'AccelY', 'AccelZ'],
metadata_columns = ['Subject'],
label_columns=['Label']
metadata_filter = '[Subject] IN [User001, User002]',
force = True)
client.pipeline.set_input_query('my_query')
Managing the query cache
# list the queries in the current project
client.list_queries()
# get a query that you have already created
q = client.get_query("<query-name>")
# check if there is a cache and how many partitions there are
print(q.cache)
# update the cache for the query from the latest information in the project
q.cache_query()
# check the status of the query
q.cache_query_status()
# stop the current query cache operation
q.cache_query_stop()
- class mplabml.datamanager.queries. Queries(connection, project)
Base class for a collection of queries.
- build_query_list()
Populates the function_list property from the server
- create_query(name, columns=None, metadata_columns=None, metadata_filter='', label_column='')
Creates a query with the given input properties and inserts it onto the server
- Parameters
name (str) – Name of the query
columns (list [ str ]) – Sensor columns to select
metadata_columns (list ( str ]) – Metadata columns to select
metadata_filter (str) – Specifies one or more metadata filter conditions
- Returns
query object
- get_or_create_query(name)
Calls the REST API and gets the query by name; if it does not exist, insert a new query
- Parameters
name (str) – name of the query
- Returns
query object
- get_queries()
Gets all project queries from the server and creates corresponding local query objects
- Returns
list[query]
- get_query_by_name(name, raise_exception=False)
Retrieves a query by name from the server if it exists
- Parameters
name (str) –
- Returns
query object or None
- get_query_by_uuid(uuid)
Retrieves a query by name from the server if it exists
- Parameters
name (str) –
- Returns
query object or None
- new_query()
Initializes a new query for the project, but does not assign property values or insert it into the server
- class mplabml.datamanager.query. Query(connection, project)
Base class for a query
Queries extract project data or a subset of project data for use in a pipeline. The query must specify which columns of data to extract and what filter conditions to apply.
- cache_query(renderer=None)
Caches the current version of the query
- cache_query_status(renderer=None)
Gest the status of the current caching of the query
- cache_query_stop(renderer=None)
Kills the job for the currently executing query
- cache_queryv1(renderer=None)
Caches the current version of the query
- cache_queryv2(renderer=None)
Caches the current version of the query
- check_query_cache_up_to_date(renderer=None)
Checks if the current cached query is up to date with the current training data
The sensor data in a query is cached when the query is built. If the segments or metadata have changed since the last time the sensor data was cached, then, for a query to use the new data, it needs to be rebuilt. This API returns whether or not the sensor data changed since the last time the query was cached.
- property columns
Sensor columns to include in the query result
Note:Columns must correspond to actual project sensor columns or the reserved word ‘SequenceID’ for the original sample index.
- property combine_labels
Combine label values into new value to use in the query result
Label = Gesture Label_Values = A,B,C,D,E combine_labels = {‘Group1’:[‘A’,’B’,C’],’Group2’:[‘D’,’E’]}
The labels that will be returned will be group1 and group2
- property created_at
Date of the Pipeline creation
- data(partition=0)
Calls the REST API for query execution and returns the result
Note:Intended for previewing the query result before creating a query call object and using it in a sandbox step. The resulting DataFrame is not cached on the server, but when it is used in a sandbox, it may be cached.
- delete(renderer=None)
Calls the REST API and deletes the query object from the server
- get_feature_statistics()
Returns metadata statistics for the query
- get_statistics_summary(renderer=None)
Returns metadata statistics for the query
- initialize_from_dict(data)
Reads a json dict and populates a single query
- insert(renderer=None)
Calls the REST API and inserts a new query
- property label_column
Label columns to use in the query result
Note:Columns must correspond to actual project label column
- property metadata_columns
Metadata columns to include in the query result
Note:Columns must correspond to actual project metadata columns.
- property metadata_filter
Filter criteria of the query
- Parameters
value (str) – Similar to a SQL WHERE clause, the string can contain any number of AND-concatenated expressions where square brackets surround the column name and comparison value, with the operator in between. Supported operators: >, >=, <, <=, =, !=, IN
Examples:
metadata_filter = '[Subject] > [5] AND [Subject] <= [15]' metadata_filter = '[Gender] = [Female] AND [Activity] != [Walking]' metadata_filter = '[Subject] IN [5, 7, 9, 11, 13, 15]'
Note:Queries do not support OR-concatenation between expressions, but often the IN operator can be used to achieve OR-like functionality on a single column. For example:
[Gesture] IN [A, M, L]
is equivalent to:
[Gesture] = [A] OR [Gesture] = [M] OR [Gesture] = [L]
- property name
Name of the query
- plot_statistics(renderer=None, **kwargs)
Generates a bar plot of the query statistics
- post_feature_statistics(window_size=None)
Returns metadata statistics for the query
- refresh()
Calls the REST API and self populate using the uuid
- property segmenter
Segmenter to use for the query
- Parameters
value (int) – ID of segmenter
- size()
Returns the size of the DataFrame, which would result from the query
- statistics_segments(renderer=None)
Returns metadata statistics for the query
- property summary_statistics
Name of the query
- update(renderer=None)
Calls the REST API and updates the query object on the server