2.3.19 CSV Upload
In some cases, you may already have a CSV file that you want to use. This can be raw sensor data with labels or features that you have cached. This is separated into DataFiles and FeatureFiles, which is explained below.
DataFiles
A DataFile allows you to upload sensor data into a pipeline for testing, rather than using Project Capture data. Files must be in CSV format. Using a DataFile is convenient when:
You have test data that you want to test against a model without adding the file to your Project Capture list.
Examples:
client.list_datafiles()
# if you want to upload directly from a csv file, force=True overwrites the file on the server if it exists.
client.upload_data_file(name, path, force=True)
# if you already have a dataframe
client.upload_dataframe(name, dataframe)
FeatureFiles
A FeatureFile can be used to directly load data into a pipeline, rather than querying Project Capture data. Files must be in CSV format. Using FeatureFiles is convenient when:
You want to cache features locally, then use those as input into the training algorithm, so you can avoid running previous steps in the pipeline.
Examples:
client.list_featurefiles()
# if you want to upload directly from a csv file
client.upload_feature_file(name, path)
# if you already have a dataframe
client.upload_dataframe(name, dataframe)
- class mplabml.datamanager.featurefiles. FeatureFiles(connection, project)
Base class for a collection of FeatureFiles.
- build_datafile_list()
Populates the function_list property from the server
- build_featurefile_list()
Populates the function_list property from the server
- build_full_list()
Populates the function_list property from the server
- create_featurefile(filename, path, is_features=False, label_column='')
Creates a featurefile object from the filename and path
- Parameters
filename (str) – Desired name of the featurefile on the server, must have a .csv or .arff extension
path (str) – Full local path to the file, including the file’s local name and extension
- Returns
featurefile object
- Raises
FeatureFileExistsError , if the featurefile already exists on the server –
- get_by_name(filename)
Gets a FeatureFile or DataFile from the server referenced by name
- Parameters
filename – Name of the featurefile as stored on the server
- Returns
featurefile object or None if it does not exist
- get_featurefile(uuid)
Gets a list of all FeatureFiles in the project
- Returns
list (featurefiles)
- get_featurefiles()
Gets a list of all featurefiles in the project
- Returns
list (featurefiles)
- new_featurefile()
Initializes a new featurefile object but does not insert it
- class mplabml.datamanager.featurefile. FeatureFile(connection, project, name='', path='', is_features=True, uuid=None, label_column='', number_rows=None)
Base class for a featurefile object
- compute_analysis(analysis_type='UMAP', **kwargs)
Calls the REST API to compute the analysis for the feature file
- Parameters
analysis_type (str) – The type of clustering analysis, i.e., UMAP (default), TSNE and PCA.
- Kwargs:
shuffle_seed (int): Random seed to shuffle and resample feature vector analysis_seed (int): Random state of the analysis (default is 0) n_neighbor (int): The size of local neighborhood (in terms of number of neighboring sample points) used for manifold approximation. If not specified, default is the number of unique labels. n_components (int): The dimension of the output result. Default is 2. n_components is adjusted based on the method, dimension of the feature vector and number of samples n_sample (int): Maximum number of output samples. Default is 1000.
- Returns
A JSON response containing the metadata of the generated analysis
Example
>>> feature_file = client.get_featurefile(<feature-file uuid>) >>> response = feature_file.compute_analysis(analysis_type="PCA", shuffle_seed=13, n_components=5) >>> response.json()
- property created_at
Date of the Pipeline creation
- delete()
Calls the REST API and deletes the featurefile from the server
- download()
Calls the REST API and retrieves the FeatureFile's binary data
- Returns
featurefile contents
- download_json()
Calls the REST API and retrieves the FeatureFile's json data
- Returns
FeatureFile contents as json
- property filename
The name of the file as stored on the server
Note:Filename must contain a .csv or .arff extension
- insert()
Calls the REST API to insert a new FeatureFile
- property is_features
If this is a DataFile or FeatureFile
- list_analysis()
Calls the REST API and retrieve list of computed analysis for FeatureFile
- Returns
JSON response holding the list of all computed analysis
- refresh()
Calls the REST API and populate the FeatureFile's properties from the server
- update()
Calls the REST API to update the FeatureFile's properties on the server