2.4.8 Feature Generators
A collection of feature generators work on a segment of data to extract meaningful information. The combination of the output from all feature generators becomes a feature vector.
Statistical
- Absolute Mean
Computes the arithmetic mean of absolute value in each column of columns in the dataframe.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Absolute Mean', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxAbsMean gen_0002_accelyAbsMean gen_0003_accelzAbsMean 0 s01 2.0 7.2 5.8
- Absolute Sum
Computes the cumulative sum of absolute values in each column in ‘columns’ in the dataframe.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Absolute Sum', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxAbsSum gen_0002_accelyAbsSum gen_0003_accelzAbsSum 0 s01 10.0 36.0 29.0
- Interquartile Range
The IQR (Inter Quartile Range) of a vector V with N items, is the difference between the 75th percentile and 25th percentile value.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Interquartile Range', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxIQR gen_0002_accelyIQR gen_0003_accelzIQR 0 s01 4.0 2.0 2.0
- Kurtosis
Kurtosis is the degree of ‘peakedness’ or ‘tailedness’ in the distribution and is related to the shape. A high Kurtosis portrays a chart with fat tail and peaky distribution, whereas a low Kurtosis corresponds to the skinny tails and the distribution is concentrated towards the mean. Kurtosis is calculated using Fisher’s method.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01 >>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Kurtosis', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxKurtosis gen_0002_accelyKurtosis gen_0003_accelzKurtosis 0 s01 -1.565089 -1.371972 -1.005478
- Linear Regression Stats
Calculate a linear least-squares regression and returns the linear regression stats which are slope, intercept, r value, standard error.
slope: Slope of the regression line. intercept: Intercept of the regression line. r value: Correlation coefficient. StdErr: Standard error of the estimated gradient.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> from pandas import DataFrame >>> df = pd.DataFrame({'Subject': ['s01'] * 10,'Class': ['Crawling'] * 10 ,'Rep': [1] * 10 }) >>> df["X"] = [i + 2 for i in range(10)] >>> df["Y"] = [i for i in range(10)] >>> df["Z"] = [1, 2, 3, 3, 5, 5, 7, 7, 9, 10] >>> print(df) out: Subject Class Rep X Y Z 0 s01 Crawling 1 2 0 1 1 s01 Crawling 1 3 1 2 2 s01 Crawling 1 4 2 3 3 s01 Crawling 1 5 3 3 4 s01 Crawling 1 6 4 5 5 s01 Crawling 1 7 5 5 6 s01 Crawling 1 8 6 7 7 s01 Crawling 1 9 7 7 8 s01 Crawling 1 10 8 9 9 s01 Crawling 1 11 9 10
>>> client.upload_dataframe('test_data', df, force=True) >>> client.pipeline.reset(delete_cache=True) >>> client.pipeline.set_input_data('test_data.csv', group_columns=['Subject','Rep'], label_column='Class', data_columns=['X','Y','Z']) >>> client.pipeline.add_feature_generator([{'name':'Linear Regression Stats', 'params':{"columns": ['X','Y','Z'] }}]) >>> results, stats = client.pipeline.execute() >>> print(results.T) out: 0 Rep 1 Subject s01 gen_0001_XLinearRegressionSlope 1 gen_0001_XLinearRegressionIntercept 2 gen_0001_XLinearRegressionR 1 gen_0001_XLinearRegressionStdErr 0 gen_0002_YLinearRegressionSlope 1 gen_0002_YLinearRegressionIntercept 0 gen_0002_YLinearRegressionR 1 gen_0002_YLinearRegressionStdErr 0 gen_0003_ZLinearRegressionSlope 0.982 gen_0003_ZLinearRegressionIntercept 0.782 gen_0003_ZLinearRegressionR 0.987 gen_0003_ZLinearRegressionStdErr 0.056
- Maximum
Computes the maximum of each column in ‘columns’ in the dataframe. A maximum of a vector V the maximum value in V.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Maximum', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxmaximum gen_0002_accelymaximum gen_0003_accelzmaximum 0 s01 3.0 9.0 8.0
- Mean
Computes the arithmetic mean of each column in columns in the dataframe.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Mean', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxMean gen_0002_accelyMean gen_0003_accelzMean 0 s01 0.0 7.2 5.8
- Median
The median of a vector V with N items, is the middle value of a sorted copy of V (V_sorted). When N is even, it is the average of the two middle values in V_sorted.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Median', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxMedian gen_0002_accelyMedian gen_0003_accelzMedian 0 s01 0.0 7.0 6.0
- Minimum
Computes the minimum of each column in ‘columns’ in the dataframe. A minimum of a vector V the minimum value in V.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Minimum', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxminimum gen_0002_accelyminimum gen_0003_accelzminimum 0 s01 -3.0 6.0 3.0
- Negative Zero Crossings
Computes the number of times the selected input crosses the mean+threshold and mean-threshold values with a negative slope. The threshold value is specified by the user. Crossing the mean value when the threshold is 0 only coutns as a single crossing.
- Parameters
columns – list of columns on which to apply the feature generator
threshold – value in addition to mean which must be crossed to count as a crossing
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
- 25th Percentile
Computes the 25th percentile of each column in ‘columns’ in the dataframe. A q-th percentile of a vector V of length N is the q-th ranked value in a sorted copy of V. If the normalized ranking doesn’t match the q exactly, interpolation is done on two nearest values.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'25th Percentile', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelx25Percentile gen_0002_accely25Percentile gen_0003_accelz25Percentile 0 s01 -2.0 6.0 5.0
- 75th Percentile
Computes the 75th percentile of each column in ‘columns’ in the dataframe. A q-th percentile of a vector V of length N is the q-th ranked value in a sorted copy of V. If the normalized ranking doesn’t match the q exactly, interpolation is done on two nearest values.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with 75th percentile of each specified column.
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'75th Percentile', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute() >>> print result out: Subject gen_0001_accelx75Percentile gen_0002_accely75Percentile gen_0003_accelz75Percentile 0 s01 2.0 8.0 7.0
- 100th Percentile
Computes the 100th percentile of each column in ‘columns’ in the dataframe. A 100th percentile of a vector V the maximum value in V.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns feature vector with 100th percentile (sample maximum) of each specified column.
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'100th Percentile', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute() >>> print result out: Subject gen_0001_accelx100Percentile gen_0002_accely100Percentile gen_0003_accelz100Percentile 0 s01 3.0 9.0 8.0
- Positive Zero Crossings
Computes the number of times the selected input crosses the mean+threshold and mean-threshold values with a positive slope. The threshold value is specified by the user. Crossing the mean value when the threshold is 0 only counts as a single crossing.
- Parameters
columns – list of columns on which to apply the feature generator
threshold – value in addition to mean which must be crossed to count as a crossing
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
- Skewness
The skewness is the measure of asymmetry of the distribution of a variable about its mean. The skewness value can be positive, negative, or even undefined. A positive skew indicates that the tail on the right side is fatter than the left. A negative value indicates otherwise.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> from pandas import DataFrame >>> df = DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns=['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Skewness', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxSkew gen_0002_accelySkew gen_0003_accelzSkew 0 s01 0.0 0.363174 -0.395871
- Standard Deviation
The standard deviation of a vector V with N items, is the measure of spread of the distribution. The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean(abs(x - x.mean())**2)).
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Standard Deviation', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxStd gen_0002_accelyStd gen_0003_accelzStd 0 s01 2.280351 1.16619 1.720465
- Sum
Computes the cumulative sum of each column in ‘columns’ in the dataframe.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Standard Deviation', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxSum gen_0002_accelySum gen_0003_accelzSum 0 s01 0.0 36.0 29.0
- Variance
Computes the variance of desired column(s) in the dataframe.
- Parameters
columns – list of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Variance', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute() >>> print result out: Subject gen_0001_accelxVariance gen_0002_accelyVariance gen_0003_accelzVariance 0 s01 6.5 1.7 3.7
- Zero Crossings
Computes the number of times the selected input crosses the mean+threshold and mean-threshold values. The threshold value is specified by the user. Crossing the mean value when the threshold is 0 only counts as a single crossing.
- Parameters
columns – list of columns on which to apply the feature generator
threshold – value in addition to mean which must be crossed to count as a crossing
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Zero Crossings', 'params':{"columns": ['accelx', 'accely', 'accelz'], "threshold: 5} }]) >>> result, stats = client.pipeline.execute()
Histogram
- Histogram
Translates to the data stream(s) from a segment into a feature vector in histogram space.
- Parameters
column (list of strings) – name of the sensor streams to use
range_left (int) – the left limit (or the min) of the range for a fixed bin histogram
range_right (int) – the right limit (or the max) of the range for a fixed bin histogram
number_of_bins (int , optional) – the number of bins used for the histogram
scaling_factor (int , optional) – scaling factor used to fit for the device
- Returns
feature vector in histogram space.
- Return type
DataFrame
Examples
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> print df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_feature_generator([{'name':'Histogram', 'params':{"columns": ['accelx','accely','accelz'], "range_left": 10, "range_right": 1000, "number_of_bins": 5, "scaling_factor": 254 }}]) >>> results, stats = client.pipeline.execute()
>>> print results out: Class Rep Subject gen_0000_hist_bin_000000 gen_0000_hist_bin_000001 gen_0000_hist_bin_000002 gen_0000_hist_bin_000003 gen_0000_hist_bin_000004 0 Crawling 1 s01 8.0 38.0 46.0 69.0 0.0 1 Running 1 s01 85.0 0.0 0.0 0.0 85.0
- Histogram Auto Scale Range
Translates to the data stream(s) from a segment into a feature vector in histogram space where the range is set by the min and max values and the number of bins by the user.
- Parameters
column (list of strings) – name of the sensor streams to use
number_of_bins (int , optional) – the number of bins used for the histogram
scaling_factor (int , optional) – scaling factor used to fit for the device
- Returns
feature vector in histogram space.
- Return type
DataFrame
Examples
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> print df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_feature_generator([{'name':'Histogram', 'params':{"columns": ['accelx','accely','accelz'], "range_left": 10, "range_right": 1000, "number_of_bins": 5, "scaling_factor": 254 }}]) >>> results, stats = client.pipeline.execute()
>>> print results out: Class Rep Subject gen_0000_hist_bin_000000 gen_0000_hist_bin_000001 gen_0000_hist_bin_000002 gen_0000_hist_bin_000003 gen_0000_hist_bin_000004 0 Crawling 1 s01 8.0 38.0 46.0 69.0 0.0 1 Running 1 s01 85.0 0.0 0.0 0.0 85.0
Sampling
- Downsample
This function takes input_data dataframe as input and group by group_columns. Then, for each group, it drops the passthrough_columns and perform downsampling on the remaining columns.
On each column, perform the following steps:
Divide the entire column into windows of size total length/new_length.
Calculate mean for each window
Concatenate all the mean values.
The length of the downsampled signal is equal to ‘new length’.
Then all such means are concatenated to get new_length * # of columns. These constitute features in downstream analyses. For instance, if there are three columns and the new_length value is 12, then total number of means is 12 * 3 = 36. Each will represent a feature.
- Parameters
columns – List of columns to be downsampled
new_length – integer; Downsampled length
- Returns
DataFrame; downsampled dataframe
Examples
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> print df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_feature_generator([{'name':'Downsample', 'params':{"columns": ['accelx'], "new_length": 5}}]) >>> results, stats = client.pipeline.execute()
>>> print results out: Class Rep Subject gen_0001_accelx_0 gen_0001_accelx_1 gen_0001_accelx_2 gen_0001_accelx_3 gen_0001_accelx_4 0 Crawling 1 s01 367.0 336.5 391.0 471.0 523.0 1 Running 1 s01 -45.5 -41.5 -50.0 -64.0 -64.0
- Downsample Average with Normalization
This function takes input_data dataframe as input and group by group_columns. Then, for each group, it drops the passthrough_columns and performs a convolution on the remaining columns.
On each column, perform the following steps:
Divide the entire column into windows of size total length/new_length.
Calculate mean for each window
Concatenate all the mean values into a feature vector of length new_length
Normalize the signal to be between 0-255
Then all such means are concatenated to get new_length * # of columns. These constitute features in downstream analyses. For instance, if there are three columns and the new_length value is 12, then total number of means is 12 * 3 = 36. Each will represent a feature.
- Parameters
input_data – dataframe
columns – List of columns to be downsampled
group_columns (a list) – List of columns on which grouping is to be done. Each group will go through downsampling one at a time
new_length – integer; Downsampled length
**kwargs –
- Returns
DataFrame; convolution avg dataframe
Examples
>>> from pandas import DataFrame >>> df = DataFrame([[3, 3], [4, 5], [5, 7], [4, 6], [3, 1], [3, 1], [4, 3], [5, 5], [4, 7], [3, 6]], columns=['accelx', 'accely']) >>> df Out: accelx accely 0 3 3 1 4 5 2 5 7 3 4 6 4 3 1 5 3 1 6 4 3 7 5 5 8 4 7 9 3 6 >>> client.pipeline.reset() >>> client.pipeline.set_input_data('test_data', df, force=True) >>> client.pipeline.add_feature_generator(["Downsample Average with Normalization"], params = {"group_columns": []}, function_defaults={"columns":['accelx', 'accely'], 'new_length' : 5}) >>> result, stats = client.pipeline.execute() >>> print result Out: accelx_1 accelx_2 accelx_3 accelx_4 accelx_5 accely_1 accely_2 0 3.5 4.5 3 4.5 3.5 4 6.5 accely_3 accely_4 accely_5 0 1 4 6.5
- Downsample Max With Normaliztion
This function takes input_data dataframe as input and group by group_columns. Then, for each group, it drops the passthrough_columns and performs a max downsampling on the remaining columns.
On each column, perform the following steps:
Divide the entire column into windows of size total length/new_length.
Calculate max value for each window
Concatenate all the max values into a feature vector of length new_length
Nomralize the signal to be between 0-255
Then all such means are concatenated to get new_length * # of columns. These constitute features in downstream analyses. For instance, if there are three columns and the new_length value is 12, then the total number of means is 12 * 3 = 36. Each will represent a feature.
- Parameters
input_data – dataframe
columns – List of columns to be downsampled
group_columns (a list) – List of columns on which grouping is to be done. Each group will go through downsampling one at a time
new_length – integer; Downsampled length
**kwargs –
- Returns
DataFrame; convolution avg dataframe
Examples
>>> from pandas import DataFrame >>> df = DataFrame([[3, 3], [4, 5], [5, 7], [4, 6], [3, 1], [3, 1], [4, 3], [5, 5], [4, 7], [3, 6]], columns=['accelx', 'accely']) >>> df Out: accelx accely 0 3 3 1 4 5 2 5 7 3 4 6 4 3 1 5 3 1 6 4 3 7 5 5 8 4 7 9 3 6 >>> client.pipeline.reset() >>> client.pipeline.set_input_data('test_data', df, force=True) >>> client.pipeline.add_feature_generator(["Downsample Max with Normalization"], params = {"group_columns": []}, function_defaults={"columns":['accelx', 'accely'], 'new_length' : 5}) >>> result, stats = client.pipeline.execute() >>> print result Out: accelx_1 accelx_2 accelx_3 accelx_4 accelx_5 accely_1 accely_2 0 3.5 4.5 3 4.5 3.5 4 6.5 accely_3 accely_4 accely_5 0 1 4 6.5
Rate of Change
- Mean Crossing Rate
Calculates the rate at which mean value is crossed for each specified column. Works with grouped data. The total number of mean value crossings are found and then the number is divided by total number of samples to get the mean_crossing_rate.
- Parameters
columns – The columns represents a list of all column names on which mean_crossing_rate is to be found.
- Returns
Return the number of mean crossings divided by the length of the signal.
- Return type
DataFrame
Examples
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> print df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class')
>>> client.pipeline.add_feature_generator([{'name':'Mean Crossing Rate', 'params':{"columns": ['accelx','accely', 'accelz']} }])
>>> results, stats = client.pipeline.execute() >>> print results out: Class Rep Subject gen_0001_accelxMeanCrossingRate gen_0002_accelyMeanCrossingRate gen_0003_accelzMeanCrossingRate 0 Crawling 1 s01 0.181818 0.090909 0.090909 1 Running 1 s01 0.090909 0.454545 0.363636
- Mean Difference
Calculate the mean difference of each specified column. Works with grouped data. For a given column, it finds difference of ith element and (i-1)th element and finally takes the mean value of the entire column.
mean(diff(arr)) = mean(arr[i] - arr[i-1]), for all 1 <= i <= n.
- Parameters
columns – The columns represents a list of all column names on which mean_difference is to be found.
- Returns
Return the number of mean difference divided by the length of the signal.
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Mean Difference', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxMeanDifference gen_0002_accelyMeanDifference gen_0003_accelzMeanDifference 0 s01 1.25 0.75 0.25
- Second Sigma Crossing Rate
Calculates the rate at which the second standard deviation value (second sigma) is crossed for each specified column. The total number of second sigma crossings are found and then the number is divided by total number of samples to get the second_sigma_crossing_rate.
- Parameters
columns – The columns represents a list of all column names on which second_sigma_crossing_rate is to be found.
- Returns
Return the second sigma crossing rate.
- Return type
DataFrame
Examples
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> print df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_feature_generator([{'name':'Second Sigma Crossing Rate', 'params':{"columns": ['accelx','accely', 'accelz']} }]) >>> results, stats = client.pipeline.execute()
>>> print results out: Class Rep Subject gen_0001_accelx2ndSigmaCrossingRate gen_0002_accely2ndSigmaCrossingRate gen_0003_accelz2ndSigmaCrossingRate 0 Crawling 1 s01 0.090909 0.090909 0.0 1 Running 1 s01 0.000000 0.000000 0.0
- Sigma Crossing Rate
Calculates the rate at which standard deviation value (sigma) is crossed for each specified column. The total number of sigma crossings are found and then the number is divided by total number of samples to get the sigma_crossing_rate.
- Parameters
columns – The columns represents a list of all column names on which sigma_crossing_rate is to be found.
- Returns
Return the sigma crossing rate.
- Return type
DataFrame
Examples
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> print df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_feature_generator([{'name':'Sigma Crossing Rate', 'params':{"columns": ['accelx','accely', 'accelz']} }]) >>> results, stats = client.pipeline.execute()
>>> print results out: Class Rep Subject gen_0001_accelxSigmaCrossingRate gen_0002_accelySigmaCrossingRate gen_0003_accelzSigmaCrossingRate 0 Crawling 1 s01 0.090909 0.0 0.0 1 Running 1 s01 0.000000 0.0 0.0
- Threshold Crossing Rate
The total number of threshold crossings are found, and the number is divided by total number of samples to get the threshold_crossing_rate.
- Threshold With Offset Crossing Rate
The total number of threshold crossings are found, and the number is divided by the total number of samples to get the threshold_crossing_rate.
- Zero Crossing Rate
Calculates the rate at which zero value is crossed for each specified column. The total number of zero crossings are found and then the number is divided by total number of samples to get the zero_crossing_rate.
- Parameters
columns – The columns represents a list of all column names on which zero_crossing_rate is to be found.
- Returns
A dataframe of containing zero crossing rate
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Zero Crossing Rate', 'params':{"columns": ['accelx', 'accely', 'accelz'] } }]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxZeroCrossingRate gen_0002_accelyZeroCrossingRate gen_0003_accelzZeroCrossingRate 0 s01 0.6 0.0 0.0
Frequency
- Dominant Frequency
Calculate the dominant frequency for each specified signal. For each column, find the frequency at which the signal has highest power.
Note: The current FFT length is 512, data larger than this will be truncated. Data smaller than this will be zero-padded.- Parameters
columns – List of columns on which dominant_frequency needs to be calculated
- Returns
DataFrame of dominant_frequency for each column and the specified group_columns
Examples
>>> import matplotlib.pyplot as plt >>> import numpy as np
>>> sample = 100 >>> df = pd.DataFrame() >>> df = pd.DataFrame({ 'Subject': ['s01'] * sample , 'Class': ['0'] * (sample/2) + ['1'] * (sample/2) }) >>> x = np.arange(sample) >>> fx = 2; fy = 3; fz = 5 >>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample ) >>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample ) >>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample ) >>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:].tolist()
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject','Class'] )
>>> client.pipeline.add_feature_generator([{'name':'Dominant Frequency', 'params':{"columns": ['accelx', 'accely', 'accelz' ], "sample_rate" : sample } }])
>>> result, stats = client.pipeline.execute() >>> print result out: Class Subject gen_0001_accelxDomFreq gen_0002_accelyDomFreq gen_0003_accelzDomFreq 0 0 s01 22.0 28.0 34.0 1 1 s01 22.0 26.0 52.0
- MFCC
Translates the data stream(s) from a segment into a feature vector of Mel-Frequency Cepstral Coefficients (MFCC). The features are derived in the frequency domain that mimic human auditory response.
Note: the current FFT length is 512, data larger than this will be truncated. Data smaller than this will be zero-padded.- Parameters
columns (list of strings) – names of the sensor streams to use
sample_rate (int) – sampling rate
cepstra_count (int) – number of coefficients to generate
- Returns
feature vector of MFCC coefficients.
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':'MFCC', 'params':{"columns": ['accelx'], "sample_rate": 10, "cepstra_count": 23 }}]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxmfcc_000000 gen_0001_accelxmfcc_000001 ... gen_0001_accelxmfcc_000021 gen_0001_accelxmfcc_000022 0 s01 131357.0 -46599.0 ... 944.0 308.0
- MFE
- Translates the data stream(s) from a segment into a feature vector of Mel-Frequency Filterbanks
(MFE). The features are derived in the frequency domainNote: the current FFT length is 512, data larger than this will be truncated. Data smaller than this will be zero-padded.
- Parameters
-
-
columns (list of strings) – names of the sensor streams to use
-
num_filters (int) – number of mfe coefficients to generate
-
- Returns
-
feature vector of MFE coefficients.
- Return type
-
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':'MFE', 'params':{"columns": ['accelx'], "num_filters": 23 }}]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxmfe_000000 gen_0001_accelxmfe_000001 ... gen_0001_accelxmfe_000021 gen_0001_accelxmfe_000022 0 s01 131357.0 -46599.0 ... 944.0 308.0
- Peak Frequencies
Calculate the peak frequencies for each specified signal. For each column, find the frequencies at which the signal has highest power.
Note: the current FFT length is 512, data larger than this will be truncated. Data smaller than this will be zero-padded.The FFT is computed and the cuttoff frequency is converted to a bin based on the following formula
fft_min_bin_index = (min_freq * FFT_length / sample_rate); fft_max_bin_index = (max_freq * FFT_length / sample_rate);
- Parameters
columns – List of columns on which dominant_frequency needs to be calculated
sample_rate – sample rate of the sensor data
window_type – hanning
num_peaks – the number of peaks to identify
min_frequency – the min frequency bound to look for peaks
max_frequency – the max frequency bound to look for peaks
threshold – the threshold value a peak must be above to be considered a peak
- Returns
DataFrame of peak frequencies for each column and the specified group_columns
- Power Spectrum
Calculate the power spectrum for the signal, the resulting power spectrum will be binned into number_of_bins
Note: the current FFT length is 512, data larger than this will be truncated. Data smaller than this will be zero-padded.- Parameters
columns – List of columns on to use for the frequency calculation
window_type – hanning
number_of_bins – number of bins to reduce the FFT to
- Returns
DataFrame of power spectrum for each column and the specified group_columns
- Spectral Entropy
Calculate the spectral entropy for each specified signal. For each column, first calculate the power spectrum, and then using the power spectrum, calculate the entropy in the spectral domain. Spectral entropy measures the spectral complexity of the signal.
Note: the current FFT length is 512, data larger than this will be truncated. Data smaller than this will be zero-padded.- Parameters
columns – List of all columns for which spectral_entropy is to be calculated
- Returns
DataFrame of spectral_entropy for each column and the specified group_columns
Examples
>>> import matplotlib.pyplot as plt >>> import numpy as np
>>> sample = 100 >>> df = pd.DataFrame() >>> df = pd.DataFrame({ 'Subject': ['s01'] * sample , 'Class': ['0'] * (sample/2) + ['1'] * (sample/2) }) >>> x = np.arange(sample) >>> fx = 2; fy = 3; fz = 5 >>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample ) >>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample ) >>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample ) >>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:].tolist()
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject','Class'] )
>>> client.pipeline.add_feature_generator([{'name':'Spectral Entropy', 'params':{"columns": ['accelx', 'accely', 'accelz' ]} }])
>>> result, stats = client.pipeline.execute() >>> print result out: Class Subject gen_0001_accelxSpecEntr gen_0002_accelySpecEntr gen_0003_accelzSpecEntr 0 0 s01 1.97852 1.983631 1.981764 1 1 s01 1.97852 2.111373 2.090683
Shape
- Global Peak to Peak of High Frequency
Global peak to peak of high frequency. The high frequency signal is calculated by subtracting the moving average filter output from the original signal.
- Parameters
smoothing_factor (int) – over the cutoff frequency. The number of elements in individual columns should be al least three times the smoothing factor.
columns – List of str; Set of columns on which to apply the feature generator
- Returns
DataFrame of global p2p high frequency for each column and the specified group_columns
Examples
>>> import numpy as np >>> sample = 100 >>> df = pd.DataFrame() >>> df = pd.DataFrame({ 'Subject': ['s01'] * sample , 'Class': ['0'] * (sample/2) + ['1'] * (sample/2) }) >>> x = np.arange(sample) >>> fx = 2; fy = 3; fz = 5 >>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample ) >>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample ) >>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample ) >>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject','Class'] )
>>> client.pipeline.add_feature_generator([{'name':'Global Peak to Peak of High Frequency', 'params':{"smoothing_factor": 5, "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = client.pipeline.execute() >>> print result out: Class Subject gen_0001_accelxMaxP2PGlobalAC gen_0002_accelyMaxP2PGlobalAC gen_0003_accelzMaxP2PGlobalAC 0 0 s01 3.6 7.8 86.400002 1 1 s01 3.6 7.8 165.000000
- Global Peak to Peak of Low Frequency
Global peak to peak of low frequency. The low frequency signal is calculated by applying a moving average filter with a smoothing factor.
- Parameters
smoothing_factor (int) – frequencies over the cutoff frequency.
columns – List of str; Set of columns on which to apply the feature generator
- Returns
DataFrame of global p2p low frequency for each column and the specified group_columns
Examples
>>> import numpy as np >>> sample = 100 >>> df = pd.DataFrame() >>> df = pd.DataFrame({ 'Subject': ['s01'] * sample , 'Class': ['0'] * (sample/2) + ['1'] * (sample/2) }) >>> x = np.arange(sample) >>> fx = 2; fy = 3; fz = 5 >>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample ) >>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample ) >>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample ) >>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject','Class'] )
>>> client.pipeline.add_feature_generator([{'name':'Global Peak to Peak of Low Frequency', 'params':{"smoothing_factor": 5, "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = client.pipeline.execute() >>> print result out: Class Subject gen_0001_accelxMaxP2PGlobalDC gen_0002_accelyMaxP2PGlobalDC gen_0003_accelzMaxP2PGlobalDC 0 0 s01 195.600006 191.800003 187.000000 1 1 s01 195.600006 191.800003 185.800003
- Max Peak to Peak of first half of High Frequency
Max Peak to Peak of first half of High Frequency. The high frequency signal is calculated by subtracting the moving average filter output from the original signal.
- Parameters
smoothing_factor (int) – frequencies over the cutoff frequency.
columns – List of str; Set of columns on which to apply the feature generator
- Returns
DataFrame of max p2p half high frequency for each column and the specified group_columns
Examples
>>> import numpy as np >>> sample = 100 >>> df = pd.DataFrame() >>> df = pd.DataFrame({ 'Subject': ['s01'] * sample , 'Class': ['0'] * (sample/2) + ['1'] * (sample/2) }) >>> x = np.arange(sample) >>> fx = 2; fy = 3; fz = 5 >>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample ) >>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample ) >>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample ) >>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject','Class'] )
>>> client.pipeline.add_feature_generator([{'name':'Max Peak to Peak of first half of High Frequency', 'params':{"smoothing_factor": 5, "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = client.pipeline.execute() >>> print result out: Class Subject gen_0001_accelxMaxP2P1stHalfAC gen_0002_accelyMaxP2P1stHalfAC gen_0003_accelzMaxP2P1stHalfAC 0 0 s01 1.8 7.0 1.8 1 1 s01 1.8 7.0 20.0
- Global Min Max Sum
This function is the sum of the maximum and minimum values. It is also used as the ‘min max amplitude difference’.
- Parameters
columns – (list of str): Set of columns on which to apply the feature generator
- Returns
DataFrame of min max sum for each column and the specified group_columns
Examples
>>> from pandas import DataFrame >>> df = DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns=['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] )
>>> client.pipeline.add_feature_generator([{'name':'Global Min Max Sum', 'params':{"columns": ['accelx','accely','accelz'] }}])
>>> result, stats = client.pipeline.execute() >>> print result out: Subject gen_0001_accelxMinMaxSum gen_0002_accelyMinMaxSum gen_0003_accelzMinMaxSum 0 s01 0.0 15.0 11.0
- Global Peak to Peak
Global Peak to Peak of signal.
- Parameters
columns – (list of str): Set of columns on which to apply the feature generator
- Returns
DataFrame of peak to peak for each column and the specified group_columns
Examples
>>> from pandas import DataFrame >>> df = DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns=['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] )
>>> client.pipeline.add_feature_generator([{'name':'Global Peak to Peak', 'params':{"columns": ['accelx','accely','accelz'] }}])
>>> result, stats = client.pipeline.execute() >>> print result out: Subject gen_0001_accelxP2P gen_0002_accelyP2P gen_0003_accelzP2P 0 s01 6.0 3.0 5.0
- Shape Absolute Median Difference
Computes the absolute value of the difference in median between the first and second half of a signal
- Parameters
columns – list of columns on which to apply the feature generator
center_ratio – ratio of the signal to be on the first half to second half
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Shape Absolute Median Difference', 'params':{"columns": ['accelx', 'accely', 'accelz'], "center_ratio": 0.5} }]) >>> result, stats = client.pipeline.execute() >>> print result out: Subject gen_0001_accelxShapeAbsoluteMedianDifference gen_0002_accelyShapeAbsoluteMedianDifference gen_0003_accelzShapeAbsoluteMedianDifference 0 s01
- Difference of Peak to Peak of High Frequency between two halves
Difference of peak to peak of high frequency between two halves. The high frequency signal is calculated by subtracting the moving average filter output from the original signal.
- Parameters
smoothing_factor (int) – frequencies over the cutoff frequency. The number of elements in individual columns should be at lest three times the smoothing factor.
columns – List of str; Set of columns on which to apply the feature generator
- Returns
DataFrame of difference high frequency for each column and the specified group_columns
Examples
>>> import numpy as np >>> sample = 100 >>> df = pd.DataFrame() >>> df = pd.DataFrame({ 'Subject': ['s01'] * sample , 'Class': ['0'] * (sample/2) + ['1'] * (sample/2) }) >>> x = np.arange(sample) >>> fx = 2; fy = 3; fz = 5 >>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample ) >>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample ) >>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample ) >>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject','Class'] )
>>> client.pipeline.add_feature_generator([{'name':'Difference of Peak to Peak of High Frequency between two halves', 'params':{"smoothing_factor": 5, "columns": ['accelz'] }}])
>>> result, stats = client.pipeline.execute() >>> print result out: Class Subject gen_0001_accelzACDiff 0 0 s01 -5.199997 1 1 s01 13.000000
- Shape Median Difference
Computes the difference in median between the first and second half of a signal
- Parameters
columns – list of columns on which to apply the feature generator
center_ratio – ratio of the signal to be on the first half to second half
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'] ) >>> client.pipeline.add_feature_generator([{'name':'Shape Median Difference', 'params':{"columns": ['accelx', 'accely', 'accelz'], "center_ratio: 0.5} }]) >>> result, stats = client.pipeline.execute() >>> print result out: Subject gen_0001_accelxShapeMedianDifference gen_0002_accelyShapeMedianDifference gen_0003_accelzShapeMedianDifference 0 s01
- Ratio of Peak to Peak of High Frequency between two halves
Ratio of peak to peak of high frequency between two halves. The high frequency signal is calculated by subtracting the moving average filter output from the original signal.
- Parameters
smoothing_factor (int) – frequencies over the cutoff frequency. of elements in individual columns should be al least three times the smoothing factor.
columns – List of str; Set of columns on which to apply the feature generator
- Returns
DataFrame of ratio high frequency for each column and the specified group_columns
Examples
>>> import numpy as np >>> sample = 100 >>> df = pd.DataFrame() >>> df = pd.DataFrame({ 'Subject': ['s01'] * sample , 'Class': ['0'] * (sample/2) + ['1'] * (sample/2) }) >>> x = np.arange(sample) >>> fx = 2; fy = 3; fz = 5 >>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample ) >>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample ) >>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample ) >>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject','Class'] )
>>> client.pipeline.add_feature_generator([{'name':'Ratio of Peak to Peak of High Frequency between two halves', 'params':{"smoothing_factor": 5, "columns": ['accelz'] }}])
>>> result, stats = client.pipeline.execute() >>> print result out: Class Subject gen_0001_accelzACRatio 0 0 s01 3.888882 1 1 s01 0.350000
Time
- Abs Percent Time Over Threshold
Percentage of absolute value of samples in the series that are above the offset
- Average Time Over Threshold
Average of the time spent above threshold for all times crossed.
- Percent Time Over Second Sigma
Percentage of samples in the series that are above the sample mean + two sigma
- Parameters
columns – List of str; Set of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> print df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_feature_generator([{'name':'Percent Time Over Second Sigma', 'params':{"columns": ['accelx','accely','accelz'] }}]) >>> results, stats = client.pipeline.execute()
>>> print results out: Class Rep Subject gen_0001_accelxPctTimeOver2ndSigma gen_0002_accelyPctTimeOver2ndSigma gen_0003_accelzPctTimeOver2ndSigma 0 Crawling 1 s01 0.0 0.090909 0.090909 1 Running 1 s01 0.0 0.000000 0.000000
- Percent Time Over Sigma
Percentage of samples in the series that are above the sample mean + one sigma
- Parameters
columns – List of str; Set of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> print df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_feature_generator([{'name':'Percent Time Over Sigma', 'params':{"columns": ['accelx','accely','accelz'] }}]) >>> results, stats = client.pipeline.execute()
>>> print results out: Class Rep Subject gen_0001_accelxPctTimeOverSigma gen_0002_accelyPctTimeOverSigma gen_0003_accelzPctTimeOverSigma 0 Crawling 1 s01 0.181818 0.090909 0.090909 1 Running 1 s01 0.272727 0.090909 0.272727
- Percent Time Over Threshold
Percentage of samples in the series that are above threshold
- Parameters
columns – List of str; Set of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> print df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_feature_generator([{'name':'Percent Time Over Threshold', 'params':{"columns": ['accelx','accely','accelz'] }}]) >>> results, stats = client.pipeline.execute()
- Percent Time Over Zero
Percentage of samples in the series that are positive.
- Parameters
columns – List of str; Set of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> print df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_feature_generator([{'name':'Percent Time Over Zero', 'params':{"columns": ['accelx','accely','accelz'] }}]) >>> results, stats = client.pipeline.execute()
>>> print results out: Class Rep Subject gen_0001_accelxPctTimeOverZero gen_0002_accelyPctTimeOverZero gen_0003_accelzPctTimeOverZero 0 Crawling 1 s01 0.909091 1.0 1.0 1 Running 1 s01 0.000000 0.0 1.0
- Duration of the Signal
Duration of the signal. It is calculated by dividing the length of vector by the sampling rate.
- Parameters
sample_rate – float; Sampling rate
columns – List of str; Set of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':'Duration of the Signal', 'params':{"columns": ['accelx'] , "sample_rate": 10 }}]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxDurSignal 0 s01 0.5
Area
- Absolute Area of High Frequency
Absolute area of high frequency components of the signal. It calculates absolute area by applying a moving average filter on the signal with a smoothing factor and subtracting the filtered signal from the original.
- Parameters
sample_rate – float; Sampling rate of the signal
smoothing_factor (int) – Determines the amount of attenuation for frequencies over the cutoff frequency.
columns – List of str; Set of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':'Absolute Area of High Frequency', 'params':{"sample_rate": 10, "smoothing_factor": 5, "columns": ['accelx','accely','accelz'] }}]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxAbsAreaAc gen_0002_accelyAbsAreaAc gen_0003_accelzAbsAreaAc 0 s01 76.879997 800.099976 470.160004
- Absolute Area of Low Frequency
Absolute area of low frequency components of the signal. It calculates absolute area by first applying a moving average filter on the signal with a smoothing factor.
- Parameters
sample_rate – float; Sampling rate of the signal
smoothing_factor (int) – over the cutoff frequency.
columns – List of str; Set of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':'Absolute Area of Spectrum', 'params':{"sample_rate": 10, "columns": ['accelx','accely','accelz'] }}]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxAbsAreaSpec gen_0002_accelyAbsAreaSpec gen_0003_accelzAbsAreaSpec 0 s01 260.0 2660.0 1830.0
- Absolute Area of Spectrum
Absolute area of spectrum.
- Parameters
sample_rate – Sampling rate of the signal
columns – List of str; Set of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Absolute Area of Spectrum', 'params':{"sample_rate": 10, "columns": ['accelx','accely','accelz'] }}]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxAbsAreaSpec gen_0002_accelyAbsAreaSpec gen_0003_accelzAbsAreaSpec 0 s01 260.0 2660.0 1830.0
- Total Area
Total area under the signal. Total area = sum(signal(t)*dt), where signal(t) is signal value at time t, and dt is sampling time (dt = 1/sample_rate).
- Parameters
sample_rate – Sampling rate of the signal
columns – List of str; Set of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Total Area', 'params':{"sample_rate": 10, "columns": ['accelx','accely','accelz'] }}]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen0001_accelxTotArea gen_0002_accelyTotArea gen_0003_accelzTotArea 0 s01 0.0 3.6 2.9
- Total Area of High Frequency
Total area of high frequency components of the signal. It calculates total area by applying a moving average filter on the signal with a smoothing factor and subtracting the filtered signal from the original.
- Parameters
sample_rate – float; Sampling rate of the signal
smoothing_factor (int) – Determines the amount of attenuation for frequencies over the cutoff frequency.
columns – List of str; Set of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':'Total Area of High Frequency', 'params':{"sample_rate": 10, "smoothing_factor": 5, "columns": ['accelx','accely','accelz'] }}]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxTotAreaAc gen_0002_accelyTotAreaAc gen_0003_accelzTotAreaAc 0 s01 0.0 0.12 0.28
- Total Area of Low Frequency
Total area of low frequency components of the signal. It calculates total area by first applying a moving average filter on the signal with a smoothing factor.
- Parameters
sample_rate – float; Sampling rate of the signal
smoothing_factor (int) – frequencies over the cutoff frequency.
columns – List of str; Set of columns on which to apply the feature generator
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':'Total Area of Low Frequency', 'params':{"sample_rate": 10, "smoothing_factor": 5, "columns": ['accelx','accely','accelz'] }}]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0001_accelxTotAreaDc gen_0002_accelyTotAreaDc gen_0003_accelzTotAreaDc 0 s01 0.0 0.72 0.58
Energy
- Average Demeaned Energy
Average Demeaned Energy.
Calculate the element-wise demeaned by its column average of the input columns.
Sum the squared components across each column for the total demeaned energy per sample.
Take the average of the sum of squares to get the average demeaned energy.
- Parameters
columns – List of str; The columns represents a list of all column names on which average_energy is to be found.
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':'Average Demeaned Energy', 'params':{ "columns": ['accelx','accely','accelz'] }}]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0000_AvgDemeanedEng 0 s01 9.52
- Average Energy
Average Energy.
Calculate the element-wise square of the input columns.
Sum the squared components across each column for the total energy per sample.
Take the average of the sum of squares to get the average energy.
- Parameters
columns – List of str; The columns represents a list of all column names on which average_energy is to be found.
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print(df) out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':'Average Energy', 'params':{ "columns": ['accelx','accely','accelz'] }}]) >>> result, stats = client.pipeline.execute()
>>> print(result) out: Subject gen_0000_AvgEng 0 s01 95.0
- Total Energy
Total Energy.
Calculate the element-wise abs sum of the input columns.
Sum the energy values over all streams to get the total energy.
- Parameters
columns – List of str; The columns represents a list of all column names on which total energy is to be found.
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print df out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':'Total Energy', 'params':{ "columns": ['accelx','accely','accelz'] }}]) >>> result, stats = client.pipeline.execute()
>>> print result out: Subject gen_0000_TotEng 0 s01 475.0
Physical
- Average of Movement Intensity
Calculates the average movement intensity defined by:
\[\frac{1}{N}\sum_{i=1}^{N} \sqrt{x_{i}^2 + y_{i}^2 + .. n_{i}^2}\]- Parameters
columns (list) – list of columns to calculate average movement intensity.
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print(df) out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':'Average of Movement Intensity', 'params':{ "columns": ['accelx','accely','accelz'] }}]) >>> result, stats = client.pipeline.execute()
>>> print(result) out: Subject gen_0000_AvgInt 0 s01 9.0
- Average Signal Magnitude Area
Average signal magnitude area.
\[\frac{1}{N}\sum_{i=1}^{N} {x_{i} + y_{i} + .. n_{i}}\]- Parameters
columns – List of str; The columns represents a list of all column names on which average_energy is to be found.
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print(df) out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':"Average Signal Magnitude Area", 'params':{ "columns": ['accelx','accely','accelz'] }}]) >>> result, stats = client.pipeline.execute()
>>> print(result) out: Subject gen_0000_AvgSigMag s01 13.0
- Variance of Movement Intensity
Variance of movement intensity
- Parameters
columns – List of str; The columns represents a list of all column names on which average_energy is to be found.
- Returns
Returns data frame with specified column(s).
- Return type
DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns= ['accelx', 'accely', 'accelz']) >>> df['Subject'] = 's01' >>> print(df) out: accelx accely accelz Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01
>>> client.pipeline.reset(delete_cache=False) >>> client.pipeline.set_input_data('test_data', df, force=True, data_columns = ['accelx', 'accely', 'accelz'], group_columns = ['Subject']) >>> client.pipeline.add_feature_generator([{'name':'Variance of Movement Intensity', 'params':{ "columns": ['accelx','accely','accelz'] }}]) >>> result, stats = client.pipeline.execute()
>>> print(result) out: Subject gen_0000_VarInt 0 s01 3.082455
- library.core_functions.feature_generators.fg_physical.magnitude(input_data, input_columns)
Computes the magnitude of each column in a dataframe
Sensor Fusion
- Abs Max Column
Returns the index of the column with the max abs value for each segment.
- Parameters
input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use
- Returns
feature vector with index of max abs value column.
- Return type
DataFrame
- Cross Column Correlation
Compute the correlation of the slopes between two columns.
- Parameters
input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use
sample_frequency (int) – frequency to sample correlation at. Default 1 which is every sample
- Returns
feature vector mean difference
- Return type
DataFrame
- Max Column
Returns the index of the column with the max value for each segment.
- Parameters
input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use
- Returns
feature vector with index of max column.
- Return type
DataFrame
- Cross Column Mean Crossing Rate
Compute the crossing rate of column 2 of over the mean of column 1
- Parameters
input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use (requires 2 inputs)
- Returns
feature vector mean crossing rate
- Return type
DataFrame
- Cross Column Mean Crossing with Offset
Compute the crossing rate of column 2 of over the mean of column 1
- Parameters
input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use (requires 2 inputs)
- Returns
feature vector mean crossing rate
- Return type
DataFrame
- Two Column Mean Difference
Compute the mean difference between two columns.
- Parameters
input_data (DataFrame) – input data
columns (list of c strings) – name of the sensor streams to use
- Returns
feature vector mean difference
- Return type
DataFrame
- Two Column Median Difference
Compute the median difference between two columns.
- Parameters
input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use
- Returns
feature vector median difference
- Return type
DataFrame
- Min Column
Returns the index of the column with the min value for each segment.
- Parameters
input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use
- Returns
feature vector with index of max abs value column.
- Return type
DataFrame
- Two Column Min Max Difference
Compute the min max difference between two columns. Computes the location of the min value for each of the two columns, whichever one larger, it computes the difference between the two at that index.
- Parameters
input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use
- Returns
feature vector difference of two columns
- Return type
DataFrame
- Two Column Peak To Peak Difference
Compute the max value for each column, then subtract the first column for the second.
- Parameters
input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use
- Returns
feature vector mean difference
- Return type
DataFrame
- Two Column Peak Location Difference
- Computes the location of the maximum value for each column and then finds the difference
between those two points.
- Parameters
input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use
- Returns
feature vector mean difference
- Return type
DataFrame