2.4.4 Segmenters
Takes input from the sensor transform/filter step and buffers the data until a segment is found.
- Windowing
- Parameters
input_data (_type_) – _description_
group_columns (_type_) – _description_
window_size (_type_) – _description_
delta (_type_) – _description_
train_delta (int , optional) – _description_. Defaults to 0.
return_segment_index (bool , optional) – _description_. Defaults to False.
- Windowing Threshold Segmentation
This function transfer the input_data and group_column from the previous pipeline block. This is a single pass threshold segmentation algorithm which transforms a window of the data stream that defined with ‘threshold_space_width’ into threshold space. The threshold space can be computed as ‘standard deviation’(std), ‘sum’, ‘absolute sum’, ‘absolute average’ and ‘variance’. The vt threshold is then compared against the calculated value with a comparison type of ">=". Once the threshold space is detected above the vt_threshold that becomes the anchor point. The segment starts at the index of the detected point minus a user specified offset. The end of the segment is immediately set to the window size.
- Parameters
column_of_interest (str) – name of the stream to use for segmentation
window_size (int) – number of samples in the window (default is 100)
offset (int) – The offset from the anchor point and the start of the segment. for a offset of 0, the start of the window will start at the anchor point. ( default is 0)
vt_threshold (int) – vt_threshold value which determines the segment.
threshold_space_width (int) – Size of the threshold buffer.
threshold_space (str) – Threshold transformation space. (std, sum, absolute sum, variance, absolute avg)
comparison (str) – the comparison between threshold space and vertical threshold (>=, <=)
return_segment_index (False) – Set to true to see the segment indexes for start and end.
- Returns
The segmented result will have a new column called SegmentID that contains the segment IDs.
- Return type
DataFrame
Example
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class')
>>> client.pipeline.add_transform("Windowing Threshold Segmentation", params={"column_of_interest": 'accelx', "window_size": 5, "offset": 0, "vt_threshold": 0.05, "threshold_space_width": 4, "threshold_space": 'std', "return_segment_index": False })
>>> results, stats = client.pipeline.execute() >>> print results out: Class Rep SegmentID Subject accelx accely accelz 0 Crawling 1 0 s01 377 569 4019 1 Crawling 1 0 s01 357 594 4051 2 Crawling 1 0 s01 333 638 4049 3 Crawling 1 0 s01 340 678 4053 4 Crawling 1 0 s01 372 708 4051 5 Crawling 1 1 s01 410 733 4028 6 Crawling 1 1 s01 450 733 3988 7 Crawling 1 1 s01 492 696 3947 8 Crawling 1 1 s01 518 677 3943 9 Crawling 1 1 s01 528 695 3988 10 Running 1 0 s01 -44 -3971 843 11 Running 1 0 s01 -47 -3982 836 12 Running 1 0 s01 -43 -3973 832 13 Running 1 0 s01 -40 -3973 834 14 Running 1 0 s01 -48 -3978 844 15 Running 1 1 s01 -52 -3993 842 16 Running 1 1 s01 -64 -3984 821 17 Running 1 1 s01 -64 -3966 813 18 Running 1 1 s01 -66 -3971 826 19 Running 1 1 s01 -62 -3988 827
- Max Min Threshold Segmentation
This is a max min threshold segmentation algorithm which transforms a window of the data stream of size threshold_space_width into threshold space. This function transfer the input_data and group_column from the previous pipeline block.
The threshold space can be computed as standard deviation, sum, absolute sum, absolute average and variance. The vt threshold is then compared against the calculated value with a comparison type of ">=" for the start of the segment and "<=" for the end of the segment. This algorithm is a two pass detection. The first pass detects the start of the segment. The second pass detects the end of the segment.
- Parameters
column_of_interest (str) – name of the stream to use for segmentation
max_segment_length (int) – number of samples in the window (default is 100)
min_segment_length – The smallest segment allowed.
threshold_space_width (float) – number of samples to check for being above the vt_threshold before forgetting segment.
threshold_space (std) – Threshold transformation space. (std, sum, absolute sum, variance, absolute avg)
first_vt_threshold (int) – vt_threshold value to begin detecting a segment
second_vt_threshold (int) – vt_threshold value to detect a segments end.
return_segment_index (False) – set to true to see the segment indexes for start and end.
- Returns
The segmented result will have a new column called SegmentID that contains the segment IDs.
- Return type
DataFrame
Example
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class')
>>> client.pipeline.add_transform("Max Min Threshold Segmentation", params={ "column_of_interest": 'accelx', "max_segment_length": 5, "min_segment_length": 5, "threshold_space_width": 3, "threshold_space": 'std', "first_vt_threshold": 0.05, "second_vt_threshold": 0.05, "return_segment_index": False})
>>> results, stats = client.pipeline.execute() >>> print results out: Class Rep SegmentID Subject accelx accely accelz 0 Crawling 1 0 s01 377 569 4019 1 Crawling 1 0 s01 357 594 4051 2 Crawling 1 0 s01 333 638 4049 3 Crawling 1 0 s01 340 678 4053 4 Crawling 1 0 s01 372 708 4051 5 Running 1 0 s01 -44 -3971 843 6 Running 1 0 s01 -47 -3982 836 7 Running 1 0 s01 -43 -3973 832 8 Running 1 0 s01 -40 -3973 834 9 Running 1 0 s01 -48 -3978 844
- General Threshold Segmentation
This is a general threshold segmentation algorithm which transforms a window of the data stream of size threshold_space_width into threshold space. This function transfer the input_data and group_column from the previous pipeline block.
The threshold space can be computed as standard deviation, sum, absolute sum, absolute average and variance. The vt threshold is then compared against the calculated value with a comparison type of "<=" or ">=" based on the use of “min” or “max” in the comparison type. This algorithm is a two pass detection. The first pass detects the start of the segment. The second pass detects the end of the segment. In this generalized algorithm, the two can be set independently.
- Parameters
first_column_of_interest (str) – name of the stream to use for first threshold segmentation
second_column_of_interest (str) – name of the stream to use for second threshold segmentation
max_segment_length (int) – number of samples in the window (default is 200)
min_segment_length (int) – The smallest segment allowed. (default 100)
first_vt_threshold (int) – vt_threshold value to begin detecting a segment
first_threshold_space (str) – threshold space to detect segment against (std, variance, absolute avg, absolute sum, sum)
first_comparison (str) – detect threshold above(max) or below(min) the vt_threshold (max, min)
second_vt_threshold (int) – vt_threshold value to detect a segments end.
second_threshold_space (str) – threshold space to detect segment end (std, variance, absolute avg, absolute sum, sum)
second_comparison (str) – detect threshold above(max) or below(min) the vt_threshold (max, min) threshold_space_width (int): the size of the buffer that the threshold value is calculated from.
return_segment_index (False) – set to true to see the segment indexes for start and end.
- Returns
The segmented result will have a new column called SegmentID that contains the segment IDs.
- Return type
DataFrame
Example
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> df out: Subject Class Rep accelx accely accelz 0 s01 Crawling 1 377 569 4019 1 s01 Crawling 1 357 594 4051 2 s01 Crawling 1 333 638 4049 3 s01 Crawling 1 340 678 4053 4 s01 Crawling 1 372 708 4051 5 s01 Crawling 1 410 733 4028 6 s01 Crawling 1 450 733 3988 7 s01 Crawling 1 492 696 3947 8 s01 Crawling 1 518 677 3943 9 s01 Crawling 1 528 695 3988 10 s01 Crawling 1 -1 2558 4609 11 s01 Running 1 -44 -3971 843 12 s01 Running 1 -47 -3982 836 13 s01 Running 1 -43 -3973 832 14 s01 Running 1 -40 -3973 834 15 s01 Running 1 -48 -3978 844 16 s01 Running 1 -52 -3993 842 17 s01 Running 1 -64 -3984 821 18 s01 Running 1 -64 -3966 813 19 s01 Running 1 -66 -3971 826 20 s01 Running 1 -62 -3988 827 21 s01 Running 1 -57 -3984 843
>>> client.pipeline.set_input_data('test_data', df, force=True, data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class')
>>> client.pipeline.add_transform("General Threshold Segmentation", params={"first_column_of_interest": 'accelx', "second_column_of_interest": 'accely', "max_segment_length": 5, "min_segment_length": 5, "threshold_space_width": 2, "first_vt_threshold": 0.05, "first_threshold_space": 'std', "first_comparison": 'max', "second_vt_threshold": 0.05, "second_threshold_space": 'std', "second_comparison": 'min', "return_segment_index": False})
>>> results, stats = client.pipeline.execute() >>> print results out: Class Rep SegmentID Subject accelx accely accelz 0 Crawling 1 0 s01 377 569 4019 1 Crawling 1 0 s01 357 594 4051 2 Crawling 1 0 s01 333 638 4049 3 Crawling 1 0 s01 340 678 4053 4 Crawling 1 0 s01 372 708 4051 5 Running 1 0 s01 -44 -3971 843 6 Running 1 0 s01 -47 -3982 836 7 Running 1 0 s01 -43 -3973 832 8 Running 1 0 s01 -40 -3973 834 9 Running 1 0 s01 -48 -3978 844
- Double Peak Key Segmentation
Considers a double peak as the key to begin segmentation segmentation and a single peak as the end.
- Parameters
input_data (DataFrame) – The input data.
axis_of_interest (str) – The stream to use for segmentation.
group_columns ([ str ]) – A list of column names to use for grouping.
max_segment_length (int) – This is the maximum number of samples a segment can contain. A segment length too large will not fit on the device.
return_segment_index (False) – Set to true to see the segment indexes for start and end.
Note: This should only be used for visualization not pipeline building.
- Adaptive Windowing Segmentation
A sliding windowing technique with adaptive sizing. This will find the largest point after min_segment_length that is above the threshold. That point will be considered the end of the segment. If no points are above the threshold before reaching max segment length, then the segment will stop at max_segment_length
- Parameters
input_data (DataFrame) – The input data.
columns_of_interest (str) – The stream to use for segmentation.
group_columns ([ str ]) – A list of column names to use for grouping.
max_segment_length (int) – This is the maximum number of samples a segment can contain.
min_segment_length (int) – segment can contain.
threshold (int) – The threshold must be met to start looking for the end of the segment early. If the threshold is not met, the segment ends at the max_segment_length
absolute_value (bool) – Takes the absolute value of the sensor data prior do doing the comparison
return_segment_index (False) – Set to true to see the segment indexes for start and end.