2.4.5 Augmentation
Large number of model parameters or insufficient amounts of data might cause an over-fitting problem. One solution to address this problem is using data augmentation, which is a process to generate more data by using existing data.
The ML Pipeline offers a set of augmentation methods for time series data. Each augmentation method is implemented as a member of an augmentation set. A function in the augmentation set uses segmented time series data individually. Users can create a subset of the original data set by defining “target sensor”, “target label” and “percentage” parameters. If “target sensor” is not defined, data from all sensors will be used. Similarly, if “target label” is not defined, data from all labels will be used. The subset will be created for each method individually and concatenated with the original data.
See the following examples for more clarification.
Original Data:
Subject Class Rep accelx accely accelz SegmentID
0 s01 Crawling 1 377 569 4019 0
1 s01 Crawling 1 357 594 4051 0
2 s01 Crawling 1 333 638 4049 0
3 s01 Crawling 1 340 678 4053 0
4 s01 Crawling 1 372 708 4051 0
5 s01 Crawling 1 410 733 4028 1
6 s01 Crawling 1 450 733 3988 1
7 s01 Crawling 1 492 696 3947 1
8 s01 Crawling 1 518 677 3943 1
9 s01 Crawling 1 528 695 3988 1
10 s01 Running 1 -44 -3971 843 0
11 s01 Running 1 -47 -3982 836 0
12 s01 Running 1 -43 -3973 832 0
13 s01 Running 1 -40 -3973 834 0
14 s01 Running 1 -48 -3978 844 0
15 s01 Running 1 -52 -3993 842 1
16 s01 Running 1 -64 -3984 821 1
17 s01 Running 1 -64 -3966 813 1
18 s01 Running 1 -66 -3971 826 1
19 s01 Running 1 -62 -3988 827 1
Example 1: Creating augmented data by using Add Noise method.
>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('toy_data',
data_columns=['accelx', 'accely', 'accelz'],
group_columns=['Subject', 'Class', 'Rep'],
label_column='Class')
>>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5 })
>>> client.pipeline.add_augmentation(
[
{"name": "Add Noise",
"params": {"scale": [0.3, 0.5],
"target_labels": ['Crawling'],
"target_sensors": ['accelx', 'accelz']
}},
],
params={"percent": 0.5}
)
>>> results, stats = client.pipeline.execute()
The subset of the data will be created by grouping the original data. Concatenation of group columns ([‘Subject’, ‘Class’, ‘Rep’]) and SegmentID are used for grouping. Only subset items with targeted labels (‘Crawling’) will be selected.
Subset-1:
Subject Class Rep accelx accely accelz SegmentID
0 s01 Crawling 1 377 569 4019 0
1 s01 Crawling 1 357 594 4051 0
2 s01 Crawling 1 333 638 4049 0
3 s01 Crawling 1 340 678 4053 0
4 s01 Crawling 1 372 708 4051 0
Subset-2:
Subject Class Rep accelx accely accelz SegmentID
1 s01 Crawling 1 410 733 4028 1
2 s01 Crawling 1 450 733 3988 1
3 s01 Crawling 1 492 696 3947 1
4 s01 Crawling 1 518 677 3943 1
5 s01 Crawling 1 528 695 3988 1
Since “percent” is 0.5, half of the subset data will be selected randomly. Let’s say Subset-2 is selected. Only time series of targeted sensors ([‘accelx’, ‘accelz’]) will be applied to the desired augmentation method with defined scale items.
Augmented Data with Add Noise method with scale of 0.3:
Subject Class Rep accelx accely accelz SegmentID
1 s01 Crawling 1 351 733 4016 1__add_noise_scale_0.3
2 s01 Crawling 1 474 733 3978 1__add_noise_scale_0.3
3 s01 Crawling 1 441 696 3943 1__add_noise_scale_0.3
4 s01 Crawling 1 504 677 3969 1__add_noise_scale_0.3
5 s01 Crawling 1 508 695 3975 1__add_noise_scale_0.3
Augmented Data with Add Noise method with scale of 0.5:
Subject Class Rep accelx accely accelz SegmentID
1 s01 Crawling 1 486 733 4067 1__add_noise_scale_0.5
2 s01 Crawling 1 395 733 3955 1__add_noise_scale_0.5
3 s01 Crawling 1 479 696 3873 1__add_noise_scale_0.5
4 s01 Crawling 1 481 677 3907 1__add_noise_scale_0.5
5 s01 Crawling 1 482 695 4001 1__add_noise_scale_0.5
At the end the augmentation function, original data and augmented data will concatenate.
>>> print(results)
Out:
Subject Class Rep accelx accely accelz SegmentID
0 s01 Crawling 1 377 569 4019 0
1 s01 Crawling 1 357 594 4051 0
2 s01 Crawling 1 333 638 4049 0
3 s01 Crawling 1 340 678 4053 0
4 s01 Crawling 1 372 708 4051 0
5 s01 Crawling 1 410 733 4028 1
6 s01 Crawling 1 450 733 3988 1
7 s01 Crawling 1 492 696 3947 1
8 s01 Crawling 1 518 677 3943 1
9 s01 Crawling 1 528 695 3988 1
10 s01 Running 1 -44 -3971 843 0
11 s01 Running 1 -47 -3982 836 0
12 s01 Running 1 -43 -3973 832 0
13 s01 Running 1 -40 -3973 834 0
14 s01 Running 1 -48 -3978 844 0
15 s01 Running 1 -52 -3993 842 1
16 s01 Running 1 -64 -3984 821 1
17 s01 Running 1 -64 -3966 813 1
18 s01 Running 1 -66 -3971 826 1
19 s01 Running 1 -62 -3988 827 1
20 s01 Crawling 1 351 733 4016 1__add_noise_scale_0.3
21 s01 Crawling 1 474 733 3978 1__add_noise_scale_0.3
22 s01 Crawling 1 441 696 3943 1__add_noise_scale_0.3
23 s01 Crawling 1 504 677 3969 1__add_noise_scale_0.3
24 s01 Crawling 1 508 695 3975 1__add_noise_scale_0.3
25 s01 Crawling 1 486 733 4067 1__add_noise_scale_0.5
26 s01 Crawling 1 395 733 3955 1__add_noise_scale_0.5
27 s01 Crawling 1 479 696 3873 1__add_noise_scale_0.5
28 s01 Crawling 1 481 677 3907 1__add_noise_scale_0.5
Example 2: Creating augmented data by using Add Noise and Add Quantize method.
>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('toy_data',
data_columns=['accelx', 'accely', 'accelz'],
group_columns=['Subject', 'Class', 'Rep'],
label_column='Class')
>>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5 })
>>> client.pipeline.add_augmentation(
[
{"name": "Add Noise",
"params": {"scale": [0.3, 0.5],
"target_labels": ['Crawling'],
"target_sensors": ['accelx', 'accelz']
}},
{"name": "Add Quantize",
"params": {"n_levels": [10] }},
],
params={"percent": 0.5}
)
>>> results, stats = client.pipeline.execute()
For each augmentation method, subsets are created individually. In “Add Noise” method, target sensors and target labels are defined, and a subset of the original data is created, as explained in Example-1. But in “Add Quantize” method, target sensors and target labels are not defined. Therefore the data from all sensors and labels will be used to create group data. Only half of the group data will be used for each label.
After augmented data sets are created by using these two methods individually, they will concatenate with the original data set.
>>> print(results)
Out:
Subject Class Rep accelx accely accelz SegmentID
0 s01 Crawling 1 377 569 4019 0
1 s01 Crawling 1 357 594 4051 0
2 s01 Crawling 1 333 638 4049 0
3 s01 Crawling 1 340 678 4053 0
4 s01 Crawling 1 372 708 4051 0
5 s01 Crawling 1 410 733 4028 1
6 s01 Crawling 1 450 733 3988 1
7 s01 Crawling 1 492 696 3947 1
8 s01 Crawling 1 518 677 3943 1
9 s01 Crawling 1 528 695 3988 1
10 s01 Running 1 -44 -3971 843 0
11 s01 Running 1 -47 -3982 836 0
12 s01 Running 1 -43 -3973 832 0
13 s01 Running 1 -40 -3973 834 0
14 s01 Running 1 -48 -3978 844 0
15 s01 Running 1 -52 -3993 842 1
16 s01 Running 1 -64 -3984 821 1
17 s01 Running 1 -64 -3966 813 1
18 s01 Running 1 -66 -3971 826 1
19 s01 Running 1 -62 -3988 827 1
20 s01 Crawling 1 396 733 3981 1__add_noise_scale_0.3
21 s01 Crawling 1 447 733 3958 1__add_noise_scale_0.3
22 s01 Crawling 1 464 696 3924 1__add_noise_scale_0.3
23 s01 Crawling 1 486 677 3979 1__add_noise_scale_0.3
24 s01 Crawling 1 528 695 3961 1__add_noise_scale_0.3
25 s01 Crawling 1 378 733 4152 1__add_noise_scale_0.5
26 s01 Crawling 1 442 733 4036 1__add_noise_scale_0.5
27 s01 Crawling 1 479 696 3927 1__add_noise_scale_0.5
28 s01 Crawling 1 482 677 4060 1__add_noise_scale_0.5
29 s01 Crawling 1 575 695 3944 1__add_noise_scale_0.5
30 s01 Crawling 1 374 575 4020 0__add_quantize_n_levels_10
31 s01 Crawling 1 357 603 4051 0__add_quantize_n_levels_10
32 s01 Crawling 1 335 645 4051 0__add_quantize_n_levels_10
33 s01 Crawling 1 344 687 4051 0__add_quantize_n_levels_10
34 s01 Crawling 1 374 701 4051 0__add_quantize_n_levels_10
35 s01 Running 1 -52 -3991 840 1__add_quantize_n_levels_10
36 s01 Running 1 -63 -3983 823 1__add_quantize_n_levels_10
37 s01 Running 1 -63 -3967 814 1__add_quantize_n_levels_10
38 s01 Running 1 -65 -3970 826 1__add_quantize_n_levels_10
39 s01 Running 1 -61 -3986 828 1__add_quantize_n_levels_10
The details of each augmentation method are explained below.
library.core_functions.augmentation.add_convolve(input_data, label_column, fraction_of_segment, kernel, target_labels, percent, group_columns, target_sensors, **kwargs)
- Add Convolve:
Convolve (smoothing) time series with a kernel window for each segment.
- Parameters
input_data (DataFrame) – Input data
fraction_of_segment (list) – The fraction of segment size.
kernel (str) – The type of kernel window used for the convolution. See unhandled xref case for more details.
target_labels (list) – List of labels that drift will be applied.
percent (float) – Percentage of the dataset that drift will be applied.
target_sensors (list) – List of sensors that drift will be applied.
- Returns
which includes convolved data set.
- Return type
DataFrame
Example
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> df = pd.concat([df,df]).reset_index(drop=True) >>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset() >>> client.pipeline.set_input_data('toy_data', data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5})
>>> client.pipeline.add_augmentation( [ {"name": "Add Convolve", "params": {"fraction_of_segment": [0.5, 0.2], "kernel": "hann", "target_labels": ['Crawling'], "target_sensors": ['accelx'] }} ], params={"percent": 0.8} )
>>> results, stats = client.pipeline.execute() >>> print(results) Out: Subject Class Rep accelx accely accelz SegmentID 0 s01 Crawling 1 377 569 4019 0 1 s01 Crawling 1 357 594 4051 0 2 s01 Crawling 1 333 638 4049 0 3 s01 Crawling 1 340 678 4053 0 4 s01 Crawling 1 372 708 4051 0 5 s01 Crawling 1 410 733 4028 0 6 s01 Crawling 1 450 733 3988 0 7 s01 Crawling 1 492 696 3947 0 8 s01 Crawling 1 518 677 3943 0 9 s01 Crawling 1 528 695 3988 0 10 s01 Crawling 1 -1 2558 4609 0 11 s01 Crawling 1 377 569 4019 0 12 s01 Crawling 1 357 594 4051 0 13 s01 Crawling 1 333 638 4049 0 14 s01 Crawling 1 340 678 4053 0 15 s01 Crawling 1 372 708 4051 0 16 s01 Crawling 1 410 733 4028 0 17 s01 Crawling 1 450 733 3988 0 18 s01 Crawling 1 492 696 3947 0 19 s01 Crawling 1 518 677 3943 0 20 s01 Running 1 -44 -3971 843 0 21 s01 Running 1 -47 -3982 836 0 .. ... ... ... ... ... ... ... 60 s01 Crawling 1 366 569 4019 0__add_convolve_size_0.2_window_hann 61 s01 Crawling 1 344 594 4051 0__add_convolve_size_0.2_window_hann 62 s01 Crawling 1 335 638 4049 0__add_convolve_size_0.2_window_hann 63 s01 Crawling 1 355 678 4053 0__add_convolve_size_0.2_window_hann 64 s01 Crawling 1 390 708 4051 0__add_convolve_size_0.2_window_hann 65 s01 Crawling 1 429 733 4028 0__add_convolve_size_0.2_window_hann 66 s01 Crawling 1 470 733 3988 0__add_convolve_size_0.2_window_hann 67 s01 Crawling 1 504 696 3947 0__add_convolve_size_0.2_window_hann 68 s01 Crawling 1 522 677 3943 0__add_convolve_size_0.2_window_hann 69 s01 Crawling 1 263 695 3988 0__add_convolve_size_0.2_window_hann 70 s01 Crawling 1 187 2558 4609 0__add_convolve_size_0.2_window_hann 71 s01 Crawling 1 366 569 4019 0__add_convolve_size_0.2_window_hann 72 s01 Crawling 1 344 594 4051 0__add_convolve_size_0.2_window_hann 73 s01 Crawling 1 335 638 4049 0__add_convolve_size_0.2_window_hann 74 s01 Crawling 1 355 678 4053 0__add_convolve_size_0.2_window_hann 75 s01 Crawling 1 390 708 4051 0__add_convolve_size_0.2_window_hann 76 s01 Crawling 1 429 733 4028 0__add_convolve_size_0.2_window_hann 77 s01 Crawling 1 470 733 3988 0__add_convolve_size_0.2_window_hann 78 s01 Crawling 1 504 696 3947 0__add_convolve_size_0.2_window_hann 79 s01 Crawling 1 517 677 3943 0__add_convolve_size_0.2_window_hann
library.core_functions.augmentation.add_drift(input_data, label_column, max_drift, n_drift_points, target_labels, percent, group_columns, target_sensors, **kwargs)
- Add Drift:
The augmenter drifts the value of time series from its original values randomly and smoothly. The extent of drifting is controlled by the maximal drift and the number of drift points.
- Parameters
input_data (DataFrame) – Input data
max_drift (list) – The maximal amount of drift added to a time series. The maximal drift added to a time series (each sensor, each segment) is sampled from the interval randomly.
n_drift_points (int) – The number of time points a new drifting trend is defined in a series.
target_labels (list) – List of labels that drift will be applied.
percent (float) – Percentage of the data set that drift will be applied.
target_sensors (list) – List of sensors that drift will be applied.
- Returns
which includes drift added data set.
- Return type
DataFrame
Example
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset() >>> client.pipeline.set_input_data('toy_data', data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5})
>>> client.pipeline.add_augmentation( [ {"name": "Add Drift", "params": {"max_drift": [0.1, 1.5], "n_drift_points": 1, "target_labels": ['Crawling'], "target_sensors": ['accelx']}} ], params={"percent": 0.4} )
>>> results, stats = client.pipeline.execute() >>> print(results) Out: Subject Class Rep accelx accely accelz SegmentID 0 s01 Crawling 1 377 569 4019 0 1 s01 Crawling 1 357 594 4051 0 2 s01 Crawling 1 333 638 4049 0 3 s01 Crawling 1 340 678 4053 0 4 s01 Crawling 1 372 708 4051 0 5 s01 Crawling 1 410 733 4028 1 6 s01 Crawling 1 450 733 3988 1 7 s01 Crawling 1 492 696 3947 1 8 s01 Crawling 1 518 677 3943 1 9 s01 Crawling 1 528 695 3988 1 10 s01 Running 1 -44 -3971 843 0 11 s01 Running 1 -47 -3982 836 0 12 s01 Running 1 -43 -3973 832 0 13 s01 Running 1 -40 -3973 834 0 14 s01 Running 1 -48 -3978 844 0 15 s01 Running 1 -52 -3993 842 1 16 s01 Running 1 -64 -3984 821 1 17 s01 Running 1 -64 -3966 813 1 18 s01 Running 1 -66 -3971 826 1 19 s01 Running 1 -62 -3988 827 1 20 s01 Crawling 1 410 733 4028 1__add_drift_max_drift_0.1_n_drift_points_1 21 s01 Crawling 1 449 733 3988 1__add_drift_max_drift_0.1_n_drift_points_1 22 s01 Crawling 1 488 696 3947 1__add_drift_max_drift_0.1_n_drift_points_1 23 s01 Crawling 1 511 677 3943 1__add_drift_max_drift_0.1_n_drift_points_1 24 s01 Crawling 1 516 695 3988 1__add_drift_max_drift_0.1_n_drift_points_1 25 s01 Crawling 1 410 733 4028 1__add_drift_max_drift_1.5_n_drift_points_1 26 s01 Crawling 1 436 733 3988 1__add_drift_max_drift_1.5_n_drift_points_1 27 s01 Crawling 1 445 696 3947 1__add_drift_max_drift_1.5_n_drift_points_1 28 s01 Crawling 1 416 677 3943 1__add_drift_max_drift_1.5_n_drift_points_1 29 s01 Crawling 1 351 695 3988 1__add_drift_max_drift_1.5_n_drift_points_1
library.core_functions.augmentation.add_dropout(input_data, label_column, fraction_of_segment, p, target_labels, percent, group_columns, target_sensors, **kwargs)
- Add Dropout:
Dropout values of some random time points in time series without changing the length.
- Parameters
input_data (DataFrame) – Input data
fraction_of_segment (list) – The fraction of segment size.
p (int) – Probability of the value of a time point to be dropped out.
target_labels (list) – List of labels that drift will be applied.
percent (float) – Percentage of the data set that drift will be applied.
target_sensors (list) – List of sensors that drift will be applied.
- Returns
which includes dropout data set.
- Return type
DataFrame
Example
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> df = pd.concat([df,df]).reset_index(drop=True) >>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset() >>> client.pipeline.set_input_data('toy_data', data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_transform('Windowing', params={'window_size' : 10, 'delta': 10})
>>> client.pipeline.add_augmentation( [ {"name": "Add Dropout", "params": {"fraction_of_segment": [0.1, 0.5], "p": 0.5, "target_labels": ['Crawling'], "target_sensors": ['accelx'] }} ], params={"percent": 0.8} )
>>> results, stats = client.pipeline.execute() >>> print(results) Out: Subject Class Rep accelx accely accelz SegmentID 0 s01 Crawling 1 377 569 4019 0 1 s01 Crawling 1 357 594 4051 0 2 s01 Crawling 1 333 638 4049 0 3 s01 Crawling 1 340 678 4053 0 4 s01 Crawling 1 372 708 4051 0 5 s01 Crawling 1 410 733 4028 0 6 s01 Crawling 1 450 733 3988 0 7 s01 Crawling 1 492 696 3947 0 8 s01 Crawling 1 518 677 3943 0 9 s01 Crawling 1 528 695 3988 0 10 s01 Running 1 -44 -3971 843 0 11 s01 Running 1 -47 -3982 836 0 12 s01 Running 1 -43 -3973 832 0 13 s01 Running 1 -40 -3973 834 0 14 s01 Running 1 -48 -3978 844 0 15 s01 Running 1 -52 -3993 842 0 16 s01 Running 1 -64 -3984 821 0 17 s01 Running 1 -64 -3966 813 0 18 s01 Running 1 -66 -3971 826 0__add_dropout_size_0.1_p_0.5 19 s01 Running 1 -62 -3988 827 0__add_dropout_size_0.1_p_0.5 20 s01 Crawling 1 377 569 4019 0__add_dropout_size_0.1_p_0.5 21 s01 Crawling 1 357 594 4051 0__add_dropout_size_0.1_p_0.5 22 s01 Crawling 1 357 638 4049 0__add_dropout_size_0.1_p_0.5 23 s01 Crawling 1 333 678 4053 0__add_dropout_size_0.1_p_0.5 24 s01 Crawling 1 340 708 4051 0__add_dropout_size_0.1_p_0.5 25 s01 Crawling 1 410 733 4028 0__add_dropout_size_0.1_p_0.5 26 s01 Crawling 1 410 733 3988 0__add_dropout_size_0.1_p_0.5 27 s01 Crawling 1 450 696 3947 0__add_dropout_size_0.1_p_0.5 28 s01 Crawling 1 492 677 3943 0__add_dropout_size_0.1_p_0.5 29 s01 Crawling 1 528 695 3988 0__add_dropout_size_0.5_p_0.5 30 s01 Crawling 1 377 569 4019 0__add_dropout_size_0.5_p_0.5 31 s01 Crawling 1 357 594 4051 0__add_dropout_size_0.5_p_0.5 32 s01 Crawling 1 333 638 4049 0__add_dropout_size_0.5_p_0.5 33 s01 Crawling 1 340 678 4053 0__add_dropout_size_0.5_p_0.5 34 s01 Crawling 1 372 708 4051 0__add_dropout_size_0.5_p_0.5 35 s01 Crawling 1 410 733 4028 0__add_dropout_size_0.5_p_0.5 36 s01 Crawling 1 450 733 3988 0__add_dropout_size_0.5_p_0.5 37 s01 Crawling 1 492 696 3947 0__add_dropout_size_0.5_p_0.5 38 s01 Crawling 1 518 677 3943 0__add_dropout_size_0.5_p_0.5 39 s01 Crawling 1 528 695 3988 0__add_dropout_size_0.5_p_0.5
library.core_functions.augmentation.add_noise(input_data, label_column, group_columns, target_sensors, target_labels, percent=0.1, scale=None)
- Add Noise:
Add random noise to time series. The noise added to every time point of a time series is independent and identically distributed.
- Parameters
input_data (DataFrame) – Input data
scale (list) – List of standard deviation of the random noise that will be used to add noise.
target_labels (list) – List of labels that drift will be applied.
percent (float) – Percentage of the data set that drift will be applied.
target_sensors (list) – List of sensors that drift will be applied.
- Returns
which includes drift added data set.
- Return type
DataFrame
Example
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset() >>> client.pipeline.set_input_data('toy_data', data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5})
>>> client.pipeline.add_augmentation( [ {"name": "Add Noise", "params":{"scale": [0.1, 0.2], "target_labels": ['Crawling'], "target_sensors": ['accelx'] } } ], params={"percent": 0.4} )
>>> results, stats = client.pipeline.execute() >>> print(results) Out: Subject Class Rep accelx accely accelz SegmentID 0 s01 Crawling 1 377 569 4019 0 1 s01 Crawling 1 357 594 4051 0 2 s01 Crawling 1 333 638 4049 0 3 s01 Crawling 1 340 678 4053 0 4 s01 Crawling 1 372 708 4051 0 5 s01 Crawling 1 410 733 4028 1 6 s01 Crawling 1 450 733 3988 1 7 s01 Crawling 1 492 696 3947 1 8 s01 Crawling 1 518 677 3943 1 9 s01 Crawling 1 528 695 3988 1 10 s01 Running 1 -44 -3971 843 0 11 s01 Running 1 -47 -3982 836 0 12 s01 Running 1 -43 -3973 832 0 13 s01 Running 1 -40 -3973 834 0 14 s01 Running 1 -48 -3978 844 0 15 s01 Running 1 -52 -3993 842 1 16 s01 Running 1 -64 -3984 821 1 17 s01 Running 1 -64 -3966 813 1 18 s01 Running 1 -66 -3971 826 1 19 s01 Running 1 -62 -3988 827 1 20 s01 Crawling 1 401 733 4028 1__add_noise_scale_0.1 21 s01 Crawling 1 447 733 3988 1__add_noise_scale_0.1 22 s01 Crawling 1 496 696 3947 1__add_noise_scale_0.1 23 s01 Crawling 1 521 677 3943 1__add_noise_scale_0.1 24 s01 Crawling 1 519 695 3988 1__add_noise_scale_0.1 25 s01 Crawling 1 378 733 4028 1__add_noise_scale_0.2 26 s01 Crawling 1 443 733 3988 1__add_noise_scale_0.2 27 s01 Crawling 1 507 696 3947 1__add_noise_scale_0.2 28 s01 Crawling 1 524 677 3943 1__add_noise_scale_0.2 29 s01 Crawling 1 529 695 3988 1__add_noise_scale_0.2
library.core_functions.augmentation.add_pool(input_data, label_column, fraction_of_segment, target_labels, percent, group_columns, target_sensors, **kwargs)
- Add Pool:
Reduce the temporal resolution without changing the length.
- Parameters
input_data (DataFrame) – Input data
fraction_of_segment (list) – The fraction of segment size.
target_labels (list) – List of labels that drift will be applied.
percent (float) – Percentage of the data set that drift will be applied.
target_sensors (list) – List of sensors that drift will be applied.
- Returns
which includes new resolution added data set.
- Return type
DataFrame
Example
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> df = pd.concat([df,df]).reset_index(drop=True) >>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset() >>> client.pipeline.set_input_data('toy_data', data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_transform('Windowing', params={'window_size' : 10, 'delta': 10})
>>> client.pipeline.add_augmentation( [ {"name": "Add Pool", "params": {"fraction_of_segment": [0.3, 0.5], "target_labels": ['Crawling'], "target_sensors": ['accelx'] }} ], params={"percent": 0.8} )
>>> results, stats = client.pipeline.execute() >>> print(results) Out: Subject Class Rep accelx accely accelz SegmentID 0 s01 Crawling 1 377 569 4019 0 1 s01 Crawling 1 357 594 4051 0 2 s01 Crawling 1 333 638 4049 0 3 s01 Crawling 1 340 678 4053 0 4 s01 Crawling 1 372 708 4051 0 5 s01 Crawling 1 410 733 4028 0 6 s01 Crawling 1 450 733 3988 0 7 s01 Crawling 1 492 696 3947 0 8 s01 Crawling 1 518 677 3943 0 9 s01 Crawling 1 528 695 3988 0 10 s01 Running 1 -44 -3971 843 0 11 s01 Running 1 -47 -3982 836 0 12 s01 Running 1 -43 -3973 832 0 13 s01 Running 1 -40 -3973 834 0 14 s01 Running 1 -48 -3978 844 0 15 s01 Running 1 -52 -3993 842 0 16 s01 Running 1 -64 -3984 821 0 17 s01 Running 1 -64 -3966 813 0 18 s01 Running 1 -66 -3971 826 0 19 s01 Running 1 -62 -3988 827 0 20 s01 Crawling 1 355 569 4019 0__add_pool_size_0.3 21 s01 Crawling 1 355 594 4051 0__add_pool_size_0.3 22 s01 Crawling 1 355 638 4049 0__add_pool_size_0.3 23 s01 Crawling 1 374 678 4053 0__add_pool_size_0.3 24 s01 Crawling 1 374 708 4051 0__add_pool_size_0.3 25 s01 Crawling 1 374 733 4028 0__add_pool_size_0.3 26 s01 Crawling 1 486 733 3988 0__add_pool_size_0.3 27 s01 Crawling 1 486 696 3947 0__add_pool_size_0.3 28 s01 Crawling 1 486 677 3943 0__add_pool_size_0.3 29 s01 Crawling 1 528 695 3988 0__add_pool_size_0.3 30 s01 Crawling 1 355 569 4019 0__add_pool_size_0.5 31 s01 Crawling 1 355 594 4051 0__add_pool_size_0.5 32 s01 Crawling 1 355 638 4049 0__add_pool_size_0.5 33 s01 Crawling 1 355 678 4053 0__add_pool_size_0.5 34 s01 Crawling 1 355 708 4051 0__add_pool_size_0.5 35 s01 Crawling 1 479 733 4028 0__add_pool_size_0.5 36 s01 Crawling 1 479 733 3988 0__add_pool_size_0.5 37 s01 Crawling 1 479 696 3947 0__add_pool_size_0.5 38 s01 Crawling 1 479 677 3943 0__add_pool_size_0.5 39 s01 Crawling 1 479 695 3988 0__add_pool_size_0.5
library.core_functions.augmentation.add_quantize(input_data, label_column, n_levels, target_labels, percent, group_columns, target_sensors, **kwargs)
- Add Quantize:
Quantize time series to a level set.
- Parameters
input_data (DataFrame) – Input data
n_levels (list) – Values in a time series are rounded to the nearest level in the level set.
target_labels (list) – List of labels that drift will be applied.
percent (float) – Percentage of the data set that drift will be applied.
target_sensors (list) – List of sensors that drift will be applied.
- Returns
which includes new resolution added data set.
- Return type
DataFrame
Example
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> df = pd.concat([df,df]).reset_index(drop=True) >>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset() >>> client.pipeline.set_input_data('toy_data', data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5})
>>> client.pipeline.add_augmentation( [ {"name": "Add Quantize", "params": {"n_levels": [2, 4], "target_labels": ['Crawling'], "target_sensors": ['accelx'] }} ], params={"percent": 0.5} )
>>> results, stats = client.pipeline.execute() >>> print(results) Out: Subject Class Rep accelx accely accelz SegmentID 0 s01 Crawling 1 377 569 4019 0 1 s01 Crawling 1 357 594 4051 0 2 s01 Crawling 1 333 638 4049 0 3 s01 Crawling 1 340 678 4053 0 4 s01 Crawling 1 372 708 4051 0 5 s01 Crawling 1 410 733 4028 1 6 s01 Crawling 1 450 733 3988 1 7 s01 Crawling 1 492 696 3947 1 8 s01 Crawling 1 518 677 3943 1 9 s01 Crawling 1 528 695 3988 1 10 s01 Running 1 -44 -3971 843 0 11 s01 Running 1 -47 -3982 836 0 12 s01 Running 1 -43 -3973 832 0 13 s01 Running 1 -40 -3973 834 0 14 s01 Running 1 -48 -3978 844 0 15 s01 Running 1 -52 -3993 842 1 16 s01 Running 1 -64 -3984 821 1 17 s01 Running 1 -64 -3966 813 1 18 s01 Running 1 -66 -3971 826 1 19 s01 Running 1 -62 -3988 827 1 20 s01 Crawling 1 366 569 4019 0__add_quantize_n_levels_2 21 s01 Crawling 1 366 594 4051 0__add_quantize_n_levels_2 22 s01 Crawling 1 344 638 4049 0__add_quantize_n_levels_2 23 s01 Crawling 1 344 678 4053 0__add_quantize_n_levels_2 24 s01 Crawling 1 366 708 4051 0__add_quantize_n_levels_2 25 s01 Crawling 1 371 569 4019 0__add_quantize_n_levels_4 26 s01 Crawling 1 360 594 4051 0__add_quantize_n_levels_4 27 s01 Crawling 1 338 638 4049 0__add_quantize_n_levels_4 28 s01 Crawling 1 349 678 4053 0__add_quantize_n_levels_4 29 s01 Crawling 1 371 708 4051 0__add_quantize_n_levels_4
library.core_functions.augmentation.add_reverse(input_data, label_column, target_labels, percent, group_columns, target_sensors, **kwargs)
- Add Reverse:
Reverse the time line of series.
- Parameters
input_data (DataFrame) – Input data
target_labels (list) – List of labels that drift will be applied.
percent (float) – Percentage of the data set that drift will be applied.
target_sensors (list) – List of sensors that drift will be applied.
- Returns
which includes reversed of time series.
- Return type
DataFrame
Example
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> df = pd.concat([df,df]).reset_index(drop=True) >>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset() >>> client.pipeline.set_input_data('toy_data', data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5})
>>> client.pipeline.add_augmentation( [ {"name": "Add Reverse", "params": {"target_labels": ['Crawling'], "target_sensors": ['accelx'] }} ], params={"percent": 0.5} )
>>> results, stats = client.pipeline.execute() >>> print(results) Out: Subject Class Rep accelx accely accelz SegmentID 0 s01 Crawling 1 377 569 4019 0 1 s01 Crawling 1 357 594 4051 0 2 s01 Crawling 1 333 638 4049 0 3 s01 Crawling 1 340 678 4053 0 4 s01 Crawling 1 372 708 4051 0 5 s01 Crawling 1 410 733 4028 1 6 s01 Crawling 1 450 733 3988 1 7 s01 Crawling 1 492 696 3947 1 8 s01 Crawling 1 518 677 3943 1 9 s01 Crawling 1 528 695 3988 1 10 s01 Running 1 -44 -3971 843 0 11 s01 Running 1 -47 -3982 836 0 12 s01 Running 1 -43 -3973 832 0 13 s01 Running 1 -40 -3973 834 0 14 s01 Running 1 -48 -3978 844 0 15 s01 Running 1 -52 -3993 842 1 16 s01 Running 1 -64 -3984 821 1 17 s01 Running 1 -64 -3966 813 1 18 s01 Running 1 -66 -3971 826 1 19 s01 Running 1 -62 -3988 827 1 20 s01 Crawling 1 528 733 4028 1__add_reverse__0 21 s01 Crawling 1 518 733 3988 1__add_reverse__0 22 s01 Crawling 1 492 696 3947 1__add_reverse__0 23 s01 Crawling 1 450 677 3943 1__add_reverse__0 24 s01 Crawling 1 410 695 3988 1__add_reverse__0
library.core_functions.augmentation.add_timewarp(input_data, label_column, n_speed_change, max_speed_ratio, target_labels, percent, group_columns, target_sensors, **kwargs)
- Add Timewarp:
Random time warping. The augmenter random changed the speed of timeline. The time warping is controlled by the number of speed changes and the maximal ratio of max/min speed. Smaller n_speed_change and greater max_speed_ratio more similar to original signal
- Parameters
input_data (DataFrame) – Input data
n_speed_change – The number of speed changes in each series.
max_speed_ratio – The maximal ratio of max/min speed in the warpped time line. The time line of a series is more likely to be significantly wrapped if this value is greater.
target_labels (list) – List of labels that drift will be applied.
percent (float) – Percentage of the data set that drift will be applied.
target_sensors (list) – List of sensors that drift will be applied.
- Returns
which includes drift added data set.
- Return type
DataFrame
Example
>>> client.pipeline.reset() >>> df = client.datasets.load_activity_raw_toy() >>> df = pd.concat([df,df]).reset_index(drop=True) >>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset() >>> client.pipeline.set_input_data('toy_data', data_columns=['accelx', 'accely', 'accelz'], group_columns=['Subject', 'Class', 'Rep'], label_column='Class') >>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5})
>>> client.pipeline.add_augmentation( [ {"name": "Add TimeWarp", "params": {"n_speed_change": [3, 5], "max_speed_ratio": 2.5, "target_labels": ['Crawling'], "target_sensors": ['accelx'] }} ], params={"percent": 0.5} )
>>> results, stats = client.pipeline.execute() >>> print(results) Out: Subject Class Rep accelx accely accelz SegmentID 0 s01 Crawling 1 377 569 4019 0 1 s01 Crawling 1 357 594 4051 0 2 s01 Crawling 1 333 638 4049 0 3 s01 Crawling 1 340 678 4053 0 4 s01 Crawling 1 372 708 4051 0 5 s01 Crawling 1 410 733 4028 0 6 s01 Crawling 1 450 733 3988 0 7 s01 Crawling 1 492 696 3947 0 8 s01 Crawling 1 518 677 3943 0 9 s01 Crawling 1 528 695 3988 0 10 s01 Running 1 -44 -3971 843 0 11 s01 Running 1 -47 -3982 836 0 12 s01 Running 1 -43 -3973 832 0 13 s01 Running 1 -40 -3973 834 0 14 s01 Running 1 -48 -3978 844 0 15 s01 Running 1 -52 -3993 842 0 16 s01 Running 1 -64 -3984 821 0 17 s01 Running 1 -64 -3966 813 0 18 s01 Running 1 -66 -3971 826 0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5 19 s01 Running 1 -62 -3988 827 0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5 20 s01 Crawling 1 377 569 4019 0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5 21 s01 Crawling 1 352 594 4051 0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5 22 s01 Crawling 1 333 638 4049 0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5 23 s01 Crawling 1 336 678 4053 0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5 24 s01 Crawling 1 372 708 4051 0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5 25 s01 Crawling 1 377 569 4019 0__add_timewarp_n_speed_change_5_max_speed_ratio_2.5 26 s01 Crawling 1 359 594 4051 0__add_timewarp_n_speed_change_5_max_speed_ratio_2.5 27 s01 Crawling 1 336 638 4049 0__add_timewarp_n_speed_change_5_max_speed_ratio_2.5 28 s01 Crawling 1 337 678 4053 0__add_timewarp_n_speed_change_5_max_speed_ratio_2.5 29 s01 Crawling 1 372 708 4051 0__add_timewarp_n_speed_change_5_max_speed_ratio_2.5