2.4.5 Augmentation

Large number of model parameters or insufficient amounts of data might cause an over-fitting problem. One solution to address this problem is using data augmentation, which is a process to generate more data by using existing data.

The ML Pipeline offers a set of augmentation methods for time series data. Each augmentation method is implemented as a member of an augmentation set. A function in the augmentation set uses segmented time series data individually. Users can create a subset of the original data set by defining “target sensor”, “target label” and “percentage” parameters. If “target sensor” is not defined, data from all sensors will be used. Similarly, if “target label” is not defined, data from all labels will be used. The subset will be created for each method individually and concatenated with the original data.

See the following examples for more clarification.

Original Data:

   Subject     Class  Rep  accelx  accely  accelz               SegmentID
0      s01  Crawling    1     377     569    4019                       0
1      s01  Crawling    1     357     594    4051                       0
2      s01  Crawling    1     333     638    4049                       0
3      s01  Crawling    1     340     678    4053                       0
4      s01  Crawling    1     372     708    4051                       0
5      s01  Crawling    1     410     733    4028                       1
6      s01  Crawling    1     450     733    3988                       1
7      s01  Crawling    1     492     696    3947                       1
8      s01  Crawling    1     518     677    3943                       1
9      s01  Crawling    1     528     695    3988                       1
10     s01   Running    1     -44   -3971     843                       0
11     s01   Running    1     -47   -3982     836                       0
12     s01   Running    1     -43   -3973     832                       0
13     s01   Running    1     -40   -3973     834                       0
14     s01   Running    1     -48   -3978     844                       0
15     s01   Running    1     -52   -3993     842                       1
16     s01   Running    1     -64   -3984     821                       1
17     s01   Running    1     -64   -3966     813                       1
18     s01   Running    1     -66   -3971     826                       1
19     s01   Running    1     -62   -3988     827                       1

Example 1: Creating augmented data by using Add Noise method.

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('toy_data',
                  data_columns=['accelx', 'accely', 'accelz'],
                  group_columns=['Subject', 'Class', 'Rep'],
                  label_column='Class')
>>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5 })
>>> client.pipeline.add_augmentation(
                  [
                        {"name": "Add Noise",
                        "params": {"scale": [0.3, 0.5],
                                    "target_labels": ['Crawling'],
                                    "target_sensors": ['accelx', 'accelz']
                                 }},
                  ],
                  params={"percent": 0.5}
               )
>>> results, stats = client.pipeline.execute()

The subset of the data will be created by grouping the original data. Concatenation of group columns ([‘Subject’, ‘Class’, ‘Rep’]) and SegmentID are used for grouping. Only subset items with targeted labels (‘Crawling’) will be selected.

Subset-1:

   Subject     Class  Rep  accelx  accely  accelz               SegmentID
0      s01  Crawling    1     377     569    4019                       0
1      s01  Crawling    1     357     594    4051                       0
2      s01  Crawling    1     333     638    4049                       0
3      s01  Crawling    1     340     678    4053                       0
4      s01  Crawling    1     372     708    4051                       0

Subset-2:

   Subject     Class  Rep  accelx  accely  accelz               SegmentID
1      s01  Crawling    1     410     733    4028                       1
2      s01  Crawling    1     450     733    3988                       1
3      s01  Crawling    1     492     696    3947                       1
4      s01  Crawling    1     518     677    3943                       1
5      s01  Crawling    1     528     695    3988                       1

Since “percent” is 0.5, half of the subset data will be selected randomly. Let’s say Subset-2 is selected. Only time series of targeted sensors ([‘accelx’, ‘accelz’]) will be applied to the desired augmentation method with defined scale items.

Augmented Data with Add Noise method with scale of 0.3:
   Subject     Class  Rep  accelx  accely  accelz               SegmentID
1      s01  Crawling    1     351     733    4016  1__add_noise_scale_0.3
2      s01  Crawling    1     474     733    3978  1__add_noise_scale_0.3
3      s01  Crawling    1     441     696    3943  1__add_noise_scale_0.3
4      s01  Crawling    1     504     677    3969  1__add_noise_scale_0.3
5      s01  Crawling    1     508     695    3975  1__add_noise_scale_0.3

Augmented Data with Add Noise method with scale of 0.5:
   Subject     Class  Rep  accelx  accely  accelz               SegmentID
1      s01  Crawling    1     486     733    4067  1__add_noise_scale_0.5
2      s01  Crawling    1     395     733    3955  1__add_noise_scale_0.5
3      s01  Crawling    1     479     696    3873  1__add_noise_scale_0.5
4      s01  Crawling    1     481     677    3907  1__add_noise_scale_0.5
5      s01  Crawling    1     482     695    4001  1__add_noise_scale_0.5

At the end the augmentation function, original data and augmented data will concatenate.

>>> print(results)
   Out:
   Subject     Class  Rep  accelx  accely  accelz               SegmentID
0      s01  Crawling    1     377     569    4019                       0
1      s01  Crawling    1     357     594    4051                       0
2      s01  Crawling    1     333     638    4049                       0
3      s01  Crawling    1     340     678    4053                       0
4      s01  Crawling    1     372     708    4051                       0
5      s01  Crawling    1     410     733    4028                       1
6      s01  Crawling    1     450     733    3988                       1
7      s01  Crawling    1     492     696    3947                       1
8      s01  Crawling    1     518     677    3943                       1
9      s01  Crawling    1     528     695    3988                       1
10     s01   Running    1     -44   -3971     843                       0
11     s01   Running    1     -47   -3982     836                       0
12     s01   Running    1     -43   -3973     832                       0
13     s01   Running    1     -40   -3973     834                       0
14     s01   Running    1     -48   -3978     844                       0
15     s01   Running    1     -52   -3993     842                       1
16     s01   Running    1     -64   -3984     821                       1
17     s01   Running    1     -64   -3966     813                       1
18     s01   Running    1     -66   -3971     826                       1
19     s01   Running    1     -62   -3988     827                       1
20     s01  Crawling    1     351     733    4016  1__add_noise_scale_0.3
21     s01  Crawling    1     474     733    3978  1__add_noise_scale_0.3
22     s01  Crawling    1     441     696    3943  1__add_noise_scale_0.3
23     s01  Crawling    1     504     677    3969  1__add_noise_scale_0.3
24     s01  Crawling    1     508     695    3975  1__add_noise_scale_0.3
25     s01  Crawling    1     486     733    4067  1__add_noise_scale_0.5
26     s01  Crawling    1     395     733    3955  1__add_noise_scale_0.5
27     s01  Crawling    1     479     696    3873  1__add_noise_scale_0.5
28     s01  Crawling    1     481     677    3907  1__add_noise_scale_0.5

Example 2: Creating augmented data by using Add Noise and Add Quantize method.

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('toy_data',
                  data_columns=['accelx', 'accely', 'accelz'],
                  group_columns=['Subject', 'Class', 'Rep'],
                  label_column='Class')
>>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5 })
>>> client.pipeline.add_augmentation(
                  [
                        {"name": "Add Noise",
                        "params": {"scale": [0.3, 0.5],
                                    "target_labels": ['Crawling'],
                                    "target_sensors": ['accelx', 'accelz']
                                 }},
                        {"name": "Add Quantize",
                        "params": {"n_levels": [10] }},
                  ],
                  params={"percent": 0.5}
               )
>>> results, stats = client.pipeline.execute()

For each augmentation method, subsets are created individually. In “Add Noise” method, target sensors and target labels are defined, and a subset of the original data is created, as explained in Example-1. But in “Add Quantize” method, target sensors and target labels are not defined. Therefore the data from all sensors and labels will be used to create group data. Only half of the group data will be used for each label.

After augmented data sets are created by using these two methods individually, they will concatenate with the original data set.

>>> print(results)
   Out:
         Subject     Class  Rep  accelx  accely  accelz                    SegmentID
   0      s01  Crawling    1     377     569    4019                            0
   1      s01  Crawling    1     357     594    4051                            0
   2      s01  Crawling    1     333     638    4049                            0
   3      s01  Crawling    1     340     678    4053                            0
   4      s01  Crawling    1     372     708    4051                            0
   5      s01  Crawling    1     410     733    4028                            1
   6      s01  Crawling    1     450     733    3988                            1
   7      s01  Crawling    1     492     696    3947                            1
   8      s01  Crawling    1     518     677    3943                            1
   9      s01  Crawling    1     528     695    3988                            1
   10     s01   Running    1     -44   -3971     843                            0
   11     s01   Running    1     -47   -3982     836                            0
   12     s01   Running    1     -43   -3973     832                            0
   13     s01   Running    1     -40   -3973     834                            0
   14     s01   Running    1     -48   -3978     844                            0
   15     s01   Running    1     -52   -3993     842                            1
   16     s01   Running    1     -64   -3984     821                            1
   17     s01   Running    1     -64   -3966     813                            1
   18     s01   Running    1     -66   -3971     826                            1
   19     s01   Running    1     -62   -3988     827                            1
   20     s01  Crawling    1     396     733    3981       1__add_noise_scale_0.3
   21     s01  Crawling    1     447     733    3958       1__add_noise_scale_0.3
   22     s01  Crawling    1     464     696    3924       1__add_noise_scale_0.3
   23     s01  Crawling    1     486     677    3979       1__add_noise_scale_0.3
   24     s01  Crawling    1     528     695    3961       1__add_noise_scale_0.3
   25     s01  Crawling    1     378     733    4152       1__add_noise_scale_0.5
   26     s01  Crawling    1     442     733    4036       1__add_noise_scale_0.5
   27     s01  Crawling    1     479     696    3927       1__add_noise_scale_0.5
   28     s01  Crawling    1     482     677    4060       1__add_noise_scale_0.5
   29     s01  Crawling    1     575     695    3944       1__add_noise_scale_0.5
   30     s01  Crawling    1     374     575    4020  0__add_quantize_n_levels_10
   31     s01  Crawling    1     357     603    4051  0__add_quantize_n_levels_10
   32     s01  Crawling    1     335     645    4051  0__add_quantize_n_levels_10
   33     s01  Crawling    1     344     687    4051  0__add_quantize_n_levels_10
   34     s01  Crawling    1     374     701    4051  0__add_quantize_n_levels_10
   35     s01   Running    1     -52   -3991     840  1__add_quantize_n_levels_10
   36     s01   Running    1     -63   -3983     823  1__add_quantize_n_levels_10
   37     s01   Running    1     -63   -3967     814  1__add_quantize_n_levels_10
   38     s01   Running    1     -65   -3970     826  1__add_quantize_n_levels_10
   39     s01   Running    1     -61   -3986     828  1__add_quantize_n_levels_10

The details of each augmentation method are explained below.

library.core_functions.augmentation.add_convolve(input_data, label_column, fraction_of_segment, kernel, target_labels, percent, group_columns, target_sensors, **kwargs)

Add Convolve:

Convolve (smoothing) time series with a kernel window for each segment.

Parameters
  • input_data (DataFrame) – Input data

  • fraction_of_segment (list) – The fraction of segment size.

  • kernel (str) – The type of kernel window used for the convolution. See unhandled xref case for more details.

  • target_labels (list) – List of labels that drift will be applied.

  • percent (float) – Percentage of the dataset that drift will be applied.

  • target_sensors (list) – List of sensors that drift will be applied.

Returns

which includes convolved data set.

Return type

DataFrame

Example

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> df = pd.concat([df,df]).reset_index(drop=True)
>>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('toy_data',
                        data_columns=['accelx', 'accely', 'accelz'],
                        group_columns=['Subject', 'Class', 'Rep'],
                        label_column='Class')
>>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5})
>>> client.pipeline.add_augmentation(
        [
            {"name": "Add Convolve",
             "params": {"fraction_of_segment": [0.5, 0.2],
                        "kernel": "hann",
                        "target_labels": ['Crawling'],
                        "target_sensors": ['accelx']
                       }}
        ],
        params={"percent": 0.8}
    )
>>> results, stats = client.pipeline.execute()
>>> print(results)
    Out:
           Subject     Class  Rep  accelx  accely  accelz                               SegmentID
        0      s01  Crawling    1     377     569    4019                                       0
        1      s01  Crawling    1     357     594    4051                                       0
        2      s01  Crawling    1     333     638    4049                                       0
        3      s01  Crawling    1     340     678    4053                                       0
        4      s01  Crawling    1     372     708    4051                                       0
        5      s01  Crawling    1     410     733    4028                                       0
        6      s01  Crawling    1     450     733    3988                                       0
        7      s01  Crawling    1     492     696    3947                                       0
        8      s01  Crawling    1     518     677    3943                                       0
        9      s01  Crawling    1     528     695    3988                                       0
        10     s01  Crawling    1      -1    2558    4609                                       0
        11     s01  Crawling    1     377     569    4019                                       0
        12     s01  Crawling    1     357     594    4051                                       0
        13     s01  Crawling    1     333     638    4049                                       0
        14     s01  Crawling    1     340     678    4053                                       0
        15     s01  Crawling    1     372     708    4051                                       0
        16     s01  Crawling    1     410     733    4028                                       0
        17     s01  Crawling    1     450     733    3988                                       0
        18     s01  Crawling    1     492     696    3947                                       0
        19     s01  Crawling    1     518     677    3943                                       0
        20     s01   Running    1     -44   -3971     843                                       0
        21     s01   Running    1     -47   -3982     836                                       0
        ..     ...       ...  ...     ...     ...     ...                                     ...
        60     s01  Crawling    1     366     569    4019    0__add_convolve_size_0.2_window_hann
        61     s01  Crawling    1     344     594    4051    0__add_convolve_size_0.2_window_hann
        62     s01  Crawling    1     335     638    4049    0__add_convolve_size_0.2_window_hann
        63     s01  Crawling    1     355     678    4053    0__add_convolve_size_0.2_window_hann
        64     s01  Crawling    1     390     708    4051    0__add_convolve_size_0.2_window_hann
        65     s01  Crawling    1     429     733    4028    0__add_convolve_size_0.2_window_hann
        66     s01  Crawling    1     470     733    3988    0__add_convolve_size_0.2_window_hann
        67     s01  Crawling    1     504     696    3947    0__add_convolve_size_0.2_window_hann
        68     s01  Crawling    1     522     677    3943    0__add_convolve_size_0.2_window_hann
        69     s01  Crawling    1     263     695    3988    0__add_convolve_size_0.2_window_hann
        70     s01  Crawling    1     187    2558    4609    0__add_convolve_size_0.2_window_hann
        71     s01  Crawling    1     366     569    4019    0__add_convolve_size_0.2_window_hann
        72     s01  Crawling    1     344     594    4051    0__add_convolve_size_0.2_window_hann
        73     s01  Crawling    1     335     638    4049    0__add_convolve_size_0.2_window_hann
        74     s01  Crawling    1     355     678    4053    0__add_convolve_size_0.2_window_hann
        75     s01  Crawling    1     390     708    4051    0__add_convolve_size_0.2_window_hann
        76     s01  Crawling    1     429     733    4028    0__add_convolve_size_0.2_window_hann
        77     s01  Crawling    1     470     733    3988    0__add_convolve_size_0.2_window_hann
        78     s01  Crawling    1     504     696    3947    0__add_convolve_size_0.2_window_hann
        79     s01  Crawling    1     517     677    3943    0__add_convolve_size_0.2_window_hann

library.core_functions.augmentation.add_drift(input_data, label_column, max_drift, n_drift_points, target_labels, percent, group_columns, target_sensors, **kwargs)

Add Drift:

The augmenter drifts the value of time series from its original values randomly and smoothly. The extent of drifting is controlled by the maximal drift and the number of drift points.

Parameters
  • input_data (DataFrame) – Input data

  • max_drift (list) – The maximal amount of drift added to a time series. The maximal drift added to a time series (each sensor, each segment) is sampled from the interval randomly.

  • n_drift_points (int) – The number of time points a new drifting trend is defined in a series.

  • target_labels (list) – List of labels that drift will be applied.

  • percent (float) – Percentage of the data set that drift will be applied.

  • target_sensors (list) – List of sensors that drift will be applied.

Returns

which includes drift added data set.

Return type

DataFrame

Example

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('toy_data',
                        data_columns=['accelx', 'accely', 'accelz'],
                        group_columns=['Subject', 'Class', 'Rep'],
                        label_column='Class')
>>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5})
>>> client.pipeline.add_augmentation(
        [
            {"name": "Add Drift", "params": {"max_drift": [0.1, 1.5],
                                            "n_drift_points": 1,
                                            "target_labels": ['Crawling'],
                                            "target_sensors": ['accelx']}}
        ],
        params={"percent": 0.4}
    )
>>> results, stats = client.pipeline.execute()
>>> print(results)
    Out:
           Subject     Class  Rep  accelx  accely  accelz                                    SegmentID
        0      s01  Crawling    1     377     569    4019                                            0
        1      s01  Crawling    1     357     594    4051                                            0
        2      s01  Crawling    1     333     638    4049                                            0
        3      s01  Crawling    1     340     678    4053                                            0
        4      s01  Crawling    1     372     708    4051                                            0
        5      s01  Crawling    1     410     733    4028                                            1
        6      s01  Crawling    1     450     733    3988                                            1
        7      s01  Crawling    1     492     696    3947                                            1
        8      s01  Crawling    1     518     677    3943                                            1
        9      s01  Crawling    1     528     695    3988                                            1
        10     s01   Running    1     -44   -3971     843                                            0
        11     s01   Running    1     -47   -3982     836                                            0
        12     s01   Running    1     -43   -3973     832                                            0
        13     s01   Running    1     -40   -3973     834                                            0
        14     s01   Running    1     -48   -3978     844                                            0
        15     s01   Running    1     -52   -3993     842                                            1
        16     s01   Running    1     -64   -3984     821                                            1
        17     s01   Running    1     -64   -3966     813                                            1
        18     s01   Running    1     -66   -3971     826                                            1
        19     s01   Running    1     -62   -3988     827                                            1
        20     s01  Crawling    1     410     733    4028  1__add_drift_max_drift_0.1_n_drift_points_1
        21     s01  Crawling    1     449     733    3988  1__add_drift_max_drift_0.1_n_drift_points_1
        22     s01  Crawling    1     488     696    3947  1__add_drift_max_drift_0.1_n_drift_points_1
        23     s01  Crawling    1     511     677    3943  1__add_drift_max_drift_0.1_n_drift_points_1
        24     s01  Crawling    1     516     695    3988  1__add_drift_max_drift_0.1_n_drift_points_1
        25     s01  Crawling    1     410     733    4028  1__add_drift_max_drift_1.5_n_drift_points_1
        26     s01  Crawling    1     436     733    3988  1__add_drift_max_drift_1.5_n_drift_points_1
        27     s01  Crawling    1     445     696    3947  1__add_drift_max_drift_1.5_n_drift_points_1
        28     s01  Crawling    1     416     677    3943  1__add_drift_max_drift_1.5_n_drift_points_1
        29     s01  Crawling    1     351     695    3988  1__add_drift_max_drift_1.5_n_drift_points_1

library.core_functions.augmentation.add_dropout(input_data, label_column, fraction_of_segment, p, target_labels, percent, group_columns, target_sensors, **kwargs)

Add Dropout:

Dropout values of some random time points in time series without changing the length.

Parameters
  • input_data (DataFrame) – Input data

  • fraction_of_segment (list) – The fraction of segment size.

  • p (int) – Probability of the value of a time point to be dropped out.

  • target_labels (list) – List of labels that drift will be applied.

  • percent (float) – Percentage of the data set that drift will be applied.

  • target_sensors (list) – List of sensors that drift will be applied.

Returns

which includes dropout data set.

Return type

DataFrame

Example

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> df = pd.concat([df,df]).reset_index(drop=True)
>>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('toy_data',
                        data_columns=['accelx', 'accely', 'accelz'],
                        group_columns=['Subject', 'Class', 'Rep'],
                        label_column='Class')
>>> client.pipeline.add_transform('Windowing', params={'window_size' : 10, 'delta': 10})
>>> client.pipeline.add_augmentation(
        [
            {"name": "Add Dropout",
             "params": {"fraction_of_segment": [0.1, 0.5],
                        "p": 0.5,
                        "target_labels": ['Crawling'],
                        "target_sensors": ['accelx']
                       }}
        ],
        params={"percent": 0.8}
    )
>>> results, stats = client.pipeline.execute()
>>> print(results)
    Out:
           Subject     Class  Rep  accelx  accely  accelz                       SegmentID
        0      s01  Crawling    1     377     569    4019                               0
        1      s01  Crawling    1     357     594    4051                               0
        2      s01  Crawling    1     333     638    4049                               0
        3      s01  Crawling    1     340     678    4053                               0
        4      s01  Crawling    1     372     708    4051                               0
        5      s01  Crawling    1     410     733    4028                               0
        6      s01  Crawling    1     450     733    3988                               0
        7      s01  Crawling    1     492     696    3947                               0
        8      s01  Crawling    1     518     677    3943                               0
        9      s01  Crawling    1     528     695    3988                               0
        10     s01   Running    1     -44   -3971     843                               0
        11     s01   Running    1     -47   -3982     836                               0
        12     s01   Running    1     -43   -3973     832                               0
        13     s01   Running    1     -40   -3973     834                               0
        14     s01   Running    1     -48   -3978     844                               0
        15     s01   Running    1     -52   -3993     842                               0
        16     s01   Running    1     -64   -3984     821                               0
        17     s01   Running    1     -64   -3966     813                               0
        18     s01   Running    1     -66   -3971     826   0__add_dropout_size_0.1_p_0.5
        19     s01   Running    1     -62   -3988     827   0__add_dropout_size_0.1_p_0.5
        20     s01  Crawling    1     377     569    4019   0__add_dropout_size_0.1_p_0.5
        21     s01  Crawling    1     357     594    4051   0__add_dropout_size_0.1_p_0.5
        22     s01  Crawling    1     357     638    4049   0__add_dropout_size_0.1_p_0.5
        23     s01  Crawling    1     333     678    4053   0__add_dropout_size_0.1_p_0.5
        24     s01  Crawling    1     340     708    4051   0__add_dropout_size_0.1_p_0.5
        25     s01  Crawling    1     410     733    4028   0__add_dropout_size_0.1_p_0.5
        26     s01  Crawling    1     410     733    3988   0__add_dropout_size_0.1_p_0.5
        27     s01  Crawling    1     450     696    3947   0__add_dropout_size_0.1_p_0.5
        28     s01  Crawling    1     492     677    3943   0__add_dropout_size_0.1_p_0.5
        29     s01  Crawling    1     528     695    3988   0__add_dropout_size_0.5_p_0.5
        30     s01  Crawling    1     377     569    4019   0__add_dropout_size_0.5_p_0.5
        31     s01  Crawling    1     357     594    4051   0__add_dropout_size_0.5_p_0.5
        32     s01  Crawling    1     333     638    4049   0__add_dropout_size_0.5_p_0.5
        33     s01  Crawling    1     340     678    4053   0__add_dropout_size_0.5_p_0.5
        34     s01  Crawling    1     372     708    4051   0__add_dropout_size_0.5_p_0.5
        35     s01  Crawling    1     410     733    4028   0__add_dropout_size_0.5_p_0.5
        36     s01  Crawling    1     450     733    3988   0__add_dropout_size_0.5_p_0.5
        37     s01  Crawling    1     492     696    3947   0__add_dropout_size_0.5_p_0.5
        38     s01  Crawling    1     518     677    3943   0__add_dropout_size_0.5_p_0.5
        39     s01  Crawling    1     528     695    3988   0__add_dropout_size_0.5_p_0.5

library.core_functions.augmentation.add_noise(input_data, label_column, group_columns, target_sensors, target_labels, percent=0.1, scale=None)

Add Noise:

Add random noise to time series. The noise added to every time point of a time series is independent and identically distributed.

Parameters
  • input_data (DataFrame) – Input data

  • scale (list) – List of standard deviation of the random noise that will be used to add noise.

  • target_labels (list) – List of labels that drift will be applied.

  • percent (float) – Percentage of the data set that drift will be applied.

  • target_sensors (list) – List of sensors that drift will be applied.

Returns

which includes drift added data set.

Return type

DataFrame

Example

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('toy_data',
                        data_columns=['accelx', 'accely', 'accelz'],
                        group_columns=['Subject', 'Class', 'Rep'],
                        label_column='Class')
>>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5})
>>> client.pipeline.add_augmentation(
        [
            {"name": "Add Noise", "params":{"scale": [0.1, 0.2],
                                            "target_labels": ['Crawling'],
                                            "target_sensors": ['accelx'] }
            }
        ],
        params={"percent": 0.4}
    )
>>> results, stats = client.pipeline.execute()
>>> print(results)
    Out:
           Subject     Class  Rep  accelx  accely  accelz               SegmentID
        0      s01  Crawling    1     377     569    4019                       0
        1      s01  Crawling    1     357     594    4051                       0
        2      s01  Crawling    1     333     638    4049                       0
        3      s01  Crawling    1     340     678    4053                       0
        4      s01  Crawling    1     372     708    4051                       0
        5      s01  Crawling    1     410     733    4028                       1
        6      s01  Crawling    1     450     733    3988                       1
        7      s01  Crawling    1     492     696    3947                       1
        8      s01  Crawling    1     518     677    3943                       1
        9      s01  Crawling    1     528     695    3988                       1
        10     s01   Running    1     -44   -3971     843                       0
        11     s01   Running    1     -47   -3982     836                       0
        12     s01   Running    1     -43   -3973     832                       0
        13     s01   Running    1     -40   -3973     834                       0
        14     s01   Running    1     -48   -3978     844                       0
        15     s01   Running    1     -52   -3993     842                       1
        16     s01   Running    1     -64   -3984     821                       1
        17     s01   Running    1     -64   -3966     813                       1
        18     s01   Running    1     -66   -3971     826                       1
        19     s01   Running    1     -62   -3988     827                       1
        20     s01  Crawling    1     401     733    4028  1__add_noise_scale_0.1
        21     s01  Crawling    1     447     733    3988  1__add_noise_scale_0.1
        22     s01  Crawling    1     496     696    3947  1__add_noise_scale_0.1
        23     s01  Crawling    1     521     677    3943  1__add_noise_scale_0.1
        24     s01  Crawling    1     519     695    3988  1__add_noise_scale_0.1
        25     s01  Crawling    1     378     733    4028  1__add_noise_scale_0.2
        26     s01  Crawling    1     443     733    3988  1__add_noise_scale_0.2
        27     s01  Crawling    1     507     696    3947  1__add_noise_scale_0.2
        28     s01  Crawling    1     524     677    3943  1__add_noise_scale_0.2
        29     s01  Crawling    1     529     695    3988  1__add_noise_scale_0.2

library.core_functions.augmentation.add_pool(input_data, label_column, fraction_of_segment, target_labels, percent, group_columns, target_sensors, **kwargs)

Add Pool:

Reduce the temporal resolution without changing the length.

Parameters
  • input_data (DataFrame) – Input data

  • fraction_of_segment (list) – The fraction of segment size.

  • target_labels (list) – List of labels that drift will be applied.

  • percent (float) – Percentage of the data set that drift will be applied.

  • target_sensors (list) – List of sensors that drift will be applied.

Returns

which includes new resolution added data set.

Return type

DataFrame

Example

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> df = pd.concat([df,df]).reset_index(drop=True)
>>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('toy_data',
                        data_columns=['accelx', 'accely', 'accelz'],
                        group_columns=['Subject', 'Class', 'Rep'],
                        label_column='Class')
>>> client.pipeline.add_transform('Windowing', params={'window_size' : 10, 'delta': 10})
>>> client.pipeline.add_augmentation(
        [
            {"name": "Add Pool",
             "params": {"fraction_of_segment": [0.3, 0.5],
                        "target_labels": ['Crawling'],
                        "target_sensors": ['accelx']
                       }}
        ],
        params={"percent": 0.8}
    )
>>> results, stats = client.pipeline.execute()
>>> print(results)
    Out:
           Subject     Class  Rep  accelx  accely  accelz             SegmentID
        0      s01  Crawling    1     377     569    4019                     0
        1      s01  Crawling    1     357     594    4051                     0
        2      s01  Crawling    1     333     638    4049                     0
        3      s01  Crawling    1     340     678    4053                     0
        4      s01  Crawling    1     372     708    4051                     0
        5      s01  Crawling    1     410     733    4028                     0
        6      s01  Crawling    1     450     733    3988                     0
        7      s01  Crawling    1     492     696    3947                     0
        8      s01  Crawling    1     518     677    3943                     0
        9      s01  Crawling    1     528     695    3988                     0
        10     s01   Running    1     -44   -3971     843                     0
        11     s01   Running    1     -47   -3982     836                     0
        12     s01   Running    1     -43   -3973     832                     0
        13     s01   Running    1     -40   -3973     834                     0
        14     s01   Running    1     -48   -3978     844                     0
        15     s01   Running    1     -52   -3993     842                     0
        16     s01   Running    1     -64   -3984     821                     0
        17     s01   Running    1     -64   -3966     813                     0
        18     s01   Running    1     -66   -3971     826                     0
        19     s01   Running    1     -62   -3988     827                     0
        20     s01  Crawling    1     355     569    4019  0__add_pool_size_0.3
        21     s01  Crawling    1     355     594    4051  0__add_pool_size_0.3
        22     s01  Crawling    1     355     638    4049  0__add_pool_size_0.3
        23     s01  Crawling    1     374     678    4053  0__add_pool_size_0.3
        24     s01  Crawling    1     374     708    4051  0__add_pool_size_0.3
        25     s01  Crawling    1     374     733    4028  0__add_pool_size_0.3
        26     s01  Crawling    1     486     733    3988  0__add_pool_size_0.3
        27     s01  Crawling    1     486     696    3947  0__add_pool_size_0.3
        28     s01  Crawling    1     486     677    3943  0__add_pool_size_0.3
        29     s01  Crawling    1     528     695    3988  0__add_pool_size_0.3
        30     s01  Crawling    1     355     569    4019  0__add_pool_size_0.5
        31     s01  Crawling    1     355     594    4051  0__add_pool_size_0.5
        32     s01  Crawling    1     355     638    4049  0__add_pool_size_0.5
        33     s01  Crawling    1     355     678    4053  0__add_pool_size_0.5
        34     s01  Crawling    1     355     708    4051  0__add_pool_size_0.5
        35     s01  Crawling    1     479     733    4028  0__add_pool_size_0.5
        36     s01  Crawling    1     479     733    3988  0__add_pool_size_0.5
        37     s01  Crawling    1     479     696    3947  0__add_pool_size_0.5
        38     s01  Crawling    1     479     677    3943  0__add_pool_size_0.5
        39     s01  Crawling    1     479     695    3988  0__add_pool_size_0.5

library.core_functions.augmentation.add_quantize(input_data, label_column, n_levels, target_labels, percent, group_columns, target_sensors, **kwargs)

Add Quantize:

Quantize time series to a level set.

Parameters
  • input_data (DataFrame) – Input data

  • n_levels (list) – Values in a time series are rounded to the nearest level in the level set.

  • target_labels (list) – List of labels that drift will be applied.

  • percent (float) – Percentage of the data set that drift will be applied.

  • target_sensors (list) – List of sensors that drift will be applied.

Returns

which includes new resolution added data set.

Return type

DataFrame

Example

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> df = pd.concat([df,df]).reset_index(drop=True)
>>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('toy_data',
                        data_columns=['accelx', 'accely', 'accelz'],
                        group_columns=['Subject', 'Class', 'Rep'],
                        label_column='Class')
>>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5})
>>> client.pipeline.add_augmentation(
        [
            {"name": "Add Quantize",
             "params": {"n_levels": [2, 4],
                        "target_labels": ['Crawling'],
                        "target_sensors": ['accelx']
                       }}
        ],
        params={"percent": 0.5}
    )
>>> results, stats = client.pipeline.execute()
>>> print(results)
    Out:
           Subject     Class  Rep  accelx  accely  accelz                   SegmentID
        0      s01  Crawling    1     377     569    4019                           0
        1      s01  Crawling    1     357     594    4051                           0
        2      s01  Crawling    1     333     638    4049                           0
        3      s01  Crawling    1     340     678    4053                           0
        4      s01  Crawling    1     372     708    4051                           0
        5      s01  Crawling    1     410     733    4028                           1
        6      s01  Crawling    1     450     733    3988                           1
        7      s01  Crawling    1     492     696    3947                           1
        8      s01  Crawling    1     518     677    3943                           1
        9      s01  Crawling    1     528     695    3988                           1
        10     s01   Running    1     -44   -3971     843                           0
        11     s01   Running    1     -47   -3982     836                           0
        12     s01   Running    1     -43   -3973     832                           0
        13     s01   Running    1     -40   -3973     834                           0
        14     s01   Running    1     -48   -3978     844                           0
        15     s01   Running    1     -52   -3993     842                           1
        16     s01   Running    1     -64   -3984     821                           1
        17     s01   Running    1     -64   -3966     813                           1
        18     s01   Running    1     -66   -3971     826                           1
        19     s01   Running    1     -62   -3988     827                           1
        20     s01  Crawling    1     366     569    4019  0__add_quantize_n_levels_2
        21     s01  Crawling    1     366     594    4051  0__add_quantize_n_levels_2
        22     s01  Crawling    1     344     638    4049  0__add_quantize_n_levels_2
        23     s01  Crawling    1     344     678    4053  0__add_quantize_n_levels_2
        24     s01  Crawling    1     366     708    4051  0__add_quantize_n_levels_2
        25     s01  Crawling    1     371     569    4019  0__add_quantize_n_levels_4
        26     s01  Crawling    1     360     594    4051  0__add_quantize_n_levels_4
        27     s01  Crawling    1     338     638    4049  0__add_quantize_n_levels_4
        28     s01  Crawling    1     349     678    4053  0__add_quantize_n_levels_4
        29     s01  Crawling    1     371     708    4051  0__add_quantize_n_levels_4

library.core_functions.augmentation.add_reverse(input_data, label_column, target_labels, percent, group_columns, target_sensors, **kwargs)

Add Reverse:

Reverse the time line of series.

Parameters
  • input_data (DataFrame) – Input data

  • target_labels (list) – List of labels that drift will be applied.

  • percent (float) – Percentage of the data set that drift will be applied.

  • target_sensors (list) – List of sensors that drift will be applied.

Returns

which includes reversed of time series.

Return type

DataFrame

Example

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> df = pd.concat([df,df]).reset_index(drop=True)
>>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('toy_data',
                        data_columns=['accelx', 'accely', 'accelz'],
                        group_columns=['Subject', 'Class', 'Rep'],
                        label_column='Class')
>>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5})
>>> client.pipeline.add_augmentation(
        [
            {"name": "Add Reverse",
             "params": {"target_labels": ['Crawling'],
                        "target_sensors": ['accelx']
                       }}
        ],
        params={"percent": 0.5}
    )
>>> results, stats = client.pipeline.execute()
>>> print(results)
    Out:
        Subject     Class  Rep  accelx  accely  accelz          SegmentID
    0      s01  Crawling    1     377     569    4019                  0
    1      s01  Crawling    1     357     594    4051                  0
    2      s01  Crawling    1     333     638    4049                  0
    3      s01  Crawling    1     340     678    4053                  0
    4      s01  Crawling    1     372     708    4051                  0
    5      s01  Crawling    1     410     733    4028                  1
    6      s01  Crawling    1     450     733    3988                  1
    7      s01  Crawling    1     492     696    3947                  1
    8      s01  Crawling    1     518     677    3943                  1
    9      s01  Crawling    1     528     695    3988                  1
    10     s01   Running    1     -44   -3971     843                  0
    11     s01   Running    1     -47   -3982     836                  0
    12     s01   Running    1     -43   -3973     832                  0
    13     s01   Running    1     -40   -3973     834                  0
    14     s01   Running    1     -48   -3978     844                  0
    15     s01   Running    1     -52   -3993     842                  1
    16     s01   Running    1     -64   -3984     821                  1
    17     s01   Running    1     -64   -3966     813                  1
    18     s01   Running    1     -66   -3971     826                  1
    19     s01   Running    1     -62   -3988     827                  1
    20     s01  Crawling    1     528     733    4028  1__add_reverse__0
    21     s01  Crawling    1     518     733    3988  1__add_reverse__0
    22     s01  Crawling    1     492     696    3947  1__add_reverse__0
    23     s01  Crawling    1     450     677    3943  1__add_reverse__0
    24     s01  Crawling    1     410     695    3988  1__add_reverse__0

library.core_functions.augmentation.add_timewarp(input_data, label_column, n_speed_change, max_speed_ratio, target_labels, percent, group_columns, target_sensors, **kwargs)

Add Timewarp:

Random time warping. The augmenter random changed the speed of timeline. The time warping is controlled by the number of speed changes and the maximal ratio of max/min speed. Smaller n_speed_change and greater max_speed_ratio more similar to original signal

Parameters
  • input_data (DataFrame) – Input data

  • n_speed_change – The number of speed changes in each series.

  • max_speed_ratio – The maximal ratio of max/min speed in the warpped time line. The time line of a series is more likely to be significantly wrapped if this value is greater.

  • target_labels (list) – List of labels that drift will be applied.

  • percent (float) – Percentage of the data set that drift will be applied.

  • target_sensors (list) – List of sensors that drift will be applied.

Returns

which includes drift added data set.

Return type

DataFrame

Example

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> df = pd.concat([df,df]).reset_index(drop=True)
>>> client.upload_dataframe("toy_data.csv", df, force=True)
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('toy_data',
                        data_columns=['accelx', 'accely', 'accelz'],
                        group_columns=['Subject', 'Class', 'Rep'],
                        label_column='Class')
>>> client.pipeline.add_transform('Windowing', params={'window_size' : 5, 'delta': 5})
>>> client.pipeline.add_augmentation(
        [
            {"name": "Add TimeWarp",
             "params": {"n_speed_change": [3, 5],
                        "max_speed_ratio": 2.5,
                        "target_labels": ['Crawling'],
                        "target_sensors": ['accelx']
                       }}
        ],
        params={"percent": 0.5}
    )
>>> results, stats = client.pipeline.execute()
>>> print(results)
    Out:
       Subject     Class  Rep  accelx  accely  accelz                                             SegmentID
    0      s01  Crawling    1     377     569    4019                                                     0
    1      s01  Crawling    1     357     594    4051                                                     0
    2      s01  Crawling    1     333     638    4049                                                     0
    3      s01  Crawling    1     340     678    4053                                                     0
    4      s01  Crawling    1     372     708    4051                                                     0
    5      s01  Crawling    1     410     733    4028                                                     0
    6      s01  Crawling    1     450     733    3988                                                     0
    7      s01  Crawling    1     492     696    3947                                                     0
    8      s01  Crawling    1     518     677    3943                                                     0
    9      s01  Crawling    1     528     695    3988                                                     0
    10     s01   Running    1     -44   -3971     843                                                     0
    11     s01   Running    1     -47   -3982     836                                                     0
    12     s01   Running    1     -43   -3973     832                                                     0
    13     s01   Running    1     -40   -3973     834                                                     0
    14     s01   Running    1     -48   -3978     844                                                     0
    15     s01   Running    1     -52   -3993     842                                                     0
    16     s01   Running    1     -64   -3984     821                                                     0
    17     s01   Running    1     -64   -3966     813                                                     0
    18     s01   Running    1     -66   -3971     826  0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5
    19     s01   Running    1     -62   -3988     827  0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5
    20     s01  Crawling    1     377     569    4019  0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5
    21     s01  Crawling    1     352     594    4051  0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5
    22     s01  Crawling    1     333     638    4049  0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5
    23     s01  Crawling    1     336     678    4053  0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5
    24     s01  Crawling    1     372     708    4051  0__add_timewarp_n_speed_change_3_max_speed_ratio_2.5
    25     s01  Crawling    1     377     569    4019  0__add_timewarp_n_speed_change_5_max_speed_ratio_2.5
    26     s01  Crawling    1     359     594    4051  0__add_timewarp_n_speed_change_5_max_speed_ratio_2.5
    27     s01  Crawling    1     336     638    4049  0__add_timewarp_n_speed_change_5_max_speed_ratio_2.5
    28     s01  Crawling    1     337     678    4053  0__add_timewarp_n_speed_change_5_max_speed_ratio_2.5
    29     s01  Crawling    1     372     708    4051  0__add_timewarp_n_speed_change_5_max_speed_ratio_2.5