Model step

Model step can be used to train a model, predict with an existing model or both depending on how you execute your pipeline. You can use the execution_options 'train' to determine if model need to be trained or not.

By default, Model are trained only during experiment mode. You can use the parameters ‘train’ within executions_options property or directly as run() or experiment() parameter to determine if pipeline execution should attempt to train your model or only use it for prediction.

The model step will store and use trained model only from the platform repository.

{
  "type": "model",
  "config": {
    "train_script" : "your_script",
    "model_key": "your_key"
  },
  "inputs": [
    { "dataframe" : "model_input" }
  ],
  "outputs": [
    { "dataframe" : "model_output" }
  ]
}

The model step use wizata_dsapi.MLModelConfig as config and can be created also directly using python :

 pipeline.add_model(
    config=wizata_dsapi.MLModelConfig(
        train_script="your_script",
        model_key="your_key"
    ),
    input_df="model_input",
    output_df="model_output"
)

Models step must strictly have one input and one output, it is strongly recommended to use context:wizata_dsapi.Context as the sole parameter of your Script.

Key Identifier

Models are identified primarily key called model_key and will use that key to identify the model in the repository/registry.

A model can be unique for a pipeline or used by multiple pipelines. If you want the model to be tight to your pipeline, you can omit the model_key parameter and the pipeline key will be used.

By Twin

A model can be trained by a pipeline differently for each registered twin, in this case the hardware_id is added to the model_key. In this context, Wizata will store a separate model for each twin. You must execute the pipeline for each twin on train mode to train all model or upload them with the proper key identifier

Note the hardware_id of the twin registered must be specified in the Digital Twin. To tell the pipeline to keep and use different model by twin, use the by_twin config key.

e.g. if a twin “asset_a” is used for this pipeline the model used will be your_script.asset_a

 pipeline.add_model(
    config=wizata_dsapi.MLModelConfig(
        train_script="your_script",
        model_key="your_key",
        by_twin=True
    ),
    input_df="model_input",
    output_df="model_output"
)

By Property

In addition or independently you can also append a custom property to the model_key. This allow you to control different models depending on the situation. To tell the pipeline to keep and use a different model by a specific property, use the by_property config key.

In addition, you need to specify the name of the variable containing the value of your variation.

e.g. if a twin “asset_a” is used for this pipeline, the “production_type” property is used to determine which kind of raw materials is currently processed and you want different model by type of materials. You can create a variable named ‘production_type’ and set the value e.g. ‘type_a’ on a previous step. Therefore the model used will be your_script.asset_a.type_a

pipeline.add_model(
    config=wizata_dsapi.MLModelConfig(
        train_script="your_script",
        model_key="your_key",
        by_twin=True,
        by_property=True,
        property_name="production_type"
    ),
    input_df="model_input",
    output_df="model_output"
)

📘

You need to ensure you have a proper training scenario for all model key, twins and property value combination.

Training Script

Training a model within Wizata help you automate the re-training. This can be done within the same pipeline or a separate pipeline.

You can decide to train your model from Wizata app or upload your trained model in the repository. Model trained within Wizata need a training script and a pipeline executed in experiment mode or with parameter train=True.

Make sure your script use a context to retrieve context.dataframe and context.set_model to set the model.

def train_bearings_fit(context: wizata_dsapi.Context):
    df = context.dataframe

    x = df[['x1', 'x2']]
    y = df['y']

    model = sklearn.linear_model.LinearRegression()
    model.fit(x, y)

    context.set_model(model, x.columns)

For more information, please check Script. You must pass the script function name in train_script config key.

Features

By default, the model block will attempt to use all columns of your input data frame. But you can use the features config to pass a list of features.

Therefore Wizata will select and order features based on that list from your input data frame.

{
  "config": {

    "features": ["dp1", "dp2"],

  },
}

Target Features

You can use the target_feat config to set one column name to be used as target. This column will be dropped from prediction.

If your model need to use extra column(s) during training and remove them during predict, you can use the target_feat key to define the name of the column(s) to be removed at predict time.

{
  "type": "model",
  "config": {
    // ...
    "target_feat": "y",
    // ...
  },
  "inputs": [
    { "dataframe" : "model_input" }
  ],
  "outputs": [
    { "dataframe" : "model_output" }
  ]
}

Split

By default, the pipeline will use your input data frame for both training and scoring.

train_test_split_type which can accept:

  • “ignore” default - no split

This features are under development.

Output

By default, output of the predict/x function will be appended to your input data frame. Unique generated column is named by default ‘result’ if there’s only a single column or they keep default naming if multiple. But you may specify a few optional parameters to alter this behaviour:

  • output_columns_names - str or list; single name or list of names to rename columns from predict/x function. number of columns must corresponds.
  • output_append - bool; by default true append result to input df. if not the only predict result is return.
  • output_prefix - str; set a prefix to all columns output by predict/x data frame.

e.g. renaming three column output and not appending the results :

{
  "type": "model",
  "config": {
    // ...
    "output_columns_names": ["c1", "c2", "c3"],
    "output_append": False,
    // ...
  },
  "inputs": [
    { "dataframe" : "model_input" }
  ],
  "outputs": [
    { "dataframe" : "model_output" }
  ]
}

e.g. adding a prefix result to all columns

{
  "type": "model",
  "config": {
    // ...
    "output_prefix": "result",
    // ...
  },
  "inputs": [
    { "dataframe" : "model_input" }
  ],
  "outputs": [
    { "dataframe" : "model_output" }
  ]
}

Function

By default, pipeline will attempt to use the predict function from your model. If you need to use another function like "transform" or any function of your model instead, you can use the function config key.

{
  "type": "model",
  "config": {
    "train_script" : "uat_custom_function",
    "model_key": "uat_function_01",
    "function": "transform"
  },
  "inputs": [
    { "dataframe" : "model_input" }
  ],
  "outputs": [
    { "dataframe" : "model_output" }
  ]
}

Source

Repository source is by default a cloud blob storage within Wizata where all pickled files are stored. Default repository is identified by an optional key source wizata

If you want you can integrate your ML Flow environment to use versioning, tracking and an external registry.