Model step
A Model step is used to train a machine learning model, run inference with an existing one, or both, depending on how you execute your pipeline. By default, models are trained only during experiment mode. You can use the train parameter within execution_options or directly as a parameter of run() or experiment() to control whether a pipeline execution should retrain the model or only use it for prediction.
The model step always stores and retrieves trained models from the Wizata model repository. For a complete overview of how models are stored, versioned, and identified within the platform, refer to Understanding Model Storage and Metadata in Wizata.
Model Types
The Model step supports two types of model trainers:
- Custom training scripts: Python functions you write and register on the platform, giving you full control over the algorithm, preprocessing logic, and framework used. Refer to the Script step article for details on how to create and register scripts.
- Built-in model trainers: predefined algorithms provided by the Wizata library, available out of the box for common industrial use cases such as anomaly detection, regression, and classification. For more details, refer to the Wizata library for common functions.
A Model step must strictly have one input and one output dataframe.
Adding a Model Step
Using the Pipeline UI, navigate to Library > Model Trainers in the left-hand panel. You can filter by Custom, Uploaded, or Built-in trainers. Drag the desired model trainer onto the canvas. Once the block is on the canvas, open the configuration panel on the right to set the training script, model key, features, and any other parameters.
Using the Python Toolkit, add the model as a step in the pipeline using pipeline.add_model() with an MLModelConfig object:
pipeline.add_model(
config=wizata_dsapi.MLModelConfig(
train_script="your_script",
model_key="your_key",
model_format="pkl"
),
input_df="model_input",
output_df="model_output"
)For the full list of available parameters, refer to the MLModelConfig reference.
Key Identifier
Models are identified primarily by a model_key , which is used to locate the model in the repository. A model can be shared across multiple pipelines or kept specific to one. If you omit the model_key, the pipeline key will be used as the identifier. For a complete explanation of how models are named and versioned using the <key>.<twin>.<property>@<alias> pattern, refer to Understanding Model Storage and Metadata in Wizata.
By twin
When using a pipeline across multiple assets, you can tell Wizata to store a separate model for each twin by setting by_twin=True. In this case, the twin's hardware_id is appended to the model_key. You must execute the pipeline in training mode for each twin separately to produce all individual models.
For example, if the twin
asset_ais used, the stored model will be identified asyour_key.asset_a.
pipeline.add_model(
config=wizata_dsapi.MLModelConfig(
train_script="your_script",
model_key="your_key",
by_twin=True
),
input_df="model_input",
output_df="model_output"
)By Property
In addition to or independently of by_twin, you can append a custom property value to the model_key using by_property=True. This allows you to maintain different models depending on a specific condition, such as the type of raw material being processed or the current operating mode.
For example, if the twin
asset_ais used and the propertyproduction_typehas valuetype_a, the stored model will beyour_key.asset_a.type_a.
pipeline.add_model(
config=wizata_dsapi.MLModelConfig(
train_script="your_script",
model_key="your_key",
by_twin=True,
by_property=True,
property_name="production_type"
),
input_df="model_input",
output_df="model_output"
)Make sure you have a proper training scenario for every combination of twin and property value you intend to use in production.
Training Script
A training script is a Python function that receives a dataframe from the context, fits a model, and stores it using context.set_model(). It follows the same conventions as any other Script step.
def train_my_model(context: wizata_dsapi.Context):
df = context.dataframe
x = df[['x1', 'x2']]
y = df['y']
model = sklearn.linear_model.LinearRegression()
model.fit(x, y)
context.set_model(model)The set_model method accepts the model as its first argument. Any additional keyword arguments will be stored as extra files in the model repository. Files will be stored as JSON if the object is serializable, otherwise as pickle.
context.set_model(model, features=x.columns)For models trained outside the platform, you can also upload them manually. Refer to Uploading a Manually Trained Model for the full process. If your use case requires automation, retraining across multiple assets, or scheduled executions, using a training pipeline is strongly recommended over manual uploads. See Automating Model Training with Pipelines for more details.
Features
By default, the model step uses all columns of the input dataframe as features. You can restrict and order the features in two ways.
Directly in the configuration:
{
"config": {
"features": ["dp1", "dp2"]
}
}From a file stored in the model repository:
{
"config": {
"features_from_file": "features.json"
}
}If both features and features_from_file are provided, features takes precedence.
Target Features
Use target_feat to specify a column that should be used as the prediction target during training and dropped during inference.
{
"type": "model",
"config": {
"target_feat": "y"
},
"inputs": [{ "dataframe": "model_input" }],
"outputs": [{ "dataframe": "model_output" }]
}Output
By default, the output of the predict function is appended to the input dataframe. The generated column is named result by default when there is a single output column. You can customize this behavior with the following parameters:
output_columns_names: a string or list of strings to rename the output columns. The number of names must match the number of output columns.output_append: when set toFalse, only the prediction result is returned instead of appending it to the input dataframe.output_prefix: adds a prefix to all output columns.
Function
By default, the pipeline calls the predict function on the trained model. If you need to use a different function such as transform, you can specify it with the function config key.
{
"type": "model",
"config": {
"train_script": "your_script",
"model_key": "your_key",
"function": "transform"
},
"inputs": [{ "dataframe": "model_input" }],
"outputs": [{ "dataframe": "model_output" }]
}Model Format
By default, models are stored as pickle files (pkl). PyTorch scripted models (pt) are also supported on Python version 3.11 and above.
Properties Mapping
Properties mapping renames properties in the context before passing them to your model script. The key is the name your model expects and the value is the actual property name found in the context. Both mappings are reversed after execution.
{
"type": "model",
"config": {
"properties_mapping": {
"percentage": "pct",
"size": "amount"
}
},
"inputs": [{ "dataframe": "query_output" }],
"outputs": [{ "dataframe": "first_output" }]
}If you mention a property in the mapping, it becomes mandatory.
Source and MLflow Integration
By default, models are stored in Wizata's internal cloud blob storage, identified by the source key wizata. If you have configured an external MLflow environment, you can use it for versioning, tracking, and model registry. For more information, refer to the ML Flow integration article.
Updated about 1 month ago