ML Flow

MLflow is a solution to many of these issues in this dynamic landscape, offering tools and simplifying processes to streamline the ML lifecycle and foster collaboration among ML practitioners.

MLflow can be used to version, track and register ML models used within Wizata pipelines as an alternative to the built-in model registry. MLflow offers additional features, such as e.g. model tracking, versioning, and artifacts management.

Configure

To get started, you will need to configure MLflow with Wizata through the custom.ini file.

[mlflow]
api=https://<your_url>/
username=<username>
password=<password>
azure_storage_access_key=<access_key>

You can upload your updated custom_ini file under Pipeline Settings

📘
MLflow must be configured using a remote tracking server
For more information, you can access to the common setups

Currently, Wizata only supports Azure Blob Storage with access key as a remote tracking server.

Once configured properly, you can start using MLflow as a source for Model step within your pipelines.

Model configuration

To use MLflow as the source repository for the ML models within your pipeline you need to adapt the configuration of the Model step.

Source

Add the property “source” with value mlflow to the config in order to specify that you want to use MLflow within your model step config.

You can do it within your pipeline JSON definition:

{
  "type" : "model",
  "config":
  {
    "train_script": "your_script",
    "model_key": "your_model_key",
    "source": "mlflow"
  },
  "inputs": [
    { "dataframe" : "model_input" }
  ],
  "outputs": [
    { "dataframe" : "model_output" }
  ]
}

Or you can do it, from your Python definition:

pipeline.add_model(
    config=wizata_dsapi.MLModelConfig(
        train_script="your_script",
        model_key="your_key",
        source="mlflow"
    ),
    input_df="model_input",
    output_df="model_output"
)

Equivalent to native model, your model is identified within MLflow based on its model key combined by twin or/and by custom property. For more details, please see Model step.

You can continue using your native model training script with the method context.set_model( … ) to train your model. Note that your whole scripts will run within mlflow.startrun() therefore you can within your script use MLflow additional logging capabilities.

def train_script_sample(context: wizata_dsapi.Context):
    df = context.dataframe

    x = df[['x1', 'x2']]
    y = df['y']
    model = sklearn.linear_model.LinearRegression()
    model.fit(x, y)

    mlflow.log_param('coef_x1', model.coef_[0])
    mlflow.log_param('coef_x2', model.coef_[1])
    mlflow.log_param('intercept', model.intercept_)

    context.set_model(model, x.columns)

Type/Flavor

By default, Wizata DS API and Runners will use mlflow.pyfunc type and flavor to log and load your model. To use another flavor/type please add model_type key/value in model config.

{
  "type" : "model",
  "config":
  {
    "train_script": "your_script",
    "model_key": "your_model_key",
    "model_type": "sklearn"
    "source": "mlflow"
  },
  "inputs": [
    { "dataframe" : "model_input" }
  ],
  "outputs": [
    { "dataframe" : "model_output" }
  ]
}

📘
Currently, support flavors are: pyfunc (default), sklearn, xgboost, lightgbm, catboost, statsmodels and prophet .
Any type of models can generally be supported by using default (pyfunc).

Version

During training phase, Wizata creates a new version of the model. During predict/inference the system download by default the latest updated version from MLflow based on the key. You can add an alias to the version you preferred and set the alias with the model_alias key/value in model config to force the usage of this version.

{
  "type" : "model",
  "config":
  {
    "train_script": "your_script",
    "model_key": "your_model_key",
    "model_alias": "your_alias>"
    "source": "mlflow"
  },
  "inputs": [
    { "dataframe" : "model_input" }
  ],
  "outputs": [
    { "dataframe" : "model_output" }
  ]
}

The model doesn’t have to be necessary trained within Wizata platform as you can simply point out it with the model_key, by_twin and by_property key/value.

ML Flow

Configure

📘
MLflow must be configured using a remote tracking server

Model configuration

Source

Type/Flavor

📘
Currently, support flavors are: `pyfunc` (default), `sklearn`, `xgboost`, `lightgbm`, `catboost`, `statsmodels` and `prophet` .

Version

Configure

📘MLflow must be configured using a remote tracking server

Model configuration

Source

Type/Flavor

📘Currently, support flavors are: pyfunc (default), sklearn, xgboost, lightgbm, catboost, statsmodels and prophet .

Version

📘
MLflow must be configured using a remote tracking server

📘
Currently, support flavors are: `pyfunc` (default), `sklearn`, `xgboost`, `lightgbm`, `catboost`, `statsmodels` and `prophet` .