MLflow is a solution to many of these issues in this dynamic landscape, offering tools and simplifying processes to streamline the ML lifecycle and foster collaboration among ML practitioners.

MLflow can be used to version, track and register ML models used within Wizata pipelines as an alternative to the built-in model registry. MLflow offers additionnal features, such as e.g. model tracking, versioning, and artifacts management.

Configure

To get started, you will need to configure MLflow with Wizata through the custom.ini file.

[mlflow]
api=https://<your_url>/
username=<username>
password=<password>
azure_storage_access_key=<access_key>

๐Ÿ“˜

MLflow must be configured using a remote tracking server

For more information, you can access to the common setups

Currently, Wizata only supports Azure Blob Storage with access key as a remote tracking server.

Once configured properly, you can start using MLflow as a source for Model step within your pipelines.

Model configuration

To use MLflow as the source repository for the ML models within your pipeline you need to adapt the configuration of the Model step.

Source

Add the property โ€œsourceโ€ with value โ€œmlflowโ€ to the config in order to specify that you want to use MLflow within your model step config.

You can do it within your pipeline JSON definition:

{
  "type" : "model",
  "config":
  {
    "train_script": "your_script",
    "model_key": "your_model_key",
    "source": "mlflow"
  },
  "inputs": [
    { "dataframe" : "model_input" }
  ],
  "outputs": [
    { "dataframe" : "model_output" }
  ]
}

Or you can do it, from your Python definition:

pipeline.add_model(
    config=wizata_dsapi.MLModelConfig(
        train_script="your_script",
        model_key="your_key",
        source="mlflow"
    ),
    input_df="model_input",
    output_df="model_output"
)

Equivalent to native model, your model is identified within MLflow based on its model key combined by twin or/and by custom property. For more details, please see Model step.

You can continue using your native model training script with the method context.set_model( โ€ฆ ) to train your model. Note that your whole scripts will run within mlflow.startrun() therefore you can within your script use mlflow additional logging capabilities.

def train_script_sample(context: wizata_dsapi.Context):
    df = context.dataframe

    x = df[['x1', 'x2']]
    y = df['y']
    model = sklearn.linear_model.LinearRegression()
    model.fit(x, y)

    mlflow.log_param('coef_x1', model.coef_[0])
    mlflow.log_param('coef_x2', model.coef_[1])
    mlflow.log_param('intercept', model.intercept_)

    context.set_model(model, x.columns)

Type/Flavor

By default, Wizata DS API and Runners will use mlflow.pyfunc type and flavor to log and load your model. To use another flavor/type please add model_type key/value in model config.

{
  "type" : "model",
  "config":
  {
    "train_script": "your_script",
    "model_key": "your_model_key",
    "model_type": "sklearn"
    "source": "mlflow"
  },
  "inputs": [
    { "dataframe" : "model_input" }
  ],
  "outputs": [
    { "dataframe" : "model_output" }
  ]
}

๐Ÿ“˜

Currently, support flavors are: pyfunc (default), sklearn, xgboost, lightgbm, catboost, statsmodels and prophet .

Anytype of models can generally be supported by using default (pyfunc).

Version

During training phase, Wizata creates a new version of the model. During predict/inference the system download by default the latest updated version from MLflow based on the key. You can add an alias to the version you preferred and set the alias with the model_alias key/value in model config to force the usage of this version.

{
  "type" : "model",
  "config":
  {
    "train_script": "your_script",
    "model_key": "your_model_key",
    "model_alias": "your_alias>"
    "source": "mlflow"
  },
  "inputs": [
    { "dataframe" : "model_input" }
  ],
  "outputs": [
    { "dataframe" : "model_output" }
  ]
}

๐Ÿ“˜

The model doesnโ€™t have to be necessary trained within Wizata platform as you can simply point out it with the model_key, by_twin and by_property key/value.