Script step

Script Step can be used for various task as they are executing a custom script.

Creating scripts allow you to create transformations, plotly visuals, connectivity with third-party apps, train ML Models, … They allow you to share your solutions with your colleagues to be used within Wizata. They should respect some convention in order to be usable and understandable by others.

The custom script must have the same quantity of Data Frame inputs and outputs as declared in the step. You can always add a properties dict as parameter or context to access more information.

Uploading a custom script

To store a Script on Wizata, you can either use create/update or upsert. With _upsert _the function, script are updated on server event if another one with identical name is found. Ensure that the function name you assign is unique.

Here is an example of the script:

def your_function(context:wizata_dsapi.Context):
    df = context.dataframe # defines dataframes, this will come from the query step of pipeline
    # put your logic here 
    
    # df_result = ...
    
    return df_result
 wizata_dsapi.api().upsert(your_function)

If you are unsure your function name is unique you can use the get function to check. If the function doesn’t exists you should receive a 404 not foundexception.

my_script = wizata_dsapi.api().get(
    script_name="your_function"
)
print(my_script)

You can alternatively use the create and update function. That way you control that operation executed is the one intended.

wizata_dsapi.api().create(
     wizata_dsapi.Script(
         function=<your_function>
     )
)

wizata_dsapi.api().update(
     wizata_dsapi.Script(
         function=<your_function>
     )
)

Parameters

Depending on your needs and the flexibility you desire for your script function, you can bind parameters to it. Your function must accept at least one parameter to be compatible with Wizata. The parameters are bound based on their type annotation first, and secondly by their variable name.

To get all possible data within your function you should use an Contextobject. You can do it by using the type annotation wizata_dsapi.Context or name your parameter context, like from the example below:

def your_function(context: wizata_dsapi.Context):

    # example how to get useful context properties and input data
    df = context.dataframe
    props = context.properties   

And its equivalent:

def your_function(context):

    # example how to get useful context properties and input data
    df = context.dataframe
    props = context.properties

Within the context, you can also access all properties from Data Points used in queries through context.datapoints. This is a dictionary where the key is either the hardware ID or the template property name used in the request.

Properties

The context might contain useful custom variables and information to configure your function.

You can use for that the propertiesdictionary.

It is accessible within the Context object oncontext.properties.

def your_function(context: wizata_dsapi.Context):

    props = context.properties

You should always prevent runtime errors by checking for NoneType or missing keys.

def your_function(context: wizata_dsapi.Context):

    if context.properties['mykey'] is None:
        raise KeyError('cannot find mykey within properties in your_function')

You can also use type annotation dict or the name props or properties for properties.

def sample1(properties):
    return

def sample2(params: dict):
    return

def sample3(props):
    return

The order of parameters have no importance, firstly the type and secondly the name.

While executed within a pipeline, properties contain additional information from your registration or variables pass as parameters of your execution.

Dataframes

Dataframe is by far the most useful data inside a function. They allow you to create data transformation scripts or use output data of query or another transformation.

You can extract input data from context.dataframe and set output data on context.result_dataframe

def transformation(context: wizata_dsapi.Context):

   df = context.dataframe

   # YOUR LOGIC

   context.result_dataframe = df

Alternatively you can simplify declaration by using type annotation pandas.Dataframe or using paramater name df or dataframe

def transformation_sample_1(var1: pandas.Dataframe):
     # YOUR LOGIC
   context.result_dataframe = var1

def transformation_sample_2(df):
     # YOUR LOGIC
   context.result_dataframe = df

def transformation_sample_3(dataframe):
     # YOUR LOGIC
   context.result_dataframe = dataframe

You can also use a return statement with a pandas.Dataframe or wizata_dsapi.DSDataframe

def transformation(df):
   # YOUR LOGIC
   return df

Consider always using type annotation to avoid wrong parameter binding

def transformation(df: pandas.Dataframe) -> pandas.Dataframe
   # YOUR LOGIC
   return df

You can also add multiple input or outputs to your script and combine them.

def transformation(df1: pandas.Dataframe, df2: pandas.Dataframe)
   # YOUR LOGIC
   return df3 , df4

JSON Format of the Script step

Here’s an example of a script step config that uses a split function:

{
  "type": "script",
  "config": {
    "function" : "split_df"
    }
  "inputs": [
    { "dataframe" : "query_output" }
  ],
  "outputs": [
    { "dataframe" : "first_output" },
    { "dataframe" : "second_output" }
  ]
}

The Script step must contain the following properties :

  • config” with a function referring to your script name
  • inputs” with a list of inputs (dataframe)
  • outputs” with a list of outputs (dataframe)

A script must have at least one output or one input.

You may want to add a features_mapping and/or a properties_mapping, both are dictionaries.

Features Mapping

Features mapping will attempt to rename a column from input(s) data frame from the value to key ; key being the name your script expect and then the value the name found into the data frame.

{
  "type": "script",
  "config": {
    "function" : "split_df",
    "features_mapping" : {
        "feature_column" : "query_column"
    }
  },
  "inputs": [
    { "dataframe" : "query_output" }
  ],
  "outputs": [
    { "dataframe" : "first_output" },
    { "dataframe" : "second_output" }
  ]
}

Properties Mapping

Properties mapping will attempt to rename a property from context from the value to key ; key being the name your script expect and then the value the name found into the context.

{
  "type": "script",
  "config": {
    "function" : "split_df",
    "properties_mapping" : {
        "percentage" : "pct",
        "size" : "amount"
    }
  },
  "inputs": [
    { "dataframe" : "query_output" }
  ],
  "outputs": [
    { "dataframe" : "first_output" },
    { "dataframe" : "second_output" }
  ]
}

Both operations will be reversed after your script executed to leave data frame and properties as you received them (exception made to modification your script have applied to them).

🚧

If you mention a feature or a property in the mapping, it becomes mandatory.

Data Points

Within a script step you can access Data Point properties and other useful information through the context object passed as parameter.

In order to do so, you can use context.datapoints[name_of_your_datapoint] to receive the wizata_dsapi.DataPoint object.