Script step
Script Step can be used for various task as they are executing a custom script.
Creating scripts allow you to create transformations, plotly visuals, connectivity with third-party apps, train ML Models, … They allow you to share your solutions with your colleagues to be used within Wizata. They should respect some convention in order to be usable and understandable by others.
The custom script must have the same quantity of Data Frame inputs and outputs as declared in the step. You can always add a properties dict as parameter or context to access more information.
Uploading a custom script
To store a Script on Wizata, you can either use create/update or upsert. With _upsert _the function, script are updated on server event if another one with identical name is found. Ensure that the function name you assign is unique.
Here is an example of the script:
def your_function(context:wizata_dsapi.Context):
df = context.dataframe # defines dataframes, this will come from the query step of pipeline
# put your logic here
# df_result = ...
return df_result
wizata_dsapi.api().upsert(your_function)
If you are unsure your function name is unique you can use the get function to check. If the function doesn’t exists you should receive a 404 not found
exception.
my_script = wizata_dsapi.api().get(
script_name="your_function"
)
print(my_script)
You can alternatively use the create and update function. That way you control that operation executed is the one intended.
wizata_dsapi.api().create(
wizata_dsapi.Script(
function=<your_function>
)
)
wizata_dsapi.api().update(
wizata_dsapi.Script(
function=<your_function>
)
)
Parameters
Depending on your needs and the flexibility you desire for your script function, you can bind parameters to it. Your function must accept at least one parameter to be compatible with Wizata. The parameters are bound based on their type annotation first, and secondly by their variable name.
To get all possible data within your function you should use an Context
object. You can do it by using the type annotation wizata_dsapi.Context
or name your parameter context, like from the example below:
def your_function(context: wizata_dsapi.Context):
# example how to get useful context properties and input data
df = context.dataframe
props = context.properties
And its equivalent:
def your_function(context):
# example how to get useful context properties and input data
df = context.dataframe
props = context.properties
Within the context, you can also access all properties from Data Points used in queries through context.datapoints
. This is a dictionary where the key is either the hardware ID or the template property name used in the request.
Properties
The context might contain useful custom variables and information to configure your function.
You can use for that the propertiesdictionary.
It is accessible within the Context object oncontext.properties
.
def your_function(context: wizata_dsapi.Context):
props = context.properties
You should always prevent runtime errors by checking for NoneType or missing keys.
def your_function(context: wizata_dsapi.Context):
if context.properties['mykey'] is None:
raise KeyError('cannot find mykey within properties in your_function')
You can also use type annotation dict or the name props
or properties
for properties.
def sample1(properties):
return
def sample2(params: dict):
return
def sample3(props):
return
The order of parameters have no importance, firstly the type and secondly the name.
While executed within a pipeline, properties contain additional information from your registration or variables pass as parameters of your execution.
Dataframes
Dataframe is by far the most useful data inside a function. They allow you to create data transformation scripts or use output data of query or another transformation.
You can extract input data from
context.dataframe
and set output data oncontext.result_dataframe
def transformation(context: wizata_dsapi.Context):
df = context.dataframe
# YOUR LOGIC
context.result_dataframe = df
Alternatively you can simplify declaration by using type annotation pandas.Dataframe
or using paramater name df or dataframe
def transformation_sample_1(var1: pandas.Dataframe):
# YOUR LOGIC
context.result_dataframe = var1
def transformation_sample_2(df):
# YOUR LOGIC
context.result_dataframe = df
def transformation_sample_3(dataframe):
# YOUR LOGIC
context.result_dataframe = dataframe
You can also use a return statement with a pandas.Dataframe
or wizata_dsapi.DSDataframe
def transformation(df):
# YOUR LOGIC
return df
Consider always using type annotation to avoid wrong parameter binding
def transformation(df: pandas.Dataframe) -> pandas.Dataframe # YOUR LOGIC return df
You can also add multiple input or outputs to your script and combine them.
def transformation(df1: pandas.Dataframe, df2: pandas.Dataframe)
# YOUR LOGIC
return df3 , df4
JSON Format of the Script step
Here’s an example of a script step config that uses a split function:
{
"type": "script",
"config": {
"function" : "split_df"
}
"inputs": [
{ "dataframe" : "query_output" }
],
"outputs": [
{ "dataframe" : "first_output" },
{ "dataframe" : "second_output" }
]
}
The Script step must contain the following properties :
- “config” with a function referring to your script name
- “inputs” with a list of inputs (dataframe)
- “outputs” with a list of outputs (dataframe)
A script must have at least one output or one input.
You may want to add a features_mapping and/or a properties_mapping, both are dictionaries.
Features Mapping
Features mapping will attempt to rename a column from input(s) data frame from the value to key ; key being the name your script expect and then the value the name found into the data frame.
{
"type": "script",
"config": {
"function" : "split_df",
"features_mapping" : {
"feature_column" : "query_column"
}
},
"inputs": [
{ "dataframe" : "query_output" }
],
"outputs": [
{ "dataframe" : "first_output" },
{ "dataframe" : "second_output" }
]
}
Properties Mapping
Properties mapping will attempt to rename a property from context from the value to key ; key being the name your script expect and then the value the name found into the context.
{
"type": "script",
"config": {
"function" : "split_df",
"properties_mapping" : {
"percentage" : "pct",
"size" : "amount"
}
},
"inputs": [
{ "dataframe" : "query_output" }
],
"outputs": [
{ "dataframe" : "first_output" },
{ "dataframe" : "second_output" }
]
}
Both operations will be reversed after your script executed to leave data frame and properties as you received them (exception made to modification your script have applied to them).
If you mention a feature or a property in the mapping, it becomes mandatory.
Data Points
Within a script step you can access Data Point properties and other useful information through the context object passed as parameter.
In order to do so, you can use context.datapoints[name_of_your_datapoint]
to receive the wizata_dsapi.DataPoint
object.
Updated about 1 month ago