A Query formalizes the process of retrieving data from the time-series database using constraints such as digital twin, time, aggregation, and/or filtering. You can fetch data using a Request object, which returns a fully formatted Pandas dataframe.

You will notice that the concept of a query is also discussed in the Pipeline section, as a query can also function as a pipeline Step.

Structure

Request format use a logical JSON based format wrapped into wizata_dsapi.Request entity. It contains different properties:

  • Datapoints selection
  • Time selection information
  • Aggregations information
  • Filters

Only Datapoints, Time selection and Aggregations information are required.

Using the following logic, we will be able to visualize the information from datapoints in the platform. For our case on the bearing datapoints from the motor sensor, we will retrieve the data from all the bearings attached to our motor 1, adding a filter for the bearing_sensor1 values that are higher than 0.060:

from datetime import datetime
import wizata_dsapi

df = wizata_dsapi.api().query(
    datapoints=['mt1_bearing1', 'mt1_bearing2', 'mt1_bearing3', 'mt1_bearing4'],
    start=datetime.strptime("2024-08-29", "%Y-%m-%d"),
    end=datetime.strptime("2024-08-30", "%Y-%m-%d"),
    agg_method="mean",
    interval=600000,
    filters={
        "mt1_bearing1": {
            ">": 0.060
        }
    }
)

You can also use a JSON file to define your query and parse it. In our case, we will store the following JSON with the name bearing_query.json:

{
  "datapoints": [
    "mt1_bearing1",
    "mt1_bearing2",
    "mt1_bearing3",
    "mt1_bearing4"
  ],
  "timeframe": {
    "start": 1724889600000,
    "end": 1724976000000
  },
  "aggregations": {
    "agg_method": "mean",
    "interval": 600000
  },
  "filters": {
    "mt1_bearing1": {
        ">": 0.060
    }
  }
}

You will use the following code to parse and execute your query:

query = wizata_dsapi.Request()
file = open("bearing_query.json")
query.from_json(json.load(file))
df = wizata_dsapi.api().get(query)

Query parameters

Datapoints

You can define the datapoints you want to select directly from their hardware ID or using a template.

To use datapoints simply provide a list of str corresponding to Hardware ID in the platform.

df = wizata_dsapi.api().query(
        ...
        datapoints=['mt1_bearing1', 'mt1_bearing2', 'mt1_bearing3', 'mt1_bearing4'],
        ...
    )

You can specify the key identifier of a solution template to fetch datapoints.

If you are querying data directly you should also pass the hardware ID of an asset registered on the template. Inside a pipeline, the hardware ID of the asset is passed at runtime.

df = wizata_dsapi.api().query(
        ...
        template='example_template',
        twin='example_factory',
        ...
    )

By default, query will return all properties of type ‘datapoint’ from your template. You can use datapoints to a pass a list of template properties to select.

df = wizata_dsapi.api().query(
        ...
        template='example_template',
        twin='bearing_sensor1',
        datapoints=['mt1_bearing1'],
        ...
    )

In addition, you can use a dynamic list of datapoints stored as a JSON property of your template :

  • Add a template property of type JSON
  • Add a registration and on the value use a JSON list of hardware IDs
  • On the query using template, simply use the name of the template property in the datapoints list

Timeframe

There different way to select timeframe within a request.

You can use a static value of “datetime” format, using epoch integer in JSON (ms) and a datetime in python:

df = wizata_dsapi.api().query(
    ...
    start=datetime.strptime("2023-01-01", "%Y-%m-%d"),
    end=datetime.strptime("2023-07-17", "%Y-%m-%d"),
    ...
)
"timeframe": {
  "start": 1682035200000,
  "end": 1682294400000
},

Alternatively, you can use the format “relative” to specify the start and end value. “relative” is a str based on “now” plus or minus or a delay.

Format is now then + or - then an integer value then a unit ( y, M, d, h, m, s, ms ).

e.g. “now-1d” , “now+1h”, “now”, “now-2M”, …

"timeframe": {
  "start": "now-1d",
  "end": "now"
},

📘

Year are corresponding to a fix period of 365 days, and Month to a fix period of 30 days.

Aggregation

When fetched from platform all data are aggregated following an aggregation method and an interval.

The interval is a fixed number representing milliseconds.

🚧

Please adapt your interval to the period fetched to avoid any performance issue within your environment.

The agg_method is a str representing the method used to aggregate the data. By default, query will use ‘mean’ if not specified differently.

The query accept the following methods :

  • mean
  • stddev
  • mode
  • median
  • count
  • sum
  • first
  • last
  • max
  • min

e.g. using ‘count’ for a 1 min interval.

df = wizata_dsapi.api().query(
    ...
    agg_method="count",
    interval=60000,
    ...
)

Filters

Filters can be used to filter a specific datapoint based on its value.

Filters are cumulative and can do simple comparison such as:

  • ‘<’ less than
  • ‘<=’ less or equal than
  • ‘=’ equal
  • ‘!=’ different
  • ‘>’ greater than
  • ‘>=’ greater or equal than

Filters compared the datapoint value to a fixed float value. It consists of a dict where key is the name of the datapoint/property and value is another dict based on operator for the key and float as value.

Filters must apply do a datapoint selected within the request. In case you use a template, instead use template property name.

Options

Options can be used in order to influence the query behavior.

They are passed if necessary inside the options as a dictionary.

df = wizata_dsapi.api().query(
    ...
    options={
        'null' : 'all',
        'fill_behavior' : 'interpolation',
        'auto_fill_set_points' : False
    }
    ...
)

null

“null” option tell query how to deal with missing values. It is executed post-aggregation. By default, set to ‘drop’ from experiment and ‘ignore’ on query from code;

  • drop: drop all rows that contains at list one missing values.
  • all: drop all rows only if all columns are missing values.
  • ignore: do not drop any rows and ignore missing values.

fill_behavior

“fill_behavior” is executed after “null” and tell query how to complete any missing value left. By default, set to ‘ignore’;

  • ignore: leave missing values untouched.
  • forward: using ffill function to fill values with the previous known value.
  • back: using bfill function to fill values with the next known value.
  • interpolation: using AI to try interpolate the missing values.

auto_fill_set_points

“auto_fill_set_points” accepts True or False boolean value. By default, set as true. System will consider any last known value as active for any datapoint of type Set Point and will fill all missing values with it until a new value is found. Set it to False to disable that behaviour.

📘

We are trying to make our query system flexible and adaptive to help you not worry on your data quality. Don’t hesitate to contact us if you encounter an unexpected behaviour or you are in need of a specific case implementation.