DataFrame

pyclarify.client.Client.data_frame(filter={}, sort: List[str] = [], limit: int = 20, skip: int = 0, total: bool = False, gte: Union[datetime.datetime, str] = None, lt: Union[datetime.datetime, str] = None, last: int = - 1, rollup: Union[str, datetime.timedelta] = None, include: List[str] = [], window_size: Union[str, datetime.timedelta] = None) → pyclarify.views.generics.Response

Retrieve DataFrame for items stored in Clarify.

Parameters

filter (Filter, optional) – A Filter Model that describes a mongodb filter to be applied.
sort (list of strings) – List of strings describing the order in which to sort the items in the response.
limit (int, default 20) – The maximum number of resources to select. Negative numbers means no limit, which may or may not be allowed.
skip (int, default: 0) – Skip the first N matches. A negative skip is treated as 0.
total (bool, default: False) – When true, force the inclusion of a total count in the response. A total count is the total number of resources that matches filter.
gte (ISO 8601 timestamp , default: <now - 7 days>) – An RFC3339 time describing the inclusive start of the window.
lt (ISO 8601 timestamp , default: <now + 7 days>) – An RFC3339 time describing the exclusive end of the window.
last (int, default: -1) – If above 0, select last N timestamps per series. The selection happens after the rollup aggregation.
rollup (RFC3339 duration or “window”, default: None) – If duration is specified, roll-up the values into either the full time window (gte -> lt) or evenly sized buckets.
include (List of strings, default: []) – A list of strings specifying which relationships to be included in the response.
window_size (RFC3339 duration, default None) – If duration is specified, the iterator will use the specified window as a paging size instead of default API limits. This is commonly used when resolution of data is too high to be packaged with default values.

Returns

Response.result.data is a DataFrame

Return type

Response

See also

Client.select_items: Retrieve item metadata from selected items.

Notes

Time selection:

Maximum window size is 40 days (40 * 24 hours) when rollup is null or less than PT1M (1 minute).
Maximum window size is 400 days (400 * 24 hours) when rollup is greater than or equal to PT1M (1 minute).
No maximum window size if rollup is window.

The limits are used internally by the Clarify API. Should you have very high resolution data (>=1hz), you can use time_window argument to reduce the window, resulting in more requests.

Examples

>>> client = Client("./clarify-credentials.json")

Getting data frame with a filter.

>>> client.data_frame(
...     filter = query.Filter(fields={"name": query.NotEqual(value="Air Temperature")}),
... )

Getting data with a time range.

>>> client.data_frame(
...     gte="2022-01-01T01:01:01Z",
...     lt="2022-01-09T01:01:01Z",
... )

Skipping first 3 items and only retrieving 5 items, sorted with descending id.

>>> client.data_frame(
...     sort = ["-id"],
...     limit = 5,
...     skip = 3,
... )

Setting a lower window size due to json decoding errors.

>>> client.data_frame(
...     window_size = "P20DT",
...     limit = 5,
...     skip = 3,
... )

Warning

We recommend using rollup instead of window_size due to execution time being much faster.

Using rollup to get sampled data.

>>> r = client.data_frame(
...     rollup = "PT5M",
...     limit = 5,
...     skip = 3,
... )
>>> r.result.data
... DataFrame(
...     times=[datetime.datetime(2022, 9, 5, 11, 5, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 5, 11, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 5, 11, 15, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 5, 11, 30, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 5, 11, 35, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 6, 13, 40, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 6, 13, 45, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 6, 13, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 7, 13, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 7, 13, 5, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 7, 13, 10, tzinfo=datetime.timezone.utc)],
...     series={
...         'cbpmaq6rpn52969vfl1g_avg': [1.0, 5.0, 5.875, 6.8, 4.2, 7.0, 3.6, 5.0, 2.0, 2.2, 4.25],
...         'cbpmaq6rpn52969vfl1g_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0],
...         'cbpmaq6rpn52969vfl1g_max': [1.0, 9.0, 9.0, 9.0, 8.0, 9.0, 6.0, 6.0, 2.0, 6.0, 8.0],
...         'cbpmaq6rpn52969vfl1g_min': [1.0, 0.0, 0.0, 5.0, 1.0, 6.0, 0.0, 4.0, 2.0, 0.0, 0.0],
...         'cbpmaq6rpn52969vfl1g_sum': [2.0, 50.0, 47.0, 34.0, 21.0, 21.0, 18.0, 10.0, 2.0, 11.0, 17.0],
...         'cbpmaq6rpn52969vfl20_avg': [5.0, 4.7, 3.75, 3.6, 5.2, 7.333333333333333, 3.6, 7.0, 9.0, 3.6, 6.75],
...         'cbpmaq6rpn52969vfl20_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0],
...         'cbpmaq6rpn52969vfl20_max': [8.0, 9.0, 8.0, 7.0, 9.0, 9.0, 8.0, 9.0, 9.0, 8.0, 9.0],
...         'cbpmaq6rpn52969vfl20_min': [2.0, 1.0, 0.0, 1.0, 2.0, 4.0, 0.0, 5.0, 9.0, 0.0, 1.0],
...         'cbpmaq6rpn52969vfl20_sum': [10.0, 47.0, 30.0, 18.0, 26.0, 22.0, 18.0, 14.0, 9.0, 18.0, 27.0],
...         'cbpmaq6rpn52969vfl2g_avg': [8.0, 3.7, 4.75, 1.6, 3.6, 2.0, 5.6, 8.5, 4.0, 3.8, 5.0],
...         'cbpmaq6rpn52969vfl2g_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0],
...         'cbpmaq6rpn52969vfl2g_max': [8.0, 9.0, 9.0, 5.0, 8.0, 5.0, 9.0, 9.0, 4.0, 8.0, 7.0],
...         'cbpmaq6rpn52969vfl2g_min': [8.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 8.0, 4.0, 0.0, 1.0],
...         'cbpmaq6rpn52969vfl2g_sum': [16.0, 37.0, 38.0, 8.0, 18.0, 6.0, 28.0, 17.0, 4.0, 19.0, 20.0],
...         'cbpmaq6rpn52969vfl30_avg': [2.0, 5.6, 3.875, 3.2, 5.2, 4.666666666666667, 5.0, 4.5, 7.0, 5.8, 8.0],
...         'cbpmaq6rpn52969vfl30_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0],
...         'cbpmaq6rpn52969vfl30_max': [3.0, 9.0, 7.0, 9.0, 9.0, 8.0, 7.0, 8.0, 7.0, 9.0, 9.0],
...         'cbpmaq6rpn52969vfl30_min': [1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 2.0, 1.0, 7.0, 1.0, 6.0],
...         'cbpmaq6rpn52969vfl30_sum': [4.0, 56.0, 31.0, 16.0, 26.0, 14.0, 25.0, 9.0, 7.0, 29.0, 32.0],
...         'cbpmaq6rpn52969vfl3g_avg': [1.5, 3.3, 6.75, 5.8, 4.8, 5.666666666666667, 3.8, 6.5, 5.0, 3.0, 3.25],
...         'cbpmaq6rpn52969vfl3g_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0],
...         'cbpmaq6rpn52969vfl3g_max': [2.0, 9.0, 9.0, 9.0, 9.0, 7.0, 8.0, 8.0, 5.0, 7.0, 5.0],
...         'cbpmaq6rpn52969vfl3g_min': [1.0, 1.0, 4.0, 1.0, 1.0, 3.0, 0.0, 5.0, 5.0, 0.0, 0.0],
...         'cbpmaq6rpn52969vfl3g_sum': [3.0, 33.0, 54.0, 29.0, 24.0, 17.0, 19.0, 13.0, 5.0, 15.0, 13.0]
... })

Response

In case of a valid return value, returns a pydantic model with the following format:

>>> jsonrpc = '2.0'
... id = '1'
... result = Selection(
...     meta={
...         'total': -1,
...         'groupIncludedByType': True
...     },
...     data=DataFrame(
...         times=[
...             datetime.datetime(2022, 1, 1, 12, 0, tzinfo=datetime.timezone.utc),
...             datetime.datetime(2022, 1, 1, 13, 0, tzinfo=datetime.timezone.utc),
...             ...],
...         series={
...             'c5i41fjsbu8cohpkcpvg': [0.18616, 0.18574000000000002, ...],
...             'c5i41fjsbu8cohfdepvg': [450.876543125, 450.176543554, ...],
...             ...
...         }
...     )
...
... error = None

In case of the error the method return a pydantic model with the following format:

>>> jsonrpc = '2.0'
... id = '1'
... result = None
... error = Error(
...         code = '-32602',
...         message = 'Invalid params',
...         data = ErrorData(trace = <trace_id>, params = {})
... )

Tip

You can change the type of DataFrame from pyclarify to pandas using the to_pandas() method.

>>> r = client.data_frame()
>>> c_df = r.result.data
>>> p_df = c_df.to_pandas()
>>> p_df.head()
...                                   cbpmaq6rpn52969vfl00  cbpmaq6rpn52969vfl0g  ...  cbpmaq6rpn52969vfl90  cbpmaq6rpn52969vfl9g
... 2022-09-05 11:30:11.432725+00:00                   2.0                   8.0  ...                   0.0                   4.0
... 2022-09-05 11:31:11.432723+00:00                   9.0                   2.0  ...                   8.0                   8.0
... 2022-09-05 11:32:11.432722+00:00                   6.0                   4.0  ...                   8.0                   9.0
... 2022-09-05 11:33:11.432720+00:00                   0.0                   7.0  ...                   9.0                   4.0
... 2022-09-05 11:34:11.432719+00:00                   8.0                   6.0  ...                   8.0                   5.0