DataFrame

pyclarify.client.Client.data_frame(filter: Optional[pyclarify.query.filter.Filter] = None, sort: List[str] = [], limit: int = 20, skip: int = 0, total: bool = False, gte: Union[datetime.datetime, str] = None, lt: Union[datetime.datetime, str] = None, last: int = - 1, rollup: Union[str, datetime.timedelta] = None, include: List[str] = [], window_size: Union[str, datetime.timedelta] = None) pyclarify.views.generics.Response

Retrieve DataFrame for items stored in Clarify.

Parameters
  • filter (Filter, optional) – A Filter Model that describes a mongodb filter to be applied.

  • sort (list of strings) – List of strings describing the order in which to sort the items in the response.

  • limit (int, default 20) – The maximum number of resources to select. Negative numbers means no limit, which may or may not be allowed.

  • skip (int, default: 0) – Skip the first N matches. A negative skip is treated as 0.

  • total (bool, default: False) – When true, force the inclusion of a total count in the response. A total count is the total number of resources that matches filter.

  • gte (ISO 8601 timestamp , default: <now - 7 days>) – An RFC3339 time describing the inclusive start of the window.

  • lt (ISO 8601 timestamp , default: <now + 7 days>) – An RFC3339 time describing the exclusive end of the window.

  • last (int, default: -1) – If above 0, select last N timestamps per series. The selection happens after the rollup aggregation.

  • rollup (RFC3339 duration or “window”, default: None) – If duration is specified, roll-up the values into either the full time window (gte -> lt) or evenly sized buckets.

  • include (List of strings, default: []) – A list of strings specifying which relationships to be included in the response.

  • window_size (RFC3339 duration, default None) – If duration is specified, the iterator will use the specified window as a paging size instead of default API limits. This is commonly used when resolution of data is too high to be packaged with default values.

Returns

Response.result.data is a DataFrame

Return type

Response

See also

Client.select_items

Retrieve item metadata from selected items.

Notes

Time selection:

  • Maximum window size is 40 days (40 * 24 hours) when rollup is null or less than PT1M (1 minute).

  • Maximum window size is 400 days (400 * 24 hours) when rollup is greater than or equal to PT1M (1 minute).

  • No maximum window size if rollup is window.

The limits are used internally by the Clarify API. Should you have very high resolution data (>=1hz), you can use time_window argument to reduce the window, resulting in more requests.

Examples

>>> client = Client("./clarify-credentials.json")

Getting data frame with a filter.

>>> client.data_frame(
...     filter = query.Filter(fields={"name": query.NotEqual(value="Air Temperature")}),
... )

Getting data with a time range.

>>> client.data_frame(
...     gte="2022-01-01T01:01:01Z",
...     lt="2022-01-09T01:01:01Z",
... )

Skipping first 3 items and only retrieving 5 items, sorted with descending id.

>>> client.data_frame(
...     sort = ["-id"],
...     limit = 5,
...     skip = 3,
... )

Setting a lower window size due to json decoding errors.

>>> client.data_frame(
...     window_size = "P20DT",
...     limit = 5,
...     skip = 3,
... )

Warning

We recommend using rollup instead of window_size due to execution time being much faster.

Using rollup to get sampled data.

>>> r = client.data_frame(
...     rollup = "PT5M",
...     limit = 5,
...     skip = 3,
... )
>>> r.result.data
... DataFrame(
...     times=[datetime.datetime(2022, 9, 5, 11, 5, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 5, 11, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 5, 11, 15, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 5, 11, 30, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 5, 11, 35, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 6, 13, 40, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 6, 13, 45, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 6, 13, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 7, 13, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 7, 13, 5, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 7, 13, 10, tzinfo=datetime.timezone.utc)],
...     series={
...         'cbpmaq6rpn52969vfl1g_avg': [1.0, 5.0, 5.875, 6.8, 4.2, 7.0, 3.6, 5.0, 2.0, 2.2, 4.25],
...         'cbpmaq6rpn52969vfl1g_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0],
...         'cbpmaq6rpn52969vfl1g_max': [1.0, 9.0, 9.0, 9.0, 8.0, 9.0, 6.0, 6.0, 2.0, 6.0, 8.0],
...         'cbpmaq6rpn52969vfl1g_min': [1.0, 0.0, 0.0, 5.0, 1.0, 6.0, 0.0, 4.0, 2.0, 0.0, 0.0],
...         'cbpmaq6rpn52969vfl1g_sum': [2.0, 50.0, 47.0, 34.0, 21.0, 21.0, 18.0, 10.0, 2.0, 11.0, 17.0],
...         'cbpmaq6rpn52969vfl20_avg': [5.0, 4.7, 3.75, 3.6, 5.2, 7.333333333333333, 3.6, 7.0, 9.0, 3.6, 6.75],
...         'cbpmaq6rpn52969vfl20_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0],
...         'cbpmaq6rpn52969vfl20_max': [8.0, 9.0, 8.0, 7.0, 9.0, 9.0, 8.0, 9.0, 9.0, 8.0, 9.0],
...         'cbpmaq6rpn52969vfl20_min': [2.0, 1.0, 0.0, 1.0, 2.0, 4.0, 0.0, 5.0, 9.0, 0.0, 1.0],
...         'cbpmaq6rpn52969vfl20_sum': [10.0, 47.0, 30.0, 18.0, 26.0, 22.0, 18.0, 14.0, 9.0, 18.0, 27.0],
...         'cbpmaq6rpn52969vfl2g_avg': [8.0, 3.7, 4.75, 1.6, 3.6, 2.0, 5.6, 8.5, 4.0, 3.8, 5.0],
...         'cbpmaq6rpn52969vfl2g_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0],
...         'cbpmaq6rpn52969vfl2g_max': [8.0, 9.0, 9.0, 5.0, 8.0, 5.0, 9.0, 9.0, 4.0, 8.0, 7.0],
...         'cbpmaq6rpn52969vfl2g_min': [8.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 8.0, 4.0, 0.0, 1.0],
...         'cbpmaq6rpn52969vfl2g_sum': [16.0, 37.0, 38.0, 8.0, 18.0, 6.0, 28.0, 17.0, 4.0, 19.0, 20.0],
...         'cbpmaq6rpn52969vfl30_avg': [2.0, 5.6, 3.875, 3.2, 5.2, 4.666666666666667, 5.0, 4.5, 7.0, 5.8, 8.0],
...         'cbpmaq6rpn52969vfl30_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0],
...         'cbpmaq6rpn52969vfl30_max': [3.0, 9.0, 7.0, 9.0, 9.0, 8.0, 7.0, 8.0, 7.0, 9.0, 9.0],
...         'cbpmaq6rpn52969vfl30_min': [1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 2.0, 1.0, 7.0, 1.0, 6.0],
...         'cbpmaq6rpn52969vfl30_sum': [4.0, 56.0, 31.0, 16.0, 26.0, 14.0, 25.0, 9.0, 7.0, 29.0, 32.0],
...         'cbpmaq6rpn52969vfl3g_avg': [1.5, 3.3, 6.75, 5.8, 4.8, 5.666666666666667, 3.8, 6.5, 5.0, 3.0, 3.25],
...         'cbpmaq6rpn52969vfl3g_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0],
...         'cbpmaq6rpn52969vfl3g_max': [2.0, 9.0, 9.0, 9.0, 9.0, 7.0, 8.0, 8.0, 5.0, 7.0, 5.0],
...         'cbpmaq6rpn52969vfl3g_min': [1.0, 1.0, 4.0, 1.0, 1.0, 3.0, 0.0, 5.0, 5.0, 0.0, 0.0],
...         'cbpmaq6rpn52969vfl3g_sum': [3.0, 33.0, 54.0, 29.0, 24.0, 17.0, 19.0, 13.0, 5.0, 15.0, 13.0]
... })
Response

In case of a valid return value, returns a pydantic model with the following format:

>>> jsonrpc = '2.0'
... id = '1'
... result = Selection(
...     meta={
...         'total': -1,
...         'groupIncludedByType': True
...     },
...     data=DataFrame(
...         times=[
...             datetime.datetime(2022, 1, 1, 12, 0, tzinfo=datetime.timezone.utc),
...             datetime.datetime(2022, 1, 1, 13, 0, tzinfo=datetime.timezone.utc),
...             ...],
...         series={
...             'c5i41fjsbu8cohpkcpvg': [0.18616, 0.18574000000000002, ...],
...             'c5i41fjsbu8cohfdepvg': [450.876543125, 450.176543554, ...],
...             ...
...         }
...     )
...
... error = None

In case of the error the method return a pydantic model with the following format:

>>> jsonrpc = '2.0'
... id = '1'
... result = None
... error = Error(
...         code = '-32602',
...         message = 'Invalid params',
...         data = ErrorData(trace = <trace_id>, params = {})
... )

Tip

You can change the type of DataFrame from pyclarify to pandas using the to_pandas() method.

>>> r = client.data_frame()
>>> c_df = r.result.data
>>> p_df = c_df.to_pandas()
>>> p_df.head()
...                                   cbpmaq6rpn52969vfl00  cbpmaq6rpn52969vfl0g  ...  cbpmaq6rpn52969vfl90  cbpmaq6rpn52969vfl9g
... 2022-09-05 11:30:11.432725+00:00                   2.0                   8.0  ...                   0.0                   4.0
... 2022-09-05 11:31:11.432723+00:00                   9.0                   2.0  ...                   8.0                   8.0
... 2022-09-05 11:32:11.432722+00:00                   6.0                   4.0  ...                   8.0                   9.0
... 2022-09-05 11:33:11.432720+00:00                   0.0                   7.0  ...                   9.0                   4.0
... 2022-09-05 11:34:11.432719+00:00                   8.0                   6.0  ...                   8.0                   5.0