DataFrame
- pyclarify.client.Client.data_frame(filter={}, sort: List[str] = [], limit: int = 20, skip: int = 0, total: bool = False, gte: Union[datetime.datetime, str] = None, lt: Union[datetime.datetime, str] = None, last: int = - 1, rollup: Union[str, datetime.timedelta] = None, include: List[str] = [], window_size: Union[str, datetime.timedelta] = None) pyclarify.views.generics.Response
Retrieve DataFrame for items stored in Clarify.
- Parameters
filter (Filter, optional) – A Filter Model that describes a mongodb filter to be applied.
sort (list of strings) – List of strings describing the order in which to sort the items in the response.
limit (int, default 20) – The maximum number of resources to select. Negative numbers means no limit, which may or may not be allowed.
skip (int, default: 0) – Skip the first N matches. A negative skip is treated as 0.
total (bool, default: False) – When true, force the inclusion of a total count in the response. A total count is the total number of resources that matches filter.
gte (ISO 8601 timestamp , default: <now - 7 days>) – An RFC3339 time describing the inclusive start of the window.
lt (ISO 8601 timestamp , default: <now + 7 days>) – An RFC3339 time describing the exclusive end of the window.
last (int, default: -1) – If above 0, select last N timestamps per series. The selection happens after the rollup aggregation.
rollup (RFC3339 duration or “window”, default: None) – If duration is specified, roll-up the values into either the full time window (gte -> lt) or evenly sized buckets.
include (List of strings, default: []) – A list of strings specifying which relationships to be included in the response.
window_size (RFC3339 duration, default None) – If duration is specified, the iterator will use the specified window as a paging size instead of default API limits. This is commonly used when resolution of data is too high to be packaged with default values.
- Returns
Response.result.data
is a DataFrame- Return type
Response
See also
Client.select_items
Retrieve item metadata from selected items.
Notes
Time selection:
Maximum window size is 40 days (40 * 24 hours) when rollup is null or less than PT1M (1 minute).
Maximum window size is 400 days (400 * 24 hours) when rollup is greater than or equal to PT1M (1 minute).
No maximum window size if rollup is window.
The limits are used internally by the Clarify API. Should you have very high resolution data (>=1hz), you can use
time_window
argument to reduce the window, resulting in more requests.Examples
>>> client = Client("./clarify-credentials.json")
Getting data frame with a filter.
>>> client.data_frame( ... filter = query.Filter(fields={"name": query.NotEqual(value="Air Temperature")}), ... )
Getting data with a time range.
>>> client.data_frame( ... gte="2022-01-01T01:01:01Z", ... lt="2022-01-09T01:01:01Z", ... )
Skipping first 3 items and only retrieving 5 items, sorted with descending id.
>>> client.data_frame( ... sort = ["-id"], ... limit = 5, ... skip = 3, ... )
Setting a lower window size due to json decoding errors.
>>> client.data_frame( ... window_size = "P20DT", ... limit = 5, ... skip = 3, ... )
Warning
We recommend using
rollup
instead ofwindow_size
due to execution time being much faster.Using rollup to get sampled data.
>>> r = client.data_frame( ... rollup = "PT5M", ... limit = 5, ... skip = 3, ... ) >>> r.result.data ... DataFrame( ... times=[datetime.datetime(2022, 9, 5, 11, 5, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 5, 11, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 5, 11, 15, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 5, 11, 30, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 5, 11, 35, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 6, 13, 40, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 6, 13, 45, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 6, 13, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 7, 13, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 7, 13, 5, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 9, 7, 13, 10, tzinfo=datetime.timezone.utc)], ... series={ ... 'cbpmaq6rpn52969vfl1g_avg': [1.0, 5.0, 5.875, 6.8, 4.2, 7.0, 3.6, 5.0, 2.0, 2.2, 4.25], ... 'cbpmaq6rpn52969vfl1g_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0], ... 'cbpmaq6rpn52969vfl1g_max': [1.0, 9.0, 9.0, 9.0, 8.0, 9.0, 6.0, 6.0, 2.0, 6.0, 8.0], ... 'cbpmaq6rpn52969vfl1g_min': [1.0, 0.0, 0.0, 5.0, 1.0, 6.0, 0.0, 4.0, 2.0, 0.0, 0.0], ... 'cbpmaq6rpn52969vfl1g_sum': [2.0, 50.0, 47.0, 34.0, 21.0, 21.0, 18.0, 10.0, 2.0, 11.0, 17.0], ... 'cbpmaq6rpn52969vfl20_avg': [5.0, 4.7, 3.75, 3.6, 5.2, 7.333333333333333, 3.6, 7.0, 9.0, 3.6, 6.75], ... 'cbpmaq6rpn52969vfl20_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0], ... 'cbpmaq6rpn52969vfl20_max': [8.0, 9.0, 8.0, 7.0, 9.0, 9.0, 8.0, 9.0, 9.0, 8.0, 9.0], ... 'cbpmaq6rpn52969vfl20_min': [2.0, 1.0, 0.0, 1.0, 2.0, 4.0, 0.0, 5.0, 9.0, 0.0, 1.0], ... 'cbpmaq6rpn52969vfl20_sum': [10.0, 47.0, 30.0, 18.0, 26.0, 22.0, 18.0, 14.0, 9.0, 18.0, 27.0], ... 'cbpmaq6rpn52969vfl2g_avg': [8.0, 3.7, 4.75, 1.6, 3.6, 2.0, 5.6, 8.5, 4.0, 3.8, 5.0], ... 'cbpmaq6rpn52969vfl2g_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0], ... 'cbpmaq6rpn52969vfl2g_max': [8.0, 9.0, 9.0, 5.0, 8.0, 5.0, 9.0, 9.0, 4.0, 8.0, 7.0], ... 'cbpmaq6rpn52969vfl2g_min': [8.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 8.0, 4.0, 0.0, 1.0], ... 'cbpmaq6rpn52969vfl2g_sum': [16.0, 37.0, 38.0, 8.0, 18.0, 6.0, 28.0, 17.0, 4.0, 19.0, 20.0], ... 'cbpmaq6rpn52969vfl30_avg': [2.0, 5.6, 3.875, 3.2, 5.2, 4.666666666666667, 5.0, 4.5, 7.0, 5.8, 8.0], ... 'cbpmaq6rpn52969vfl30_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0], ... 'cbpmaq6rpn52969vfl30_max': [3.0, 9.0, 7.0, 9.0, 9.0, 8.0, 7.0, 8.0, 7.0, 9.0, 9.0], ... 'cbpmaq6rpn52969vfl30_min': [1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 2.0, 1.0, 7.0, 1.0, 6.0], ... 'cbpmaq6rpn52969vfl30_sum': [4.0, 56.0, 31.0, 16.0, 26.0, 14.0, 25.0, 9.0, 7.0, 29.0, 32.0], ... 'cbpmaq6rpn52969vfl3g_avg': [1.5, 3.3, 6.75, 5.8, 4.8, 5.666666666666667, 3.8, 6.5, 5.0, 3.0, 3.25], ... 'cbpmaq6rpn52969vfl3g_count': [2.0, 10.0, 8.0, 5.0, 5.0, 3.0, 5.0, 2.0, 1.0, 5.0, 4.0], ... 'cbpmaq6rpn52969vfl3g_max': [2.0, 9.0, 9.0, 9.0, 9.0, 7.0, 8.0, 8.0, 5.0, 7.0, 5.0], ... 'cbpmaq6rpn52969vfl3g_min': [1.0, 1.0, 4.0, 1.0, 1.0, 3.0, 0.0, 5.0, 5.0, 0.0, 0.0], ... 'cbpmaq6rpn52969vfl3g_sum': [3.0, 33.0, 54.0, 29.0, 24.0, 17.0, 19.0, 13.0, 5.0, 15.0, 13.0] ... })
- Response
In case of a valid return value, returns a pydantic model with the following format:
>>> jsonrpc = '2.0' ... id = '1' ... result = Selection( ... meta={ ... 'total': -1, ... 'groupIncludedByType': True ... }, ... data=DataFrame( ... times=[ ... datetime.datetime(2022, 1, 1, 12, 0, tzinfo=datetime.timezone.utc), ... datetime.datetime(2022, 1, 1, 13, 0, tzinfo=datetime.timezone.utc), ... ...], ... series={ ... 'c5i41fjsbu8cohpkcpvg': [0.18616, 0.18574000000000002, ...], ... 'c5i41fjsbu8cohfdepvg': [450.876543125, 450.176543554, ...], ... ... ... } ... ) ... ... error = None
In case of the error the method return a pydantic model with the following format:
>>> jsonrpc = '2.0' ... id = '1' ... result = None ... error = Error( ... code = '-32602', ... message = 'Invalid params', ... data = ErrorData(trace = <trace_id>, params = {}) ... )
Tip
You can change the type of DataFrame from pyclarify to pandas using the to_pandas() method.
>>> r = client.data_frame() >>> c_df = r.result.data >>> p_df = c_df.to_pandas() >>> p_df.head() ... cbpmaq6rpn52969vfl00 cbpmaq6rpn52969vfl0g ... cbpmaq6rpn52969vfl90 cbpmaq6rpn52969vfl9g ... 2022-09-05 11:30:11.432725+00:00 2.0 8.0 ... 0.0 4.0 ... 2022-09-05 11:31:11.432723+00:00 9.0 2.0 ... 8.0 8.0 ... 2022-09-05 11:32:11.432722+00:00 6.0 4.0 ... 8.0 9.0 ... 2022-09-05 11:33:11.432720+00:00 0.0 7.0 ... 9.0 4.0 ... 2022-09-05 11:34:11.432719+00:00 8.0 6.0 ... 8.0 5.0