DataFrame module

class DataFrame(*, times: List[datetime] = None, series: Dict[str, List[float | int | None]] = None)[source]

Bases: BaseModel

DataFrame structure maps to data structure used in the API for saving time series. Supports merging with other Clarify DataFrame objects and can convert to and from Pandas.DataFrame.

Parameters:

series (Dict[InputID, List[Union[None, float, int]]]) – Map of input ids to Array of data points to insert by Input ID. The length of each array must match that of the times array. To omit a value for a given timestamp in times, use the value null.
times (List of timestamps) – Either as a python datetime or as YYYY-MM-DD[T]HH:MM[:SS[.ffffff]][Z or [±]HH[:]MM]]] to insert.

Example

>>> from pyclarify import DataFrame
>>> data = DataFrame(
...     series={"INPUT_ID_1": [1, 2], "INPUT_ID_2": [3, 4]},
...     times=["2021-11-01T21:50:06Z",  "2021-11-02T21:50:06Z"]
... )

classmethod from_dict(data)[source]: Converts dictionary to pyclarify.DataFrame. Handles series and flat dictionaries. No need to define time column as only one time column is accepted.

classmethod from_pandas(df, time_col=None)[source]

Convert a pandas DataFrame into a Clarify DataFrame.

Parameters:

df (pandas.DataFrame) – The pandas.DataFrame object to cast to pyclarify.DataFrame.
time_col (str, default None) – A string denoting the column containing the time axis. If no string is given it is assumed to be the index of the DataFrame.

Returns:

pyclarify.DataFrame

Return type:

The Clarify DataFrame representing this instance.

Example

>>> from pyclarify import DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame(data={"INPUT_ID_1": [1, 2], "INPUT_ID_2": [3, 4]})
>>> df.index = ["2021-11-01T21:50:06Z",  "2021-11-02T21:50:06Z"]
>>> DataFrame.from_pandas(df)
... DataFrame(
...     times=[
...         datetime.datetime(2021, 11, 1, 21, 50, 6, tzinfo=datetime.timezone.utc),
...         datetime.datetime(2021, 11, 2, 21, 50, 6, tzinfo=datetime.timezone.utc)],
...     series={
...         'INPUT_ID_1': [1.0, 2.0],
...         'INPUT_ID_2': [3.0, 4.0]
...     }
... )

With specific time column.

>>> from pyclarify import DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame(data={
...     "INPUT_ID_1": [1, 2],
...     "INPUT_ID_2": [3, 4],
...     "timestamps": ["2021-11-01T21:50:06Z",  "2021-11-02T21:50:06Z"]
...})
>>> DataFrame.from_pandas(df, time_col="timestamps")
... DataFrame(
...     times=[
...         datetime.datetime(2021, 11, 1, 21, 50, 6, tzinfo=datetime.timezone.utc),
...         datetime.datetime(2021, 11, 2, 21, 50, 6, tzinfo=datetime.timezone.utc)],
...     series={
...         'INPUT_ID_1': [1.0, 2.0],
...         'INPUT_ID_2': [3.0, 4.0]
...     }
... )

classmethod merge(data_frames) → DataFrame[source]

Method for merging 2 or more Clarify Data Frames. Mapping overlapping signal names to single series. Concatenates timestamps of all data frames. Inserts none value to series not containing entry at a given timestamp.

Parameters:: data_frames (List[DataFrame]) – A Clarify DataFrame or a list of Clarify Data_Frames
Returns:: DataFrame – Merged data frame of all input data frames and self
Return type:: DataFrame

Example

Merging two data frames.

>>> df1 = DataFrame(
...     series={"INPUT_ID_1": [1, 2], "INPUT_ID_2": [3, 4]},
...     times=["2021-11-01T21:50:06Z",  "2021-11-02T21:50:06Z"]
... )
>>> df2 = DataFrame(
...     series={"INPUT_ID_1": [5, 6], "INPUT_ID_3": [7, 8]},
...     times=["2021-11-01T21:50:06Z",  "2021-11-03T21:50:06Z"]
... )
>>> merged_df = DataFrame.merge([df1, df2])
>>> merged_df.to_pandas()
...                            INPUT_ID_2  INPUT_ID_1  INPUT_ID_3
... 2021-11-01 21:50:06+00:00         3.0         5.0         7.0
... 2021-11-02 21:50:06+00:00         4.0         2.0         NaN
... 2021-11-03 21:50:06+00:00         NaN         6.0         8.0

Warning

Notice from the example above that when time series have overlapping timestamps the last data frame overwrites the first.

>>> df1 = DataFrame(
...     series={"INPUT_ID_1": [1, 2]},
...     times=["2021-11-01T21:50:06Z",  "2021-11-02T21:50:06Z"]
... )
>>> df2 = DataFrame(
...     series={"INPUT_ID_1": [5, 6]},
...     times=["2021-11-01T21:50:06Z",  "2021-11-03T21:50:06Z"]
... )
>>> DataFrame.merge([df1, df2])
...                             INPUT_ID_1
... 2021-11-01 21:50:06+00:00         5.0   <--
... 2021-11-02 21:50:06+00:00         2.0
... 2021-11-03 21:50:06+00:00         6.0
>>> DataFrame.merge([df2, df1])
...                             INPUT_ID_1
... 2021-11-01 21:50:06+00:00         1.0   <--
... 2021-11-02 21:50:06+00:00         2.0
... 2021-11-03 21:50:06+00:00         6.0

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'json_encoders': {<class 'datetime.datetime'>: <function time_to_string>}, 'ser_json_inf_nan': 'null'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'series': FieldInfo(annotation=Dict[Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[_PydanticGeneralMetadata(pattern='^^[a-zA-Z0-9-_:.#+/]{1,128}$')])], List[Union[float, int, NoneType]]], required=False), 'times': FieldInfo(annotation=List[datetime], required=False)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

series: Dict[str, List[float | int | None]]

times: List[datetime]

to_pandas()[source]

Convert the instance into a pandas DataFrame.

Returns:: pandas.DataFrame
Return type:: The pandas DataFrame representing this instance.

Example

>>> from pyclarify import DataFrame
>>> data = DataFrame(
...     series={"INPUT_ID_1": [1, 2], "INPUT_ID_2": [3, 4]},
...     times=["2021-11-01T21:50:06Z",  "2021-11-02T21:50:06Z"]
... )
>>> data.to_pandas()
...                            INPUT_ID_1  INPUT_ID_2
... 2021-11-01 21:50:06+00:00         1.0         3.0
... 2021-11-02 21:50:06+00:00         2.0         4.0