DataFrame module
- class DataFrame(*, times: List[datetime] = None, series: Dict[str, List[float | int | None]] = None)[source]
Bases:
BaseModel
DataFrame structure maps to data structure used in the API for saving time series. Supports merging with other Clarify DataFrame objects and can convert to and from Pandas.DataFrame.
- Parameters:
series (Dict[InputID, List[Union[None, float, int]]]) – Map of input ids to Array of data points to insert by Input ID. The length of each array must match that of the times array. To omit a value for a given timestamp in times, use the value null.
times (List of timestamps) – Either as a python datetime or as YYYY-MM-DD[T]HH:MM[:SS[.ffffff]][Z or [±]HH[:]MM]]] to insert.
Example
>>> from pyclarify import DataFrame >>> data = DataFrame( ... series={"INPUT_ID_1": [1, 2], "INPUT_ID_2": [3, 4]}, ... times=["2021-11-01T21:50:06Z", "2021-11-02T21:50:06Z"] ... )
- classmethod from_dict(data)[source]
Converts dictionary to pyclarify.DataFrame. Handles series and flat dictionaries. No need to define time column as only one time column is accepted.
- classmethod from_pandas(df, time_col=None)[source]
Convert a pandas DataFrame into a Clarify DataFrame.
- Parameters:
df (pandas.DataFrame) – The pandas.DataFrame object to cast to pyclarify.DataFrame.
time_col (str, default None) – A string denoting the column containing the time axis. If no string is given it is assumed to be the index of the DataFrame.
- Returns:
pyclarify.DataFrame
- Return type:
The Clarify DataFrame representing this instance.
Example
>>> from pyclarify import DataFrame >>> import pandas as pd >>> df = pd.DataFrame(data={"INPUT_ID_1": [1, 2], "INPUT_ID_2": [3, 4]}) >>> df.index = ["2021-11-01T21:50:06Z", "2021-11-02T21:50:06Z"] >>> DataFrame.from_pandas(df) ... DataFrame( ... times=[ ... datetime.datetime(2021, 11, 1, 21, 50, 6, tzinfo=datetime.timezone.utc), ... datetime.datetime(2021, 11, 2, 21, 50, 6, tzinfo=datetime.timezone.utc)], ... series={ ... 'INPUT_ID_1': [1.0, 2.0], ... 'INPUT_ID_2': [3.0, 4.0] ... } ... )
With specific time column.
>>> from pyclarify import DataFrame >>> import pandas as pd >>> df = pd.DataFrame(data={ ... "INPUT_ID_1": [1, 2], ... "INPUT_ID_2": [3, 4], ... "timestamps": ["2021-11-01T21:50:06Z", "2021-11-02T21:50:06Z"] ...}) >>> DataFrame.from_pandas(df, time_col="timestamps") ... DataFrame( ... times=[ ... datetime.datetime(2021, 11, 1, 21, 50, 6, tzinfo=datetime.timezone.utc), ... datetime.datetime(2021, 11, 2, 21, 50, 6, tzinfo=datetime.timezone.utc)], ... series={ ... 'INPUT_ID_1': [1.0, 2.0], ... 'INPUT_ID_2': [3.0, 4.0] ... } ... )
- classmethod merge(data_frames) DataFrame [source]
Method for merging 2 or more Clarify Data Frames. Mapping overlapping signal names to single series. Concatenates timestamps of all data frames. Inserts none value to series not containing entry at a given timestamp.
- Parameters:
data_frames (List[DataFrame]) – A Clarify DataFrame or a list of Clarify Data_Frames
- Returns:
DataFrame – Merged data frame of all input data frames and self
- Return type:
Example
Merging two data frames.
>>> df1 = DataFrame( ... series={"INPUT_ID_1": [1, 2], "INPUT_ID_2": [3, 4]}, ... times=["2021-11-01T21:50:06Z", "2021-11-02T21:50:06Z"] ... ) >>> df2 = DataFrame( ... series={"INPUT_ID_1": [5, 6], "INPUT_ID_3": [7, 8]}, ... times=["2021-11-01T21:50:06Z", "2021-11-03T21:50:06Z"] ... ) >>> merged_df = DataFrame.merge([df1, df2]) >>> merged_df.to_pandas() ... INPUT_ID_2 INPUT_ID_1 INPUT_ID_3 ... 2021-11-01 21:50:06+00:00 3.0 5.0 7.0 ... 2021-11-02 21:50:06+00:00 4.0 2.0 NaN ... 2021-11-03 21:50:06+00:00 NaN 6.0 8.0
Warning
Notice from the example above that when time series have overlapping timestamps the last data frame overwrites the first.
>>> df1 = DataFrame( ... series={"INPUT_ID_1": [1, 2]}, ... times=["2021-11-01T21:50:06Z", "2021-11-02T21:50:06Z"] ... ) >>> df2 = DataFrame( ... series={"INPUT_ID_1": [5, 6]}, ... times=["2021-11-01T21:50:06Z", "2021-11-03T21:50:06Z"] ... ) >>> DataFrame.merge([df1, df2]) ... INPUT_ID_1 ... 2021-11-01 21:50:06+00:00 5.0 <-- ... 2021-11-02 21:50:06+00:00 2.0 ... 2021-11-03 21:50:06+00:00 6.0 >>> DataFrame.merge([df2, df1]) ... INPUT_ID_1 ... 2021-11-01 21:50:06+00:00 1.0 <-- ... 2021-11-02 21:50:06+00:00 2.0 ... 2021-11-03 21:50:06+00:00 6.0
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'json_encoders': {<class 'datetime.datetime'>: <function time_to_string>}, 'ser_json_inf_nan': 'null'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'series': FieldInfo(annotation=Dict[Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[_PydanticGeneralMetadata(pattern='^^[a-zA-Z0-9-_:.#+/]{1,128}$')])], List[Union[float, int, NoneType]]], required=False), 'times': FieldInfo(annotation=List[datetime], required=False)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- series: Dict[str, List[float | int | None]]
- times: List[datetime]
- to_pandas()[source]
Convert the instance into a pandas DataFrame.
- Returns:
pandas.DataFrame
- Return type:
The pandas DataFrame representing this instance.
Example
>>> from pyclarify import DataFrame >>> data = DataFrame( ... series={"INPUT_ID_1": [1, 2], "INPUT_ID_2": [3, 4]}, ... times=["2021-11-01T21:50:06Z", "2021-11-02T21:50:06Z"] ... ) >>> data.to_pandas() ... INPUT_ID_1 INPUT_ID_2 ... 2021-11-01 21:50:06+00:00 1.0 3.0 ... 2021-11-02 21:50:06+00:00 2.0 4.0