Skip to content

Smart Datasets (🧪Beta)

albert.resources.smart_datasets

Attributes:

Name Type Description
SmartDatasetVariable

SmartDatasetVariable

SmartDatasetVariable = Annotated[
    MaterialAmountVariable
    | ParameterVariable
    | MoleculeVariable
    | PropertyVariable,
    Field(discriminator="type"),
]

SmartDatasetBuildState

Bases: str, Enum

The build state of a smart dataset.

Attributes:

Name Type Description
BUILDING
READY
FAILED

BUILDING

BUILDING = 'building'

READY

READY = 'ready'

FAILED

FAILED = 'failed'

SmartDatasetScope

Bases: BaseAlbertModel

Represents the scope of a smart dataset.

Attributes:

Name Type Description
project_ids list[ProjectId]

List of project IDs.

target_ids list[TargetId]

List of target IDs.

sheet_ids list[WorksheetId] | None

List of worksheet IDs. If None, all worksheets in the projects will be used.

target_parent_ids dict[TargetId, ProjectId] | None

Optional mapping from target ID to a parent project ID. When set, the target inherits its ACL policy from the referenced project.

Show JSON schema:
{
  "description": "Represents the scope of a smart dataset.\n\nAttributes\n----------\nproject_ids : list[ProjectId]\n    List of project IDs.\ntarget_ids : list[TargetId]\n    List of target IDs.\nsheet_ids : list[WorksheetId] | None\n    List of worksheet IDs. If None, all worksheets in the projects will be used.\ntarget_parent_ids : dict[TargetId, ProjectId] | None\n    Optional mapping from target ID to a parent project ID. When set, the target\n    inherits its ACL policy from the referenced project.",
  "properties": {
    "projectIds": {
      "items": {
        "type": "string"
      },
      "title": "Projectids",
      "type": "array"
    },
    "targetIds": {
      "items": {
        "type": "string"
      },
      "title": "Targetids",
      "type": "array"
    },
    "sheetIds": {
      "anyOf": [
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Sheetids"
    },
    "targetParentIds": {
      "anyOf": [
        {
          "additionalProperties": {
            "type": "string"
          },
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "title": "Targetparentids"
    }
  },
  "title": "SmartDatasetScope",
  "type": "object"
}

Fields:

Validators:

project_ids

project_ids: list[ProjectId]

target_ids

target_ids: list[TargetId]

sheet_ids

sheet_ids: list[WorksheetId] | None = None

target_parent_ids

target_parent_ids: dict[TargetId, ProjectId] | None

filter_invalid_sheet_ids

filter_invalid_sheet_ids(v)
Source code in src/albert/resources/smart_datasets.py
@field_validator("sheet_ids", mode="before")
@classmethod
def filter_invalid_sheet_ids(cls, v):
    if v is None:
        return v
    valid = [sid for sid in v if isinstance(sid, str) and sid.upper().startswith("WKS")]
    return valid or None

SmartDataset

Bases: BaseResource

Represents a smart dataset entity.

Attributes:

Name Type Description
id SmartDatasetId | None

The unique identifier of the smart dataset.

parent_id ProjectId | None

The ID of the parent project this smart dataset belongs to. When set, the smart dataset inherits its ACL policy from the referenced project.

scope SmartDatasetScope | None

The dataset scope containing project, target, and sheet IDs.

schema_ dict | None

The dataset schema.

storage_key str | None

The storage key for the dataset.

type

type: Literal['smart'] = 'smart'

id

id: SmartDatasetId | None = Field(default=None)

parent_id

parent_id: ProjectId | None = Field(
    default=None, alias="parentId"
)

build_state

build_state: SmartDatasetBuildState | None = Field(
    default=None, alias="buildState"
)

scope

scope: SmartDatasetScope | None = Field(default=None)

schema_

schema_: dict | None = Field(default=None, alias='schema')

storage_key

storage_key: str | None = Field(
    default=None, alias="storageKey"
)

SmartDatasetAggregateBy

Bases: str, Enum

The aggregation level for smart dataset experiment data.

Methods:

Name Description
to_api_value
from_api_value

Attributes:

Name Type Description
INV
LOT
WFL
PTD

INV

INV = 'inv'

LOT

LOT = 'lot'

WFL

WFL = 'wfl'

PTD

PTD = 'ptd'

to_api_value

to_api_value() -> str
Source code in src/albert/resources/smart_datasets.py
def to_api_value(self) -> str:
    return {
        SmartDatasetAggregateBy.INV: "inventory",
        SmartDatasetAggregateBy.LOT: "lot",
        SmartDatasetAggregateBy.WFL: "workflow",
        SmartDatasetAggregateBy.PTD: "measurement",
    }[self.value]

from_api_value

from_api_value(value: str) -> SmartDatasetAggregateBy
Source code in src/albert/resources/smart_datasets.py
@staticmethod
def from_api_value(value: str) -> "SmartDatasetAggregateBy":
    return {
        "inventory": SmartDatasetAggregateBy.INV,
        "lot": SmartDatasetAggregateBy.LOT,
        "workflow": SmartDatasetAggregateBy.WFL,
        "measurement": SmartDatasetAggregateBy.PTD,
    }[value]

SmartDatasetVariableDataType

Bases: str, Enum

The data type of a smart dataset variable.

Attributes:

Name Type Description
NUMERIC
CATEGORICAL
MOLECULAR
BOOLEAN

NUMERIC

NUMERIC = 'numeric'

CATEGORICAL

CATEGORICAL = 'categorical'

MOLECULAR

MOLECULAR = 'molecular'

BOOLEAN

BOOLEAN = 'boolean'

SmartDatasetRecordIdentifier

Bases: BaseAlbertModel

An identifier for a record in a smart dataset experiment data matrix.

The same shape is used across all aggregation levels (inventory, material, experiment, measurement); fields that don't apply at a given level are left unset.

Attributes:

Name Type Description
type str

The identifier type (e.g., albert_inventory, albert_material).

inventory_id str

The inventory ID of the record.

key str | None

The unique key of the identifier.

lot_id str | None

The lot ID, if applicable.

workflow_interval str | None

The workflow interval, if applicable.

task_id str | None

The task ID, if applicable.

property_data_id str | None

The property data ID, if applicable.

Show JSON schema:
{
  "description": "An identifier for a record in a smart dataset experiment data matrix.\n\nThe same shape is used across all aggregation levels (inventory, material,\nexperiment, measurement); fields that don't apply at a given level are left\nunset.\n\nAttributes\n----------\ntype : str\n    The identifier type (e.g., ``albert_inventory``, ``albert_material``).\ninventory_id : str\n    The inventory ID of the record.\nkey : str | None\n    The unique key of the identifier.\nlot_id : str | None\n    The lot ID, if applicable.\nworkflow_interval : str | None\n    The workflow interval, if applicable.\ntask_id : str | None\n    The task ID, if applicable.\nproperty_data_id : str | None\n    The property data ID, if applicable.",
  "properties": {
    "type": {
      "title": "Type",
      "type": "string"
    },
    "inventory_id": {
      "title": "Inventory Id",
      "type": "string"
    },
    "key": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Key"
    },
    "lot_id": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Lot Id"
    },
    "workflow_interval": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Workflow Interval"
    },
    "task_id": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Task Id"
    },
    "property_data_id": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Property Data Id"
    }
  },
  "required": [
    "type",
    "inventory_id"
  ],
  "title": "SmartDatasetRecordIdentifier",
  "type": "object"
}

Fields:

type

type: str

inventory_id

inventory_id: str

key

key: str | None = None

lot_id

lot_id: str | None = None

workflow_interval

workflow_interval: str | None = None

task_id

task_id: str | None = None

property_data_id

property_data_id: str | None = None

MaterialAmountVariable

Bases: _BaseVariable

A material amount variable.

Show JSON schema:
{
  "description": "A material amount variable.",
  "properties": {
    "key": {
      "title": "Key",
      "type": "string"
    },
    "name": {
      "title": "Name",
      "type": "string"
    },
    "type": {
      "const": "material_amount",
      "default": "material_amount",
      "title": "Type",
      "type": "string"
    },
    "data_type": {
      "const": "numeric",
      "default": "numeric",
      "title": "Data Type",
      "type": "string"
    }
  },
  "required": [
    "key",
    "name"
  ],
  "title": "MaterialAmountVariable",
  "type": "object"
}

Fields:

type

type: Literal['material_amount'] = 'material_amount'

data_type

data_type: Literal[NUMERIC] = NUMERIC

ParameterVariable

Bases: _BaseVariable

A parameter variable.

Show JSON schema:
{
  "$defs": {
    "SmartDatasetVariableDataType": {
      "description": "The data type of a smart dataset variable.",
      "enum": [
        "numeric",
        "categorical",
        "molecular",
        "boolean"
      ],
      "title": "SmartDatasetVariableDataType",
      "type": "string"
    }
  },
  "description": "A parameter variable.",
  "properties": {
    "key": {
      "title": "Key",
      "type": "string"
    },
    "name": {
      "title": "Name",
      "type": "string"
    },
    "type": {
      "const": "parameter",
      "default": "parameter",
      "title": "Type",
      "type": "string"
    },
    "data_type": {
      "$ref": "#/$defs/SmartDatasetVariableDataType"
    },
    "sources": {
      "items": {
        "enum": [
          "property",
          "batch",
          "process_design"
        ],
        "type": "string"
      },
      "title": "Sources",
      "type": "array"
    }
  },
  "required": [
    "key",
    "name",
    "data_type"
  ],
  "title": "ParameterVariable",
  "type": "object"
}

Fields:

type

type: Literal['parameter'] = 'parameter'

data_type

sources

sources: list[
    Literal["property", "batch", "process_design"]
]

MoleculeVariable

Bases: _BaseVariable

A molecule variable.

Show JSON schema:
{
  "description": "A molecule variable.",
  "properties": {
    "key": {
      "title": "Key",
      "type": "string"
    },
    "name": {
      "title": "Name",
      "type": "string"
    },
    "type": {
      "const": "molecule",
      "default": "molecule",
      "title": "Type",
      "type": "string"
    },
    "data_type": {
      "const": "molecular",
      "default": "molecular",
      "title": "Data Type",
      "type": "string"
    }
  },
  "required": [
    "key",
    "name"
  ],
  "title": "MoleculeVariable",
  "type": "object"
}

Fields:

type

type: Literal['molecule'] = 'molecule'

data_type

data_type: Literal[MOLECULAR] = MOLECULAR

PropertyVariable

Bases: _BaseVariable

A property variable.

Show JSON schema:
{
  "$defs": {
    "SmartDatasetVariableDataType": {
      "description": "The data type of a smart dataset variable.",
      "enum": [
        "numeric",
        "categorical",
        "molecular",
        "boolean"
      ],
      "title": "SmartDatasetVariableDataType",
      "type": "string"
    }
  },
  "description": "A property variable.",
  "properties": {
    "key": {
      "title": "Key",
      "type": "string"
    },
    "name": {
      "title": "Name",
      "type": "string"
    },
    "type": {
      "const": "property",
      "default": "property",
      "title": "Type",
      "type": "string"
    },
    "data_type": {
      "$ref": "#/$defs/SmartDatasetVariableDataType"
    }
  },
  "required": [
    "key",
    "name",
    "data_type"
  ],
  "title": "PropertyVariable",
  "type": "object"
}

Fields:

type

type: Literal['property'] = 'property'

data_type

SmartDatasetData

Bases: BaseAlbertModel

The experiment data matrix for a smart dataset.

Attributes:

Name Type Description
aggregate_by SmartDatasetAggregateBy

The aggregation level of the returned data.

identifiers list[SmartDatasetRecordIdentifier]

The identifier metadata for each row index entry.

variables list[SmartDatasetVariable]

The variable metadata for each column entry.

data OrientTightDataFrame

The experiment data values.

uncertainty OrientTightDataFrame | None

The associated uncertainty values, if available.

counts OrientTightDataFrame | None

The associated observation counts, if available.

Show JSON schema:
{
  "$defs": {
    "MaterialAmountVariable": {
      "description": "A material amount variable.",
      "properties": {
        "key": {
          "title": "Key",
          "type": "string"
        },
        "name": {
          "title": "Name",
          "type": "string"
        },
        "type": {
          "const": "material_amount",
          "default": "material_amount",
          "title": "Type",
          "type": "string"
        },
        "data_type": {
          "const": "numeric",
          "default": "numeric",
          "title": "Data Type",
          "type": "string"
        }
      },
      "required": [
        "key",
        "name"
      ],
      "title": "MaterialAmountVariable",
      "type": "object"
    },
    "MoleculeVariable": {
      "description": "A molecule variable.",
      "properties": {
        "key": {
          "title": "Key",
          "type": "string"
        },
        "name": {
          "title": "Name",
          "type": "string"
        },
        "type": {
          "const": "molecule",
          "default": "molecule",
          "title": "Type",
          "type": "string"
        },
        "data_type": {
          "const": "molecular",
          "default": "molecular",
          "title": "Data Type",
          "type": "string"
        }
      },
      "required": [
        "key",
        "name"
      ],
      "title": "MoleculeVariable",
      "type": "object"
    },
    "ParameterVariable": {
      "description": "A parameter variable.",
      "properties": {
        "key": {
          "title": "Key",
          "type": "string"
        },
        "name": {
          "title": "Name",
          "type": "string"
        },
        "type": {
          "const": "parameter",
          "default": "parameter",
          "title": "Type",
          "type": "string"
        },
        "data_type": {
          "$ref": "#/$defs/SmartDatasetVariableDataType"
        },
        "sources": {
          "items": {
            "enum": [
              "property",
              "batch",
              "process_design"
            ],
            "type": "string"
          },
          "title": "Sources",
          "type": "array"
        }
      },
      "required": [
        "key",
        "name",
        "data_type"
      ],
      "title": "ParameterVariable",
      "type": "object"
    },
    "PropertyVariable": {
      "description": "A property variable.",
      "properties": {
        "key": {
          "title": "Key",
          "type": "string"
        },
        "name": {
          "title": "Name",
          "type": "string"
        },
        "type": {
          "const": "property",
          "default": "property",
          "title": "Type",
          "type": "string"
        },
        "data_type": {
          "$ref": "#/$defs/SmartDatasetVariableDataType"
        }
      },
      "required": [
        "key",
        "name",
        "data_type"
      ],
      "title": "PropertyVariable",
      "type": "object"
    },
    "SmartDatasetAggregateBy": {
      "description": "The aggregation level for smart dataset experiment data.",
      "enum": [
        "inv",
        "lot",
        "wfl",
        "ptd"
      ],
      "title": "SmartDatasetAggregateBy",
      "type": "string"
    },
    "SmartDatasetRecordIdentifier": {
      "description": "An identifier for a record in a smart dataset experiment data matrix.\n\nThe same shape is used across all aggregation levels (inventory, material,\nexperiment, measurement); fields that don't apply at a given level are left\nunset.\n\nAttributes\n----------\ntype : str\n    The identifier type (e.g., ``albert_inventory``, ``albert_material``).\ninventory_id : str\n    The inventory ID of the record.\nkey : str | None\n    The unique key of the identifier.\nlot_id : str | None\n    The lot ID, if applicable.\nworkflow_interval : str | None\n    The workflow interval, if applicable.\ntask_id : str | None\n    The task ID, if applicable.\nproperty_data_id : str | None\n    The property data ID, if applicable.",
      "properties": {
        "type": {
          "title": "Type",
          "type": "string"
        },
        "inventory_id": {
          "title": "Inventory Id",
          "type": "string"
        },
        "key": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Key"
        },
        "lot_id": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Lot Id"
        },
        "workflow_interval": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Workflow Interval"
        },
        "task_id": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Task Id"
        },
        "property_data_id": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Property Data Id"
        }
      },
      "required": [
        "type",
        "inventory_id"
      ],
      "title": "SmartDatasetRecordIdentifier",
      "type": "object"
    },
    "SmartDatasetVariableDataType": {
      "description": "The data type of a smart dataset variable.",
      "enum": [
        "numeric",
        "categorical",
        "molecular",
        "boolean"
      ],
      "title": "SmartDatasetVariableDataType",
      "type": "string"
    }
  },
  "description": "The experiment data matrix for a smart dataset.\n\nAttributes\n----------\naggregate_by : SmartDatasetAggregateBy\n    The aggregation level of the returned data.\nidentifiers : list[SmartDatasetRecordIdentifier]\n    The identifier metadata for each row index entry.\nvariables : list[SmartDatasetVariable]\n    The variable metadata for each column entry.\ndata : OrientTightDataFrame\n    The experiment data values.\nuncertainty : OrientTightDataFrame | None\n    The associated uncertainty values, if available.\ncounts : OrientTightDataFrame | None\n    The associated observation counts, if available.",
  "properties": {
    "aggregate_by": {
      "$ref": "#/$defs/SmartDatasetAggregateBy"
    },
    "identifiers": {
      "items": {
        "$ref": "#/$defs/SmartDatasetRecordIdentifier"
      },
      "title": "Identifiers",
      "type": "array"
    },
    "variables": {
      "items": {
        "discriminator": {
          "mapping": {
            "material_amount": "#/$defs/MaterialAmountVariable",
            "molecule": "#/$defs/MoleculeVariable",
            "parameter": "#/$defs/ParameterVariable",
            "property": "#/$defs/PropertyVariable"
          },
          "propertyName": "type"
        },
        "oneOf": [
          {
            "$ref": "#/$defs/MaterialAmountVariable"
          },
          {
            "$ref": "#/$defs/ParameterVariable"
          },
          {
            "$ref": "#/$defs/MoleculeVariable"
          },
          {
            "$ref": "#/$defs/PropertyVariable"
          }
        ]
      },
      "title": "Variables",
      "type": "array"
    },
    "data": {
      "title": "Data",
      "type": "object"
    },
    "uncertainty": {
      "anyOf": [
        {
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Uncertainty"
    },
    "counts": {
      "anyOf": [
        {
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Counts"
    }
  },
  "required": [
    "aggregate_by",
    "data"
  ],
  "title": "SmartDatasetData",
  "type": "object"
}

Fields:

aggregate_by

identifiers

variables

data

data: OrientTightDataFrame

uncertainty

uncertainty: OrientTightDataFrame | None = None

counts

counts: OrientTightDataFrame | None = None