ThermoMLDataSet¶
-
class
openff.evaluator.datasets.thermoml.
ThermoMLDataSet
[source]¶ A dataset of physical property measurements created from a ThermoML dataset.
Examples
For example, we can use the DOI 10.1016/j.jct.2005.03.012 as a key for retrieving the dataset from the ThermoML Archive:
>>> dataset = ThermoMLDataSet.from_doi('10.1016/j.jct.2005.03.012')
You can also specify multiple ThermoML Archive keys to create a dataset from multiple ThermoML files:
>>> thermoml_keys = ['10.1021/acs.jced.5b00365', '10.1021/acs.jced.5b00474'] >>> dataset = ThermoMLDataSet.from_doi(*thermoml_keys)
Methods
__init__
()Constructs a new ThermoMLDataSet object.
add_properties
(*physical_properties[, validate])Adds a physical property to the data set.
filter_by_components
(number_of_components)Filter the data set based on the number of components present in the substance the data points were collected for.
filter_by_elements
(*allowed_elements)Filters out those properties which were estimated for
filter_by_function
(filter_function)Filter the data set using a given filter function.
filter_by_phases
(phases)Filter the data set based on the phase of the property (e.g liquid).
filter_by_pressure
(min_pressure, max_pressure)Filter the data set based on a minimum and maximum pressure.
filter_by_property_types
(*property_types)Filter the data set based on the type of property (e.g Density).
filter_by_smiles
(*allowed_smiles)Filters out those properties which were estimated for
filter_by_temperature
(min_temperature, …)Filter the data set based on a minimum and maximum temperature.
Filters out those properties which don’t have their uncertainties reported.
from_doi
(*doi_list)Load a ThermoML data set from a list of DOIs
from_file
(*file_list)Load a ThermoML data set from a list of files
from_json
(file_path)Create this object from a JSON file.
from_url
(*url_list)Load a ThermoML data set from a list of URLs
from_xml
(xml, default_source)Load a ThermoML data set from an xml object.
json
([file_path, format])Creates a JSON representation of this class.
merge
(data_set[, validate])Merge another data set into the current one.
parse_json
(string_contents[, encoding])Parses a typed json string into the corresponding class structure.
properties_by_substance
(substance)A generator which may be used to loop over all of the properties which were measured for a particular substance.
properties_by_type
(property_type)A generator which may be used to loop over all of properties of a particular type, e.g.
Converts a PhysicalPropertyDataSet to a pandas.DataFrame object with columns of
validate
()Checks to ensure that all properties within the set are valid physical property object.
Attributes
A list of all of the properties within this set.
The types of property within this data set.
registered_properties
The sources from which the properties in this data set were gathered.
The substances for which the properties in this data set were collected for.
-
classmethod
from_doi
(*doi_list)[source]¶ Load a ThermoML data set from a list of DOIs
- Parameters
doi_list (str) – The list of DOIs to pull data from
- Returns
The loaded data set.
- Return type
-
classmethod
from_url
(*url_list)[source]¶ Load a ThermoML data set from a list of URLs
- Parameters
url_list (str) – The list of URLs to pull data from
- Returns
The loaded data set.
- Return type
-
classmethod
from_file
(*file_list)[source]¶ Load a ThermoML data set from a list of files
- Parameters
file_list (str) – The list of files to pull data from
- Returns
The loaded data set.
- Return type
-
add_properties
(*physical_properties, validate=True)¶ Adds a physical property to the data set.
- Parameters
physical_properties (PhysicalProperty) – The physical property to add.
validate (bool) – Whether to validate the properties before adding them to the set.
-
filter_by_components
(number_of_components)¶ Filter the data set based on the number of components present in the substance the data points were collected for.
- Parameters
number_of_components (int) – The allowed number of components in the mixture.
Examples
Filter the dataset to only include pure substance properties.
>>> # Load in the data set of properties which will be used for comparisons >>> from openff.evaluator.datasets.thermoml import ThermoMLDataSet >>> data_set = ThermoMLDataSet.from_doi('10.1016/j.jct.2016.10.001') >>> >>> data_set.filter_by_components(number_of_components=1)
-
filter_by_elements
(*allowed_elements)¶ - Filters out those properties which were estimated for
compounds which contain elements outside of those defined in allowed_elements.
- Parameters
allowed_elements (str) – The symbols (e.g. C, H, Cl) of the elements to retain.
-
filter_by_function
(filter_function)¶ Filter the data set using a given filter function.
- Parameters
filter_function (lambda) – The filter function.
-
filter_by_phases
(phases)¶ Filter the data set based on the phase of the property (e.g liquid).
- Parameters
phases (PropertyPhase) – The phase of property which should be retained.
Examples
Filter the dataset to only include liquid properties.
>>> # Load in the data set of properties which will be used for comparisons >>> from openff.evaluator.datasets.thermoml import ThermoMLDataSet >>> data_set = ThermoMLDataSet.from_doi('10.1016/j.jct.2016.10.001') >>> >>> from openff.evaluator.datasets import PropertyPhase >>> data_set.filter_by_temperature(PropertyPhase.Liquid)
-
filter_by_pressure
(min_pressure, max_pressure)¶ Filter the data set based on a minimum and maximum pressure.
- Parameters
min_pressure (pint.Quantity) – The minimum pressure.
max_pressure (pint.Quantity) – The maximum pressure.
Examples
Filter the dataset to only include properties measured between 70-150 kPa.
>>> # Load in the data set of properties which will be used for comparisons >>> from openff.evaluator.datasets.thermoml import ThermoMLDataSet >>> data_set = ThermoMLDataSet.from_doi('10.1016/j.jct.2016.10.001') >>> >>> from openff.evaluator import unit >>> data_set.filter_by_temperature(min_pressure=70*unit.kilopascal, max_temperature=150*unit.kilopascal)
-
filter_by_property_types
(*property_types)¶ Filter the data set based on the type of property (e.g Density).
- Parameters
property_types (PropertyType or str) – The type of property which should be retained.
Examples
Filter the dataset to only contain densities and static dielectric constants
>>> # Load in the data set of properties which will be used for comparisons >>> from openff.evaluator.datasets.thermoml import ThermoMLDataSet >>> data_set = ThermoMLDataSet.from_doi('10.1016/j.jct.2016.10.001') >>> >>> # Filter the dataset to only include densities and dielectric constants. >>> from openff.evaluator.properties import Density, DielectricConstant >>> data_set.filter_by_property_types(Density, DielectricConstant)
or
>>> data_set.filter_by_property_types('Density', 'DielectricConstant')
-
filter_by_smiles
(*allowed_smiles)¶ - Filters out those properties which were estimated for
compounds which do not appear in the allowed smiles list.
- Parameters
allowed_smiles (str) – The smiles identifiers of the compounds to keep after filtering.
-
filter_by_temperature
(min_temperature, max_temperature)¶ Filter the data set based on a minimum and maximum temperature.
- Parameters
min_temperature (pint.Quantity) – The minimum temperature.
max_temperature (pint.Quantity) – The maximum temperature.
Examples
Filter the dataset to only include properties measured between 130-260 K.
>>> # Load in the data set of properties which will be used for comparisons >>> from openff.evaluator.datasets.thermoml import ThermoMLDataSet >>> data_set = ThermoMLDataSet.from_doi('10.1016/j.jct.2016.10.001') >>> >>> from openff.evaluator import unit >>> data_set.filter_by_temperature(min_temperature=130*unit.kelvin, max_temperature=260*unit.kelvin)
-
filter_by_uncertainties
()¶ Filters out those properties which don’t have their uncertainties reported.
-
classmethod
from_json
(file_path)¶ Create this object from a JSON file.
- Parameters
file_path (str) – The path to load the JSON from.
- Returns
The parsed class.
- Return type
cls
-
classmethod
from_xml
(xml, default_source)[source]¶ Load a ThermoML data set from an xml object.
- Parameters
- Returns
The loaded ThermoML data set.
- Return type
-
json
(file_path=None, format=False)¶ Creates a JSON representation of this class.
-
merge
(data_set, validate=True)¶ Merge another data set into the current one.
- Parameters
data_set (PhysicalPropertyDataSet) – The secondary data set to merge into this one.
validate (bool) – Whether to validate the other data set before merging.
-
classmethod
parse_json
(string_contents, encoding='utf8')¶ Parses a typed json string into the corresponding class structure.
-
property
properties
¶ A list of all of the properties within this set.
- Type
tuple of PhysicalProperty
-
properties_by_substance
(substance)¶ A generator which may be used to loop over all of the properties which were measured for a particular substance.
- Parameters
substance (Substance) – The substance of interest.
- Returns
- Return type
generator of PhysicalProperty
-
properties_by_type
(property_type)¶ A generator which may be used to loop over all of properties of a particular type, e.g. all “Density” properties.
- Parameters
property_type (str or type of PhysicalProperty) – The type of property of interest. This may either be the string class name of the property or the class type.
- Returns
- Return type
generator of PhysicalProperty
-
property
property_types
¶ The types of property within this data set.
- Type
set of str
-
property
sources
¶ The sources from which the properties in this data set were gathered.
- Type
set of Source
-
property
substances
¶ The substances for which the properties in this data set were collected for.
- Type
set of Substance
-
to_pandas
()¶ Converts a PhysicalPropertyDataSet to a pandas.DataFrame object with columns of
‘Id’
‘Temperature (K)’
‘Pressure (kPa)’
‘Phase’
‘N Components’
‘Component 1’
‘Role 1’
‘Mole Fraction 1’
‘Exact Amount 1’
…
‘Component N’
‘Role N’
‘Mole Fraction N’
‘Exact Amount N’
‘<Property 1> Value (<default unit>)’
‘<Property 1> Uncertainty / (<default unit>)’
…
‘<Property N> Value / (<default unit>)’
‘<Property N> Uncertainty / (<default unit>)’
‘Source’
where ‘Component X’ is a column containing the smiles representation of component X.
- Returns
The create data frame.
- Return type
-
validate
()¶ Checks to ensure that all properties within the set are valid physical property object.
-
classmethod