Datasets
counts(*, version=None, per_platform=False)
Retrieve the number of datasets assets in the metadata catalogue.
All parameters must be specified by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
version
|
str | None
|
The version of the endpoint (default is None). |
None
|
per_platform
|
bool
|
Whether to list counts per platform (default is False). |
False
|
Returns:
Type | Description |
---|---|
int | dict[str, int]
|
The number datasets assets in the metadata catalogue. If the parameter per_platform is True, it returns a dictionary with platform names as keys and the number of datasets assets from that platform as values. |
delete(*, identifier, version=None)
Delete datasets from the catalogue.
All parameters must be specified by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
version
|
str | None
|
The version of the endpoint (default is None). |
None
|
Returns:
Type | Description |
---|---|
Response
|
The server response. |
get_asset(identifier, *, version=None, data_format='pandas')
Retrieve metadata for a specific datasets.
All parameters except identifier
must be specified by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
identifier
|
str
|
The identifier of the datasets to retrieve. |
required |
version
|
str | None
|
The version of the endpoint (default is None). |
None
|
data_format
|
Literal['pandas', 'json']
|
The desired format for the response (default is "pandas"). For "json" formats, the returned type is a json decoded type, in this case a dict. |
'pandas'
|
Returns:
Type | Description |
---|---|
Series | dict
|
The retrieved metadata for the specified datasets. |
Raises:
Type | Description |
---|---|
KeyError
|
If the asset cannot be found. |
get_asset_from_platform(*, platform, platform_identifier, version=None, data_format='pandas')
Retrieve metadata for a specific datasets identified by the external platform identifier.
All parameters must be specified by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
platform
|
str
|
The platform where the datasets asset is retrieved from. |
required |
platform_identifier
|
str
|
The identifier under which the datasets is known by the platform. |
required |
version
|
str | None
|
The version of the endpoint (default is None). |
None
|
data_format
|
Literal['pandas', 'json']
|
The desired format for the response (default is "pandas"). For "json" formats, the returned type is a json decoded type, in this case a dict. |
'pandas'
|
Returns:
Type | Description |
---|---|
Series | dict
|
The retrieved metadata for the specified datasets. |
get_assets_async(identifiers, *, version=None, data_format='pandas')
async
Asynchronously retrieve metadata for a list of datasets identifiers.
All parameters except identifiers
must be specified by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
identifiers
|
list[str]
|
The list of identifiers of the datasets to retrieve. |
required |
version
|
str | None
|
The version of the endpoint (default is None). |
None
|
data_format
|
Literal['pandas', 'json']
|
The desired format for the response (default is "pandas"). For "json" formats, the returned type is a json decoded type, in this case a list of dicts. |
'pandas'
|
Returns:
Type | Description |
---|---|
DataFrame | list[dict]
|
The retrieved metadata for the specified datasets. |
get_content(*, identifier, distribution_idx=0, version=None)
Retrieve the data content of a specific datasets.
All parameters must be specified by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
identifier
|
str
|
The identifier of the datasets asset. |
required |
distribution_idx
|
int
|
The index of a specific distribution from the distribution list (default is 0). |
0
|
version
|
str | None
|
The version of the endpoint (default is None). |
None
|
Returns:
Type | Description |
---|---|
bytes
|
The data content for the specified datasets. |
get_list(*, platform=None, offset=0, limit=10, version=None, data_format='pandas')
Retrieve a list of datasets from the catalogue.
All parameters must be specified by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
platform
|
str | None
|
Return metadata of datasets assets of this platform (default is None). |
None
|
offset
|
int
|
The offset for pagination (default is 0). |
0
|
limit
|
int
|
The maximum number of items to retrieve (default is 10). |
10
|
version
|
str | None
|
The version of the endpoint (default is None). |
None
|
data_format
|
Literal['pandas', 'json']
|
The desired format for the response (default is "pandas"). For "json" formats, the returned type is a json decoded type, i.e. in this case a list of dicts. |
'pandas'
|
Returns:
Type | Description |
---|---|
DataFrame | list[dict]
|
The retrieved metadata in the specified format. |
get_list_async(*, offset=0, limit=100, batch_size=10, version=None, data_format='pandas')
async
Asynchronously retrieve a list of datasets from the catalogue in batches.
All parameters must be specified by name.
Returns:
Type | Description |
---|---|
DataFrame | list[dict]
|
The retrieved metadata in the specified format. |
Raises:
Type | Description |
---|---|
ValueError
|
Batch size must be larger than 0. |
register(*, metadata, version=None)
Register datasets in catalogue.
All parameters must be specified by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metadata
|
dict
|
A dictionary with for each attribute a value. |
required |
version
|
str | None
|
If provided, use this version of the REST API instead of |
None
|
Returns:
Name | Type | Description |
---|---|---|
identifier |
str
|
if the asset is registered successfully |
error response: requests.Response
|
error response, if it failed to register successfully |
replace(*, identifier, metadata, version=None)
Replace a datasets in catalogue.
All parameters must be specified by name.
Notes
Any attribute not specified in metadata
will be replaced with the default value!
If you wish to only modify some attributes and keep the values of others, make sure
to provide all asset metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
identifier
|
str
|
The identifier of the asset whose metadata to replace. |
required |
metadata
|
dict
|
A dictionary with for each attribute a value. |
required |
version
|
str | None
|
If provided, use this version of the REST API instead of |
None
|
Returns:
Type | Description |
---|---|
Response
|
The server response. |
Raises:
Type | Description |
---|---|
KeyError if the identifier is not known by the server.
|
|
search(query, *, platforms=None, offset=0, limit=10, search_field=None, get_all=True, version=None, data_format='pandas', asset_type)
Search metadata for datasets type using the Elasticsearch endpoint of the AIoD metadata catalogue.
All parameters except query
must be specified by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search
|
The string to be matched against the search fields. |
required | |
platforms
|
list[str] | None
|
The platforms to filter the search results. If None, results from all platforms will be returned (default is None). |
None
|
offset
|
int
|
The offset for pagination (default is 0). |
0
|
limit
|
int
|
The maximum number of results to retrieve (default is 10). |
10
|
search_field
|
Optional[Literal['name', 'issn', 'description_html', 'description_plain']]
|
The specific fields to search within. If None, the query will be matched against all fields (default is None). |
None
|
get_all
|
bool
|
If true, a request to the database is made to retrieve all data. If false, only the indexed information is returned. (default is True). |
True
|
version
|
str | None
|
The version of the endpoint to use (default is None). |
None
|
data_format
|
Literal['pandas', 'json']
|
The desired format for the response (default is "pandas"). For "json" formats, the returned type is a json decoded type, in this case a list of dict's. |
'pandas'
|
Returns:
Type | Description |
---|---|
DataFrame | list[dict]
|
The retrieved metadata in the specified format. |
update(*, identifier, metadata, version=None)
Update an datasets in catalogue.
All parameters must be specified by name.
Notes
This is a best-effort implementation, but is not yet officially supported by the server.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
identifier
|
str
|
The identifier of the asset whose metadata to replace. |
required |
metadata
|
dict
|
A dictionary with for each attribute a value. |
required |
version
|
str | None
|
If provided, use this version of the REST API instead of |
None
|
Returns:
Type | Description |
---|---|
Response
|
The server response. |
Raises:
Type | Description |
---|---|
KeyError if the identifier is not known by the server.
|
|