Getting Started
This page has information on how to host your own metadata catalogue. If you plan to locally develop the REST API, please follow the installation procedure in "Contributing" after following the instructions on this page.
The platform is tested on Linux, but should also work on Windows and MacOS. Additionally, it needs Docker and Docker Compose (version 2.21.0 or higher).
Starting the metadata catalogue is as simple as spinning up the docker containers through docker compose. This means that other than the prerequisites, no installation steps are necessary. However, we do need to fetch files from the latest release of the repository:
git clone
It is also possible to clone using SSH. If you plan to develop the metadata catalogue, check the "Contributing" page for more information on this step.
- Navigate to the project page aiondemand/AIOD-rest-api.
- Click the green
<> Code
button and download theZIP
file. - Find the downloaded file on disk, and extract the content.
Starting the Metadata Catalogue
From the root of the project directory (i.e., the directory with the docker-compose.yaml
file), run:
We provide the following script as a convenience. This is especially useful when running with a non-default or development configuration, more on that later.
docker compose up -d
This will start a number of services running within one docker network:
- Database: a MySQL database that contains the metadata.
- Keycloak: an authentication service, provides login functionality.
- Metadata Catalogue REST API:
- Elastic Search: indexes metadata catalogue data for faster keyword searches.
- Logstash: Loads data into Elastic Search.
- Deletion: Takes care of cleaning up deleted data.
- nginx: Redirects network traffic within the docker network.
- es_logstash_setup: Generates scripts for Logstash and creates Elastic Search indices.
These services are described in more detail in their dedicated pages. After the previous command was executed successfully, you can navigate to localhost and see the REST API documentation. This should look similar to the page, but is connected to your local database and services.
Starting Connector Services
To start connector services that automatically index data from external platforms into the metadata catalogue,
you must specify their docker-compose profiles (as defined in the docker-compose.yaml
Their configuration, if any, is through environment variables which can be set in the override.env
file as explained in "Configuring the Metadata Catalogue".
For example, you can use the following commands when starting the connectors for OpenML and Zenodo.
./scripts/ openml zenodo-datasets
docker compose --profile openml --profile zenodo-datasets --env-file=.env --env-file=override.env up -d
Connectors and Syncing Nodes
If you are configuring your metadata catalogue as part of a set of root nodes, only one of the root nodes should be running the connectors. Running the same connector on multiple root nodes may introduce conflicts.
The full list of connector profiles are:
- aibuilder: indexes models on AI Builder
- huggingface-datasets: indexes datasets from Hugging Face.
For more information, see the "Connectors" page.
There are two main places to configure the metadata catalogue services:
environment variables configured in .env
files, and REST API configuration in a .toml
The default files are ./.env
and ./src/config.default.toml
shown below.
If you want to use non-default values, we strongly encourage you not to overwrite the contents of these files.
Instead, you can create ./override.env
and ./src/config.override.toml
files to override those files.
When using the ./scripts/
script to launch your services, these overrides are automatically taken into account.
# Configures the REST API
# Information on which database to connect to
host = "sqlserver"
port = 3306
database = "aiod"
username = "root"
password = "ok"
# Additional options for development
reload = true
request_timeout = 10 # seconds
log_level = "INFO" # Python log levels:
# Authentication and authorization
server_url = "http://keycloak:8080/aiod-auth/"
realm = "aiod"
client_id = "aiod-api" # a private client, used by the backend
client_id_swagger = "aiod-api-swagger" # a public client, used by the Swagger Frontend
openid_connect_url = "http://localhost/aiod-auth/realms/aiod/.well-known/openid-configuration"
scopes = "openid profile roles"
role = "edit_aiod_resources"
# Override USE_LOCAL_DEV to "true" to automatically mount your
# source code to the containers when using `scripts/`.
ES_JAVA_OPTS="-Xmx256m -Xms256m"
LS_JAVA_OPTS="-Xmx256m -Xms256m"
Overwriting these files directly will likely complicate updating to newer releases due to merge conflicts.
Updating to New Releases
First, stop running services:
git fetch origin
git checkout vX.Y.Z
Then run the startup commands again (either
or docker compose
Database Schema Migration
We use Alembic to automate database schema migrations (e.g., adding a table, altering a column, and so on). Please refer to the Alembic documentation for more information. Commands below assume that the root directory of the project is your current working directory.
Database migrations may be irreversible. Always make sure there is a backup of the old database.
Build the database schema migration docker image with:
docker build -f alembic/Dockerfile . -t aiod-migration
With the sqlserver container running, you can migrate to the latest schema with
docker run -v $(pwd)/alembic:/alembic:ro -v $(pwd)/src:/app -it --network aiod-rest-api_default aiod-migration
since the default entrypoint of the container specifies to upgrade the database to the latest schema.
Make sure that the specified --network
is the docker network that has the sqlserver
The alembic directory is mounted to ensure the latest migrations are available,
the src directory is mounted so the migration scripts can use defined classes and variable from the project.