Skip to content

Getting Started

This page has information on how to host your own metadata catalogue. If you plan to locally develop the REST API, please follow the installation procedure in "Contributing" after following the instructions on this page.

Prerequisites

The platform is tested on Linux, but should also work on Windows and MacOS. Additionally, it needs Docker and Docker Compose (version 2.21.0 or higher).

Installation

Starting the metadata catalogue is as simple as spinning up the docker containers through docker compose. This means that other than the prerequisites, no installation steps are necessary. However, we do need to fetch files from the latest release of the repository:

git clone https://github.com/aiondemand/AIOD-rest-api.git

It is also possible to clone using SSH. If you plan to develop the metadata catalogue, check the "Contributing" page for more information on this step.

  • Navigate to the project page aiondemand/AIOD-rest-api.
  • Click the green <> Code button and download the ZIP file.
  • Find the downloaded file on disk, and extract the content.

Starting the Metadata Catalogue

From the root of the project directory (i.e., the directory with the docker-compose.yaml file), run:

We provide the following script as a convenience. This is especially useful when running with a non-default or development configuration, more on that later.

./scripts/up.sh

docker compose up -d

This will start a number of services running within one docker network:

  • Database: a MySQL database that contains the metadata.
  • Keycloak: an authentication service, provides login functionality.
  • Metadata Catalogue REST API: The main API service for managing and accessing metadata.
  • Elastic Search: indexes metadata catalogue data for faster keyword searches.
  • Logstash: Loads data into Elastic Search.
  • Deletion: Takes care of cleaning up deleted data.
  • nginx: Redirects network traffic within the docker network.
  • es_logstash_setup: Generates scripts for Logstash and creates Elastic Search indices.

These services are described in more detail in their dedicated pages. After the previous command was executed successfully, you can navigate to localhost and see the REST API documentation. This should look similar to the api.aiod.eu page, but is connected to your local database and services.

Starting Connector Services

To start connector services that automatically index data from external platforms into the metadata catalogue, you must specify their docker-compose profiles (as defined in the docker-compose.yaml file). Their configuration, if any, is through environment variables which can be set in the override.env file as explained in "Configuring the Metadata Catalogue". For example, you can use the following commands when starting the connectors for OpenML and Zenodo.

./scripts/up.sh openml zenodo-datasets
docker compose --profile openml --profile zenodo-datasets --env-file=.env --env-file=override.env up -d

Connectors and Syncing Nodes

If you are configuring your metadata catalogue as part of a set of root nodes, only one of the root nodes should be running the connectors. Running the same connector on multiple root nodes may introduce conflicts.

The full list of connector profiles are:

For more information, see the "Connectors" page.

Configuration

There are two main places to configure the metadata catalogue services: environment variables configured in .env files, and REST API configuration in a .toml file. The default files are ./.env and ./src/config.default.toml shown below.

If you want to use non-default values, we strongly encourage you not to overwrite the contents of these files. Instead, you can create ./override.env and ./src/config.override.toml files to override those files. When using the ./scripts/up.sh script to launch your services, these overrides are automatically taken into account.

# Configures the REST API
# TODO: refactor configuration (https://github.com/aiondemand/AIOD-rest-api/issues/82)

# Information on which database to connect to
[database]
host = "sqlserver"
port = 3306
database = "aiod"
username = "root"
password = "ok"

# Additional options for development
[dev]
reload = true
request_timeout = 10  # seconds
log_level = "INFO"  # Python log levels: https://docs.python.org/3/library/logging.html#logging-levels

# Authentication and authorization
[keycloak]
server_url = "http://keycloak:8080/aiod-auth/"
realm = "aiod"
client_id = "aiod-api"  # a private client, used by the backend
client_id_swagger = "aiod-api-swagger"  # a public client, used by the Swagger Frontend
openid_connect_url = "http://localhost/aiod-auth/realms/aiod/.well-known/openid-configuration"
scopes = "openid profile roles"
# DEV
# Override USE_LOCAL_DEV to "true" to automatically mount your
# source code to the containers when using `scripts/up.sh`.
USE_LOCAL_DEV=

# REST API
AIOD_REST_PORT=8000

#MYSQL
MYSQL_ROOT_PASSWORD=ok

#KEYCLOAK
HOSTNAME=localhost
KEYCLOAK_ADMIN=admin
KEYCLOAK_ADMIN_PASSWORD=password
KEYCLOAK_CLIENT_SECRET="QJiOGn09eCEfnqAmcPP2l4vMU8grlmVQ"
REDIRECT_URIS=http://${HOSTNAME}/docs/oauth2-redirect
POST_LOGOUT_REDIRECT_URIS=http://${HOSTNAME}/aiod-auth/realms/aiod/protocol/openid-connect/logout
AIOD_KEYCLOAK_PORT=8080
REVIEWER_ROLE_NAME=review_aiod_resources

EGICHECKINALIAS=

#AIBUILDER
AIBUILDER_API_TOKEN=""

#ELASTICSEARCH
ES_USER=elastic
ES_PASSWORD=changeme
ES_DISCOVERY_TYPE=single-node
ES_JAVA_OPTS="-Xmx256m -Xms256m"
AIOD_ES_HTTP_PORT=9200
AIOD_ES_TRANSPORT_PORT=9300

#LOGSTASH
LS_JAVA_OPTS="-Xmx256m -Xms256m"
AIOD_LOGSTASH_BEATS_PORT=5044
AIOD_LOGSTASH_PORT=5001
AIOD_LOGSTASH_API_PORT=9600

#NGINX
AIOD_NGINX_PORT=80

#DATA STORAGE
DATA_PATH=./data
BACKUP_PATH=./data/backups

If you do not use ./scripts/up.sh you can make sure the environment files are included by specifying them in your docker compose up call, e.g.: docker compose --env-file=.env --env-file=override.env up. Note that order is important, later environment files will override earlier ones.

Overwriting .env or src/config.default.toml directly will likely complicate updating to newer releases due to merge conflicts.

Updating to New Releases

First, stop running services:

./scripts/down.sh
Then get the new release:
git fetch origin
git checkout vX.Y.Z
A new release might come with a database migration. If that is the case, follow the instructions in "Database Schema Migration" below. The database schema migration must be performed before resuming operations.

Then run the startup commands again (either up.sh or docker compose).

Creating the Database

By default, the server will create a database on the provided MySQL server if it does not yet exist. You can change this behavior through the build-db command-line parameter, it takes the following options: * never: never creates the database, not even if there does not exist one yet. Use this only if you expect the database to be created through other means, such as MySQL group replication. * if-absent: Creates a database only if none exists. (default) * drop-then-build: Drops the database on startup to recreate it from scratch. THIS REMOVES ALL DATA PERMANENTLY. NO RECOVERY POSSIBLE.

Populating the Database

To populate the database with some examples, run the connectors/fill-examples.sh script. When using docker compose you can easily do this by running the "examples" profile: docker compose --profile examples up

Database Schema Migration

We use Alembic to automate database schema migrations (e.g., adding a table, altering a column, and so on). Please refer to the Alembic documentation for more information. Commands below assume that the root directory of the project is your current working directory.

Warning

Database migrations may be irreversible. Always make sure there is a backup of the old database.

Build the database schema migration docker image with:

docker build -f alembic/Dockerfile . -t aiod-migration

With the sqlserver container running, you can migrate to the latest schema with

docker run -v $(pwd)/alembic:/alembic:ro  -v $(pwd)/src:/app -it --network aiod-rest-api_default  aiod-migration

since the default entrypoint of the container specifies to upgrade the database to the latest schema.

Make sure that the specified --network is the docker network that has the sqlserver container. The alembic directory is mounted to ensure the latest migrations are available, the src directory is mounted so the migration scripts can use defined classes and variable from the project.

Using connectors

You can start different connectors using their profiles, e.g.:

docker compose --profile examples --profile huggingface-datasets --profile openml --profile zenodo-datasets up -d
docker compose --profile examples --profile huggingface-datasets --profile openml --profile zenodo-datasets down

Make sure you use the same profile for up and down, or use ./scripts/down.sh (see below), otherwise some containers might keep running.

Shorthands

We provide two auxiliary scripts for launching docker containers and bringing them down. The first, ./scripts/up.sh invokes docker compose up -d and takes any number of profiles to launch as parameters. It will also ensure that the changes of the configurations (see above) are observed. If USE_LOCAL_DEV is set to true (e.g., in override.env) then your local source code will be mounted on the containers, this is useful for local development but should not be used in production. E.g., with USE_LOCAL_DEV set to true, ./scripts/up.sh resolves to: docker compose --env-file=.env --env-file=override.env -f docker-compose.yaml -f docker-compose.dev.yaml --profile examples up -d

The second script is a convenience for bringing down all services, including all profiles: ./scripts/down.sh