Running Marquez on AWS
This guide helps you deploy and manage Marquez on AWS EKS.
PREREQUISITES
AWS EKS Cluster
To create an AWS EKS cluster, please follow the steps outlined in the AWS EKS documentation.
CONNECT TO AWS EKS CLUSTER
-
Make sure you have configured your AWS CLI, then create or update the kubeconfig file for your cluster:
$ aws eks --region <AWS-REGION> update-kubeconfig --name <AWS-EKS-CLUSTER>
-
Verify that the context has been switched:
$ kubectl config current-context arn:aws:eks:<AWS-REGION>:<AWS-ACCOUNT-ID>:cluster/<AWS-EKS-CLUSTER>
-
Using
kubectl
, verify that you can connect to your cluster:$ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE svc/kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 1m
Note: If you’re having issues connecting to your cluster, please see Why can’t I connect to my AWS EKS cluster?
AWS RDS
Next, create an AWS RDS instance as outlined in the AWS RDS documentation. This database will be used to store dataset, job, and run metadata collected as OpenLineage events via the Marquez HTTP API.
CREATE AWS RDS DATABASE
- Navigate to the AWS RDS page and create a PostgreSQL database, leaving the database template as Production.
- Use
marquez
as the database identifier and set the master username tomarquez
. - Choose a master password to use later in your Helm deployment (see password in
values.yaml
). - Leave public access to the database off.
- Choose the same VPC where your AWS EKS cluster resides.
- In a separate tab, navigate to the AWS EKS cluster page and make note of the security group attached to your cluster.
- Navigate back to the AWS RDS page and, in the security group section, add the AWS EKS cluster’s security group from step 6.
- Next, under the Additional Configuration tab, enter
marquez
as the initial database name. - Finally, select Create Database.
CONNECT TO AWS RDS DATABASE
-
Create a
marquez
namespace:$ kubectl create namespace marquez
-
Next, run the following command with your AWS RDS
host
,username
, andpassword
:kubectl run pgsql-postgresql-client --rm --tty -i --restart='Never' \ --namespace marquez \ --image docker.io/bitnami/postgresql:12-debian-10 \ --env="PGPASSWORD=<AWS-RDS-PASSWORD>" \ --command -- psql marquez --host <AWS-RDS-HOST> -U <AWS-RDS-USERNAME> -d marquez -p 5432
Deploy Marquez on AWS EKS
INSTALLING MARQUEZ
-
Get Marquez:
$ git clone git@github.com:MarquezProject/marquez.git && cd chart
-
Install Marquez:
helm upgrade --install marquez . --set marquez.db.host=<AWS-RDS-HOST> --set marquez.db.user=<AWS-RDS-USERNAME> --set marquez.db.password=<AWS-RDS-PASSWORD> --namespace marquez --atomic --wait
Note: To avoid overriding deployment settings via the command line, update the marquez.db section of the Marquez Helm chart’s
values.yaml
to include the AWS RDShost
,username
, andpassword
in your deployment. -
Verify all the pods have come up correctly:
$ kubectl get pods --namespace marquez
UNINSTALLING MARQUEZ
helm uninstall marquez --namespace marquez
SPDX-License-Identifier: Apache-2.0 Copyright 2018-2023 contributors to the Marquez project.