Create and Connect a Remote GKE Cluster to Data Lakehouse
ভূমিকা
Ilum empowers you to manage a powerful multi-cluster setup from a single, central control plane. While Ilum automates the deployment and configuration of its core components, setting up the underlying infrastructure requires precise coordination.
This guide provides a comprehensive walkthrough for setting up a multi-cluster architecture on গুগল কুবারনেটস ইঞ্জিন (জিকেই) . You will learn how to:
- Provision a central control plane (Master Cluster).
- Set up a dedicated execution environment (Remote Cluster).
- Establish secure communication between them using client certificates and ingress rules.
This guide walks you through the steps required to launch your first Ilum Job on a remote cluster. We use Google Kubernetes Engine (GKE) as an example, but you can follow the same flow with any Kubernetes distribution.
পূর্বশর্ত
Before starting the tutorial, make sure you have:
- Access to a Google Cloud project with billing enabled.
- কুবেক্টল installed and configured on your machine (version compatible with your GKE cluster).
- হেলম installed (v3+).
- ঐ Google Cloud CLI (
gcloud) installed and initialized (you can rungcloud auth loginএবংgcloud config listwithout errors). - ঐ
gke-gcloud-auth-plugininstalled and available in yourPATHso thatকুবেক্টলcan authenticate to GKE clusters. - Permissions in the target Google Cloud project to:
- create and manage GKE clusters (e.g. Kubernetes Engine Cluster Admin or equivalent),
- create and use Cloud Storage buckets if you plan to use GCS for data.
What you'll accomplish in this guide:
| Step | Task | Purpose |
|---|---|---|
| 1 | Create two GKE clusters | Set up master (control plane) and remote (job execution) |
| 2 | Install Ilum on master | Deploy Ilum's core components |
| 3 | Set up authentication | Create secure credentials for remote cluster access |
| 4 | Register remote cluster | Add cluster to Ilum's management interface |
| 5 | Configure networking | Enable communication between clusters |
| 6 | Run your first job | Verify the multi-cluster setup works |
Step 1. Provision Master and Remote GKE Clusters
The foundation of a multi-cluster setup consists of two distinct entities:
- Master Cluster: Hosts the Ilum control plane (UI, API, Scheduler).
- Remote Cluster: dedicated environment where Ilum executes the Spark Jobs dispatched from the master.
একটি প্রকল্প তৈরি করুন
- Open Google Cloud Console.
- Click the Project selector in the top-left corner.
- টিপুন New Project.
- Enter a project name and (if applicable) select an Organization/Folder.
- টিপুন তৈরি .
- Select the newly created project in the project selector.
Enable Google Kubernetes Engine API
- In the Console search bar, type কুবারনেটস ইঞ্জিন .
- Open কুবারনেটস ইঞ্জিন .
- টিপুন সক্ষম to enable the Google Kubernetes Engine API for the selected project.
Switch to the chosen project in gcloud
- In the Console, open the project selector and copy the Project ID.
- In your terminal, set this project as active:
জিক্লাউড কনফিগারেশন সেট প্রকল্প PROJECT_ID
- (Optional) If you plan to create clusters in a specific region often, set a default region:
gcloud config set compute/region europe-central2
This avoids errors requiring --region/ --zone.
একটি ক্লাস্টার তৈরি করুন
Create the master cluster first:
জিক্লাউড কনটেইনার ক্লাস্টারগুলি মাস্টার-ক্লাস্টার তৈরি করে \
--machine-type=n1-standard-8 \
--num-nodes=1
Create the remote cluster with a different name:
gcloud container clusters create remote-cluster \
--machine-type=n1-standard-4 \
--num-nodes=1
Resource Requirements & Architecture:
Why two clusters? The master cluster runs Ilum's control plane (UI, API, scheduler). The remote cluster executes your Spark jobs. This separation allows independent scaling and multi-cluster management from one interface.
Sizing:
- Master cluster: This example uses
n1-standard-8(8 vCPU, 30 GB RAM) for testing only. Minimum recommended: 12 vCPUs and 48 GB RAM (e.g.,n1-standard-12). Production environments with many users need significantly more. - Remote cluster: This example uses
n1-standard-4(4 vCPU, 15 GB RAM) for testing only. Production workloads require larger machines (e.g.,n1-standard-16+) and multiple nodes depending on your Spark job requirements.
Step 2. Install Ilum Control Plane on Master Cluster
Once your clusters are running, the next step is to deploy the Ilum platform on the master cluster.