Configure Cloud Object Storage (GCS, S3, Azure) for Data Lake
Ilum allows you to link জিসিএস , এস 3 , ওয়াসবস এবং এইচডিএফএস storages to your clusters. Linking storage allows Ilum to automatically configure all your jobs to use your cloud data lakes seamlessly, eliminating the need for manual Spark parameter configuration.
Supported Storage Providers
| Provider | টাইপ | বর্ণনা |
|---|---|---|
| গুগল ক্লাউড স্টোরেজ | জিসিএস | Native integration for GCP projects. |
| অ্যামাজন এস৩ | এস 3 | Standard S3 and S3-compatible storage support. |
| Azure Blob Storage | WASBS/ABFS | Integration for Azure data lakes. |
| এইচডিএফএস | এইচডিএফএস | Connect to existing Hadoop Distributed File Systems. |
- গুগল ক্লাউড স্টোরেজ (জিসিএস)
- অ্যামাজন এস৩
- Azure Blob Storage
গুগল ক্লাউড স্টোরেজ (জিসিএস)
Step 1: Create a GCS Bucket
ডেমো:
-
Create a Google Cloud Project
- Open Google Cloud Consoleএবং যান প্রকল্প নির্বাচক / Manage Resources.
- টিপুন New Project/ Create Project.
- Enter a Project name, choose Organizationএবং অবস্থান .
-
Create a GCS Bucket
- In the Console, navigate to ক্লাউড স্টোরেজ → Buckets.
- টিপুন তৈরি .
- Enter a globally unique Bucket name (e.g.,
my-ilum-bucket) and select your Region.
নোটRemember the bucket name you created - you will need it when adding this storage to Ilum.
-
Create a Service Account and JSON Key
- যেতে IAM & Admin → Service Accounts.
- টিপুন Create Service Account, fill in details, and grant Storage Admin roles.
- Click the created email, go to the Keys tab, and Create new key (JSON).
- Save the downloaded JSON file securely.
importantOrganization Policy Update: In new organizations, creating service account keys might be disabled by default. Contact your administrator if you cannot create keys.
Step 2: Add GCS to Ilum Cluster
ডেমো:
-
Navigate to কাজের চাপ → ক্লাস্টার → সম্পাদনা → সঞ্চয় → Add Storage.
-
Configure General Settings:
| Parameter | Value Example | বর্ণনা |
|---|---|---|
| নাম | my-gcs-storage | Unique name for this storage config. |
| টাইপ | জিসিএস | Select GCS provider. |
| স্পার্ক বালতি | my-ilum-bucket | Bucket for Spark logs/events. |
| ডাটা বাকেট | my-ilum-bucket | Bucket for your data. |
- Configure GCS Authorization: Open your JSON key file and copy the values:
| Parameter | Source Key | বর্ণনা |
|---|---|---|
| Client Email | client_email | Service account email address. |
| Private Key | private_key | Full key including -----BEGIN.... |
| Private Key ID | private_key_id | Key ID string. |
- টিপুন জমা to save.
অ্যামাজন এস৩
The process for adding S3 storage is nearly identical to GCS. You will need to provide your AWS credentials (Access Key and Secret Key) instead of a JSON key file.
- Navigate to কাজের চাপ → ক্লাস্টার → সম্পাদনা → সঞ্চয় → Add Storage.
- Select এস 3 as the টাইপ .
- Fill in the required fields:
| Parameter | বর্ণনা |
|---|---|
| নাম | Unique name for this storage config. |
| Access Key | Your AWS Access Key ID. |
| Secret Key | Your AWS Secret Access Key. |
| Region | AWS Region of your bucket (e.g., ইউএস-ইস্ট-১ ). |
| Endpoint | (Optional) Custom endpoint for S3-compatible storage (e.g., MinIO). |
Azure Blob Storage
The process for adding Azure storage is nearly identical to GCS and S3. You will need your Azure Storage Account Name and Access Key.
- Navigate to কাজের চাপ → ক্লাস্টার → সম্পাদনা → সঞ্চয় → Add Storage.
- Select Azure (or WASBS) as the টাইপ .
- Fill in the required fields:
| Parameter | বর্ণনা |
|---|---|
| নাম | Unique name for this storage config. |
| Account Name | Your Azure Storage Account name. |
| Account Key | Your Azure Storage Account Access Key. |
| Container | Name of the container to use. |
Step 3: Verify Connection
To ensure your storage is correctly configured, run a simple Spark job.
-
Create a Code Service:
- যেতে কাজের চাপ → সেবা → New Service +.
- Select টাইপ :
কোড, ভাষা :স্কালা, and your ক্লাস্টার .
-
Execute Test Code: Paste and run the following Scala code:
Test Storage Connection// Write test data
valডাটা = Seq( ( "Alice", 34) , ( "Bob", 45) )
valডিএফ = স্ফুলিঙ্গ . createDataFrame ( ডাটা ) . toDF( "নাম" , "age")
// Replace with your bucket path (e.g., gs://..., s3a://..., wasbs://...)
valপথ = "gs://my-ilum-bucket/output/"
ডিএফ . লিখন . পরিমণ্ডল ( "ওভাররাইট" ) . format( "csv") . save( পথ )
// Read back data
স্ফুলিঙ্গ . রিড . format( "csv") . load( পথ ) . দেখান ( ) -
Check Results: If the job completes and displays the data table, your storage connection is active.
Common Issues & FAQ
Why do I get a "Permission Denied" error?
কারণ: The Service Account or User doesn't have permissions to access the bucket. সমাধান:
- Go to your cloud provider's console (e.g., Google Cloud Console).
- Navigate to the bucket's অনুমতি ট্যাব।
- Grant your service account the Storage Adminবা Storage Object Admin role.
Why does it say "Bucket does not exist"?
কারণ: The bucket name in your code doesn't match the actual bucket name, or the region is incorrect. সমাধান:
- Verify the bucket exists in your cloud console.
- Check that the bucket name in your code matches exactly (names are often case-sensitive).
Why do I get "Invalid credentials"?
কারণ: The keys (JSON or Access Keys) were not copied correctly. সমাধান:
- Re-open your key file.
- Carefully copy the values again. For GCS, ensure you include the
-----প্রাইভেট কী শুরু করুন-----এবং-----এন্ড প্রাইভেট কী-----lines. - Re-save the storage configuration in Ilum.