মূল বিষয়বস্তুতে যান

Run Spark Jobs via REST API

সংক্ষিপ্ত বিবরণ

Ilum provides a robust REST API that allows you to manage, submit, and execute Apache Spark jobs programmatically. This capability is essential for organizations running কুবেরনেটে স্ফুলিঙ্গ who need to automate their data workflows.

Using the API is particularly effective for:

  • CI/CD Integration: Seamlessly trigger Spark jobs from GitLab CI, Jenkins, GitHub Actions, or Airflow.
  • Custom Orchestration: Build your own data platforms or internal tools on top of Ilum.
  • অটোমেশন : Replace manual স্পার্ক-সাবমিট CLI commands with reliable, code-driven API calls.

REST API vs. Spark CLI

বৈশিষ্ট্য REST API Spark CLI (স্পার্ক-সাবমিট )
Primary Use CaseAutomation, CI/CD, Web AppsAd-hoc testing, Local development
Client Requirementকার্ল or HTTP ClientSpark Binaries & Java installed
Feedback LoopJSON Response (Job ID)Console Logs (Streamed)
Firewall FriendlyYes (Single HTTP Port)No (Requires Random Ports)

In this guide, you will learn how to:

  1. Submit a Spark job using the মাল্টিপার্ট/ফর্ম-ডাটা endpoint.
  2. Monitor the job's status via the API.

পূর্বশর্ত

To follow this example, you will need the কার্ল command-line tool and a sample Spark JAR file.

Accessing the API

The Ilum Core API is exposed by default on port 9888 . Depending on your environment, you can access it using one of the following methods:

API Base URL

In the examples below, replace http://localhost:9888 with your actual Ilum Core address.

1. Port Forwarding (Development)

If you are running the API on your local machine using a Kubernetes cluster (like Minikube or MicroK8s), you can use Kubectl পোর্ট-ফরোয়ার্ড to access it locally:

Port Forward
- কুবেক্টল পোর্ট-ফরোয়ার্ড এসভিসি / আইএলইউএম-কোর 9888: 9888 

The API will then be available at http://localhost:9888/api/v1 .

2. NodePort

If your Ilum installation is configured with a নোডপোর্ট service type, you can access it via any Kubernetes node IP:

Get Nodes & Services
# Get the node IP
Kubectl Get nodes -o ওয়াইড

# Get the assigned NodePort
kubectl get svc ilum-core

Access the API at http://<NODE_IP>:<NODE_PORT>/api/v1.

3. Ingress (Production)

For production environments, use an Ingress controller to expose the API. This allows you to use a custom domain and SSL/TLS encryption.

Example Ingress Path
-  পথ :  /api/v1/(.*)
pathType: ImplementationSpecific
backend:
সেবা :
নাম : ইলুম - কোর
পোর্ট :
number: 9888

Access the API at https://your-domain.com/api/v1.

Which Method Should I Use?

Methodএর জন্য সেরা Requirement
Port ForwardingLocal development, one-off testsকুবেক্টল access to the cluster
নোডপোর্ট Internal lab environments, simple setupsAccess to Kubernetes Node IPs
IngressProduction, Team collaboration, CI/CDIngress Controller (Nginx, Traefik, etc.)

Submit Apache Spark Jobs Programmatically

To submit a new Spark application, use the POST /api/v1/job/submit endpoint. This endpoint accepts মাল্টিপার্ট/ফর্ম-ডাটা requests, allowing you to upload your application JAR or Python script along with the job configuration. This method is the programmatic equivalent of স্পার্ক-সাবমিট .

Example: Submitting MiniReadWriteTest

The following কার্ল command submits the মিনিরিডরাইটটেস্ট example job (from the downloaded JAR). This job writes a file and then reads it back to verify the setup.

Submit Job
curl -X POST "http://localhost:9888/api/v1/job/submit" \
-F "name=MiniReadWriteTest" \
-F "clusterName=default" \
-F "language=SCALA" \
-F "jobClass=org.apache.spark.examples.MiniReadWriteTest" \
-F "jobConfig=spark.executor.instances=2" \
-F "args=/opt/spark/examples/src/main/resources/kv1.txt" \
-F "jars=@spark-examples_2.12-3.5.7.jar"

Parameter Reference

Parameterটাইপ বর্ণনা Requiredযেমন
নাম স্ট্রিং A unique identifier for your job.হ্যাঁ মিনিরিডরাইটটেস্ট
clusterName স্ট্রিং The name of the Kubernetes cluster registered in Ilum.হ্যাঁ ডিফল্ট
ভাষা স্ট্রিং The programming language of the job (SCALAবা PYTHON). হ্যাঁ SCALA
জব ক্লাস স্ট্রিং স্কালা : The fully qualified main class name. পাইথন : The script filename (without extension).হ্যাঁ org.apache.spark.examples.MiniReadWriteTest
জবকনফিগ স্ট্রিং Semicolon-separated List of Spark configuration properties in key=value format.না spark.executor.instances=2
আর্গস স্ট্রিং Semicolon-separated list of arguments to pass to the job's main method.না /path/to/input.txt
জার ফাইল The application JAR file. Use the @ prefix in curl to upload the file.Yes (for Scala)@app.jar
pyFiles ফাইল The main Python script or ZIP package.Yes (for Python)@job.py
Full API Specification

For a complete list of all available parameters and their detailed descriptions, refer to the Ilum API ডকুমেন্টেশন .

Monitor Spark Job Status

Upon successful submission, the API returns a JSON response containing the জব আইডি . You can use this ID to poll for the job's completion status, making it easy to build wait-logic into your automation scripts.

{ 
"জবআইডি" : "20251222-0931-f56pqk5y1ap"
}

You can use this জব আইডি to check the current status of your job:

Get Job Status
curl "http://localhost:9888/api/v1/job/{jobId}"

The response provides a comprehensive overview of the job's configuration, state, and execution timing.

{ 
"জবআইডি" : "20251222-0931-f56pqk5y1ap",
"কাজের নাম" : "MiniReadWriteTest",
"কাজের ধরণ" : "একক" ,
"ভাষা" : "স্কালা" ,
"appId" : "spark-92b3da7ee0fa4d1e965b521ba356544c",
"রাষ্ট্র" : "সমাপ্ত" ,
"জমা দেওয়ার সময়" : 1766395898079,
"স্টার্টটাইম" : 1766395899941,
"এন্ডটাইম" : 1766395905785,
"জব কনফিগ" : {
"spark.executor.instances": "2" ,
"spark.kubernetes.namespace": "ডিফল্ট" ,
"spark.eventLog.enabled": "সত্যি" ,
"..." : "..."
}
}

Key fields to monitor include:

  • অবস্থা : The current lifecycle phase (e.g., SUBMITTED, RUNNING, FINISHED, FAILED).
  • appId: The Spark Application ID assigned by the cluster manager.
  • startTime/ endTime: Epoch timestamps (ms) for performance tracking.
  • error: If the state is FAILED, this field will contain the error message or stack trace.

Troubleshooting Common Issues

If you encounter issues while submitting jobs, refer to the table below for common error codes and solutions.

HTTP CodeErrorPossible Cause & Solution
400Bad RequestMissing Parameters: Ensure জব ক্লাস , clusterName এবং জার (for Scala) are provided correctly in the form data.
401UnauthorizedAuth Failure: Check if your cluster requires an API Token or Basic Auth header.
404 Not FoundInvalid Cluster: The clusterName specified does not exist. Verify active clusters via GET /api/v1/cluster.
500 Internal Server ErrorCluster Connection: Ilum cannot talk to the K8s API server. Check the ইলুম-কোর logs for connectivity issues.

Frequently Asked Questions (FAQ)

Can I upload Python dependencies?

Yes. For PySpark jobs, use the pyFiles parameter to upload your .py script or a .zip archive containing your Python modules.

How do I secure the API?

We recommend placing the Ilum API behind an Ingress Controller with Basic Auth or OAuth2 enabled. You can then pass the credentials via standard HTTP headers.

What is the maximum JAR size?

The default limit is usually 100MB (configured in your Ingress or Spring Boot settings). For larger JARs, we recommend uploading them to S3/HDFS first and referencing them via spark.jars config, rather than uploading directly.