Scheduling Apache Spark Jobs
Scheduling in Ilum allows you to automate the execution of Apache Spark jobs on কুবেরনেটস clusters at specified intervals using CRON expressions. This is essential for setting up reliable ETL pipelines, regular data analysis, or maintenance tasks that need to run without manual intervention.
আপনি এই থেকে স্পার্ক উদাহরণ সঙ্গে জার ফাইল ব্যবহার করতে পারেন যোগসূত্র .
Step-by-Step Guide: Scheduling a Spark Job
-
Navigate to Schedules: Access the সময়সূচী section in your Ilum dashboard.
-
Create New Schedule: Click the New Schedule + button to start setting up your automated job.
-
Fill Out Schedule Details:
-
সাধারণ ট্যাব:
- নাম: প্রবেশ
ScheduledMiniReadWriteTest - Cluster: Select your target cluster
- শ্রেণী: প্রবেশ
org.apache.spark.examples.MiniReadWriteTest - Language: Select
স্কালা
- নাম: প্রবেশ
-
টাইমিং ট্যাব:
- CRON Expression: Select the
কাস্টমট্যাব। - Custom expression:প্রবেশ
@daily
TimingThis configuration will trigger the job to run once every day at midnight. You can adjust this to any valid CRON expression (e.g.,
0 */12 * * *for every 12 hours). - CRON Expression: Select the
-
Configuration Tab:
- Arguments:প্রবেশ
/opt/spark/examples/src/main/resources/kv1.txt
- Arguments:প্রবেশ
-
রিসোর্স ট্যাব:
- Jars: Upload the JAR file from the link above.
-
Memory Tab:
- Leave all settings at their default values for this example.
-
-
Submit and Monitor:
- টিপুন জমা to create the schedule.
- You can see your new schedule in the list.
- When the scheduled time arrives, a new job instance will be launched automatically. You can view these instances in the কাজ section.
Schedule Configuration Reference
Below is a detailed breakdown of all available settings, organized by tab as they appear in the UI.
- সাধারণ
- Timing
- কনফিগারেশন
- সংস্থান
- স্মৃতি
| Parameter | বর্ণনা |
|---|---|
নাম | A unique identifier for the schedule. |
ক্লাস্টার | The target cluster where the scheduled jobs will be executed. |
শ্রেণী | The fully qualified class name of the application (e.g., org.apache.spark.examples.SparkPi) or the filename for Python scripts. |
ভাষা | The programming language used for the job (স্কালা বা পাইথন ). |
বর্ণনা | An optional description to explain the purpose of this schedule. |
Max Retries | The maximum number of times Ilum will attempt to restart the job if it fails. |
| Parameter | বর্ণনা |
|---|---|
Start Time | (Optional) The specific date and time when the schedule should become active. If left blank, it starts immediately. |
End Time | (Optional) The specific date and time when the schedule should stop triggering new jobs. |
CRON Expression | Defines the frequency of the job execution. You can use the visual builders (Minutes, Hourly, Daily, Weekly, Monthly) or select কাস্টম to enter a standard Unix-style CRON expression (e.g., 0 12 * * * for noon daily). |
| Parameter | বর্ণনা |
|---|---|
প্যারামিটার | Key-value pairs for configuring Spark properties (e.g., spark.executor.memory). |
Arguments | Command-line arguments passed to the main method of your application. |
Tags | Custom labels to categorize and filter your scheduled jobs. |
| Parameter | বর্ণনা |
|---|---|
জার | Additional JAR files to be included in the classpath. |
Files | Auxiliary files to be placed in the working directory of each executor. |
PyFiles | Python dependencies (.zip, .egg, .py ) for Python jobs. |
প্রয়োজনীয়তা | Additional Python packages to install. |
Spark Packages | Maven coordinates for Spark JAR packages to be downloaded. |
| Parameter | বর্ণনা |
|---|---|
Executors | The number of executor instances to allocate. |
Driver Cores | The number of CPU cores assigned to the driver. |
Executor Cores | The number of CPU cores assigned to each executor. |
Driver Memory | The amount of RAM allocated to the driver. |
Executor Memory | The amount of RAM allocated to each executor. |
Dynamic Allocation | Enables automatic scaling of executors based on workload. |
প্রায়শই জিজ্ঞাসিত প্রশ্নাবলী
Details
Can I schedule PySpark jobs using Ilum?
Yes, Ilum fully supports scheduling for both Scala/Java (JARs) and Python (PySpark) jobs. Simply select "Python" as the language in the General tab and provide your script.Details
How does the retry mechanism work?
If a scheduled job fails, Ilum can automatically attempt to restart it based on the "Max Retries" configuration. This ensures transient issues don't break your pipelines.Details
What CRON formats are supported?
Ilum supports standard Unix-style CRON expressions (e.g.,0 12 * * *) as well as predefined macros like @daily, @hourly, etc.