Zeppelin in Ilum

Overview

Zeppelin is an interactive, web-based notebook platform for data exploration, visualization, and analytics on big data platforms such as Apache Spark.
In Ilum, Zeppelin is tightly integrated with Ilum core services, including the Spark cluster and Ilum-Livy-Proxy. It supports collaborative, multi-language analytics with strong visualization capabilities, making it ideal for ad-hoc analysis, dashboards, and team data workflows.

Note:

Zeppelin is optional in Ilum. It can be enabled and managed as a separate module.

Zeppelin provides a different experience from JupyterLab—see the comparison tables in Notebooks Overview.

Currently, Zeppelin in Ilum does NOT provide any authentication or user access control. Anyone who can access the Zeppelin web interface has full access to all notebooks and features.

Key Features

Multi-language Analytics:
Use interpreters to run code in Python, Scala, SQL, Bash, and more—all in a single document.
First-class Spark Support:
Dedicated Spark interpreters (via Livy) allow running Spark jobs directly from notebook cells, supporting both %livy.spark (Scala) and %livy.pyspark (Python).
Built-in Visualizations:
Instantly generate bar charts, line plots, pie charts, tables, and more from SQL/Spark results—no additional coding required.
Team Collaboration:
Notebooks can be shared among users, and visualizations can be combined into dashboards for presentation.
Dynamic, Block-by-Block Execution:
Execute cells incrementally and visualize results in real time.
Integration with Ilum Services:
Access to Ilum's Spark clusters, storage, lineage, and history server via the Ilum-Livy-Proxy.

Zeppelin in Ilum vs. JupyterLab/JupyterHub

Aspect	Zeppelin	JupyterLab / JupyterHub
User Model	Shared notebooks (no isolation)	Multi-user (JupyterHub), single-user (JupyterLab)
Authentication	No authentication	LDAP/SSO via Ilum
Workspace Isolation	Shared or per-notebook	Per-user (JupyterHub), shared/single (JupyterLab)
Spark Integration	Built-in Livy Interpreters	Sparkmagic magics & Livy Proxy
Version Control	Manual, export	Git (Gitea integration)
Visualization	Built-in charts/dashboards	Widgets, matplotlib, plotly, etc.
Best For	Dashboards, ad-hoc analytics, interactive data exploration	Data science pipelines, ML, reproducible workflows

Access & Deployment

Enable Zeppelin in Ilum:
Zeppelin is not enabled by default. You can enable it via Helm:

helm upgrade \
  --set ilum-zeppelin.enabled=true \
  --reuse-values \
  ilum ilum/ilum

Access Zeppelin UI: After deployment, access Zeppelin via Modules > Zeppelin
Authentication: Currently, Zeppelin in Ilum does NOT provide any authentication or access control. Anyone who can reach the Zeppelin web UI (via browser) will have full access to create, edit, run, and delete all notebooks.

How Zeppelin Works in Ilum

Interpreter Architecture: Zeppelin uses interpreters for each language or system (e.g., %livy.spark, %livy.pyspark, %livy.sql). Each interpreter connects via the Ilum-Livy-Proxy to Spark clusters, mapping notebook blocks to Spark jobs and code services.
Session Management: For each notebook, separate Spark sessions are created for %livy.spark (Scala), %livy.pyspark (Python), and %livy.sql (SQL). Sessions are managed automatically but can be configured via interpreter settings.
Integration with Ilum Services: Spark jobs launched from Zeppelin are visible in the Ilum UI (Workloads). These sessions inherit all cluster integrations—Hive Metastore, lineage, storage access, and monitoring.

Example Workflows

Examples and hands-on workflows for Zeppelin (including running Spark, SQL, visualizations, dashboards, and session lifecycle management) are described in a dedicated guide:

See Zeppelin Usage Examples

Best Practices

Interpreter Selection: Always use Livy-based interpreters (%livy.spark, %livy.pyspark, %livy.sql) for Spark jobs in Ilum.
Data Visualization: Leverage built-in Zeppelin charts for immediate insight; export as images or dashboards as needed.
Resource Awareness: Sessions consume Spark resources; close notebooks or stop sessions when not needed.
Versioning: Use notebook export for backup or manual versioning, or integrate with external Git if required.
Collaboration: Remember: there is no access control. Treat all Zeppelin notebooks as visible/editable by anyone who can access the service.

Troubleshooting

Cannot Access Zeppelin:
- Check if the module is enabled and properly deployed.
- Make sure the Zeppelin service is reachable (check port-forward or ingress).
Spark Session Issues:
- If jobs don't start, ensure Livy Proxy is enabled and accessible.
- Review interpreter settings or logs in Zeppelin UI.
Timeouts:
- Adjust session timeouts in interpreter config for long-running jobs.
Visualization Issues:
- Try switching chart types or exporting results for offline analysis.

Overview​

Key Features​

Zeppelin in Ilum vs. JupyterLab/JupyterHub​

Access & Deployment​

How Zeppelin Works in Ilum​

Example Workflows​

Best Practices​

Troubleshooting​

More Resources​