ETL Pipeline Flow
RAW APIs
->
Validate
->
Cleanse
->
Transform
->
Aggregate
->
Warehouse
Processing Distribution by Region
Live Weather Data - Indonesia Cities
Source: Open-Meteo API
Recent Earthquakes - Indonesia Region
Source: USGS Earthquake API
Spark Pipeline Configuration
from pyspark import SparkSession
# Apache Spark-style pipeline
# Data Sources Configuration
DATA_SOURCES = {
"indonesia_population": {
"url": "bps.go.id/api/v1",
"format": "JSON",
"records": 2_400_000
},
"seasia_weather": {
"url": "open-meteo.com/v1",
"format": "JSON",
"records": 1_200_000
}
}
# Spark Pipeline Execution
spark = SparkSession.builder \
.appName("IndonesiaBigData") \
.config("spark.sql.shuffle.partitions", 200) \
.getOrCreate()
Live Data Points - Indonesia
Recent Processing Jobs
| Job ID | Source | Region | Records | Duration | Status |
|---|