ETL Pipeline Flow

RAW APIs
->
Validate
->
Cleanse
->
Transform
->
Aggregate
->
Warehouse
156K
records/sec
98.2%
validation rate
12
workers

Processing Distribution by Region

Live Weather Data - Indonesia Cities

Source: Open-Meteo API

Recent Earthquakes - Indonesia Region

Source: USGS Earthquake API

Spark Pipeline Configuration

from pyspark import SparkSession # Apache Spark-style pipeline # Data Sources Configuration DATA_SOURCES = { "indonesia_population": { "url": "bps.go.id/api/v1", "format": "JSON", "records": 2_400_000 }, "seasia_weather": { "url": "open-meteo.com/v1", "format": "JSON", "records": 1_200_000 } } # Spark Pipeline Execution spark = SparkSession.builder \ .appName("IndonesiaBigData") \ .config("spark.sql.shuffle.partitions", 200) \ .getOrCreate()

Live Data Points - Indonesia

Recent Processing Jobs

Job ID Source Region Records Duration Status