Jose Aguilar

Schoolytics runs scheduled data syncs for every district. Grades from PowerSchool overnight, assignment data from Google Classroom hourly, state assessment results weekly, and so on. Hand-writing DAGs per-district × per-integration doesn't scale, and every district's timezone, schedule, and integration set is different. This is the meta-DAG that generates the fleet.

How it works

A single meta-DAG runs nightly. It queries Cloud Spanner for the current set of active customers and their integration configs, then emits per-customer DAG Python files directly to the Composer GCS bucket.

YAML templates per integration type describe the task graph declaratively (imports, tasks, dependencies).
A composer class merges a template with customer-specific parameters (customer id, timezone, source, schedule) via recursive format substitution, producing Python DAG source.
Multiple DAG variants per customer: a main DAG, per-source cron DAGs for integrations that need their own schedule, and a legacy static-file path for a handful of historical customers.

Why a meta-DAG

Config is the source of truth. Onboard a new district in Spanner, the next nightly run produces their DAGs. No code change, no deploy.
Per-customer task overrides. A district with a custom sync can patch or add tasks via a database field without a code change.
Pre-generation validation. Required fields and defaults are enforced at generation time, bad config fails the meta-DAG, not the downstream DAG at runtime.
Timezone spread. Customers in the same timezone stagger start times by a few minutes to avoid thundering-herd load on downstream vendor APIs.

Hierarchical orgs

Customers can have child orgs with roll-up behavior, a parent district's warehouse is populated by queries that union IDs across all children. The generator knows about the parent-child relationships and produces a parent DAG that correctly includes child customers in aggregations.

Stack

Cloud Composer 2 (managed Airflow 2.x), Python 3.14
Cloud Spanner for customer + integration metadata
BigQuery for the per-tenant data warehouse
Cloud Secret Manager for customer credentials (via the Composer backend)
Cloud Functions + Cloud Tasks operators for serverless integration calls
Terraform for the Composer environment itself