Schoolytics runs scheduled data syncs for every district. Grades from PowerSchool overnight, assignment data from Google Classroom hourly, state assessment results weekly, and so on. Hand-writing DAGs per-district × per-integration doesn't scale, and every district's timezone, schedule, and integration set is different. This is the meta-DAG that generates the fleet.
How it works
A single meta-DAG runs nightly. It queries Cloud Spanner for the current set of active customers and their integration configs, then emits per-customer DAG Python files directly to the Composer GCS bucket.
- YAML templates per integration type describe the task graph declaratively (imports, tasks, dependencies).
- A composer class merges a template with customer-specific parameters (customer id, timezone, source, schedule) via recursive format substitution, producing Python DAG source.
- Multiple DAG variants per customer: a main DAG, per-source cron DAGs for integrations that need their own schedule, and a legacy static-file path for a handful of historical customers.
Why a meta-DAG
- Config is the source of truth. Onboard a new district in Spanner, the next nightly run produces their DAGs. No code change, no deploy.
- Per-customer task overrides. A district with a custom sync can patch or add tasks via a database field without a code change.
- Pre-generation validation. Required fields and defaults are enforced at generation time, bad config fails the meta-DAG, not the downstream DAG at runtime.
- Timezone spread. Customers in the same timezone stagger start times by a few minutes to avoid thundering-herd load on downstream vendor APIs.
Hierarchical orgs
Customers can have child orgs with roll-up behavior, a parent district's warehouse is populated by queries that union IDs across all children. The generator knows about the parent-child relationships and produces a parent DAG that correctly includes child customers in aggregations.
Stack
- Cloud Composer 2 (managed Airflow 2.x), Python 3.14
- Cloud Spanner for customer + integration metadata
- BigQuery for the per-tenant data warehouse
- Cloud Secret Manager for customer credentials (via the Composer backend)
- Cloud Functions + Cloud Tasks operators for serverless integration calls
- Terraform for the Composer environment itself