What Airflow actually does
Airflow is a pipeline orchestration tool. It does not process data itself — that is what Spark, Pandas, or SQL queries do. What Airflow does is manage when those processing tasks run, in what order, what happens if one fails, and how to retry failed tasks. Think of it as the conductor of an orchestra: it does not play the instruments, but it coordinates everything so the output is coherent.
In practice, a data engineer writes a DAG (Directed Acyclic Graph) — a Python file that describes tasks and their dependencies. Airflow reads these DAGs and executes them on schedule. You might have a DAG that runs every morning: first extract data from three source databases, then load it to a staging area, then run transformations, then send a Slack notification when the pipeline is done.
Airflow alternatives you may see in job descriptions
Prefect and Dagster are newer alternatives with better developer experience and native Python testing support. Prefect in particular has grown significantly in adoption among startups. AWS has its own managed service called MWAA (Managed Workflows for Apache Airflow) and also offers AWS Step Functions as an alternative. For cloud-native teams, Google Cloud has Cloud Composer (managed Airflow) and Workflows. Understanding Airflow gives you a conceptual foundation that makes all of these easier to learn.