KEMBAR78
Airflow Setup Guide for Beginners | PDF | Directory (Computing) | Computer Science
0% found this document useful (0 votes)
65 views2 pages

Airflow Setup Guide for Beginners

The document outlines the steps to install and configure Apache Airflow including: 1) Installing Airflow using pip and creating the database 2) Creating an admin user 3) Setting the dags folder location and disabling example dags 4) Starting the Airflow webserver and scheduler processes

Uploaded by

HEMANTH REDDY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views2 pages

Airflow Setup Guide for Beginners

The document outlines the steps to install and configure Apache Airflow including: 1) Installing Airflow using pip and creating the database 2) Creating an admin user 3) Setting the dags folder location and disabling example dags 4) Starting the Airflow webserver and scheduler processes

Uploaded by

HEMANTH REDDY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021 (02-02) $

export AIRFLOW_HOME="/workspaces/hands-on-introduction-data-engineering-
4395021/airflow" && pip install "apache-airflow==2.5.1" --constraint
"https://raw.githubusercontent.com/apache/airflow/constraints-2.5.1/constraints-3.7.txt"

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021 (02-02) $


airflow

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021 (02-02) $


airflow db init  creates a database

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021 (02-02) $


airflow users create \
--username admin \
--firstname Firstname \
--lastname Lastname \
--role Admin \
--email admin@example.org \
--password password

airflow users create \


--username admin \
--firstname Firstname \
--lastname Lastname \
--role Admin \
--email admin@example.org \
--password password

 creates user

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021 (02-02) $


cd airflow/
@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ ls

Go to webserver_config.py and set WTF_CSRF_ENABLED = False

The Airflow webserver is a web-based UI that's commonly used in production to provide an


overview of all DAGs and their execution flow. It offers several ways to manage
administrative settings, connections, variables, and other components of Airflow through an
easy-to-use web interface.

The Airflow scheduler is a process that continually monitors all tasks and DAGs in Airflow. It
starts subprocesses that keep track of the heartbeat of all DAGs and checks whether any
active tasks can be triggered. Although it's possible to run the webserver without the
scheduler, it's also not recommended. Now let's switch back to Codespaces and see how to
run both the Airflow webserver and the scheduler.

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ airflow webserver -D

Port gets generated (


(cd ..)

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ airflow dags list

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ airflow scheduler -D

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ cat $AIRFLOW_HOME/airflow-webserver.pid | xargs kill
@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ echo "" > $AIRFLOW_HOME/airflow-webserver.pid
@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ cat $AIRFLOW_HOME/airflow-scheduler.pid | xargs kill
@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ echo "" > $AIRFLOW_HOME/airflow-scheduler.pid

Upon installation, Airflow will create an airflow.cfg file that lives in the Airflow installation
directory. To see where that directory is, you can run echo AIRFLOW_HOME.

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(main) $ echo $AIRFLOW_HOME

I start by checking if any of the environment variables have been set for Airflow. In this case,
it looks like I only have AIRFLOW_HOME set, so I should be good to go.

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(main) $ env | grep -i airflow

# The folder where your airflow pipelines live, most likely a


# subfolder in a code repository. This path must be absolute.
dags_folder = /workspaces/hands-on-introduction-data-engineering-4395021/airflow/dags

We can unload example DAGS by load_examples = False in airflow.cfg file.

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(main) $ cat airflow.cfg | grep "dags_folder"

You might also like