induction-airflow

Induction to Airflow

View the Project on GitHub infra-helpers/induction-airflow

Apache Airflow Setup on Proxmox LXC Containers

Vršič Pass, Soča, Slovenia, by Miha Rekar on Unsplash

Overview

That README is part of a broader tutorial about workflow engines, and gives details on how to setup Apache Airflow on a LXC container of a Proxmox host, secured by an SSH gateway, also acting as an Nginx-based reverse proxy.

For the installation of the Proxmox host and LXC containers themselves, refer to the dedicated tutorial on GitHub, itself a full tutorial on Kubernetes (k8s). Only a summary is given here, focusing on Apache Airflow.

Table of Content (ToC)

Table of contents generated with markdown-toc

References

Installation

LXC Containers on Proxmox

Apache Airflow

Host preparation

In that section, it is assumed that we are logged on the Proxmox host as root.

The following parameters are used in the remaining of the guide, and may be adapted according to your configuration:

VM ID Private IP Host name (full) Short name
104 10.30.2.4 proxy8.example.com proxy8
200 10.30.2.200 arfl-int.example.com arfl-int

The loopback network interface

auto lo iface lo inet loopback

auto eno1 iface eno1 inet manual

auto eno2 iface eno2 inet manual

auto bond0 iface bond0 inet manual bond-slaves eno1 eno2 bond-miimon 100 bond-mode active-backup

vmbr0: Bridging. Make sure to use only MAC adresses that were assigned to you.

auto vmbr0 iface vmbr0 inet static address ${HST_IP} netmask 255.255.255.0 gateway ${HST_GTW_IP} bridge_ports bond0 bridge_stp off bridge_fd 0

auto vmbr2 iface vmbr2 inet static address 10.30.2.2 netmask 255.255.255.0 bridge-ports none bridge-stp off bridge-fd 0 post-up echo 1 > /proc/sys/net/ipv4/ip_forward post-up iptables -t nat -A POSTROUTING -s ‘10.30.2.0/24’ -o vmbr0 -j MASQUERADE post-down iptables -t nat -D POSTROUTING -s ‘10.30.2.0/24’ -o vmbr0 -j MASQUERADE

root@proxmox:~$ cat /etc/systemd/network/50-default.network

This file sets the IP configuration of the primary (public) network device.

You can also see this as “OSI Layer 3” config.

It was created by the OVH installer, please be careful with modifications.

Documentation: man systemd.network or https://www.freedesktop.org/software/systemd/man/systemd.network.html

[Match] Name=vmbr0

[Network] Description=network interface on public network, with default route DHCP=no Address=${HST_IP}/24 Gateway=${HST_GTW_IP} IPv6AcceptRA=no NTP=ntp.ovh.net DNS=127.0.0.1 DNS=8.8.8.8

[Address] Address=${HST_IPv6}

[Route] Destination=2001:0000:0000:34ff:ff:ff:ff:ff Scope=link

root@proxmox:~$ cat /etc/systemd/network/50-public-interface.link

This file configures the relation between network device and device name.

You can also see this as “OSI Layer 2” config.

It was created by the OVH installer, please be careful with modifications.

Documentation: man systemd.link or https://www.freedesktop.org/software/systemd/man/systemd.link.html

[Match] Name=vmbr0

[Link] Description=network interface on public network, with default route MACAddressPolicy=persistent NamePolicy=kernel database onboard slot path mac #Name=eth0 # name under which this interface is known under OVH rescue system #Name=eno1 # name under which this interface is probably known by systemd


* The maximal virtual memory needs to be increased on the host:
```bash
$ sysctl -w vm.max_map_count=262144
$ cat >> /etc/sysctl.conf << _EOF

###########################
# Elasticsearch in VM
vm.max_map_count = 262144

_EOF

Get the latest CentOS templates

Kernel modules

Overlay module

root@proxmox:~$ modprobe overlay && \
  cat > /etc/modules-load.d/docker-overlay.conf << _EOF
overlay
_EOF

nf_conntrack

SSH gateway and reverse proxy

ip addr add ${HST_GTW_IP}/5 dev eth0 ip link set eth0 up ip route add default via ${GTW_IP} dev eth0

ip addr add 10.30.2.4/24 dev eth1 ip link set eth1 up

_EOF [root@proxy8]# chmod 755 ~/bin/netup.sh [root@proxy8]# ~/bin/netup.sh [root@proxy8]# dnf -y upgrade [root@proxy8]# dnf -y install epel-release [root@proxy8]# dnf -y install NetworkManager-tui [root@proxy8]# systemctl start NetworkManager.service \ && systemctl status NetworkManager.service \ && systemctl enable NetworkManager.service [root@proxy8]# nmcli con # to check the name of the connection [root@proxy8]# nmcli con up “System eth0” [root@proxy8]# exit


* Complement the installation on the SSH gateway/reverse proxy container.
  For security reason, it may be a good idea to change the SSH port
  from `22` to, say `7022`:
```bash
root@proxmox:~$ pct enter 104
[root@proxy8]# dnf -y install hostname rpmconf dnf-utils wget curl net-tools tar
[root@proxy8]# hostnamectl set-hostname proxy8.example.com
[root@proxy8]# dnf -y install htop less screen bzip2 dos2unix man man-pages
[root@proxy8]# dnf -y install sudo whois ftp rsync vim git-all patch mutt
[root@proxy8]# dnf -y install java-11-openjdk-headless
[root@proxy8]# dnf -y install nginx python3-pip
[root@proxy8]# pip-3 install certbot-nginx
[root@proxy8]# rpmconf -a
[root@proxy8]# ln -sf /usr/share/zoneinfo/Europe/Paris /etc/localtime
[root@proxy8]# setenforce 0
[root@proxy8]# dnf -y install openssh-server
root@proxy8# sed -i -e 's/#Port 22/Port 7022/g' /etc/ssh/sshd_config
[root@proxy8]# systemctl start sshd.service \
	&& systemctl status sshd.service \
	&& systemctl enable sshd.service
[root@proxy8]# mkdir ~/.ssh && chmod 700 ~/.ssh
[root@proxy8]# cat > ~/.ssh/authorized_keys << _EOF
ssh-rsa AAAA<Add-Your-own-SSH-public-key>BLAgU first.last@example.com
_EOF
[root@proxy8]# chmod 600 ~/.ssh/authorized_keys
[root@proxy8]# passwd -d root
[root@proxy8]# rpm --import http://wiki.psychotic.ninja/RPM-GPG-KEY-psychotic
[root@proxy8]# rpm -ivh http://packages.psychotic.ninja/7/base/x86_64/RPMS/keychain-2.8.0-3.el7.psychotic.noarch.rpm

ES cluster

10.30.2.200 arfl-int.example.com arfl-int

_EOF


* A few handy aliases:
```bash
root@proxy8:~# cat >> ~/.bashrc << _EOF

# Source aliases
if [ -f ~/.bash_aliases ]
then
        . ~/.bash_aliases
fi

_EOF
root@proxy8:~$ cat ~/.bash_aliases << _EOF
# User specific aliases and functions
alias dir='ls -laFh --color'
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

_EOF
root@proxy8:~# . ~/.bashrc
root@proxy8:~# exit

_EOF [root@proxy8]# htpasswd -c /etc/nginx/.airflow-user New password: Re-type new password: Adding password for user > [root@proxy8]# /usr/local/bin/certbot --nginx [root@proxy8]# nginx -t [root@proxy8]# nginx -s reload [root@proxy8]# exit


# Airflow node
* Create the LXC container:
```bash
root@proxmox:~$ pct create 200 local:vztmpl/centos-8-default_20191016_amd64.tar.xz --arch amd64 --cores 2 --hostname arfl-int.example.com --memory 16134 --swap 32268 --net0 name=eth0,bridge=vmbr2,gw=10.30.2.2,ip=10.30.2.200/24,type=veth --onboot 1 --ostype centos
root@proxmox:~$ pct resize 200 rootfs 50G
root@proxmox:~$ ls -laFh /var/lib/vz/images/200/vm-200-disk-0.raw
-rw-r----- 1 root root 50G Dec 19 22:27 /var/lib/vz/images/200/vm-200-disk-0.raw
root@proxmox:~$ cat /etc/pve/lxc/200.conf
arch: amd64
cores: 2
hostname: arfl-int.example.com
memory: 16134
net0: name=eth0,bridge=vmbr2,gw=10.30.2.2,hwaddr=1A:EC:7F:9E:90:34,ip=10.30.2.200/24,type=veth
onboot: 1
ostype: centos
rootfs: local:200/vm-200-disk-0.raw,size=50G
swap: 32268

ip addr add 10.30.2.200/24 dev eth0 ip link set eth0 up ip route add default via 10.30.2.2 dev eth0

_EOF [root@arfl-int]# chmod 755 ~/bin/netup.sh [root@arfl-int]# ~/bin/netup.sh # may not be needed [root@arfl-int]# dnf -y upgrade [root@arfl-int]# dnf -y install epel-release [root@arfl-int]# dnf -y install NetworkManager-tui [root@arfl-int]# systemctl start NetworkManager.service \ && systemctl status NetworkManager.service \ && systemctl enable NetworkManager.service [root@arfl-int]# nmcli con # to check the name of the connection [root@arfl-int]# nmcli con up “System eth0” [root@arfl-int]# exit


* Complement the installation:
```bash
root@proxmox:~$ pct enter 200
[root@arfl-int]# dnf -y install hostname rpmconf dnf-utils wget curl net-tools tar
[root@arfl-int]# hostnamectl set-hostname arfl-int.example.com
[root@arfl-int]# dnf -y install htop less screen bzip2 dos2unix man man-pages
[root@arfl-int]# dnf -y install sudo ftp rsync vim git-all patch mutt
[root@arfl-int]# dnf -y install autoconf libtool make gcc gcc-c++ m4
[root@arfl-int]# dnf -y install python2-devel python3-devel
[root@arfl-int]# dnf -y install postgresql
[root@arfl-int]# dnf -y install python3-pip
[root@arfl-int]# rpmconf -a
[root@arfl-int]# ln -sf /usr/share/zoneinfo/Europe/Paris /etc/localtime
[root@arfl-int]# setenforce 0
[root@arfl-int]# dnf -y install openssh-server
[root@arfl-int]# systemctl start sshd.service \
	&& systemctl status sshd.service \
	&& systemctl enable sshd.service
[root@arfl-int]# mkdir ~/.ssh && chmod 700 ~/.ssh
[root@arfl-int]# cat > ~/.ssh/authorized_keys << _EOF
ssh-rsa AAAA<Add-Your-own-SSH-public-key>BLAgU first.last@example.com
_EOF
[root@arfl-int]# chmod 600 ~/.ssh/authorized_keys
[root@arfl-int]# passwd -d root
[root@arfl-int]# rpm --import http://wiki.psychotic.ninja/RPM-GPG-KEY-psychotic
[root@arfl-int]# rpm -ivh http://packages.psychotic.ninja/7/base/x86_64/RPMS/keychain-2.8.0-3.el7.psychotic.noarch.rpm
[root@arfl-int]# cat > ~/.screenrc << _EOF
hardstatus alwayslastline "%{.kW}%-w%{.B}%n %t%{-}%{=b kw}%?%+w%? %=%c %d/%m/%Y" #B&W & date&time
startup_message off
defscrollback 1024
_EOF
[root@arfl-int]# exit

PostgreSQL database server

Docker

SystemD services for Airflow