Rebuilding the Homestead’s DNS with Consul, DNSMasq, and Ansible

August 29, 2018 at 10:21 pm

My friend Jason recently posted an update on his blog over at Peaks and Protocols about redoing his home network’s DNS setup. This reminded me that I really needed to do an update on my own recent DNS rebuild, which was based around Hashicorp‘s Consul, DNSMasq and Ansible running on some Raspberry Pi 3s. Overkill? Probably. But if you can’t have fun with your home network, what’s the point? On to the setup…

House Network Dioagram

Consul

Consul Logo

Consul started life as a distributed service locator and key-value store. It has grown significantly over the years and is now becoming a full-fledged service mesh. It allows for any server to register and provide one or multiple services, with simple config files or api calls. Further, Consul supports the idea of multiple locations natively and even has health checks. This means it will give you your local, healthy service endpoint.

One of the main reasons I chose Consul is because it makes itself available via DNS as the .consul domain. Want to know where your git server is? dig git.service.consul. Your documentation hosted on a webserver somewhere? dig docs.service.consul. This makes finding a service you have running somewhere trivial, and means never having to update a DNS zone file again.

Another reason, which I’m not using yet, is that it has a solid key-value store. This is great for storing configuration settings for distributed applications. There are a ton of tools that take advantage of this, and even provide dynamic reloading capabilities to the app when a key is changed in Consul.

DNSMasq

In order to take advantage of Consul’s DNS features you need a DNS server that can point to Consul for just that domain, while passing through all other traffic to a normal DNS resolver. I chose DNSMasq for this because it is simple and well understood. There were some security issues with it last year, but they have since been addressed. I may migrate to unbound in the long run, but DNSMasq is fine for my use cases.

Ansible & Putting it All Together

Ansible Logo

Ansible is the glue that makes sure I can redo this config easily should something happen to the PIs. It is a configuration management system that just works, with minimal extra craziness. I could go on for days about Ansible, and probably should write a dozen posts on it alone, but there’s so much out there already that I don’t feel the need. Bottom line is, this is the tool that sets up Consul and DNSMasq for me, and ensures that I can reset everything to a known working state in the event of configuration drift.

I used several modules to help get this project running quickly.

I ended up having to change some of the roles around to suit the raspberry pi environment, but otherwise it was fairly easy. I created my own baseline role which updates and upgrades and installs some packages, including python and its tools. This base role also creates a user account for me and Ansible itself. The first time I ran it, I had to pass parameters to login as the default Raspbian user, but after that it can run using the Ansible user instead.

- name: Update Apt and Upgrade Packages
  apt:
    update_cache: yes
    cache_valid_time: 3600
    name: "*"
    state: latest
  tags:
    - packages

- name: Install Baseline Apps
  apt:
    name: "{{ packages }}"
    state: present
  vars:
    packages:
    - python
    - python-pip
    - python3
    - python3-pip
    - virtualenv
    - python3-virtualenv
    - python-pip
    - dnsutils
  tags:
    - packages

- name: Install pi base python packages
  pip:
    name: "{{ packages }}"
    state: present
  vars:
    packages:
    - python-consul
    - hvac

- name: Create Ansible management user
  user:
    name: ansible
    comment: Ansible system user
    group: admin
    state: present

- name: Create dmurawsky user
  user:
    name: dmurawsky
    comment: Derek Murawsky
    group: admin
    state: present

For my group_vars, I created a DNS.yml file with the needed variables for consul and DNSMasq.

---
# Consul Configuration
consul_version: 1.2.2
#consul_package: consul_1.2.2_linux_arm.zip
consul_server: true
consul_agent: true
consul_ui: true

consul_server_nodes:
  - 192.168.1.2
  - 192.168.1.3

# Services #
consul_agent_services: true
consul_services_register:
# Register NTP in consul
  - name: ntp
    port: 123
    tags:
      - udp
  - name: dns
    port: 53
    tags:
      - udp

# Hashicorp Vault
vault_version: 0.10.4
vault_pkg: vault_{{ vault_version }}_linux_arm.zip
vault_pkg_sum: 384e47720cdc72317d3b8c98d58e6c8c719ff3aaeeb71b147a6f5f7a529ca21b

# DNSMasq
dnsmasq_dnsmasq_conf:
  - |
    port=53
    bind-interfaces
    server=8.8.8.8
    server=8.8.4.4

dnsmasq_dnsmasq_d_files_present:
  cache:
    - |
      domain-needed
      bogus-priv
      no-hosts
      dns-forward-max=150
      cache-size=1000
      neg-ttl=3600
      no-poll
      no-resolv
  consul:
    - |
      server=/consul/127.0.0.1#8600
  homestead-murawsky-net:
    - address=/usg.homestead.murawsky.net/192.168.1.1
    - address=/ns1.homestead.murawsky.net/192.168.1.2
    - address=/ns2.homestead.murawsky.net/192.168.1.3

# NTP
ntp_enabled: true
ntp_manage_config: true
ntp_area: 'us'
ntp_servers:
  - "0{{ ntp_area }}.pool.ntp.org iburst"
  - "1{{ ntp_area }}.pool.ntp.org iburst"
  - "2{{ ntp_area }}.pool.ntp.org iburst"
  - "3{{ ntp_area }}.pool.ntp.org iburst"
ntp_timezone: America/New_York

And finally, the simple site.yml file.

---
- name: Configure System Baselines
  hosts: all
  roles:
    - { role: baseline, tags: ['baseline']}

- name: Configure DNS hosts
  hosts: dns
  roles:
    - { role: ntp, tags: ['ntp'] }
    - { role: dnsmasq, tags: ['dnsmasq'] }
    - { role: consul, tags: ['consul'] }
    - { role: hashivault, tags: ['hashivault'] }
...

Results

DNS resolution worked perfectly out of the gate as expected, but what about Consul?

Consul Dashboard
Consul Dashboard

Brilliant! Sure, the services that I have loaded are pretty simple and don’t really benefit from a service locator, but they’re examples of what is possible. Now I can register any new service by loading the consul agent onto the server and simply adding a definition file in the appropriate folder! This should make future expansion of services much easier.

Note: I currently have two consul servers. This is bad and not highly available. I have to get one more consul server online. Debating between another pi or putting on the home server.

Future Plans

You’ll notice there’s no real security around the deployment above either. That needs to be fixed in terms of Consul ACLs, Vault, and password/key management for user accounts. There’s also a cool tool called pi-hole which is a dns level ad blocker that I want to integrate into my environment. I also plan on setting up Docker on my home server in the not too distant future to make it easier to host some fun services like Prometheus, Grafana, HomeAssistant, and some other cool tools. I’ll also have to extend the network to my barn as the office is moving out there. Lastly, I want to build a portable lab that I can take with me when doing demos or presentation at local user groups.

How to clear all Workstation DNS caches from PowerShell

September 4, 2014 at 2:32 pm

I recently found myself in need of the ability to clear the DNS cache of all the laptops in my company. I found a very powerful and simple way to do so and thought I would share.

$c = Get-ADComputer -Filter {operatingsystem -notlike "*server*" }
Invoke-Command -cn $c.name -SCRIPT { ipconfig /flushdns }

The first line queries Active Directory for all computers that are not servers. The second line simply invokes the normal windows command “ipconfig /flushdns” on all computers.

This technique could be used to run any command across all workstations. Very powerful, and dangerous. Use at your own risk!

Monitors and Caching DNS

June 20, 2013 at 4:53 pm

Had an interesting issue today. One of the production systems suddenly went dark, and we found out about it from the client. This is never a good way to start a Thursday. It turns out that the client was having DNS issues and the domain was no longer valid. Relatively simple fix, crisis averted…

But why didn’t the monitoring system pick it up?

We use Dotcom-Monitor to check each of our sites on a regular basis. The monitor actually logs in to each website to verify functionality. What in the DNS world could cause this issue in such a scenario? How about a caching nameserver? Turns out, to limit the stress on their nameserver, Dotcom Monitor set up a standard caching nameserver that keeps a record in cache until the record expires. So even though DNS was no longer working for this site, the monitor thought everything was A-OK.

What can we do to fix this issue? Not much unfortunately. Dotcom Monitor will have to implement a change in their infrastructure which will likely increase the load on their DNS servers significantly. Since that’s not likely, it looks like I’ll have to build a service into our internal monitor (Zabbix based) to check for the domain against the SOA for it.

Flush DNS Cache for a Single Domain

June 11, 2013 at 10:13 am

I was working on the site today and ran into an issue: Our caching DNS server (Windows 2008) was holding on to the old webserver’s IP. This wasn’t a problem for me locally as I used the old hosts file trick to point to the new server. However, this meant I couldn’t show other folks the site until either the cache was completely flushed or the record expired.

A little googling later, and I found this little command from ServerFault.

dnscmd dnsserver.local /NodeDelete ..Cache whatever.com [/Tree] [/f]

/tree    Specifies to delete all of the child records.

/f       Executes the command without asking for confirmation.

This allows you to clear just a small portion of the cache, as you define it. Pretty handy!