Rebuilding the Homestead’s DNS with Consul, DNSMasq, and Ansible

August 29, 2018 at 10:21 pm

My friend Jason recently posted an update on his blog over at Peaks and Protocols about redoing his home network’s DNS setup. This reminded me that I really needed to do an update on my own recent DNS rebuild, which was based around Hashicorp‘s Consul, DNSMasq and Ansible running on some Raspberry Pi 3s. Overkill? Probably. But if you can’t have fun with your home network, what’s the point? On to the setup…

House Network Dioagram

Consul

Consul Logo

Consul started life as a distributed service locator and key-value store. It has grown significantly over the years and is now becoming a full-fledged service mesh. It allows for any server to register and provide one or multiple services, with simple config files or api calls. Further, Consul supports the idea of multiple locations natively and even has health checks. This means it will give you your local, healthy service endpoint.

One of the main reasons I chose Consul is because it makes itself available via DNS as the .consul domain. Want to know where your git server is? dig git.service.consul. Your documentation hosted on a webserver somewhere? dig docs.service.consul. This makes finding a service you have running somewhere trivial, and means never having to update a DNS zone file again.

Another reason, which I’m not using yet, is that it has a solid key-value store. This is great for storing configuration settings for distributed applications. There are a ton of tools that take advantage of this, and even provide dynamic reloading capabilities to the app when a key is changed in Consul.

DNSMasq

In order to take advantage of Consul’s DNS features you need a DNS server that can point to Consul for just that domain, while passing through all other traffic to a normal DNS resolver. I chose DNSMasq for this because it is simple and well understood. There were some security issues with it last year, but they have since been addressed. I may migrate to unbound in the long run, but DNSMasq is fine for my use cases.

Ansible & Putting it All Together

Ansible Logo

Ansible is the glue that makes sure I can redo this config easily should something happen to the PIs. It is a configuration management system that just works, with minimal extra craziness. I could go on for days about Ansible, and probably should write a dozen posts on it alone, but there’s so much out there already that I don’t feel the need. Bottom line is, this is the tool that sets up Consul and DNSMasq for me, and ensures that I can reset everything to a known working state in the event of configuration drift.

I used several modules to help get this project running quickly.

I ended up having to change some of the roles around to suit the raspberry pi environment, but otherwise it was fairly easy. I created my own baseline role which updates and upgrades and installs some packages, including python and its tools. This base role also creates a user account for me and Ansible itself. The first time I ran it, I had to pass parameters to login as the default Raspbian user, but after that it can run using the Ansible user instead.

- name: Update Apt and Upgrade Packages
  apt:
    update_cache: yes
    cache_valid_time: 3600
    name: "*"
    state: latest
  tags:
    - packages

- name: Install Baseline Apps
  apt:
    name: "{{ packages }}"
    state: present
  vars:
    packages:
    - python
    - python-pip
    - python3
    - python3-pip
    - virtualenv
    - python3-virtualenv
    - python-pip
    - dnsutils
  tags:
    - packages

- name: Install pi base python packages
  pip:
    name: "{{ packages }}"
    state: present
  vars:
    packages:
    - python-consul
    - hvac

- name: Create Ansible management user
  user:
    name: ansible
    comment: Ansible system user
    group: admin
    state: present

- name: Create dmurawsky user
  user:
    name: dmurawsky
    comment: Derek Murawsky
    group: admin
    state: present

For my group_vars, I created a DNS.yml file with the needed variables for consul and DNSMasq.

---
# Consul Configuration
consul_version: 1.2.2
#consul_package: consul_1.2.2_linux_arm.zip
consul_server: true
consul_agent: true
consul_ui: true

consul_server_nodes:
  - 192.168.1.2
  - 192.168.1.3

# Services #
consul_agent_services: true
consul_services_register:
# Register NTP in consul
  - name: ntp
    port: 123
    tags:
      - udp
  - name: dns
    port: 53
    tags:
      - udp

# Hashicorp Vault
vault_version: 0.10.4
vault_pkg: vault_{{ vault_version }}_linux_arm.zip
vault_pkg_sum: 384e47720cdc72317d3b8c98d58e6c8c719ff3aaeeb71b147a6f5f7a529ca21b

# DNSMasq
dnsmasq_dnsmasq_conf:
  - |
    port=53
    bind-interfaces
    server=8.8.8.8
    server=8.8.4.4

dnsmasq_dnsmasq_d_files_present:
  cache:
    - |
      domain-needed
      bogus-priv
      no-hosts
      dns-forward-max=150
      cache-size=1000
      neg-ttl=3600
      no-poll
      no-resolv
  consul:
    - |
      server=/consul/127.0.0.1#8600
  homestead-murawsky-net:
    - address=/usg.homestead.murawsky.net/192.168.1.1
    - address=/ns1.homestead.murawsky.net/192.168.1.2
    - address=/ns2.homestead.murawsky.net/192.168.1.3

# NTP
ntp_enabled: true
ntp_manage_config: true
ntp_area: 'us'
ntp_servers:
  - "0{{ ntp_area }}.pool.ntp.org iburst"
  - "1{{ ntp_area }}.pool.ntp.org iburst"
  - "2{{ ntp_area }}.pool.ntp.org iburst"
  - "3{{ ntp_area }}.pool.ntp.org iburst"
ntp_timezone: America/New_York

And finally, the simple site.yml file.

---
- name: Configure System Baselines
  hosts: all
  roles:
    - { role: baseline, tags: ['baseline']}

- name: Configure DNS hosts
  hosts: dns
  roles:
    - { role: ntp, tags: ['ntp'] }
    - { role: dnsmasq, tags: ['dnsmasq'] }
    - { role: consul, tags: ['consul'] }
    - { role: hashivault, tags: ['hashivault'] }
...

Results

DNS resolution worked perfectly out of the gate as expected, but what about Consul?

Consul Dashboard
Consul Dashboard

Brilliant! Sure, the services that I have loaded are pretty simple and don’t really benefit from a service locator, but they’re examples of what is possible. Now I can register any new service by loading the consul agent onto the server and simply adding a definition file in the appropriate folder! This should make future expansion of services much easier.

Note: I currently have two consul servers. This is bad and not highly available. I have to get one more consul server online. Debating between another pi or putting on the home server.

Future Plans

You’ll notice there’s no real security around the deployment above either. That needs to be fixed in terms of Consul ACLs, Vault, and password/key management for user accounts. There’s also a cool tool called pi-hole which is a dns level ad blocker that I want to integrate into my environment. I also plan on setting up Docker on my home server in the not too distant future to make it easier to host some fun services like Prometheus, Grafana, HomeAssistant, and some other cool tools. I’ll also have to extend the network to my barn as the office is moving out there. Lastly, I want to build a portable lab that I can take with me when doing demos or presentation at local user groups.

Homestead Network Upgrades

October 22, 2017 at 1:05 pm

Despite coming from the networking side of IT, I tend to use regular consumer grade equipment at home. It typically just works, and I’m not looking for extreme reliability or features. I’ve been using hardware from Linksys, Netgear, and the other consumer network vendors for at least the last 10 years. Sometimes, though, things happen that make you reevaluate your previous life choices…

For me, that thing was an email that I received from Verizon saying my router was infected with malware. Since I always take basic precautions like changing the default password and locking down external ports, I was a bit surprised. Turns out, there was a vulnerability in the firmware that had gone unpatched for months… In hindsight, I should not have been that surprised. At all. I thought I had purchased a flagship router that would be supported for at least a few years, but it didn’t look like any more patches were coming. Ever. I looked into trusty old DD-WRT figuring that I could flash the router and at least get another year out of it, but apparently the R7000 has some performance issues with DD-WRT.

After having issues like this a few times with generic consumer grade stuff over the years, no matter the vendor, I decided enough was enough. I researched available options in the enterprise hardware space (way too expensive and time consuming to set up), looked at open source alternatives (cheap, but time consuming, and not well integrated), and even looked at the more pro-level offerings from consumer manufacturers (underwhelming). After a few days, I decided on and purchased some Ubiquiti hardware based on the many good reviews and a few personal recommendations from networking folks I respect.

Ubiquiti’s hardware is solid stuff, performance wise, and they have a very good reputation. The hardware is what I would call “Enterprise Lite”, meaning it’s not Cisco, but its perfect for small to medium businesses who just want things to work. Additionally, the Unifi configuration system and dashboard is excellent, taking a significant configuration and support burden off of me.

The initial hardware purchase was:

  • Unifi Secuirty Gateway Pro (Amazon)- I definitely went overkill here. The entry model USG is capable of routing gigabit at near wirespeed. However, I decided that I likes the extra ports for a few future projects, like the barn office.
  • Unifi Switch 8, 60 Watt (Amazon)- Since the new network was not an all-in-one setup, I needed something to power the other devices around the house. This managed switch provided a lot more than just that, though. The VLANs will come in handy when we set up the home office.
  • Unifi AP AC Pro (Amazon)- Another bit of overkill for home use, but this one was easier to justify than the firewall. Simply put, it has more power, and I need that given the 2′ thick stone walls in the farmhouse.
  • Unifi Cloud Key (Amazon)- Though not strictly necessary, the Cloud Key allows you to run your network controller app on dedicated hardware. It can also be linked to the Unifi cloud portal allowing for a very convenient and secure hybrid cloud management platform.

Simple Network DiagramThe hardware wasn’t cheap, but surprisingly, it wasn’t much more than I paid for the R7000 two years ago. If I had chosen the regular USG, the price difference would have been negligible.

As for the setup, it was easier than I thought. I racked the USG Pro, plugged in the switch, then the cloud key. Thankfully I had already run the line to the wireless AP so that was easy. I also threw in a Raspberry Pi server for fun. It took about 10 minutes to patch everything together. But what about the configuration?

Well, thanks to the Unifi software on the Cloud Key, I was able to “adopt” the other devices and have them configured in no time at all. My basic single vlan setup was ready to go out of the box. All totaled, I had the network up and running in 20 minutes. Time vs the R7000? Maybe an extra 10 minutes.

Unifi Dashboard

What has it been like living with “Enterprise Lite” hardware at home? Fantastic. Having a useful dashboard that I can glance at to see the status of the home network is a perk I didn’t think I would care about, but I’ve used it several times already. The speed is true gigabit on wired, the wireless coverage is solid, and we don’t have random drops in connectivity anymore. And as for patches… I’ve already had two patches come through for stack. It’s a simple matter of hitting the upgrade button for the device, or setting up auto-upgrade. As far as I’m concerned, I’m never going back to consumer gear again.

Monitors and Caching DNS

June 20, 2013 at 4:53 pm

Had an interesting issue today. One of the production systems suddenly went dark, and we found out about it from the client. This is never a good way to start a Thursday. It turns out that the client was having DNS issues and the domain was no longer valid. Relatively simple fix, crisis averted…

But why didn’t the monitoring system pick it up?

We use Dotcom-Monitor to check each of our sites on a regular basis. The monitor actually logs in to each website to verify functionality. What in the DNS world could cause this issue in such a scenario? How about a caching nameserver? Turns out, to limit the stress on their nameserver, Dotcom Monitor set up a standard caching nameserver that keeps a record in cache until the record expires. So even though DNS was no longer working for this site, the monitor thought everything was A-OK.

What can we do to fix this issue? Not much unfortunately. Dotcom Monitor will have to implement a change in their infrastructure which will likely increase the load on their DNS servers significantly. Since that’s not likely, it looks like I’ll have to build a service into our internal monitor (Zabbix based) to check for the domain against the SOA for it.