--- title: Using GoAccess with Nginx to replace Google Analytics url: using-goaccess-with-nginx-to-replace-google-analytics.html date: 2021-01-25T12:00:00+02:00 type: post draft: false --- ## Introduction I know! You cannot simply replace Google Analytics with parsing access logs and displaying a couple of charts. But to be honest, I actually never used Google Analytics to the fullest extent and was usually interested in seeing page hits and which pages were visited most often. I recently moved my blog from Firebase to a VPS and also decided to remove Google Analytics tracking code from the site since its quite malicious and tracks users across other pages also and is creating a profile of a user, and I've had it. But I also need some insight of what is happening on a server and which content is being read the most etc. I have looked at many existing solutions like: - [Umami](https://umami.is/) - [Freshlytics](https://github.com/sheshbabu/freshlytics) - [Matomo](https://matomo.org/) But the more I looked at them the more I noticed that I am replacing one evil with another one. Don't get me wrong. Some of these solutions are absolutely fantastic but would require installation of databases and something like PHP or Node. And I was not ready to put those things on my fresh server. Also having Docker installed is out of the question. ## Opting for log parsing So, I defaulted to parsing already existing logs and generating HTML reports from this data. I found this amazing software [GoAccess](https://goaccess.io/) which provides all the functionalities I need, and it's a single binary. Written in Go. GoAccess can be used in two different modes. ![GoAccess Terminal](/posts/goaccess/goaccess-dash-term.png) *Running in a terminal* ![GoAccess HTML](/posts/goaccess/goaccess-dash-html.png) *Running in a browser* I, however, need this to run in a browser. So, the second option is the way to go. The Idea is to periodically run cronjob and export this report into a folder that gets then server by Nginx behind a Basic authentication. ## Getting Nginx ready I choose Ubuntu on [DigitalOcean](https://www.digitalocean.com/). First I installed [Nginx](https://nginx.org/en/), and [Letsencrypt](https://letsencrypt.org/getting-started/) certbot and all the necessary dependencies. ```sh # log in as root user sudo su - # first let's update the system apt update && apt upgrade -y # let's install apt install nginx certbot python3-certbot-nginx apache2-utils ``` After all this is installed we can create a new configuration for a statistics. Stats will be available at `stats.domain.com`. ```sh # creates directory where html will be hosted mkdir -p /var/www/html/stats.domain.com cp /etc/nginx/sites-available/default /etc/nginx/sites-available/stats.domain.com nano /etc/nginx/sites-available/stats.domain.com ``` ```nginx server { root /var/www/html/stats.domain.com; server_name stats.domain.com; index index.html; location / { try_files $uri $uri/ =404; } } ``` Now we check if the configuration is ok. We can do this with `nginx -t`. If all is ok, we can restart Nginx with `service nginx restart`. After all that you should add A record for this domain that points to IP of a droplet. Before enabling SSL you should test if DNS records have propagated with `curl stats.domain.com`. Now, it's time to provision TLS certificate. To achieve this, you execute command `certbot --nginx`. Follow the wizard and when you are asked about redirection always choose 2 (always redirect to HTTPS). When this is done you can visit https://stats.domain.com and you should get 404 not found error which is correct. ## Getting GoAccess ready If you are using Debian like system GoAccess should be available in repository. Otherwise refer to the official website. ```sh apt install goaccess ``` To enable Geo location we also need one additiona thing. ```sh cd /var/www/html/stats.stats.com wget https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-City.mmdb ``` Now we create a shell script that will be executed every 10 minutes. ```sh nano /var/www/html/stats.domain.com/generate-stats.sh ``` Contents of this file should look like this. ```sh #!/bin/sh zcat -f /var/log/nginx/access.log* > /var/log/nginx/access-all.log goaccess \ --log-file=/var/log/nginx/access-all.log \ --log-format=COMBINED \ --exclude-ip=0.0.0.0 \ --geoip-database=/var/www/html/stats.domain.com/GeoLite2-City.mmdb \ --ignore-crawlers \ --real-os \ --output=/var/www/html/stats.domain.com/index.html rm /var/log/nginx/access-all.log ``` Because after a while nginx creates multiple files with access logs we use [`zcat`](https://linux.die.net/man/1/zcat) to extract Gziped contents and create a file that has all the access logs. After this file is used we delete it. If you want to exclude your home IP's result look at the `--exclude-ip` option in script and instead of `0.0.0.0` add your own home IP address. You can find your home IP by executing `curl ifconfig.me` from your local machine and NOT from the droplet. Test the script by executing `sh /var/www/html/stats.domain.com/generate-stats.sh` and then checking `https://stats.domain.com`. If you can see stats instead of 404 than you are set. It's time to add this script to cron with `cron -e`. ```go */10 * * * * sh /var/www/html/stats.domain.com/generate-stats.sh ``` ## Securing with Basic authentication You probably don't want stats to be publicly available, so we should create a user and a password for Basic authentication. First we create a password for a user `stats` with `htpasswd -c /etc/nginx/.htpasswd stats`. Now we update config file with `nano /etc/nginx/sites-available/stats.domain.com`. You probably noticed that the file looks a bit different from before. This is because `certbot` added additional rules for SSL. Your location portion the config file should now look like. You should add `auth_basic` and `auth_basic_user_file` lines to the file. ```nginx location / { try_files $uri $uri/ =404; auth_basic "Private Property"; auth_basic_user_file /etc/nginx/.htpasswd; } ``` Test if config is still ok with `nginx -t` and if it is you can restart Nginx with `service nginx restart`. If you now visit `https://stats.domain.com` you should be prompted for username and password. If not, try reopening your browser. That is all. You now have analytics for your server that gets refreshed every 10 minutes.