From 1100562e29f6476448b656dbddd4cf22505523f6 Mon Sep 17 00:00:00 2001 From: Mitja Felicijan Date: Sun, 10 Mar 2024 14:59:14 +0100 Subject: Move back to JBMAFP --- content/posts/2021-01-25-goaccess.md | 204 +++++++++++++++++++++++++++++++++++ 1 file changed, 204 insertions(+) create mode 100644 content/posts/2021-01-25-goaccess.md (limited to 'content/posts/2021-01-25-goaccess.md') diff --git a/content/posts/2021-01-25-goaccess.md b/content/posts/2021-01-25-goaccess.md new file mode 100644 index 0000000..0f6f1a6 --- /dev/null +++ b/content/posts/2021-01-25-goaccess.md @@ -0,0 +1,204 @@ +--- +title: Using GoAccess with Nginx to replace Google Analytics +url: /using-goaccess-with-nginx-to-replace-google-analytics.html +date: 2021-01-25T12:00:00+02:00 +type: post +draft: false +--- + +## Introduction + +I know! You cannot simply replace Google Analytics with parsing access logs and +displaying a couple of charts. But to be honest, I actually never used Google +Analytics to the fullest extent and was usually interested in seeing page hits +and which pages were visited most often. + +I recently moved my blog from Firebase to a VPS and also decided to remove +Google Analytics tracking code from the site since its quite malicious and +tracks users across other pages also and is creating a profile of a user, and +I've had it. But I also need some insight of what is happening on a server and +which content is being read the most etc. + +I have looked at many existing solutions like: + +- [Umami](https://umami.is/) +- [Freshlytics](https://github.com/sheshbabu/freshlytics) +- [Matomo](https://matomo.org/) + +But the more I looked at them the more I noticed that I am replacing one evil +with another one. Don't get me wrong. Some of these solutions are absolutely +fantastic but would require installation of databases and something like PHP or +Node. And I was not ready to put those things on my fresh server. Also having +Docker installed is out of the question. + +## Opting for log parsing + +So, I defaulted to parsing already existing logs and generating HTML reports +from this data. + +I found this amazing software [GoAccess](https://goaccess.io/) which provides +all the functionalities I need, and it's a single binary. Written in Go. + +GoAccess can be used in two different modes. + +![GoAccess Terminal](/assets/posts/goaccess/goaccess-dash-term.png) + +*Running in a terminal* + +![GoAccess HTML](/assets/posts/goaccess/goaccess-dash-html.png) + +*Running in a browser* + +I, however, need this to run in a browser. So, the second option is the way to +go. The Idea is to periodically run cronjob and export this report into a folder +that gets then server by Nginx behind a Basic authentication. + +## Getting Nginx ready + +I choose Ubuntu on [DigitalOcean](https://www.digitalocean.com/). First I +installed [Nginx](https://nginx.org/en/), and +[Letsencrypt](https://letsencrypt.org/getting-started/) certbot and all the +necessary dependencies. + +```sh +# log in as root user +sudo su - + +# first let's update the system +apt update && apt upgrade -y + +# let's install +apt install nginx certbot python3-certbot-nginx apache2-utils +``` + +After all this is installed we can create a new configuration for a statistics. +Stats will be available at `stats.domain.com`. + +```sh +# creates directory where html will be hosted +mkdir -p /var/www/html/stats.domain.com + +cp /etc/nginx/sites-available/default /etc/nginx/sites-available/stats.domain.com +nano /etc/nginx/sites-available/stats.domain.com +``` + +```nginx +server { + root /var/www/html/stats.domain.com; + server_name stats.domain.com; + + index index.html; + location / { + try_files $uri $uri/ =404; + } +} +``` + +Now we check if the configuration is ok. We can do this with `nginx -t`. If all +is ok, we can restart Nginx with `service nginx restart`. + +After all that you should add A record for this domain that points to IP of a +droplet. + +Before enabling SSL you should test if DNS records have propagated with `curl +stats.domain.com`. + +Now, it's time to provision TLS certificate. To achieve this, you execute +command `certbot --nginx`. Follow the wizard and when you are asked about +redirection always choose 2 (always redirect to HTTPS). + +When this is done you can visit https://stats.domain.com and you should get 404 +not found error which is correct. + +## Getting GoAccess ready + +If you are using Debian like system GoAccess should be available in repository. +Otherwise refer to the official website. + +```sh +apt install goaccess +``` + +To enable Geo location we also need one additiona thing. + +```sh +cd /var/www/html/stats.stats.com +wget https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-City.mmdb +``` + +Now we create a shell script that will be executed every 10 minutes. + +```sh +nano /var/www/html/stats.domain.com/generate-stats.sh +``` + +Contents of this file should look like this. + +```sh +#!/bin/sh + +zcat -f /var/log/nginx/access.log* > /var/log/nginx/access-all.log + +goaccess \ + --log-file=/var/log/nginx/access-all.log \ + --log-format=COMBINED \ + --exclude-ip=0.0.0.0 \ + --geoip-database=/var/www/html/stats.domain.com/GeoLite2-City.mmdb \ + --ignore-crawlers \ + --real-os \ + --output=/var/www/html/stats.domain.com/index.html + +rm /var/log/nginx/access-all.log +``` + +Because after a while nginx creates multiple files with access logs we use +[`zcat`](https://linux.die.net/man/1/zcat) to extract Gziped contents and create +a file that has all the access logs. After this file is used we delete it. + +If you want to exclude your home IP's result look at the `--exclude-ip` option +in script and instead of `0.0.0.0` add your own home IP address. You can find +your home IP by executing `curl ifconfig.me` from your local machine and NOT +from the droplet. + +Test the script by executing `sh +/var/www/html/stats.domain.com/generate-stats.sh` and then checking +`https://stats.domain.com`. If you can see stats instead of 404 than you are +set. + +It's time to add this script to cron with `cron -e`. + +```go +*/10 * * * * sh /var/www/html/stats.domain.com/generate-stats.sh +``` + +## Securing with Basic authentication + +You probably don't want stats to be publicly available, so we should create a +user and a password for Basic authentication. + +First we create a password for a user `stats` with `htpasswd -c /etc/nginx/.htpasswd stats`. + +Now we update config file with `nano +/etc/nginx/sites-available/stats.domain.com`. You probably noticed that the +file looks a bit different from before. This is because `certbot` added +additional rules for SSL. + +Your location portion the config file should now look like. You should add +`auth_basic` and `auth_basic_user_file` lines to the file. + +```nginx +location / { + try_files $uri $uri/ =404; + auth_basic "Private Property"; + auth_basic_user_file /etc/nginx/.htpasswd; +} +``` + +Test if config is still ok with `nginx -t` and if it is you can restart Nginx +with `service nginx restart`. + +If you now visit `https://stats.domain.com` you should be prompted for username +and password. If not, try reopening your browser. + +That is all. You now have analytics for your server that gets refreshed every 10 +minutes. -- cgit v1.2.3