diff options
| author | Mitja Felicijan <mitja.felicijan@gmail.com> | 2024-02-23 10:35:22 +0100 |
|---|---|---|
| committer | Mitja Felicijan <mitja.felicijan@gmail.com> | 2024-02-23 10:35:22 +0100 |
| commit | 4abcce013c9ee3053badf2abda77190233066676 (patch) | |
| tree | 450de7e8fed3c3c7501a9d2e2eb60a676bdfa09e /_posts/posts/2021-01-25-goaccess.md | |
| parent | cdf50cb2e3051200c6ea0628c318d66220b7d1a1 (diff) | |
| download | mitjafelicijan.com-4abcce013c9ee3053badf2abda77190233066676.tar.gz | |
Testing thoughts page
Diffstat (limited to '_posts/posts/2021-01-25-goaccess.md')
| -rw-r--r-- | _posts/posts/2021-01-25-goaccess.md | 205 |
1 files changed, 205 insertions, 0 deletions
diff --git a/_posts/posts/2021-01-25-goaccess.md b/_posts/posts/2021-01-25-goaccess.md new file mode 100644 index 0000000..779bce5 --- /dev/null +++ b/_posts/posts/2021-01-25-goaccess.md | |||
| @@ -0,0 +1,205 @@ | |||
| 1 | --- | ||
| 2 | title: Using GoAccess with Nginx to replace Google Analytics | ||
| 3 | permalink: /using-goaccess-with-nginx-to-replace-google-analytics.html | ||
| 4 | date: 2021-01-25T12:00:00+02:00 | ||
| 5 | layout: post | ||
| 6 | type: post | ||
| 7 | draft: false | ||
| 8 | --- | ||
| 9 | |||
| 10 | ## Introduction | ||
| 11 | |||
| 12 | I know! You cannot simply replace Google Analytics with parsing access logs and | ||
| 13 | displaying a couple of charts. But to be honest, I actually never used Google | ||
| 14 | Analytics to the fullest extent and was usually interested in seeing page hits | ||
| 15 | and which pages were visited most often. | ||
| 16 | |||
| 17 | I recently moved my blog from Firebase to a VPS and also decided to remove | ||
| 18 | Google Analytics tracking code from the site since its quite malicious and | ||
| 19 | tracks users across other pages also and is creating a profile of a user, and | ||
| 20 | I've had it. But I also need some insight of what is happening on a server and | ||
| 21 | which content is being read the most etc. | ||
| 22 | |||
| 23 | I have looked at many existing solutions like: | ||
| 24 | |||
| 25 | - [Umami](https://umami.is/) | ||
| 26 | - [Freshlytics](https://github.com/sheshbabu/freshlytics) | ||
| 27 | - [Matomo](https://matomo.org/) | ||
| 28 | |||
| 29 | But the more I looked at them the more I noticed that I am replacing one evil | ||
| 30 | with another one. Don't get me wrong. Some of these solutions are absolutely | ||
| 31 | fantastic but would require installation of databases and something like PHP or | ||
| 32 | Node. And I was not ready to put those things on my fresh server. Also having | ||
| 33 | Docker installed is out of the question. | ||
| 34 | |||
| 35 | ## Opting for log parsing | ||
| 36 | |||
| 37 | So, I defaulted to parsing already existing logs and generating HTML reports | ||
| 38 | from this data. | ||
| 39 | |||
| 40 | I found this amazing software [GoAccess](https://goaccess.io/) which provides | ||
| 41 | all the functionalities I need, and it's a single binary. Written in Go. | ||
| 42 | |||
| 43 | GoAccess can be used in two different modes. | ||
| 44 | |||
| 45 | {:loading="lazy"} | ||
| 46 | |||
| 47 | *Running in a terminal* | ||
| 48 | |||
| 49 | {:loading="lazy"} | ||
| 50 | |||
| 51 | *Running in a browser* | ||
| 52 | |||
| 53 | I, however, need this to run in a browser. So, the second option is the way to | ||
| 54 | go. The Idea is to periodically run cronjob and export this report into a folder | ||
| 55 | that gets then server by Nginx behind a Basic authentication. | ||
| 56 | |||
| 57 | ## Getting Nginx ready | ||
| 58 | |||
| 59 | I choose Ubuntu on [DigitalOcean](https://www.digitalocean.com/). First I | ||
| 60 | installed [Nginx](https://nginx.org/en/), and | ||
| 61 | [Letsencrypt](https://letsencrypt.org/getting-started/) certbot and all the | ||
| 62 | necessary dependencies. | ||
| 63 | |||
| 64 | ```sh | ||
| 65 | # log in as root user | ||
| 66 | sudo su - | ||
| 67 | |||
| 68 | # first let's update the system | ||
| 69 | apt update && apt upgrade -y | ||
| 70 | |||
| 71 | # let's install | ||
| 72 | apt install nginx certbot python3-certbot-nginx apache2-utils | ||
| 73 | ``` | ||
| 74 | |||
| 75 | After all this is installed we can create a new configuration for a statistics. | ||
| 76 | Stats will be available at `stats.domain.com`. | ||
| 77 | |||
| 78 | ```sh | ||
| 79 | # creates directory where html will be hosted | ||
| 80 | mkdir -p /var/www/html/stats.domain.com | ||
| 81 | |||
| 82 | cp /etc/nginx/sites-available/default /etc/nginx/sites-available/stats.domain.com | ||
| 83 | nano /etc/nginx/sites-available/stats.domain.com | ||
| 84 | ``` | ||
| 85 | |||
| 86 | ```nginx | ||
| 87 | server { | ||
| 88 | root /var/www/html/stats.domain.com; | ||
| 89 | server_name stats.domain.com; | ||
| 90 | |||
| 91 | index index.html; | ||
| 92 | location / { | ||
| 93 | try_files $uri $uri/ =404; | ||
| 94 | } | ||
| 95 | } | ||
| 96 | ``` | ||
| 97 | |||
| 98 | Now we check if the configuration is ok. We can do this with `nginx -t`. If all | ||
| 99 | is ok, we can restart Nginx with `service nginx restart`. | ||
| 100 | |||
| 101 | After all that you should add A record for this domain that points to IP of a | ||
| 102 | droplet. | ||
| 103 | |||
| 104 | Before enabling SSL you should test if DNS records have propagated with `curl | ||
| 105 | stats.domain.com`. | ||
| 106 | |||
| 107 | Now, it's time to provision TLS certificate. To achieve this, you execute | ||
| 108 | command `certbot --nginx`. Follow the wizard and when you are asked about | ||
| 109 | redirection always choose 2 (always redirect to HTTPS). | ||
| 110 | |||
| 111 | When this is done you can visit https://stats.domain.com and you should get 404 | ||
| 112 | not found error which is correct. | ||
| 113 | |||
| 114 | ## Getting GoAccess ready | ||
| 115 | |||
| 116 | If you are using Debian like system GoAccess should be available in repository. | ||
| 117 | Otherwise refer to the official website. | ||
| 118 | |||
| 119 | ```sh | ||
| 120 | apt install goaccess | ||
| 121 | ``` | ||
| 122 | |||
| 123 | To enable Geo location we also need one additiona thing. | ||
| 124 | |||
| 125 | ```sh | ||
| 126 | cd /var/www/html/stats.stats.com | ||
| 127 | wget https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-City.mmdb | ||
| 128 | ``` | ||
| 129 | |||
| 130 | Now we create a shell script that will be executed every 10 minutes. | ||
| 131 | |||
| 132 | ```sh | ||
| 133 | nano /var/www/html/stats.domain.com/generate-stats.sh | ||
| 134 | ``` | ||
| 135 | |||
| 136 | Contents of this file should look like this. | ||
| 137 | |||
| 138 | ```sh | ||
| 139 | #!/bin/sh | ||
| 140 | |||
| 141 | zcat -f /var/log/nginx/access.log* > /var/log/nginx/access-all.log | ||
| 142 | |||
| 143 | goaccess \ | ||
| 144 | --log-file=/var/log/nginx/access-all.log \ | ||
| 145 | --log-format=COMBINED \ | ||
| 146 | --exclude-ip=0.0.0.0 \ | ||
| 147 | --geoip-database=/var/www/html/stats.domain.com/GeoLite2-City.mmdb \ | ||
| 148 | --ignore-crawlers \ | ||
| 149 | --real-os \ | ||
| 150 | --output=/var/www/html/stats.domain.com/index.html | ||
| 151 | |||
| 152 | rm /var/log/nginx/access-all.log | ||
| 153 | ``` | ||
| 154 | |||
| 155 | Because after a while nginx creates multiple files with access logs we use | ||
| 156 | [`zcat`](https://linux.die.net/man/1/zcat) to extract Gziped contents and create | ||
| 157 | a file that has all the access logs. After this file is used we delete it. | ||
| 158 | |||
| 159 | If you want to exclude your home IP's result look at the `--exclude-ip` option | ||
| 160 | in script and instead of `0.0.0.0` add your own home IP address. You can find | ||
| 161 | your home IP by executing `curl ifconfig.me` from your local machine and NOT | ||
| 162 | from the droplet. | ||
| 163 | |||
| 164 | Test the script by executing `sh | ||
| 165 | /var/www/html/stats.domain.com/generate-stats.sh` and then checking | ||
| 166 | `https://stats.domain.com`. If you can see stats instead of 404 than you are | ||
| 167 | set. | ||
| 168 | |||
| 169 | It's time to add this script to cron with `cron -e`. | ||
| 170 | |||
| 171 | ```go | ||
| 172 | */10 * * * * sh /var/www/html/stats.domain.com/generate-stats.sh | ||
| 173 | ``` | ||
| 174 | |||
| 175 | ## Securing with Basic authentication | ||
| 176 | |||
| 177 | You probably don't want stats to be publicly available, so we should create a | ||
| 178 | user and a password for Basic authentication. | ||
| 179 | |||
| 180 | First we create a password for a user `stats` with `htpasswd -c /etc/nginx/.htpasswd stats`. | ||
| 181 | |||
| 182 | Now we update config file with `nano | ||
| 183 | /etc/nginx/sites-available/stats.domain.com`. You probably noticed that the | ||
| 184 | file looks a bit different from before. This is because `certbot` added | ||
| 185 | additional rules for SSL. | ||
| 186 | |||
| 187 | Your location portion the config file should now look like. You should add | ||
| 188 | `auth_basic` and `auth_basic_user_file` lines to the file. | ||
| 189 | |||
| 190 | ```nginx | ||
| 191 | location / { | ||
| 192 | try_files $uri $uri/ =404; | ||
| 193 | auth_basic "Private Property"; | ||
| 194 | auth_basic_user_file /etc/nginx/.htpasswd; | ||
| 195 | } | ||
| 196 | ``` | ||
| 197 | |||
| 198 | Test if config is still ok with `nginx -t` and if it is you can restart Nginx | ||
| 199 | with `service nginx restart`. | ||
| 200 | |||
| 201 | If you now visit `https://stats.domain.com` you should be prompted for username | ||
| 202 | and password. If not, try reopening your browser. | ||
| 203 | |||
| 204 | That is all. You now have analytics for your server that gets refreshed every 10 | ||
| 205 | minutes. | ||
