aboutsummaryrefslogtreecommitdiff
path: root/content/posts/2021-01-25-goaccess.md
diff options
context:
space:
mode:
authorMitja Felicijan <m@mitjafelicijan.com>2023-07-08 23:25:41 +0200
committerMitja Felicijan <m@mitjafelicijan.com>2023-07-08 23:25:41 +0200
commitcd6644ea4ddc78597934ab0ef5ba50e3c3daa927 (patch)
tree03de331a8db6386dfd6fa75155bfbcea6b4feaf3 /content/posts/2021-01-25-goaccess.md
parent84ed124529ffeee1590295b8de3a8faf51848680 (diff)
downloadmitjafelicijan.com-cd6644ea4ddc78597934ab0ef5ba50e3c3daa927.tar.gz
Moved to a simpler SSG
Diffstat (limited to 'content/posts/2021-01-25-goaccess.md')
-rw-r--r--content/posts/2021-01-25-goaccess.md202
1 files changed, 0 insertions, 202 deletions
diff --git a/content/posts/2021-01-25-goaccess.md b/content/posts/2021-01-25-goaccess.md
deleted file mode 100644
index 1b6a330..0000000
--- a/content/posts/2021-01-25-goaccess.md
+++ /dev/null
@@ -1,202 +0,0 @@
1---
2title: Using GoAccess with Nginx to replace Google Analytics
3url: using-goaccess-with-nginx-to-replace-google-analytics.html
4date: 2021-01-25T12:00:00+02:00
5draft: false
6---
7
8## Introduction
9
10I know! You cannot simply replace Google Analytics with parsing access logs and
11displaying a couple of charts. But to be honest, I actually never used Google
12Analytics to the fullest extent and was usually interested in seeing page hits
13and which pages were visited most often.
14
15I recently moved my blog from Firebase to a VPS and also decided to remove
16Google Analytics tracking code from the site since its quite malicious and
17tracks users across other pages also and is creating a profile of a user, and
18I've had it. But I also need some insight of what is happening on a server and
19which content is being read the most etc.
20
21I have looked at many existing solutions like:
22
23- [Umami](https://umami.is/)
24- [Freshlytics](https://github.com/sheshbabu/freshlytics)
25- [Matomo](https://matomo.org/)
26
27But the more I looked at them the more I noticed that I am replacing one evil
28with another one. Don't get me wrong. Some of these solutions are absolutely
29fantastic but would require installation of databases and something like PHP or
30Node. And I was not ready to put those things on my fresh server. Also having
31Docker installed is out of the question.
32
33## Opting for log parsing
34
35So, I defaulted to parsing already existing logs and generating HTML reports
36from this data.
37
38I found this amazing software [GoAccess](https://goaccess.io/) which provides
39all the functionalities I need, and it's a single binary. Written in Go.
40
41GoAccess can be used in two different modes.
42
43![GoAccess Terminal](/assets/goaccess/goaccess-dash-term.png)
44<center><i>Running in a terminal</i></center>
45
46![GoAccess HTML](/assets/goaccess/goaccess-dash-html.png)
47<center><i>Running in a browser</i></center>
48
49I, however, need this to run in a browser. So, the second option is the way to
50go. The Idea is to periodically run cronjob and export this report into a folder
51that gets then server by Nginx behind a Basic authentication.
52
53## Getting Nginx ready
54
55I choose Ubuntu on [DigitalOcean](https://www.digitalocean.com/). First I
56installed [Nginx](https://nginx.org/en/), and
57[Letsencrypt](https://letsencrypt.org/getting-started/) certbot and all the
58necessary dependencies.
59
60```sh
61# log in as root user
62sudo su -
63
64# first let's update the system
65apt update && apt upgrade -y
66
67# let's install
68apt install nginx certbot python3-certbot-nginx apache2-utils
69```
70
71After all this is installed we can create a new configuration for a statistics.
72Stats will be available at `stats.domain.com`.
73
74```sh
75# creates directory where html will be hosted
76mkdir -p /var/www/html/stats.domain.com
77
78cp /etc/nginx/sites-available/default /etc/nginx/sites-available/stats.domain.com
79nano /etc/nginx/sites-available/stats.domain.com
80```
81
82```nginx
83server {
84 root /var/www/html/stats.domain.com;
85 server_name stats.domain.com;
86
87 index index.html;
88 location / {
89 try_files $uri $uri/ =404;
90 }
91}
92```
93
94Now we check if the configuration is ok. We can do this with `nginx -t`. If all
95is ok, we can restart Nginx with `service nginx restart`.
96
97After all that you should add A record for this domain that points to IP of a
98droplet.
99
100Before enabling SSL you should test if DNS records have propagated with `curl
101stats.domain.com`.
102
103Now, it's time to provision TLS certificate. To achieve this, you execute
104command `certbot --nginx`. Follow the wizard and when you are asked about
105redirection always choose 2 (always redirect to HTTPS).
106
107When this is done you can visit https://stats.domain.com and you should get 404
108not found error which is correct.
109
110## Getting GoAccess ready
111
112If you are using Debian like system GoAccess should be available in repository.
113Otherwise refer to the official website.
114
115```sh
116apt install goaccess
117```
118
119To enable Geo location we also need one additiona thing.
120
121```sh
122cd /var/www/html/stats.stats.com
123wget https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-City.mmdb
124```
125
126Now we create a shell script that will be executed every 10 minutes.
127
128```sh
129nano /var/www/html/stats.domain.com/generate-stats.sh
130```
131
132Contents of this file should look like this.
133
134```sh
135#!/bin/sh
136
137zcat -f /var/log/nginx/access.log* > /var/log/nginx/access-all.log
138
139goaccess \
140 --log-file=/var/log/nginx/access-all.log \
141 --log-format=COMBINED \
142 --exclude-ip=0.0.0.0 \
143 --geoip-database=/var/www/html/stats.domain.com/GeoLite2-City.mmdb \
144 --ignore-crawlers \
145 --real-os \
146 --output=/var/www/html/stats.domain.com/index.html
147
148rm /var/log/nginx/access-all.log
149```
150
151Because after a while nginx creates multiple files with access logs we use
152[`zcat`](https://linux.die.net/man/1/zcat) to extract Gziped contents and create
153a file that has all the access logs. After this file is used we delete it.
154
155If you want to exclude your home IP's result look at the `--exclude-ip` option
156in script and instead of `0.0.0.0` add your own home IP address. You can find
157your home IP by executing `curl ifconfig.me` from your local machine and NOT
158from the droplet.
159
160Test the script by executing `sh
161/var/www/html/stats.domain.com/generate-stats.sh` and then checking
162`https://stats.domain.com`. If you can see stats instead of 404 than you are
163set.
164
165It's time to add this script to cron with `cron -e`.
166
167```go
168*/10 * * * * sh /var/www/html/stats.domain.com/generate-stats.sh
169```
170
171## Securing with Basic authentication
172
173You probably don't want stats to be publicly available, so we should create a
174user and a password for Basic authentication.
175
176First we create a password for a user `stats` with `htpasswd -c /etc/nginx/.htpasswd stats`.
177
178Now we update config file with `nano
179/etc/nginx/sites-available/stats.domain.com`. You probably noticed that the
180file looks a bit different from before. This is because `certbot` added
181additional rules for SSL.
182
183Your location portion the config file should now look like. You should add
184`auth_basic` and `auth_basic_user_file` lines to the file.
185
186```nginx
187location / {
188 try_files $uri $uri/ =404;
189 auth_basic "Private Property";
190 auth_basic_user_file /etc/nginx/.htpasswd;
191}
192```
193
194Test if config is still ok with `nginx -t` and if it is you can restart Nginx
195with `service nginx restart`.
196
197If you now visit `https://stats.domain.com` you should be prompted for username
198and password. If not, try reopening your browser.
199
200That is all. You now have analytics for your server that gets refreshed every 10
201minutes.
202