aboutsummaryrefslogtreecommitdiff
path: root/content/2021-01-25-goaccess.md
diff options
context:
space:
mode:
Diffstat (limited to 'content/2021-01-25-goaccess.md')
-rw-r--r--content/2021-01-25-goaccess.md203
1 files changed, 203 insertions, 0 deletions
diff --git a/content/2021-01-25-goaccess.md b/content/2021-01-25-goaccess.md
new file mode 100644
index 0000000..0eb2461
--- /dev/null
+++ b/content/2021-01-25-goaccess.md
@@ -0,0 +1,203 @@
1---
2title: Using GoAccess with Nginx to replace Google Analytics
3url: using-goaccess-with-nginx-to-replace-google-analytics.html
4date: 2021-01-25T12:00:00+02:00
5type: post
6draft: false
7---
8
9## Introduction
10
11I know! You cannot simply replace Google Analytics with parsing access logs and
12displaying a couple of charts. But to be honest, I actually never used Google
13Analytics to the fullest extent and was usually interested in seeing page hits
14and which pages were visited most often.
15
16I recently moved my blog from Firebase to a VPS and also decided to remove
17Google Analytics tracking code from the site since its quite malicious and
18tracks users across other pages also and is creating a profile of a user, and
19I've had it. But I also need some insight of what is happening on a server and
20which content is being read the most etc.
21
22I have looked at many existing solutions like:
23
24- [Umami](https://umami.is/)
25- [Freshlytics](https://github.com/sheshbabu/freshlytics)
26- [Matomo](https://matomo.org/)
27
28But the more I looked at them the more I noticed that I am replacing one evil
29with another one. Don't get me wrong. Some of these solutions are absolutely
30fantastic but would require installation of databases and something like PHP or
31Node. And I was not ready to put those things on my fresh server. Also having
32Docker installed is out of the question.
33
34## Opting for log parsing
35
36So, I defaulted to parsing already existing logs and generating HTML reports
37from this data.
38
39I found this amazing software [GoAccess](https://goaccess.io/) which provides
40all the functionalities I need, and it's a single binary. Written in Go.
41
42GoAccess can be used in two different modes.
43
44![GoAccess Terminal](/assets/goaccess/goaccess-dash-term.png)
45<center><i>Running in a terminal</i></center>
46
47![GoAccess HTML](/assets/goaccess/goaccess-dash-html.png)
48<center><i>Running in a browser</i></center>
49
50I, however, need this to run in a browser. So, the second option is the way to
51go. The Idea is to periodically run cronjob and export this report into a folder
52that gets then server by Nginx behind a Basic authentication.
53
54## Getting Nginx ready
55
56I choose Ubuntu on [DigitalOcean](https://www.digitalocean.com/). First I
57installed [Nginx](https://nginx.org/en/), and
58[Letsencrypt](https://letsencrypt.org/getting-started/) certbot and all the
59necessary dependencies.
60
61```sh
62# log in as root user
63sudo su -
64
65# first let's update the system
66apt update && apt upgrade -y
67
68# let's install
69apt install nginx certbot python3-certbot-nginx apache2-utils
70```
71
72After all this is installed we can create a new configuration for a statistics.
73Stats will be available at `stats.domain.com`.
74
75```sh
76# creates directory where html will be hosted
77mkdir -p /var/www/html/stats.domain.com
78
79cp /etc/nginx/sites-available/default /etc/nginx/sites-available/stats.domain.com
80nano /etc/nginx/sites-available/stats.domain.com
81```
82
83```nginx
84server {
85 root /var/www/html/stats.domain.com;
86 server_name stats.domain.com;
87
88 index index.html;
89 location / {
90 try_files $uri $uri/ =404;
91 }
92}
93```
94
95Now we check if the configuration is ok. We can do this with `nginx -t`. If all
96is ok, we can restart Nginx with `service nginx restart`.
97
98After all that you should add A record for this domain that points to IP of a
99droplet.
100
101Before enabling SSL you should test if DNS records have propagated with `curl
102stats.domain.com`.
103
104Now, it's time to provision TLS certificate. To achieve this, you execute
105command `certbot --nginx`. Follow the wizard and when you are asked about
106redirection always choose 2 (always redirect to HTTPS).
107
108When this is done you can visit https://stats.domain.com and you should get 404
109not found error which is correct.
110
111## Getting GoAccess ready
112
113If you are using Debian like system GoAccess should be available in repository.
114Otherwise refer to the official website.
115
116```sh
117apt install goaccess
118```
119
120To enable Geo location we also need one additiona thing.
121
122```sh
123cd /var/www/html/stats.stats.com
124wget https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-City.mmdb
125```
126
127Now we create a shell script that will be executed every 10 minutes.
128
129```sh
130nano /var/www/html/stats.domain.com/generate-stats.sh
131```
132
133Contents of this file should look like this.
134
135```sh
136#!/bin/sh
137
138zcat -f /var/log/nginx/access.log* > /var/log/nginx/access-all.log
139
140goaccess \
141 --log-file=/var/log/nginx/access-all.log \
142 --log-format=COMBINED \
143 --exclude-ip=0.0.0.0 \
144 --geoip-database=/var/www/html/stats.domain.com/GeoLite2-City.mmdb \
145 --ignore-crawlers \
146 --real-os \
147 --output=/var/www/html/stats.domain.com/index.html
148
149rm /var/log/nginx/access-all.log
150```
151
152Because after a while nginx creates multiple files with access logs we use
153[`zcat`](https://linux.die.net/man/1/zcat) to extract Gziped contents and create
154a file that has all the access logs. After this file is used we delete it.
155
156If you want to exclude your home IP's result look at the `--exclude-ip` option
157in script and instead of `0.0.0.0` add your own home IP address. You can find
158your home IP by executing `curl ifconfig.me` from your local machine and NOT
159from the droplet.
160
161Test the script by executing `sh
162/var/www/html/stats.domain.com/generate-stats.sh` and then checking
163`https://stats.domain.com`. If you can see stats instead of 404 than you are
164set.
165
166It's time to add this script to cron with `cron -e`.
167
168```go
169*/10 * * * * sh /var/www/html/stats.domain.com/generate-stats.sh
170```
171
172## Securing with Basic authentication
173
174You probably don't want stats to be publicly available, so we should create a
175user and a password for Basic authentication.
176
177First we create a password for a user `stats` with `htpasswd -c /etc/nginx/.htpasswd stats`.
178
179Now we update config file with `nano
180/etc/nginx/sites-available/stats.domain.com`. You probably noticed that the
181file looks a bit different from before. This is because `certbot` added
182additional rules for SSL.
183
184Your location portion the config file should now look like. You should add
185`auth_basic` and `auth_basic_user_file` lines to the file.
186
187```nginx
188location / {
189 try_files $uri $uri/ =404;
190 auth_basic "Private Property";
191 auth_basic_user_file /etc/nginx/.htpasswd;
192}
193```
194
195Test if config is still ok with `nginx -t` and if it is you can restart Nginx
196with `service nginx restart`.
197
198If you now visit `https://stats.domain.com` you should be prompted for username
199and password. If not, try reopening your browser.
200
201That is all. You now have analytics for your server that gets refreshed every 10
202minutes.
203