1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
|
---
title: Using GoAccess with Nginx to replace Google Analytics
permalink: /using-goaccess-with-nginx-to-replace-google-analytics.html
date: 2021-01-25T12:00:00+02:00
layout: post
type: post
draft: false
---
## Introduction
I know! You cannot simply replace Google Analytics with parsing access logs and
displaying a couple of charts. But to be honest, I actually never used Google
Analytics to the fullest extent and was usually interested in seeing page hits
and which pages were visited most often.
I recently moved my blog from Firebase to a VPS and also decided to remove
Google Analytics tracking code from the site since its quite malicious and
tracks users across other pages also and is creating a profile of a user, and
I've had it. But I also need some insight of what is happening on a server and
which content is being read the most etc.
I have looked at many existing solutions like:
- [Umami](https://umami.is/)
- [Freshlytics](https://github.com/sheshbabu/freshlytics)
- [Matomo](https://matomo.org/)
But the more I looked at them the more I noticed that I am replacing one evil
with another one. Don't get me wrong. Some of these solutions are absolutely
fantastic but would require installation of databases and something like PHP or
Node. And I was not ready to put those things on my fresh server. Also having
Docker installed is out of the question.
## Opting for log parsing
So, I defaulted to parsing already existing logs and generating HTML reports
from this data.
I found this amazing software [GoAccess](https://goaccess.io/) which provides
all the functionalities I need, and it's a single binary. Written in Go.
GoAccess can be used in two different modes.
{:loading="lazy"}
*Running in a terminal*
{:loading="lazy"}
*Running in a browser*
I, however, need this to run in a browser. So, the second option is the way to
go. The Idea is to periodically run cronjob and export this report into a folder
that gets then server by Nginx behind a Basic authentication.
## Getting Nginx ready
I choose Ubuntu on [DigitalOcean](https://www.digitalocean.com/). First I
installed [Nginx](https://nginx.org/en/), and
[Letsencrypt](https://letsencrypt.org/getting-started/) certbot and all the
necessary dependencies.
```sh
# log in as root user
sudo su -
# first let's update the system
apt update && apt upgrade -y
# let's install
apt install nginx certbot python3-certbot-nginx apache2-utils
```
After all this is installed we can create a new configuration for a statistics.
Stats will be available at `stats.domain.com`.
```sh
# creates directory where html will be hosted
mkdir -p /var/www/html/stats.domain.com
cp /etc/nginx/sites-available/default /etc/nginx/sites-available/stats.domain.com
nano /etc/nginx/sites-available/stats.domain.com
```
```nginx
server {
root /var/www/html/stats.domain.com;
server_name stats.domain.com;
index index.html;
location / {
try_files $uri $uri/ =404;
}
}
```
Now we check if the configuration is ok. We can do this with `nginx -t`. If all
is ok, we can restart Nginx with `service nginx restart`.
After all that you should add A record for this domain that points to IP of a
droplet.
Before enabling SSL you should test if DNS records have propagated with `curl
stats.domain.com`.
Now, it's time to provision TLS certificate. To achieve this, you execute
command `certbot --nginx`. Follow the wizard and when you are asked about
redirection always choose 2 (always redirect to HTTPS).
When this is done you can visit https://stats.domain.com and you should get 404
not found error which is correct.
## Getting GoAccess ready
If you are using Debian like system GoAccess should be available in repository.
Otherwise refer to the official website.
```sh
apt install goaccess
```
To enable Geo location we also need one additiona thing.
```sh
cd /var/www/html/stats.stats.com
wget https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-City.mmdb
```
Now we create a shell script that will be executed every 10 minutes.
```sh
nano /var/www/html/stats.domain.com/generate-stats.sh
```
Contents of this file should look like this.
```sh
#!/bin/sh
zcat -f /var/log/nginx/access.log* > /var/log/nginx/access-all.log
goaccess \
--log-file=/var/log/nginx/access-all.log \
--log-format=COMBINED \
--exclude-ip=0.0.0.0 \
--geoip-database=/var/www/html/stats.domain.com/GeoLite2-City.mmdb \
--ignore-crawlers \
--real-os \
--output=/var/www/html/stats.domain.com/index.html
rm /var/log/nginx/access-all.log
```
Because after a while nginx creates multiple files with access logs we use
[`zcat`](https://linux.die.net/man/1/zcat) to extract Gziped contents and create
a file that has all the access logs. After this file is used we delete it.
If you want to exclude your home IP's result look at the `--exclude-ip` option
in script and instead of `0.0.0.0` add your own home IP address. You can find
your home IP by executing `curl ifconfig.me` from your local machine and NOT
from the droplet.
Test the script by executing `sh
/var/www/html/stats.domain.com/generate-stats.sh` and then checking
`https://stats.domain.com`. If you can see stats instead of 404 than you are
set.
It's time to add this script to cron with `cron -e`.
```go
*/10 * * * * sh /var/www/html/stats.domain.com/generate-stats.sh
```
## Securing with Basic authentication
You probably don't want stats to be publicly available, so we should create a
user and a password for Basic authentication.
First we create a password for a user `stats` with `htpasswd -c /etc/nginx/.htpasswd stats`.
Now we update config file with `nano
/etc/nginx/sites-available/stats.domain.com`. You probably noticed that the
file looks a bit different from before. This is because `certbot` added
additional rules for SSL.
Your location portion the config file should now look like. You should add
`auth_basic` and `auth_basic_user_file` lines to the file.
```nginx
location / {
try_files $uri $uri/ =404;
auth_basic "Private Property";
auth_basic_user_file /etc/nginx/.htpasswd;
}
```
Test if config is still ok with `nginx -t` and if it is you can restart Nginx
with `service nginx restart`.
If you now visit `https://stats.domain.com` you should be prompted for username
and password. If not, try reopening your browser.
That is all. You now have analytics for your server that gets refreshed every 10
minutes.
|