aboutsummaryrefslogtreecommitdiff
path: root/_posts/posts/2021-01-25-goaccess.md
diff options
context:
space:
mode:
Diffstat (limited to '_posts/posts/2021-01-25-goaccess.md')
-rw-r--r--_posts/posts/2021-01-25-goaccess.md205
1 files changed, 205 insertions, 0 deletions
diff --git a/_posts/posts/2021-01-25-goaccess.md b/_posts/posts/2021-01-25-goaccess.md
new file mode 100644
index 0000000..779bce5
--- /dev/null
+++ b/_posts/posts/2021-01-25-goaccess.md
@@ -0,0 +1,205 @@
1---
2title: Using GoAccess with Nginx to replace Google Analytics
3permalink: /using-goaccess-with-nginx-to-replace-google-analytics.html
4date: 2021-01-25T12:00:00+02:00
5layout: post
6type: post
7draft: false
8---
9
10## Introduction
11
12I know! You cannot simply replace Google Analytics with parsing access logs and
13displaying a couple of charts. But to be honest, I actually never used Google
14Analytics to the fullest extent and was usually interested in seeing page hits
15and which pages were visited most often.
16
17I recently moved my blog from Firebase to a VPS and also decided to remove
18Google Analytics tracking code from the site since its quite malicious and
19tracks users across other pages also and is creating a profile of a user, and
20I've had it. But I also need some insight of what is happening on a server and
21which content is being read the most etc.
22
23I have looked at many existing solutions like:
24
25- [Umami](https://umami.is/)
26- [Freshlytics](https://github.com/sheshbabu/freshlytics)
27- [Matomo](https://matomo.org/)
28
29But the more I looked at them the more I noticed that I am replacing one evil
30with another one. Don't get me wrong. Some of these solutions are absolutely
31fantastic but would require installation of databases and something like PHP or
32Node. And I was not ready to put those things on my fresh server. Also having
33Docker installed is out of the question.
34
35## Opting for log parsing
36
37So, I defaulted to parsing already existing logs and generating HTML reports
38from this data.
39
40I found this amazing software [GoAccess](https://goaccess.io/) which provides
41all the functionalities I need, and it's a single binary. Written in Go.
42
43GoAccess can be used in two different modes.
44
45![GoAccess Terminal](/assets/posts/goaccess/goaccess-dash-term.png){:loading="lazy"}
46
47*Running in a terminal*
48
49![GoAccess HTML](/assets/posts/goaccess/goaccess-dash-html.png){:loading="lazy"}
50
51*Running in a browser*
52
53I, however, need this to run in a browser. So, the second option is the way to
54go. The Idea is to periodically run cronjob and export this report into a folder
55that gets then server by Nginx behind a Basic authentication.
56
57## Getting Nginx ready
58
59I choose Ubuntu on [DigitalOcean](https://www.digitalocean.com/). First I
60installed [Nginx](https://nginx.org/en/), and
61[Letsencrypt](https://letsencrypt.org/getting-started/) certbot and all the
62necessary dependencies.
63
64```sh
65# log in as root user
66sudo su -
67
68# first let's update the system
69apt update && apt upgrade -y
70
71# let's install
72apt install nginx certbot python3-certbot-nginx apache2-utils
73```
74
75After all this is installed we can create a new configuration for a statistics.
76Stats will be available at `stats.domain.com`.
77
78```sh
79# creates directory where html will be hosted
80mkdir -p /var/www/html/stats.domain.com
81
82cp /etc/nginx/sites-available/default /etc/nginx/sites-available/stats.domain.com
83nano /etc/nginx/sites-available/stats.domain.com
84```
85
86```nginx
87server {
88 root /var/www/html/stats.domain.com;
89 server_name stats.domain.com;
90
91 index index.html;
92 location / {
93 try_files $uri $uri/ =404;
94 }
95}
96```
97
98Now we check if the configuration is ok. We can do this with `nginx -t`. If all
99is ok, we can restart Nginx with `service nginx restart`.
100
101After all that you should add A record for this domain that points to IP of a
102droplet.
103
104Before enabling SSL you should test if DNS records have propagated with `curl
105stats.domain.com`.
106
107Now, it's time to provision TLS certificate. To achieve this, you execute
108command `certbot --nginx`. Follow the wizard and when you are asked about
109redirection always choose 2 (always redirect to HTTPS).
110
111When this is done you can visit https://stats.domain.com and you should get 404
112not found error which is correct.
113
114## Getting GoAccess ready
115
116If you are using Debian like system GoAccess should be available in repository.
117Otherwise refer to the official website.
118
119```sh
120apt install goaccess
121```
122
123To enable Geo location we also need one additiona thing.
124
125```sh
126cd /var/www/html/stats.stats.com
127wget https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-City.mmdb
128```
129
130Now we create a shell script that will be executed every 10 minutes.
131
132```sh
133nano /var/www/html/stats.domain.com/generate-stats.sh
134```
135
136Contents of this file should look like this.
137
138```sh
139#!/bin/sh
140
141zcat -f /var/log/nginx/access.log* > /var/log/nginx/access-all.log
142
143goaccess \
144 --log-file=/var/log/nginx/access-all.log \
145 --log-format=COMBINED \
146 --exclude-ip=0.0.0.0 \
147 --geoip-database=/var/www/html/stats.domain.com/GeoLite2-City.mmdb \
148 --ignore-crawlers \
149 --real-os \
150 --output=/var/www/html/stats.domain.com/index.html
151
152rm /var/log/nginx/access-all.log
153```
154
155Because after a while nginx creates multiple files with access logs we use
156[`zcat`](https://linux.die.net/man/1/zcat) to extract Gziped contents and create
157a file that has all the access logs. After this file is used we delete it.
158
159If you want to exclude your home IP's result look at the `--exclude-ip` option
160in script and instead of `0.0.0.0` add your own home IP address. You can find
161your home IP by executing `curl ifconfig.me` from your local machine and NOT
162from the droplet.
163
164Test the script by executing `sh
165/var/www/html/stats.domain.com/generate-stats.sh` and then checking
166`https://stats.domain.com`. If you can see stats instead of 404 than you are
167set.
168
169It's time to add this script to cron with `cron -e`.
170
171```go
172*/10 * * * * sh /var/www/html/stats.domain.com/generate-stats.sh
173```
174
175## Securing with Basic authentication
176
177You probably don't want stats to be publicly available, so we should create a
178user and a password for Basic authentication.
179
180First we create a password for a user `stats` with `htpasswd -c /etc/nginx/.htpasswd stats`.
181
182Now we update config file with `nano
183/etc/nginx/sites-available/stats.domain.com`. You probably noticed that the
184file looks a bit different from before. This is because `certbot` added
185additional rules for SSL.
186
187Your location portion the config file should now look like. You should add
188`auth_basic` and `auth_basic_user_file` lines to the file.
189
190```nginx
191location / {
192 try_files $uri $uri/ =404;
193 auth_basic "Private Property";
194 auth_basic_user_file /etc/nginx/.htpasswd;
195}
196```
197
198Test if config is still ok with `nginx -t` and if it is you can restart Nginx
199with `service nginx restart`.
200
201If you now visit `https://stats.domain.com` you should be prompted for username
202and password. If not, try reopening your browser.
203
204That is all. You now have analytics for your server that gets refreshed every 10
205minutes.