aboutsummaryrefslogtreecommitdiff
path: root/content/posts/2021-01-25-goaccess.md
diff options
context:
space:
mode:
Diffstat (limited to 'content/posts/2021-01-25-goaccess.md')
-rw-r--r--content/posts/2021-01-25-goaccess.md204
1 files changed, 0 insertions, 204 deletions
diff --git a/content/posts/2021-01-25-goaccess.md b/content/posts/2021-01-25-goaccess.md
deleted file mode 100644
index 84ea3cd..0000000
--- a/content/posts/2021-01-25-goaccess.md
+++ /dev/null
@@ -1,204 +0,0 @@
1---
2title: Using GoAccess with Nginx to replace Google Analytics
3url: using-goaccess-with-nginx-to-replace-google-analytics.html
4date: 2021-01-25T12:00:00+02:00
5type: post
6draft: false
7---
8
9## Introduction
10
11I know! You cannot simply replace Google Analytics with parsing access logs and
12displaying a couple of charts. But to be honest, I actually never used Google
13Analytics to the fullest extent and was usually interested in seeing page hits
14and which pages were visited most often.
15
16I recently moved my blog from Firebase to a VPS and also decided to remove
17Google Analytics tracking code from the site since its quite malicious and
18tracks users across other pages also and is creating a profile of a user, and
19I've had it. But I also need some insight of what is happening on a server and
20which content is being read the most etc.
21
22I have looked at many existing solutions like:
23
24- [Umami](https://umami.is/)
25- [Freshlytics](https://github.com/sheshbabu/freshlytics)
26- [Matomo](https://matomo.org/)
27
28But the more I looked at them the more I noticed that I am replacing one evil
29with another one. Don't get me wrong. Some of these solutions are absolutely
30fantastic but would require installation of databases and something like PHP or
31Node. And I was not ready to put those things on my fresh server. Also having
32Docker installed is out of the question.
33
34## Opting for log parsing
35
36So, I defaulted to parsing already existing logs and generating HTML reports
37from this data.
38
39I found this amazing software [GoAccess](https://goaccess.io/) which provides
40all the functionalities I need, and it's a single binary. Written in Go.
41
42GoAccess can be used in two different modes.
43
44![GoAccess Terminal](/posts/goaccess/goaccess-dash-term.png)
45
46*Running in a terminal*
47
48![GoAccess HTML](/posts/goaccess/goaccess-dash-html.png)
49
50*Running in a browser*
51
52I, however, need this to run in a browser. So, the second option is the way to
53go. The Idea is to periodically run cronjob and export this report into a folder
54that gets then server by Nginx behind a Basic authentication.
55
56## Getting Nginx ready
57
58I choose Ubuntu on [DigitalOcean](https://www.digitalocean.com/). First I
59installed [Nginx](https://nginx.org/en/), and
60[Letsencrypt](https://letsencrypt.org/getting-started/) certbot and all the
61necessary dependencies.
62
63```sh
64# log in as root user
65sudo su -
66
67# first let's update the system
68apt update && apt upgrade -y
69
70# let's install
71apt install nginx certbot python3-certbot-nginx apache2-utils
72```
73
74After all this is installed we can create a new configuration for a statistics.
75Stats will be available at `stats.domain.com`.
76
77```sh
78# creates directory where html will be hosted
79mkdir -p /var/www/html/stats.domain.com
80
81cp /etc/nginx/sites-available/default /etc/nginx/sites-available/stats.domain.com
82nano /etc/nginx/sites-available/stats.domain.com
83```
84
85```nginx
86server {
87 root /var/www/html/stats.domain.com;
88 server_name stats.domain.com;
89
90 index index.html;
91 location / {
92 try_files $uri $uri/ =404;
93 }
94}
95```
96
97Now we check if the configuration is ok. We can do this with `nginx -t`. If all
98is ok, we can restart Nginx with `service nginx restart`.
99
100After all that you should add A record for this domain that points to IP of a
101droplet.
102
103Before enabling SSL you should test if DNS records have propagated with `curl
104stats.domain.com`.
105
106Now, it's time to provision TLS certificate. To achieve this, you execute
107command `certbot --nginx`. Follow the wizard and when you are asked about
108redirection always choose 2 (always redirect to HTTPS).
109
110When this is done you can visit https://stats.domain.com and you should get 404
111not found error which is correct.
112
113## Getting GoAccess ready
114
115If you are using Debian like system GoAccess should be available in repository.
116Otherwise refer to the official website.
117
118```sh
119apt install goaccess
120```
121
122To enable Geo location we also need one additiona thing.
123
124```sh
125cd /var/www/html/stats.stats.com
126wget https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-City.mmdb
127```
128
129Now we create a shell script that will be executed every 10 minutes.
130
131```sh
132nano /var/www/html/stats.domain.com/generate-stats.sh
133```
134
135Contents of this file should look like this.
136
137```sh
138#!/bin/sh
139
140zcat -f /var/log/nginx/access.log* > /var/log/nginx/access-all.log
141
142goaccess \
143 --log-file=/var/log/nginx/access-all.log \
144 --log-format=COMBINED \
145 --exclude-ip=0.0.0.0 \
146 --geoip-database=/var/www/html/stats.domain.com/GeoLite2-City.mmdb \
147 --ignore-crawlers \
148 --real-os \
149 --output=/var/www/html/stats.domain.com/index.html
150
151rm /var/log/nginx/access-all.log
152```
153
154Because after a while nginx creates multiple files with access logs we use
155[`zcat`](https://linux.die.net/man/1/zcat) to extract Gziped contents and create
156a file that has all the access logs. After this file is used we delete it.
157
158If you want to exclude your home IP's result look at the `--exclude-ip` option
159in script and instead of `0.0.0.0` add your own home IP address. You can find
160your home IP by executing `curl ifconfig.me` from your local machine and NOT
161from the droplet.
162
163Test the script by executing `sh
164/var/www/html/stats.domain.com/generate-stats.sh` and then checking
165`https://stats.domain.com`. If you can see stats instead of 404 than you are
166set.
167
168It's time to add this script to cron with `cron -e`.
169
170```go
171*/10 * * * * sh /var/www/html/stats.domain.com/generate-stats.sh
172```
173
174## Securing with Basic authentication
175
176You probably don't want stats to be publicly available, so we should create a
177user and a password for Basic authentication.
178
179First we create a password for a user `stats` with `htpasswd -c /etc/nginx/.htpasswd stats`.
180
181Now we update config file with `nano
182/etc/nginx/sites-available/stats.domain.com`. You probably noticed that the
183file looks a bit different from before. This is because `certbot` added
184additional rules for SSL.
185
186Your location portion the config file should now look like. You should add
187`auth_basic` and `auth_basic_user_file` lines to the file.
188
189```nginx
190location / {
191 try_files $uri $uri/ =404;
192 auth_basic "Private Property";
193 auth_basic_user_file /etc/nginx/.htpasswd;
194}
195```
196
197Test if config is still ok with `nginx -t` and if it is you can restart Nginx
198with `service nginx restart`.
199
200If you now visit `https://stats.domain.com` you should be prompted for username
201and password. If not, try reopening your browser.
202
203That is all. You now have analytics for your server that gets refreshed every 10
204minutes.