homeLogo

~ 4 min read

Realtime Web Analytics with Goaccess

Free Open Source realtime web analytics tool that can be used to generate reports and graphs from access logs.

GoAccess

GoAccess is an open source (Free) real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly.

Why GoAccess?

GoAccess was designed to be a fast, terminal-based log analyzer. Its core idea is to quickly analyze and view web server statistics in real time without needing to use your browser (great if you want to do a quick analysis of your access log via SSH, or if you simply love working in the terminal). While the terminal output is the default output, it has the capability to generate a complete, self-contained real-time HTML report (great for analytics, monitoring and data visualization), as well as a JSON, and CSV report.

Key Features

  • Fast, real-time, millisecond/second updates, written in C
  • Only ncurses as a dependency
  • Nearly all web log formats (Apache, Nginx, Amazon S3, Elastic Load Balancing, CloudFront, Caddy, etc)
  • Simply set the log format and run it against your log
  • Beautiful terminal and bootstrap dashboards (Tailor GoAccess to suit your own color taste/schemes) and of course, Valgrind tested.

Download GoAccess

First and foremost you need to download GoAccess. There are several ways to download GoAccess. Choose one of the following options:

Download and build from source (tar.gz)

  • Use your preferred package manager of your Linux distribution.
  • Build from development and get the latest and greatest.
  • Build GoAccess’ Docker image from upstream.

Determine Log Format

Once you have GoAccess installed on a machine, then you should be ready to start using it. However, first you need to determine the log format of your access log. GoAccess comes with several predefined log format options that you can use. Either you can set them permanently in your configuration file or simply passing it through the command line.

If you are unsure, please feel free to open a new issue on Github and post a few sample lines from your access log. If you have a custom log format, please take a look at the custom log format options.

Run GoAccess

At this point you are ready to run GoAccess against your access log(s). The following are the most basic and common scenarios.

Terminal Output

The following prompts a log configuration dialog with predefined log formats for you to choose from and then displays the stats in real-time.

goaccess access.log -c

Static HTML Output

The following parses the access log and displays the stats in a static HTML report.

goaccess access.log -o report.html --log-format=COMBINED

Note:

Here we specify the log format directly in the command line using —log-format. You can also specify the log format in your configuration file as described in the actual config file. Real-Time HTML Output


The following parses the access log and displays the stats in a real-time HTML report.

goaccess access.log -o /var/www/html/report.html --log-format=COMBINED --real-time-html

Using Cron to Schedule GoAccess Runs

Create a script that runs GoAccess every hour.

#! /bin/bash

# Change this to match the log file your server writes to
SERVER_LOG='/var/log/???'
# Change this to match the master log file GoAccess reads from
MASTER_LOG='/home/???/goaccess/goaccess-master.log'
# Change this to match the HTML page location to serve
HTML_OUT='/var/www/???/stats/index.html'
# Piwik spam domains file
BLACKLIST='/home/???/goaccess/spammers.txt'

# Append new log data to existing master log
cat $SERVER_LOG >> $MASTER_LOG

# Deduplicate log file in place (requires newer version of awk which Ubuntu 16.04 has)
awk -i inplace '!seen[$0]++' $MASTER_LOG
# This is the GoAccess command to process data and generate the stats page.
# Omit the "--config-file ~/.goaccessrc" if using Apache and replace with "--log-format=VCOMBINED"
goaccess -f $MASTER_LOG --config-file ~/.goaccessrc $(printf -- "--ignore-referer=%s " $(<$BLACKLIST)) --geoip-database ~/goaccess/GeoLiteCity.dat --agent-list --no-progress --output=$HTML_OUT

Scheduling GoAccess Runs

crontab -e
@hourly ~/goaccess/cron/cron-goaccess.sh > /dev/null