Discussion: Analytics

bertieb · 22 November 2019 15:25

“If you know your enemy and you know yourself, you need not fear the result of a hundred battles.” — Sun Tzu

We need to know ourselves.

Knowing what users look at and what they don’t look at; what they interact with and what they don’t interact with; where they go and how they get there; these are all examples of things that would be useful to know, to a greater or lesser extent.

We can use this data, this information to make our site and network better.

Analytics can give us aggregate and quantitative insights into this. What type of analytics should we use?

Related: MVP: Cookie policy

manassehkatz · 22 November 2019 15:36

My take on site analytics, I haven’t actually researched this - based on dealing with it for various web sites over many years:

First generation - Read through log files and create analytics after-the-fact - limited depending on how navigation is done and not real-time
Second generation - A bit of code on each page to track locally (i.e., within the web site server)
Third generation - Google and similar - 3rd-party tracking using a snippet of Javascript on every page

“Everyone” seems to love Google Analytics. But it has a lot of problems (and I love Google, this is not about them per se):

Information (no PII, but still) constantly transmitted to a 3rd-party
Requires Javascript (admittedly, we will need Javascript for a ton of features, but a basic user who deliberately turns off JS except when needed would not get tracked)
Dependent on 3rd-party system for analysis (OTOH: get to use great tools that others developed)

I much prefer a tracking system within our own server. That is relatively easy to implement (instead of a snippet of 3rd-pary JS, it is an actual log record in our own database for each page view). It also has the great advantage of being able to log actions as well as views, with more detail, if we want to do so. The only real catch is database storage, but done well (and I’ve worked with some not done so well…reminds me, I need to check on one of those) it is quite manageable and can provide a lot of useful information.

One catch - much more with our own because others have done it already on Google Analytics, etc. - any tracking needs to take into account robots/web crawlers (some of which we actually want) as they can significantly skew actual non-logged-in-user view statistics.

luap42 · 22 November 2019 15:37

If we need any, we should use Matomo:

https://matomo.org/

It can be hosted on our servers (PHP-based) and has a nice opt-out functionality¹, which can be linked to at the bottom of every page. It respects Do not track requests by the browser and offers tools to fulfill GDPR requests.

Having multiple sites is also easy. They only differ in an continuous “site id” which would need to be set in the site configuration.

AFAIK it is also possible to import logs if we decide not to use it at the beginning.

¹ For example see the footer of the German FOI-site FragDenStaat:

Translation

FragDenStaat uses the more privacy-friendly technology Matomo instead of the common alternatives, in order to see site usage analytics. To opt out, click here. You can find more information in our privacy policy.

manassehkatz · 22 November 2019 15:46

I’m not familiar with Matomo. I took a very quick look and they have plans (paid) using their database and also a free version using your own database (far more secure, but you have to store the data somewhere). While I can’t say “this is the one”, this is the type of system I am talking about.

Marc.2377 · 23 November 2019 19:53

I, for one, oppose anything third-party or involving actual tracking of user activities - no collected metrics should contain information that allows identifying individual users.

I never heard of Matomo before. It looks good, probably a decent choice.

benv · 11 December 2019 19:10

I’m more of a lurker here, but I’ll chime in with my 2 cents. I probably don’t feel as strongly about 3rd-party tracking as others, but I do think we should use an existing tool or framework as opposed to writing our own. Better to focus our development resources on features that differentiate our product.