PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Monday, September 5, 2022

[FIXED] What is hyperloglog and why is this good for?

 September 05, 2022     database, hyperloglog, redis     No comments   

Issue

I was studying data structures supported by Redis and I was not able to find out an explanation that could make me understand what HyperLogLog is.

How do I use it and why is this good for?


Solution

Basically is a kind of Redis Set which uses optimized algorithms to count elements by avoiding a great consumption of memory. The difference between a Set and a HyperLogLog is that with a HyperLogLog you just can add, count unique element and merge some HyperLogLogs in another one, so basically you don't store the members in a HyperLogLog as you could do in a SET, and retrieve them, you just store the occurrences of different members, that is the reason which HyperLogLog doesn't provide a command to retrieve its stored members.

A clear uses case could be if you want to have a huge SET where you want to count so many times the number of unique data inside the set, you are not interested in which data are inside the set, you are only interested in consuming low memory even when the set grows a lot. For instance, imagine you have a high impact system with a large number of users all of them very active, and you are interested in knowing the number of unique visitors in every webpage of your system. You want to be updated real-time, so you will query every second the unique visitors for every website. You could create a HyperLogLog for every URI in your system, which will represent the webpage and every time a user visits a URL you will PFAAD the user_id:

PFAAD /api/show/concerts id789989

then every second you will iterate for every URL-HyperLogLog to get number of unique user-visitors

PFCOUNT /api/show/concerts

145542

PFCOUNT /api/show/open-airs

25565223

And you would say, yes but I can get the same functionality by using SET with the benefit of having the user_ids in every set as members. Yes, you can, but you will consume much memory by using sets and every time (second) you query every set to get the numbers of unique visitors with SCARD command, you will spend even much memory, so at least you need to store user_ids for some reason, HyperLogLogs are better options as counters of unique elements. For our use case, imagine having 200-300 sets with around 20-30k of users inside.

The correspondence between HyperLogLog and Set commands:

  • PFADD = SADD
  • PFCOUNT = SCARD
  • PFMERGE = SUNION


Answered By - Averias
Answer Checked By - Timothy Miller (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing