PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Saturday, October 29, 2022

[FIXED] How to return rows with 0 count for missing data?

 October 29, 2022     database, generate-series, left-join, postgresql, python     No comments   

Issue

I have unevenly distributed data (wrt date) for a few years (2003-2008). I want to query data for a given set of start and end date, grouping the data by any of the supported intervals (day, week, month, quarter, year) in PostgreSQL 8.3.

The problem is that some of the queries give results continuous over the required period, as this one:

select to_char(date_trunc('month',date), 'YYYY-MM-DD'), count(distinct post_id) 
from some_table
where category_id = 1
and entity_id = 77
and entity2_id = 115 
and date <= '2008-12-06'
and date >= '2007-12-01'
group by date_trunc('month',date)
order by date_trunc('month',date);

  to_char   | count 
------------+-------
 2007-12-01 |    64
 2008-01-01 |    31
 2008-02-01 |    14
 2008-03-01 |    21
 2008-04-01 |    28
 2008-05-01 |    44
 2008-06-01 |   100
 2008-07-01 |    72
 2008-08-01 |    91
 2008-09-01 |    92
 2008-10-01 |    79
 2008-11-01 |    65
(12 rows)

But some of them miss some intervals because there is no data present, as this one:

select to_char(date_trunc('month',date), 'YYYY-MM-DD'), count(distinct post_id) 
from some_table
where category_id=1
and entity_id = 75
and entity2_id = 115 
and date <= '2008-12-06'
and date >= '2007-12-01'
group by date_trunc('month',date)
order by date_trunc('month',date);
    
  to_char   | count 
------------+-------

 2007-12-01 |     2
 2008-01-01 |     2
 2008-03-01 |     1
 2008-04-01 |     2
 2008-06-01 |     1
 2008-08-01 |     3
 2008-10-01 |     2
(7 rows)

where the required resultset is:

  to_char   | count 
------------+-------
 2007-12-01 |     2
 2008-01-01 |     2
 2008-02-01 |     0
 2008-03-01 |     1
 2008-04-01 |     2
 2008-05-01 |     0
 2008-06-01 |     1
 2008-07-01 |     0
 2008-08-01 |     3
 2008-09-01 |     0
 2008-10-01 |     2
 2008-11-01 |     0
(12 rows)

A count of 0 for missing entries.

I have seen earlier discussions on Stack Overflow but they don't solve my problem it seems, since my grouping period is one of (day, week, month, quarter, year) and decided on runtime by the application. So an approach like left join with a calendar table or sequence table will not help I guess.

My current solution to this is to fill in these gaps in Python (in a Turbogears App) using the calendar module.

Is there a better way to do this?


Solution

You can create the list of all first days of the last year (say) with

select distinct date_trunc('month', (current_date - offs)) as date 
from generate_series(0,365,28) as offs;
          date
------------------------
 2007-12-01 00:00:00+01
 2008-01-01 00:00:00+01
 2008-02-01 00:00:00+01
 2008-03-01 00:00:00+01
 2008-04-01 00:00:00+02
 2008-05-01 00:00:00+02
 2008-06-01 00:00:00+02
 2008-07-01 00:00:00+02
 2008-08-01 00:00:00+02
 2008-09-01 00:00:00+02
 2008-10-01 00:00:00+02
 2008-11-01 00:00:00+01
 2008-12-01 00:00:00+01

Then you can join with that series.



Answered By - Martin v. Löwis
Answer Checked By - David Goodson (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing