Issue
I just started learning SAS and realised that proc sql don't use window functions. As I am more at ease with sql I was wondering how I can simulate a sum window function in proc?
desired result
select a.active, a.store_id, a.nbr, sum(nbr) over (partition by a.store_id)
from(select active, store_id, count(customer_id) as nbr from customer group by active, store_id) as a
;
active | store_id | nbr | sum |
---|---|---|---|
0 | 1 | 8 | 326 |
1 | 1 | 318 | 326 |
0 | 2 | 7 | 273 |
1 | 2 | 266 | 273 |
eg of raw data
select active, store_id, customer_id
from customer
limit 10;
active | store_id | customer_id |
---|---|---|
1 | 1 | 1 |
1 | 1 | 2 |
1 | 2 | 3 |
1 | 2 | 4 |
1 | 1 | 5 |
1 | 1 | 6 |
0 | 1 | 7 |
1 | 2 | 8 |
1 | 1 | 9 |
1 | 2 | 10 |
current result and query
select a.active, a.store_id, a.nbr, sum(nbr)
from(select active, store_id, count(customer_id) as nbr from customer group by active, store_id) as a
group by a.active, a.store_id, a.nbr;
active | store_id | nbr | sum |
---|---|---|---|
0 | 1 | 8 | 8 |
1 | 1 | 318 | 318 |
0 | 2 | 7 | 7 |
1 | 2 | 266 | 266 |
Solution
You can do the equivalent in proc sql
by merging two sub-queries: one for the count of customers by active, store_id
, and another for the total customers for each store_id
.
proc sql noprint;
create table want as
select t1.active
, t1.store_id
, t1.nbr
, t2.sum
from (select active
, store_id
, count(customer_id) as nbr
from have
group by store_id, active
) as t1
LEFT JOIN
(select store_id
, count(customer_id) as sum
from have
group by store_id
) as t2
ON t1.store_id = t2.store_id
;
quit;
If you wanted to do this in a more SASsy way, you can run proc means
and merge together the results from a single dataset that holds everything you need. proc means
will calculate all possible combinations of your variables by default.
proc means data=have noprint;
class store_id active;
ways 1 2;
output out=want_total
n(customer_id) = total
;
run;
data want;
merge want_total(where=(_TYPE_ = 3) rename=(total = nbr) )
want_total(where=(_TYPE_ = 2) rename=(total = sum) keep=_TYPE_ store_id total)
;
by store_id;
drop _:;
run;
Or, in SQL:
proc sql;
create table want as
select t1.store_id
, t1.active
, t1.total as nbr
, t2.total as sum
from want_total as t1
LEFT JOIN
want_total as t2
ON t1.store_id = t2.store_id
where t1._TYPE_ = 3
AND t2._TYPE_ = 2
;
quit;
The _TYPE_
variable identifies the level of the analysis. For example, _TYPE_ = 1
is for active
only, _TYPE_ = 2
is for store_id
only, and _TYPE_ = 3
is for all combinations. You can view this in the output dataset from proc means
:
store_id active _TYPE_ _FREQ_ total
. 0 1 3 3
. 1 1 7 7
1 . 2 6 6
2 . 2 4 4
1 0 3 1 1
1 1 3 5 5
2 0 3 2 2
2 1 3 2 2
And if you wanted faster high-performance results, check out its big sibling, proc hpsummary
.
Therein lies the cool thing about SAS: You can bounce between PROCs, SQL, the DATA Step, and Python via Pandas/proc python. You can exploit the unique benefits of each of these methods and thought processes for any number of data engineering and statistics problems.
Answered By - Stu Sztukowski Answer Checked By - Cary Denson (PHPFixing Admin)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.