PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Monday, August 29, 2022

[FIXED] How To Determine Bulk Customers (Customers Buying More Than 'N' Items Or So In A Single Transaction) In Python (Sales Data Analytics)?

 August 29, 2022     csv, dataframe, numpy, pandas, python     No comments   

Issue

so I've the following sample dataset:

Sample Dataset

Column A: Name

Column B: Email

Column C: Products

Column D: Transaction Date

I've two objectives:

  1. To determine bulk customers (customers who purchase, let's say, 5 products or more in a single transaction), where each row represents a unique transaction with a unique timestamp.

  2. To determine from the recurring customers (customers frequently making different transactions), who all are also bulk customers.

Now, I've already determined the list of recurring customers as follows:

n = 15
custmost1 = Order_Details['Name'].value_counts().index.tolist()[:n]
custmost2 = Order_Details['Name'].value_counts().values.tolist()[:n]
custmost = np.column_stack((custmost1,custmost2))

Where custmost denotes the series tuple clubbed together as an array for customers making frequent purchases with their counts. Order_Details is the dataframe I created for the dataset.

Now, I'm at my wit ends to figure out to maintain a count of different products being purchased in a single transaction (with a unique timestamp), and possibly, add it as a separate column in a dataframe.

I don't know if it's a feasible approach or not, but two ways were coming to my mind:

  1. One to count the number of commas, so that number of commas+1 will be number of products.
  2. To segregate each products into a separate line (which I already did, by the way, for maintaining a total count for a different insight), and checking with timestamp the number of products sold at a given timestamp.

I'd segregated the Products as follows:

reshaped = \
(Order_Details.set_index(Order_Details.columns.drop('Product',1).tolist())
   .Product.str.split(',', expand=True)
   .stack()
   .reset_index()
   .rename(columns={0:'Product'})
   .loc[:, Order_Details.columns]
)

So, in this lieu, I would want someone to guide me as I feel like the aforementioned approaches are actually rather messy.


Solution

Assuming you already have a proper DataFrame:

>>> df.applymap(lambda e: type(e).__name__).value_counts()
name  email  product  date     
str   str    list     Timestamp    29
dtype: int64

(i.e., with columns: ['name', 'email', 'product', 'date'], where the 'product' column contains list objects, and date contains Timestamp),

Then you could do this:

bulk_customers = set(df.loc[df['product'].apply(len) >= 5, 'name'])

s = df.groupby('name').size() > 1
recur_customers = set(s[s].index)

>>> bulk_customers
{'PERSON_108', 'PERSON_123'}

>>> recur_customers
{'PERSON_105'}

Notes

  1. I changed the row of PERSON_125 to be PERSON_105, so that there would be one repeat customer. Likewise, I used a threshold of n_visits > 1 as the criterion for "recurring", but you may want something different.

  2. You'd be well inspired to assign a unique ID to each of your customers. This could be based on email or perhaps you already have a customer ID. In any case, using name is prone to collisions, plus sometimes customers change name (e.g. through marriage) while keeping the same email or customer ID.

  3. You didn't mention over what period of time a customer needs to visit again in order to be considered "frequent". If that is to be considered, you have to be specific whether it is e.g. "within a calendar month", or "over the past 30 days", etc., as each leads to slightly different expressions.



Answered By - Pierre D
Answer Checked By - Timothy Miller (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing