Issue
I am trying to take the input and keyword in the two tables from the database. So am using pandas to read both the tables and using the respective columns for splitting up of data and then write back the output in the same table in DB.
My input:
Original_Input
LARIDENT SRL
MIZUHO Corporation Gosen Factory
ZIMMER MANUFACTURING BV
GALT MEDICAL CORP
MIZUHO AMERICA INC
AVENT S de RL de CV
LUV N CARE LTD
STERIS ISOMEDIX PUERTO RICO INC
MEDISTIM INC
Cadence Science Inc
TECHNOLOGIES SA
AMG Mdicale Co Inc
My keyword table:
**Name_Extension** **Company_Type** **Priority**
co llc Company LLC 2
Pvt ltd Private Limited 8
Corp Corporation 4
CO Ltd Company Limited 3
inc Incorporated 5
CO Company 1
ltd Limited 7
llc LLC 6
Corporation Corporation 4
& Co Company 1
Company Limited Company Limited 3
Limited Limited 7
Co inc Company Incorporated 9
AB AB 10
SA SA 11
S A SA 11
GmbH GmbH 12
Sdn Bhd Sdn Bhd 13
llp LLP 14
co llp LLP 14
SA DE CV SA DE CV 19
Company Company 1
Coinc Company Incorporated 9
Coltd Company Limited 3
So if the input(in table 1) has any of the name extension(this is in table 2) then it has to be split and put in as Core_input and Type_input columns where core input will contain the company names and type_input will contain the company type(from table 2 column 2) and it has to be checked with the priority.
My output will be:
Core_Input Type_input
NULL NULL
NULL NULL
NULL NULL
GALT MEDICAL Corporation
MIZUHO AMERICA Incorporated
NULL NULL
LUV N CARE Limited
STERIS ISOMEDIX PUERTO RICO Incorporated
MEDISTIM Incorporated
Cadence Science Incorporated
My Code:
k1=[]
k2=[]
df1=pd.read_sql('select * from [dbo].[company_Extension]',engine)
for inp1 in df1['Name_Extension']:
k1.append(inp1.strip())
for inp2 in df1['Company_Type']:
k2.append(inp2.strip())
p=1
p1=max(df1['Priority'])
for k1 in df1['Name_Extension']:
for k2 in df1['Company_Type']:
#for pr in df1['Priority']:
for i in df['Cleansed_Input']:
while p<=p1:
if re.search(r'[^>]*?\s'+str(k1).strip(),str(i).strip(),re.I) and (p == (pr for pr in
df1['Priority'])):
splits = i.str.split(str(k1),re.I)
df['Core_Input'] = splits[0] #df['Cleansed_Input'].str.replace(str(k1),'',re.I)
df['Type_input'] = str(k2)
p=p+1
data.to_sql('Testtable', con=engine, if_exists='replace',index= False)
Any help is appreciated.
Edit:
df=pd.read_sql('select * from [dbo].[TempCompanyName]',engine)
df1=pd.read_sql('select * from [dbo].[company_Extension]',engine)
ext_list = df1['Name_Extension']
type_list =df1['Company_Type']
for i, j in df.iterrows():
comp_name = df['Original_Input'][i]
for idx, ex in enumerate(ext_list):
if re.search(rf'\b{ex}\b', comp_name,re.IGNORECASE):
df['Core_Input'] = type_list[idx]
df['Type_input'].iloc[i] = comp_type
print(df)
df.to_sql('TempCompanyName', con=engine, if_exists='replace',index= False)
Edit:
ext_list = df1['Name_Extension']
type_list =df1['Company_Type']
for i, j in enumerate(df['Cleansed_Input']):
comp_name = df['Cleansed_Input'][i]
for idx, ex in enumerate(ext_list):
comp_name.replace('.,','')
if re.search(rf'(\b{ex}\b)', comp_name, re.I):
comp_type = type_list[idx]
df['Type_input'].iloc[i]= comp_type
# Delete the extension name from company name
updated_comp_name =
re.sub(rf'(\b{str(ex).upper()}\b)','',str(comp_name).upper())
# Above regex is leaving space post word removal adding space
from next word becomes 2 spaces
updated_comp_name = str(updated_comp_name).replace(' ',' ')
# Update the company name
df['Core_Input'].iloc[i] = updated_comp_name
Solution
Hi Hope below lines help you to get the solution...i am not using SQL due to some reason, but taken your data in 2 different excels...you need to add a column Type in input table before run the code...
import pandas as pd
import numpy
import re
input_df = pd.read_excel('input.xlsx',sheet_name='Sheet1')
exts_df = pd.read_excel('exts.xlsx', sheet_name='Sheet1')
# Check if correct data is loaded
print(input_df.head())
ext_list = exts_df['Name_Extension']
type_list =exts_df['Company_Type']
for i, j in input_df.iterrows():
comp_name = input_df['Company Names'][i]
for idx, ex in enumerate(ext_list):
if re.search(rf'\b{ex}\b', comp_name,re.IGNORECASE):
comp_type = type_list[idx]
input_df['Type'].iloc[i] = comp_type
# Delete teh extension name from company name
updated_comp_name = re.sub(rf'\b{str(ex).upper()}\b','',str(comp_name).upper())
# Above regex is leaving space post word removal adding space from next word becomes 2 spaces
updated_comp_name = str(updated_comp_name).replace(' ',' ')
# Update the company name
input_df['Company Names'].iloc[i] = updated_comp_name
print(input_df)
input_df.to_excel('output.xlsx', index=False)
output post removal extension from input Company Name Column mapping Company_Type ...
Answered By - Hietsh Kumar Answer Checked By - Mildred Charles (PHPFixing Admin)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.