Issue
I'm working on a program that parses weather information. In this part of my code, I am trying to re-organise the results in order of time before continuing to append more items later on.
The time in these lines is usually the first 4 digits of any line (first 2 digits are the day and the others are the hour). The exception to this is the line that starts with 11010KT
, this line is always assumed to be the first line in any weather report, and those numbers are a wind vector and NOT a time.
You will see that I am removing any line that has TEMPO
INTER
or PROB
at the start of this example because I want lines containing these words to be added to the end of the other restructured list. These lines can be thought of as a separate list in which I want organised by time in the same way as the other items.
I am trying to use Regex to pull the times from the lines that remain after removing the TEMPO
INTER
and PROB
lines and then sort them, then once sorted, use regex again to find that line in full and create a restructured list. Once that list has been completed, I am sorting the TEMPO
INTER
and PROB
list and then appending that to the newly completed list I had just made.
I have also tried a for
loop that will remove any duplicate lines added, but this seems to only remove one duplicate of the TEMPO
line???
Can someone please help me figure this out? I am kind of new to this, thank you...
This ideally should come back looking like this:
ETA IS 0230 which is 1430 local
11010KT 5000 MODERATE DRIZZLE BKN004
FM050200 12012KT 9999 LIGHT DRIZZLE BKN008
TEMPO 0501/0502 2000 MODERATE DRIZZLE BKN002
INTER 0502/0506 4000 SHOWERS OF MODERATE RAIN BKN008
Instead of this, I am getting repeats of the line that starts with FM050200
and then repeats of the line starting with TEMPO
. It doesn't find the line starting with INTER
either...
I have made a minimal reproducible example for anyone to try and help me. I will include that here:
import re
total_print = ['\nFM050200 12012KT 9999 LIGHT DRIZZLE BKN008', '\n11010KT 5000 MODERATE DRIZZLE BKN004', '\nINTER 0502/0506 4000 SHOWERS OF MODERATE RAIN BKN008', '\nTEMPO 0501/0502 2000 MODERATE DRIZZLE BKN002']
removed_lines = []
for a in total_print: # finding and removing lines with reference to TEMPO INTER PROB
if 'TEMPO' in a:
total_print.remove(a)
removed_lines.append(a)
for b in total_print:
if 'INTER' in b:
total_print.remove(b)
removed_lines.append(b)
for f in total_print:
if 'PROB' in f:
total_print.remove(f)
removed_lines.append(f)
list_time_in_line = []
for line in total_print: # finding the times in the remaining lines
time_in_line = re.search(r'\d\d\d\d', line)
list_time_in_line.append(time_in_line.group())
sorted_time_list = sorted(list_time_in_line)
removed_time_in_line = []
for g in removed_lines: # finding the times in the lines that were originally removed
removed_times = re.search(r'\d\d\d\d', g)
removed_time_in_line.append(removed_times.group())
sorted_removed_time_list = sorted(removed_time_in_line)
final = []
final.append('ETA IS 1230 which is 1430 local\n') # appending the time display
search_for_first_line = re.search(r'[\n]\d\d\d\d\dKT', ' '.join(total_print)) # searching for line that has wind vector instead of time
search_for_first_line = search_for_first_line.group()
if search_for_first_line: # adding wind vector line so that its the firs line listed in the group
search_for_first_line = re.search(r'%s.*' % search_for_first_line, ' '.join(total_print)).group()
final.append('\n' + search_for_first_line)
print(sorted_time_list) # the list of possible times found (the second item in list is the wind vector and not a time)
d = 0
for c in sorted_time_list: # finding the whole line for the corresponding time
print(sorted_time_list[d])
search_for_whole_line = re.search(r'.*\w+\s*%s.*' % sorted_time_list[d], ' '.join(total_print))
print(search_for_whole_line.group()) # it is doubling up on the 0502 time???????
d += 1
final.append('\n' + str(search_for_whole_line.group()))
h = 0
for i in sorted_removed_time_list: # finding the whole line for the corresponding times from the previously removed items
whole_line_in_removed_srch = re.search(r'.*%s.*' % sorted_removed_time_list[h], ' '.join(removed_lines))
h += 1
final.append('\n' + str(whole_line_in_removed_srch.group())) # appending them
l_new = []
for item in final: # this doesn't seeem to properly remove duplicates ?????
if item not in l_new:
l_new.append(item)
total_print = l_new
print(' '.join(total_print))
//////////////////////////////////////////EDIT:
I had asked this recently and got an excellent answer to my problem from @diggusbickus. I have now hit a new problem with the sorting in the answer.
Because my original question had only one type of weather line (beginning with the letters 'FM') in my data['other']
, the lambda
with the split()
was only looking at the first item of the line [0]
for the time.
data['other'] = sorted(data['other'], key=lambda x: x.split()[0])
Which is where the time is located (in previous question it was FM050200
where 05 is the day and 0200 is the time). That works very well for when there are lines beginning with FM
, but I have realised that occasionally lines like this exist:
'\nBECMG 0519/0520 27007KT 9999 SHOWERS OF LIGHT RAIN SCT020 BKN030'
The time in this style of line is the FIRST 4 digits located at index [1] and is in a 4 digit format instead of the 6 digit format line in FM050200
. The time in this new line is 05
as the day and 19
as the hour (so 1900).
I need this style of line to be grouped with the FM
lines, the problem is that they don't sort. I am trying to find a way to be able to sort the lines by time regardless of whether the time is on the [0]
index and in 6 digit format or on the [1]
index and in 4 digit format.
I will include a new example with a couple of small changes on the originally answered question. This new question will have different data as the total_print
vairable. This is a working example.
I essentially need the lines to be sorted by the FIRST 4 digits of any line, and the results should look like this:
ETA IS 0230 which is 1430 local
FM131200 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 BKN010
FM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 BKN010
BECMG 1315/1317 27007KT 9999 SHOWERS OF LIGHT RAIN SCT020 BKN030
TEMPO 1312/1320 4000 SHOWERS OF MODERATE RAIN BKN007
NB. The TEMPO
line is supposed to stay at the end, so don't worry about that one.
Here is the example, thank you so much to anyone who helps.
import re
total_print = ['\nBECMG 1315/1317 27007KT 9999 SHOWERS OF LIGHT RAIN SCT020 BKN030', '\nFM131200 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 BKN010', '\nFM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 BKN010','\nTEMPO 1312/1320 4000 SHOWERS OF MODERATE RAIN BKN007']
data = {
'windvector': [], # if it is the first line of the TAF
'other': [], # anythin with FM or BECMG
'tip': [] # tempo/inter/prob
}
wind_vector = re.compile('^\s\d{5}KT')
for line in total_print:
if 'TEMPO' in line \
or 'INTER' in line \
or 'PROB' in line:
key = 'tip'
elif re.match(wind_vector, line):
key = 'windvector'
else:
key = 'other'
data[key].append(line)
final = []
data['other'] = sorted(data['other'], key=lambda x: x.split()[0])
data['tip'] = sorted(data['tip'], key=lambda x: x.split()[1])
final.append('ETA IS 0230 which is 1430 local\n')
for lst in data.values():
for line in lst:
final.append('\n' + line[1:]) # get rid of newline
print(' '.join(final))
Solution
just sort your data into a dict, you're always creating lists and removing items: it's too confusing.
your regex to catch the wind vector catches also 12012KT
, that's why that line was repeated. the ^
ensures it matches only your pattern if it's a the beginning of the line
import re
total_print = ['\nFM050200 12012KT 9999 LIGHT DRIZZLE BKN008', '\n11010KT 5000 MODERATE DRIZZLE BKN004', '\nINTER 0502/0506 4000 SHOWERS OF MODERATE RAIN BKN008', '\nTEMPO 0501/0502 2000 MODERATE DRIZZLE BKN002']
data = {
'windvector': [],
'other': [],
'tip': [] #tempo/inter/prob
}
wind_vector=re.compile('^\s\d{5}KT')
for line in total_print:
if 'TEMPO' in line \
or 'INTER' in line \
or 'PROB' in line:
key='tip'
elif re.match(wind_vector, line):
key='windvector'
else:
key='other'
data[key].append(line)
data['tip']=sorted(data['tip'], key=lambda x: x.split()[1])
print('ETA IS 0230 which is 1430 local')
print()
for lst in data.values():
for line in lst:
print(line[1:]) #get rid of newline
Answered By - diggusbickus Answer Checked By - Katrina (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.