Showing posts with label parsing. Show all posts

Tuesday, December 13, 2022

[FIXED] What is the double brace syntax in ASN.1?

December 13, 2022 asn.1, parsing, syntax No comments

Issue

I'm reading the PKCS #7 ASN.1 definition, and came across this type. I can't seem to find out what {{Authenticated}} is doing in this code, or what production this would be called. I've also seen as {{...}} in the PKCS #8 standard.

-- ATTRIBUTE information object class specification
ATTRIBUTE ::= CLASS {
  &derivation            ATTRIBUTE OPTIONAL,
  &Type                  OPTIONAL, -- either &Type or &derivation required
  &equality-match        MATCHING-RULE OPTIONAL,
  &ordering-match        MATCHING-RULE OPTIONAL,
  &substrings-match      MATCHING-RULE OPTIONAL,
  &single-valued         BOOLEAN DEFAULT FALSE,
  &collective            BOOLEAN DEFAULT FALSE,
  &dummy                 BOOLEAN DEFAULT FALSE,
  -- operational extensions
  &no-user-modification  BOOLEAN DEFAULT FALSE,
  &usage                 AttributeUsage DEFAULT userApplications,
  &id                    OBJECT IDENTIFIER UNIQUE
}
WITH SYNTAX {
  [SUBTYPE OF &derivation]
  [WITH SYNTAX &Type]
  [EQUALITY MATCHING RULE &equality-match]
  [ORDERING MATCHING RULE &ordering-match]
  [SUBSTRINGS MATCHING RULE &substrings-match]
  [SINGLE VALUE &single-valued]
  [COLLECTIVE &collective]
  [DUMMY &dummy]
  [NO USER MODIFICATION &no-user-modification]
  [USAGE &usage]
  ID &id
}


Authenticated ATTRIBUTE ::= {
  contentType |
  messageDigest |
-- begin added for VCE SCEP-support
  transactionID |
  messageType |
  pkiStatus |
  failInfo |
  senderNonce |
  recipientNonce,
-- end added for VCE SCEP-support
  ...,  -- add application-specific attributes here
  signingTime
}

SignerInfoAuthenticatedAttributes ::= CHOICE {
    aaSet         [0] IMPLICIT SET OF AttributePKCS-7 {{Authenticated}},
    aaSequence    [2] EXPLICIT SEQUENCE OF AttributePKCS-7 {{Authenticated}}
    -- Explicit because easier to compute digest on sequence of attributes and then reuse
    -- encoded sequence in aaSequence.
}

-- Also defined in X.501
-- Redeclared here as a parameterized type
AttributePKCS-7 { ATTRIBUTE:IOSet } ::= SEQUENCE {
   type    ATTRIBUTE.&id({IOSet}),
   values  SET SIZE (1..MAX) OF ATTRIBUTE.&Type({IOSet}{@type})
}

-- Inlined from PKCS5v2-0 since it is the only thing imported from that module
-- AlgorithmIdentifier { ALGORITHM-IDENTIFIER:InfoObjectSet } ::=
AlgorithmIdentifier { TYPE-IDENTIFIER:InfoObjectSet } ::=
SEQUENCE {
--  algorithm ALGORITHM-IDENTIFIER.&id({InfoObjectSet}),
  algorithm TYPE-IDENTIFIER.&id({InfoObjectSet}),
--  parameters ALGORITHM-IDENTIFIER.&Type({InfoObjectSet}
  parameters TYPE-IDENTIFIER.&Type({InfoObjectSet}
    {@algorithm}) OPTIONAL }

-- Private-key information syntax

PrivateKeyInfo ::= SEQUENCE {
  version Version,
--  privateKeyAlgorithm AlgorithmIdentifier {{PrivateKeyAlgorithms}},
  privateKeyAlgorithm AlgorithmIdentifier {{...}},
  privateKey PrivateKey,
  attributes [0] Attributes OPTIONAL }

Solution

There is no ASN.1 item called double brace. Each single brace (even when nested) is a separate token. Since the definition of AttributePKCS-7 is not given here, I am guessing that it is likely a parametrized definition that takes an Information Object Set as the parameter. The outer pair of braces would be an indication of parameter substitution while the inner pair of braces indicates that Authenticated is an Information Object Set (which is used as the parameter). The purpose of the information object set is to restrict the possible values of certain fields to those contained in the object set. You will need to look at the definition of AttributePKCS-7 to see what components are being restricted by the object set.

As for the {{...}}, this is similar to the above except that the object set is an empty extensible object set (indicated by the {...}) which is being used as a parameter (indicated by the outer pair of braces).

Answered By - Paul Thorpe

Answer Checked By - David Goodson (PHPFixing Volunteer)

[FIXED] How to produce a sequence of parallel XML elements (STag content ETag) using its grammar?

December 13, 2022 context-free-grammar, parsing, syntax, xml No comments

Issue

I refer to this link for the following grammar,

[1]  document      ::=      prolog element Misc*
[39] element       ::=      STag content ETag
[43] content       ::=      CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*

Obviously, we can produce elements (like, <p>hello world</p>) by decomposing

element to <p> content </p>, and then
content to hello world

But, what I am wondering is how to produce a sequence of parallel elements, like below,

<p>hello world</p>
<p>hello world</p>
<p>hello world</p>
<p>hello world</p>

It seems that we can only decompose the element in the grammar into nested elements, like below,

<p>
   <p>
       <p>hello world</p>
   </p>
</p>

From what I understand, in order to produce a sequence of parallel elements, we need to use a grammar like the following one,

document      ::=      prolog elements Misc*
elements      ::=      STag content ETag (STag content ETag)*
content       ::=      CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*

So, did I miss anything?

Solution

The linked grammar says that:

a document must have a single top-level element, and
an element (via content) can contain zero or more (child) elements.

So,

<p>hello world</p>
<p>hello world</p>

isn't a well-formed document, but

<something>
  <p>hello world</p>
  <p>hello world</p>
</something>

is a well-formed document.

Your suggested grammar would allow

<p>hello world</p>
<p>hello world</p>

as a document (well, not quite, because it doesn't allow the line-break between the two elements), but then you're not talking about XML documents any more.

Answered By - Michael Dyck

Answer Checked By - Pedro (PHPFixing Volunteer)

[FIXED] How does an empty regular expression evaluate?

December 12, 2022 grammar, parsing, regex, syntax No comments

Issue

For doing something like the following:

select regexp_matches('X', '');

Is a regular expression of an empty-string defined behavior? If so, how does it normally work?

In other words, which of the following is the base production (ignoring some of the advanced constructs such as repetition, grouping, etc.)?

regex
    : atom+
    ;

Or:

regex
    : atom*
    ;

As an example:

regex101 shows no match for all 7 flavors, but Postgres returns true on select regexp_matches('X', '');.

Solution

The empty regex, by definition, matches the empty string. In a substring match (which is what PostgreSQL's regex_match performs), the match always succeeds since the empty string is a substring of every string, including itself. So it's not a very useful query, but it should work with any regex implementation. (It might be more useful as a full string match, but string equality would also work and probably with less overhead.)

One aspect of empty matches which does vary between regex implementations is how they interact with the "global" (repeated application) flag or equivalent. Most regex engines will advance one character after a successful zero-length substring match, but there are exceptions. As a general rule, nullable regexes (including the empty regex) should not be used with a repeated application flag unless the result is explicitly documented by the regex library (and, for what it's worth, I couldn't find such documentation for PostgreSQL, but that doesn't mean that it doesn't exist somewhere).

Answered By - rici

Answer Checked By - Cary Denson (PHPFixing Admin)

[FIXED] how to parse windows inf files for python?

November 02, 2022 drivers, file, parsing, python, windows No comments

Issue

please help me. example inf file :

;=============================================================================
;
; Copyright (c) Intel Corporation (2002).
;
; INTEL MAKES NO WARRANTY OF ANY KIND REGARDING THE CODE.  THIS CODE IS
; LICENSED ON AN "AS IS" BASIS AND INTEL WILL NOT PROVIDE ANY SUPPORT,
; ASSISTANCE, INSTALLATION, TRAINING OR OTHER SERVICES.  INTEL DOES NOT
; PROVIDE ANY UPDATES, ENHANCEMENTS OR EXTENSIONS.  INTEL SPECIFICALLY
; DISCLAIMS ANY WARRANTY OF MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR ANY
; PARTICULAR PURPOSE, OR ANY OTHER WARRANTY.  Intel disclaims all liability,
; including liability for infringement of any proprietary rights, relating to
; use of the code. No license, express or implied, by estoppel or otherwise,
; to any intellectual property rights is granted herein.
;
;=============================================================================

; Installation inf for the Intel Corporation graphics adapter.

[Version]
Signature="$WINDOWS NT$"
Provider=%Intel%
ClassGUID={4D36E968-E325-11CE-BFC1-08002BE10318}
Class=Display
CatalogFile=i830mnt5.cat

DriverVer=08/20/2004,6.14.10.3889

[DestinationDirs]
DefaultDestDir   = 11
ialm.Miniport  = 12  ; drivers
ialm.Display   = 11  ; system32
Help.Copy = 11
CUI.Copy = 11
Uninstall_Copy = 11

OpenGL.Copy    = 11  ; OpenGL Drivers in System32

;
; Driver information
;

[Manufacturer]
%Intel%   = Intel.Mfg

[Intel.Mfg]
;830
;%i830M% = i830M, PCI\VEN_8086&DEV_3577
%i830M% = i830M, PCI\VEN_8086&DEV_3577&SUBSYS_00C81028
%i830M% = i830M, PCI\VEN_8086&DEV_3577&SUBSYS_01221028
%i830M% = i830M, PCI\VEN_8086&DEV_3577&SUBSYS_00B81028
%i830M% = i830M, PCI\VEN_8086&DEV_3577&SUBSYS_00B91028
%i830M% = i830M, PCI\VEN_8086&DEV_3577&SUBSYS_00F51028

;845
;%iBKDG% = i845G, PCI\VEN_8086&DEV_2562
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_013D1028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_01471028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_03011028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_013A1028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_01481028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_01381028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_01261028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_01271028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_01331028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_014B1028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_01601028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_01611028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_01291028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_01461028
%iBKDG% = i845G, PCI\VEN_8086&DEV_2562&SUBSYS_03031028

;845GM
%iBKDGM% = i845GM, PCI\VEN_8086&DEV_2562&SUBSYS_01491028

how to parse and formated "%iBKDGM% = i845GM, PCI\VEN_8086&DEV_2562&SUBSYS_01491028" this line ?

Solution

You may try the built-in ConfigParser

http://docs.python.org/library/configparser.html

As well as ConfigObj

http://code.google.com/p/configobj/

Both claim to be able to handle Windows INI files.

Answered By - NerdyNick

Answer Checked By - Willingham (PHPFixing Volunteer)

[FIXED] How do I detect end of file in Ruby?

October 31, 2022 csv, eof, parsing, ruby No comments

Issue

I wrote the following script to read a CSV file:

f = File.open("aFile.csv")
text = f.read
text.each_line do |line|
  if (f.eof?)
    puts "End of file reached"
  else
    line_num +=1
    if(line_num < 6) then
      puts "____SKIPPED LINE____"
      next
    end
  end

  arr = line.split(",")
  puts "line number  = #{line_num}" 
end

This code runs fine if I take out the line:

 if (f.eof?)
     puts "End of file reached"

With this line in I get an exception.

I was wondering how I can detect the end of file in the code above.

Solution

https://www.ruby-forum.com/topic/218093#946117 talks about this.

content = File.read("file.txt")
content = File.readlines("file.txt")

The above 'slurps' the entire file into memory.

File.foreach("file.txt") {|line| content << line}

You can also use IO#each_line. These last two options do not read the entire file into memory. The use of the block makes this automatically close your IO object as well. There are other ways as well, IO and File classes are pretty feature rich!

I refer to IO objects, as File is a subclass of IO. I tend to use IO when I don't really need the added methods from File class for the object.

In this way you don't need to deal with EOF, Ruby will for you.

Sometimes the best handling is not to, when you really don't need to.

Of course, Ruby has a method for this.

Answered By - vgoff

Answer Checked By - Pedro (PHPFixing Volunteer)

[FIXED] How to handle empty strings with `datenum`

October 28, 2022 date, is-empty, matlab, parsing No comments

Issue

I have a comma-separated text file I am reading in and parsing using textscan. Two of the fields are the date and time of day. I am able to convert both fields to fractional days using datenum, with the intention to sum the two resulting vectors.

My problem is that every so often one of the data messages includes the TIME field but not the DATE field. This is read in by textscan as an empty string. I have found that when datenum encounters the empty string, it returns an empty matrix rather than a NaN value or other filler value. This results in having vectors for TIME and DATE that are not the same length, and no obvious indicator of how to align the data.

How can I handle these empty strings in such a way that preserves the order of the data? Is there a way to get datenum to output a null value rather than simply ignoring the field? I would be fine with having a NaN or 0 or similar value to indicate the empty string. I would prefer to keep this vectorized if possible, but I understand a for loop may be necessary.

Solution

One easy way would be to use logical indexing to process only your valid dates, and initialize the empty ones to 0 in the output. For example, if you have your dates in a cell array C, you could use cellfun and isempty to get the index like so:

index = cellfun(@isempty, C);
out(index) = 0;  % Empty dates are 0 in output
out(~index) = datenum(C(~index), 'ddmmyy');

Alternatively, you could first replace your empty strings with '0/0/0', which will be converted to a 0 by datenum. For example:

C(cellfun(@isempty, C)) = {'0/0/0'};

However, this conversion doesn't work with your specific 'ddmmyy' format (i.e. datenum('000000', 'ddmmyy') doesn't ever return 0, even when specifying the PivotYear argument). The first option may be your best bet.

Answered By - gnovice

Answer Checked By - Marie Seifert (PHPFixing Admin)

[FIXED] How do I do a partial match in Elasticsearch?

October 25, 2022 elasticsearch, json, parsing, regex, url No comments

Issue

I have a link like http://drive.google.com and I want to match "google" out of the link.

I have:

query: {
    bool : {
        must: {
            match: { text: 'google'} 
        }
    }
}

But this only matches if the whole text is 'google' (case insensitive, so it also matches Google or GooGlE etc). How do I match for the 'google' inside of another string?

Solution

The point is that the ElasticSearch regex you are using requires a full string match:

Lucene’s patterns are always anchored. The pattern provided must match the entire string.

Thus, to match any character (but a newline), you can use .* pattern:

match: { text: '.*google.*'}
                ^^      ^^

In ES6+, use regexp insted of match:

"query": {
   "regexp": { "text": ".*google.*"} 
}

One more variation is for cases when your string can have newlines: match: { text: '(.|\n)*google(.|\n)*'}. This awful (.|\n)* is a must in ElasticSearch because this regex flavor does not allow any [\s\S] workarounds, nor any DOTALL/Singleline flags. "The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators."

However, if you do not plan to match any complicated patterns and need no word boundary checking, regex search for a mere substring is better performed with a mere wildcard search:

{
    "query": {
        "wildcard": {
            "text": {
                "value": "*google*",
                "boost": 1.0,
                "rewrite": "constant_score"
            }
        }
    }
}

See Wildcard search for more details.

NOTE: The wildcard pattern also needs to match the whole input string, thus

google* finds all strings starting with google
*google* finds all strings containing google
*google finds all strings ending with google

Also, bear in mind the only pair of special characters in wildcard patterns:

?, which matches any single character
*, which can match zero or more characters, including an empty one

Answered By - Wiktor Stribiżew

Answer Checked By - Katrina (PHPFixing Volunteer)

[FIXED] How can I read and parse CSV files in C++?

August 30, 2022 c++, csv, parsing, text No comments

Issue

I need to load and use CSV file data in C++. At this point it can really just be a comma-delimited parser (ie don't worry about escaping new lines and commas). The main need is a line-by-line parser that will return a vector for the next line each time the method is called.

I found this article which looks quite promising: http://www.boost.org/doc/libs/1_35_0/libs/spirit/example/fundamental/list_parser.cpp

I've never used Boost's Spirit, but am willing to try it. But only if there isn't a more straightforward solution I'm overlooking.

Solution

If you don't care about escaping comma and newline,
AND you can't embed comma and newline in quotes (If you can't escape then...)
then its only about three lines of code (OK 14 ->But its only 15 to read the whole file).

std::vector<std::string> getNextLineAndSplitIntoTokens(std::istream& str)
{
    std::vector<std::string>   result;
    std::string                line;
    std::getline(str,line);

    std::stringstream          lineStream(line);
    std::string                cell;

    while(std::getline(lineStream,cell, ','))
    {
        result.push_back(cell);
    }
    // This checks for a trailing comma with no data after it.
    if (!lineStream && cell.empty())
    {
        // If there was a trailing comma then add an empty element.
        result.push_back("");
    }
    return result;
}

I would just create a class representing a row.
Then stream into that object:

#include <iterator>
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <string>

class CSVRow
{
    public:
        std::string_view operator[](std::size_t index) const
        {
            return std::string_view(&m_line[m_data[index] + 1], m_data[index + 1] -  (m_data[index] + 1));
        }
        std::size_t size() const
        {
            return m_data.size() - 1;
        }
        void readNextRow(std::istream& str)
        {
            std::getline(str, m_line);

            m_data.clear();
            m_data.emplace_back(-1);
            std::string::size_type pos = 0;
            while((pos = m_line.find(',', pos)) != std::string::npos)
            {
                m_data.emplace_back(pos);
                ++pos;
            }
            // This checks for a trailing comma with no data after it.
            pos   = m_line.size();
            m_data.emplace_back(pos);
        }
    private:
        std::string         m_line;
        std::vector<int>    m_data;
};

std::istream& operator>>(std::istream& str, CSVRow& data)
{
    data.readNextRow(str);
    return str;
}   
int main()
{
    std::ifstream       file("plop.csv");

    CSVRow              row;
    while(file >> row)
    {
        std::cout << "4th Element(" << row[3] << ")\n";
    }
}

But with a little work we could technically create an iterator:

class CSVIterator
{   
    public:
        typedef std::input_iterator_tag     iterator_category;
        typedef CSVRow                      value_type;
        typedef std::size_t                 difference_type;
        typedef CSVRow*                     pointer;
        typedef CSVRow&                     reference;

        CSVIterator(std::istream& str)  :m_str(str.good()?&str:nullptr) { ++(*this); }
        CSVIterator()                   :m_str(nullptr) {}

        // Pre Increment
        CSVIterator& operator++()               {if (m_str) { if (!((*m_str) >> m_row)){m_str = nullptr;}}return *this;}
        // Post increment
        CSVIterator operator++(int)             {CSVIterator    tmp(*this);++(*this);return tmp;}
        CSVRow const& operator*()   const       {return m_row;}
        CSVRow const* operator->()  const       {return &m_row;}

        bool operator==(CSVIterator const& rhs) {return ((this == &rhs) || ((this->m_str == nullptr) && (rhs.m_str == nullptr)));}
        bool operator!=(CSVIterator const& rhs) {return !((*this) == rhs);}
    private:
        std::istream*       m_str;
        CSVRow              m_row;
};


int main()
{
    std::ifstream       file("plop.csv");

    for(CSVIterator loop(file); loop != CSVIterator(); ++loop)
    {
        std::cout << "4th Element(" << (*loop)[3] << ")\n";
    }
}

Now that we are in 2020 lets add a CSVRange object:

class CSVRange
{
    std::istream&   stream;
    public:
        CSVRange(std::istream& str)
            : stream(str)
        {}
        CSVIterator begin() const {return CSVIterator{stream};}
        CSVIterator end()   const {return CSVIterator{};}
};

int main()
{
    std::ifstream       file("plop.csv");

    for(auto& row: CSVRange(file))
    {
        std::cout << "4th Element(" << row[3] << ")\n";
    }
}

Answered By - Martin York

Answer Checked By - Candace Johnson (PHPFixing Volunteer)

[FIXED] where do the trailing commas come from (perl)

August 29, 2022 csv, parsing, perl, text-files No comments

Issue

Here is a perl script that takes a tab delimited output file and outputs three different text files, also tab delimited. Another user on SO helped me correct a mistake that created extra white-space at the end of each line in the output files. However, I wish to instead to output comma delimited text. When I substitute print $Afile join( ",", @ADD) , "\n"; instead of print $Afile join( "\t", @ADD) , "\n"; I get two trailing commas at the end of each line in the output files. Where are these coming from?

#!/usr/bin/perl
use strict; use warnings;

die "usage: [ imputed genotype.file ]\n" unless @ARGV == 1;

open my $Afile, ">$imputed" . "_ADD.txt" or die $!;
open my $Dfile, ">$imputed" . "_DOM.txt" or die $!;
open my $Ifile, ">$imputed" . "_IMP.txt" or die $!;

<>; #skip header
while(<>){ 
  chomp;
  my @entries = split( '\t', $_ );

  my @ADD = ();
  my @DOM = ();
  my @IMP = ();

  push( @ADD, $entries[ 0 ], $entries[ 1 ], $entries[ 2 ]);
  push( @DOM, $entries[ 0 ], $entries[ 1 ], $entries[ 2 ]);
  push( @IMP, $entries[ 0 ], $entries[ 1 ], $entries[ 2 ]);

  for ( my $i = 3; $i < scalar @entries - 1 ; $i+=3 ) { ### for each entry per line
      push( @ADD, $entries[ $i ] );
      push( @DOM, $entries[ $i + 1 ] );

  $entries[ $i + 2 ] =~ s/^NA$//; 

      push( @IMP, $entries[ $i + 2 ] );
  }

  print $Afile join( "\t", @ADD) , "\n"; 
  print $Dfile join( "\t", @DOM) , "\n"; 
  print $Ifile join( "\t", @IMP) , "\n"; 

} ### for loop   

close $Afile;
close $Dfile;
close $Ifile;

Solution

Since tabs are white space characters you do not see them with your current version but you already have trailing tabs. They are due to null elements in your arrays. You can filter them with grep though:

print $Afile join( ",", grep { $_ } @ADD) , "\n";

Answered By - perreal

Answer Checked By - Marilyn (PHPFixing Volunteer)

[FIXED] How to parse csv file in python to get this output?

August 28, 2022 csv, parsing, python, python-3.x No comments

Issue

I have a csv file which contains data like that

Sample csv

Name	Start	End
John	12:00	13:00
John	12:10	13:00
John	12:20	13:20
Tom	12:00	13:10
John	13:50	14:00
Jerry	14:00	14:30
Alice	15:00	16:00
Jerry	11:00	15:00

I need to find the average time taken by each people in python. How do i do that?

Sample output

Avg time taken by different people are :

John (60+50+60+10)/4 min Tom (70)/1 min Jerry (30+240)/2 min Alice (60)/1 min

I tried parsing the csv file by python csv

import datetime
import csv


with open('people.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(row['Start'],row['End'])

But i am unable to parse the column with the particular row name belongs to Jerry and find the difference in their time.

Also Need to find which Person took maximum time

Here in case Jerry took maximum time

Also need to perform merge operation

ex - john [12:00,13:00],[12:10,13:00],[12:20,13:20],[13:50,14:00]

output - [12:00,13:20],[13:50,14:00]

Any help will be appreciated.

Solution

Here's another method without using pandas -

from datetime import datetime, timedelta

with open("data.csv", "r") as f:
    f = csv.DictReader(f)
    data = [row for row in f]

diffs = {list(row.values())[0]: [] for row in data}
for row in data:
    vals = list(row.values())
    diffs[vals[0]].append(datetime.strptime(vals[2], "%H:%M") - datetime.strptime(vals[1], "%H:%M"))

diffs_avg = [str(timedelta(seconds = sum(map(timedelta.total_seconds, times)) / len(times))) for times in diffs.values()]
dict(zip(diffs.keys(), diffs_avg))

Output -

{'Alice': '1:00:00', 'Jerry': '2:15:00', 'John': '0:45:00', 'Tom': '1:10:00'}

Answered By - Zero

Answer Checked By - Mildred Charles (PHPFixing Admin)

[FIXED] How do you dynamically identify unknown delimiters in a data file?

August 27, 2022 csv, parsing, python, text-files, textinput No comments

Issue

I have three input data files. Each uses a different delimiter for the data contained therein. Data file one looks like this:

apples | bananas | oranges | grapes

data file two looks like this:

quarter, dime, nickel, penny

data file three looks like this:

horse cow pig chicken goat

(the change in the number of columns is also intentional)

The thought I had was to count the number of non-alpha characters, and presume that the highest count was the separator character. However, the files with non-space separators also have spaces before and after the separators, so the spaces win on all three files. Here's my code:

def count_chars(s):
    valid_seps=[' ','|',',',';','\t']
    cnt = {}
    for c in s:
        if c in valid_seps: cnt[c] = cnt.get(c,0) + 1
    return cnt

infile = 'pipe.txt' #or 'comma.txt' or 'space.txt'
records = open(infile,'r').read()
print count_chars(records)

It will print a dictionary with the counts of all the acceptable characters. In each case, the space always wins, so I can't rely on that to tell me what the separator is.

But I can't think of a better way to do this.

Any suggestions?

Solution

If you're using python, I'd suggest just calling re.split on the line with all valid expected separators:

>>> l = "big long list of space separated words"
>>> re.split(r'[ ,|;"]+', l)
['big', 'long', 'list', 'of', 'space', 'separated', 'words']

The only issue would be if one of the files used a separator as part of the data.

If you must identify the separator, your best bet is to count everything excluding spaces. If there are almost no occurrences, then it's probably space, otherwise, it's the max of the mapped characters.

Unfortunately, there's really no way to be sure. You may have space separated data filled with commas, or you may have | separated data filled with semicolons. It may not always work.

Answered By - JoshD

Answer Checked By - Timothy Miller (PHPFixing Admin)

[FIXED] How would I parse JSON in Ruby to give me specific output

August 18, 2022 httparty, json, output, parsing, ruby No comments

Issue

So I'm trying to make in Ruby so that it "Idk how to express myself" parse JSON from this API so the output is:

{"infected"=>19334, "deceased"=>429, "recovered"=>14047, "tested"=>515395, "tested24hours"=>8393, "infected24hours"=>351, "deceased24hours"=>11, "sourceUrl"=>"https://covid19.rs/homepage-english/", "lastUpdatedAtApify"=>"2020-07-15T14:00:00.000Z", "readMe"=>"https://apify.com/krakorj/covid-serbia"}

and I would like it to only show for example "infected"=>19334, as number 19334

I'm new to Ruby programming I'm still learning it since it's COVID pandemic and lockdown I have more of free time and it kinda makes sense to make something related to it.

This is what I've done so far:

require 'httparty'
require 'json'

url = 'https://api.apify.com/v2/key-value-stores/aHENGKUPUhKlX97aL/records/LATEST?disableRedirect=true'
response = HTTParty.get(url)
re = response.parsed_response
puts re

Solution

Sure, you do it like this:

re["infected"]
 => 19334

HTTPParty parsed response returns a hash, so you access the value by the key. If the key is a "string", you access using a "string". If they key is a :symbol, you access with a :symbol.

Answered By - benjessop

Answer Checked By - Candace Johnson (PHPFixing Volunteer)

[FIXED] How do I take a JSON string and select a random entry and get the variables from that entry

July 26, 2022 c#, json, parsing No comments

Issue

I have the following json..

{
    "Followers": [{
        "ID": 0,
        "Username": "nutty",
        "Game": "Just Chatting",
        "Viewers": 200,
        "Image": "https://static-cdn.jtvnw.net/previews-ttv/live_user_nutty-1920x1080.jpg"
    }, {
        "ID": 1,
        "Username": "CloneKorp",
        "Game": "Software and Game Development",
        "Viewers": 31,
        "Image": "https://static-cdn.jtvnw.net/previews-ttv/live_user_clonekorp-1920x1080.jpg"
    }, {
        "ID": 2,
        "Username": "kingswarrior9953",
        "Game": "Art",
        "Viewers": 1,
        "Image": "https://static-cdn.jtvnw.net/previews-ttv/live_user_kingswarrior9953-1920x1080.jpg"
    }]
}

I'd like to do something like..

JObject data = JObject.Parse(json);
int SelectedViewers = data["Followers"][1]["Viewers"];

Where it would grab the second entry (the ID of 1 entry) and set the variable of "Viewers" to 31. The number would be a random number based on the count of all the entries, but I'm not to that point yet.

However, this doesn't seem to work. Any ideas on what is broken here?

Solution

You are missing casting here:

int SelectedViewers = Int32.Parse((string)data["Followers"][1]["Viewers"]);

The above should work.

Answered By - Niraj Bihani

Answer Checked By - Terry (PHPFixing Volunteer)

[FIXED] How to decode JSON in Flutter?

July 25, 2022 dart, decode, flutter, json, parsing No comments

Issue

How to decode JSON in Flutter?

The question is simple, but the answer isn't, at least for me.

I have a project that uses a lot of JSON Strings. Basically, the entire communication between the app and the server is through JSON.

I have been using JSON.decode(json_string) to deal with it, but today I updated the Flutter core (0.5.8-pre.178) and JSON.decode isn't available anymore.

I went to the Flutter Docs to seek help, but it still says to use JSON.decode.

So, how to decode JSON in Flutter from now on?

Solution

Just use

json.decode()

jsonDecode()

In Dart 2 all screaming-case constants were changed to lower-camel-case.

Ensure to import 'dart:convert';

Answered By - Günter Zöchbauer

Answer Checked By - Gilberto Lyons (PHPFixing Admin)

[FIXED] How to parse data in JSON format

July 25, 2022 json, parsing, python No comments

Issue

My project is currently receiving a JSON message in python which I need to get bits of information out of. For the purposes of this, let's set it to some simple JSON in a string:

jsonStr = '{"one" : "1", "two" : "2", "three" : "3"}'

So far I've been generating JSON requests using a list and then json.dumps, but to do the opposite of this I think I need to use json.loads. However I haven't had much luck with it. Could anyone provide me a snippet that would return "2" with the input of "two" in the above example?

Solution

Very simple:

import json
data = json.loads('{"one" : "1", "two" : "2", "three" : "3"}')
print data['two']  # Or `print(data['two'])` in Python 3

Answered By - John Giotta

Answer Checked By - Gilberto Lyons (PHPFixing Admin)

[FIXED] How to read a text file and parse it to JSON format

July 24, 2022 json, parsing, python, python-3.x, text No comments

Issue

I have a text file with the following format.

Order|AA|BB|CC|DD
2|status1|Cdd.int|true|false
12|status2|Cdd.String|true|false
1|status3|Cdd.Float|false|true

I would like to read this text file
I would like to append also metadata with an empty value
I am only interested in Order, AA, BB, and CC with sort by order and then parse it into JSON format as follows.

The expected output looks like the following.

{
 "fields": [
 {
    "metadata": {},
    "name" : "status3",
    "type" : "Float",
    "nullable" : false
},
{
    "metadata": {},
    "name" : "status1",
    "type" : "int",
    "nullable" : true
},
{
    "metadata": {},
    "name" : "status2",
    "type" : "String",
    "nullable" : true
}
],
'type':'struct'
}

Can anyone help with this?

Solution

Assuming we have the data stored in a txt file file.txt as follows:

Order|AA|BB|CC|DD
2|status1|Cdd.int|true|false
12|status2|Cdd.String|true|false
1|status3|Cdd.Float|false|true

The following code does what you need (explained in the comments of the code itself):

import pandas as pd
import json

#read csv
df = pd.read_csv("file.txt", sep = "|")

headers = {"AA": "name", 
           "BB": "type",
           "CC": "nullable"
}

#Drop columns which are not in headers dict
df.drop([c for c in df.columns if c not in headers.keys()], inplace=True, axis=1)

#Rename columns based on headers dict
df.rename(columns = headers, inplace = True)

#Format columns
df["type"] = df["type"].str.split(".").str[1].str.lower()

#Build your final dict
output = {"fields": [], "type": "struct"}
for n, row in df.iterrows():
    data_dict = {"metadata": {}}
    data_dict.update(row.to_dict())
    output["fields"].append(data_dict)

#Save json
with open("output.json", "w") as f:
    json.dump(output, f, indent = 4)

The output json (output.json) is as follows:

{
    "fields": [
        {
            "metadata": {},
            "name": "status1",
            "type": "int",
            "nullable": true
        },
        {
            "metadata": {},
            "name": "status2",
            "type": "string",
            "nullable": true
        },
        {
            "metadata": {},
            "name": "status3",
            "type": "float",
            "nullable": false
        }
    ],
    "type": "struct"
}

Hope it helps!

Answered By - Álvaro Cuartero Montilla

Answer Checked By - Marilyn (PHPFixing Volunteer)

[FIXED] How to parse this JSON file in Snowflake?

July 23, 2022 database, json, parsing, snowflake-cloud-data-platform, sql No comments

Issue

So I have a column in a Snowflake table that stores JSON data but the column is of a varchar data type.

The JSON looks like this:

{   
    "FLAGS": [],   
    "BANNERS": {},   
    "TOOLS": {     
            "game.appConfig": {       
              "type": [         
                "small",       
                 "normal",        
                  "huge"
              ],      
              "flow": [         
                "control",       
                "noncontrol"
            ]   
        }  
    },   
    "PLATFORM": {} 
}

I want to filter only the data inside TOOLS and want to get the following result:

TOOLS_ID	TOOLS
game.appConfig	type
game.appConfig	flow

How can I achieve this?

Solution

I assumed that the TOOLs can have more than one tool ID, so I wrote this query:

with mydata as ( select
'{
    "FLAGS": [],   
    "BANNERS": {},   
    "TOOLS": {     
            "game.appConfig": {       
              "type": [         
                "small",       
                 "normal",        
                  "huge"
              ],      
              "flow": [         
                "control",       
                "noncontrol"
            ]   
        }  
    },   
    "PLATFORM": {} 
}' as v1 )
select main.KEY TOOLS_ID, sub.KEY TOOLS
from mydata,
lateral flatten ( parse_json(v1):"TOOLS" ) main,
lateral flatten ( main.VALUE ) sub;

+----------------+-------+
|    TOOLS_ID    | TOOLS |
+----------------+-------+
| game.appConfig | flow  |
| game.appConfig | type  |
+----------------+-------+

Answered By - Gokhan Atil

Answer Checked By - Timothy Miller (PHPFixing Admin)

[FIXED] Which is the fastest way to convert an integer to a byte array in Julia

July 20, 2022 arrays, hex, integer, julia, parsing No comments

Issue

Question 1:Which is the fastest way to convert an integer to byte array?

a = 1026
aHexStr = string(a,base = 16,pad = 4) #2 bytes, 4 chars
b = zeros(UInt8,2)
k = 1
for i in 1:2:4
  b[k] = parse(UInt8,aHexStr[i:i+1],base = 16)
  k += 1
end

Is this method the fastest?

Related Question 2: Which is the fastest way to convert a hexadecimal string to byte array?

I have a string of hexadecimal numbers

a = "ABCDEF12345678"

How can I convert this hex string to byte array?

b = zeros(UInt8,7)
k = 1
for i in 1:2:14
  b[k] = parse(UInt8,a[i:i+1],base = 16)
  k += 1
end

Is this method the fastest?

Solution

For the first operation I assume that you want to keep only as many bytes as there are set in your integer, so you could do:

julia> a = 1026
1026

julia> [(a>>((i-1)<<3))%UInt8 for i in 1:sizeof(a)-leading_zeros(a)>>3]
2-element Vector{UInt8}:
 0x02
 0x04

Explanation:

leading_zeros(a) get number of zero bits that a starts with
leading_zeros(a)>>3 compute number of bytes that are fully empty (>>3 is shifitng the number by 3 bits right; in this case floor division by 8)
sizeof(a)-leading_zeros(a)>>3 compute number of bytes that are to be converted
(i-1)<<3) compute number of bits we need to shift the index (in this case it is i-1 times 8)
(a>>((i-1)<<3))%UInt8 get the i-1th byte of a

For the second operation I assume that if you have an odd number of characters we do fill the remaining part of the last byte with 0 bits + that we do not need to check if the passed data is valid:

julia> a = "ABCDEF12345678"
"ABCDEF12345678"

julia> function s2b(a::String)
           b = zeros(UInt8, (sizeof(a) + 1) >> 1)
           for (i, c) in enumerate(codeunits(a))
               b[(i+1)>>1] |= (c - (c < 0x40 ? 0x30 : 0x37))<<(isodd(i)<<2)
           end
           return b
       end
s2b (generic function with 1 method)

julia> s2b(a)
7-element Vector{UInt8}:
 0xab
 0xcd
 0xef
 0x12
 0x34
 0x56
 0x78

Both methods should be fast, but it is hard to guarantee they are fastest possible.

EDIT

Benchmarks:

julia> function f1(a)
           aHexStr = string(a,base = 16,pad = 4) #2 bytes, 4 chars
           b = zeros(UInt8,2)
               k = 1
           for i in 1:2:4
               b[k] = parse(UInt8,aHexStr[i:i+1],base = 16)
               k += 1
           end
           return b
       end
f1 (generic function with 1 method)

julia> f2(a) = [(a>>((i-1)<<3))%UInt8 for i in 1:sizeof(a)-leading_zeros(a)>>3]
f2 (generic function with 1 method)

julia> using BenchmarkTools

julia> a = 1026
1026

julia> @btime f1($a)
  141.795 ns (5 allocations: 224 bytes)
2-element Vector{UInt8}:
 0x04
 0x02

julia> @btime f2($a)
  29.317 ns (1 allocation: 64 bytes)
2-element Vector{UInt8}:
 0x02
 0x04

julia> function s2b(a::String)
           b = zeros(UInt8, (sizeof(a) + 1) >> 1)
           for (i, c) in enumerate(codeunits(a))
               b[(i+1)>>1] |= (c - (c < 0x40 ? 0x30 : 0x37))<<(isodd(i)<<2)
           end
           return b
       end
s2b (generic function with 1 method)

julia> a = "ABCDEF12345678"
"ABCDEF12345678"

julia> @btime hex2bytes($a)
  50.000 ns (1 allocation: 64 bytes)
7-element Vector{UInt8}:
 0xab
 0xcd
 0xef
 0x12
 0x34
 0x56
 0x78

julia> @btime s2b($a)
  48.830 ns (1 allocation: 64 bytes)
7-element Vector{UInt8}:
 0xab
 0xcd
 0xef
 0x12
 0x34
 0x56
 0x78

As @SundarR commented in the latter case hex2bytes should be used. I have forgotten that it exists.

Answered By - Bogumił Kamiński

Answer Checked By - Clifford M. (PHPFixing Volunteer)

[FIXED] How do I parse a string to a float or int?

July 19, 2022 floating-point, integer, parsing, python, type-conversion No comments

Issue

How can I convert a str to float?
```
"545.2222"  →  545.2222
```
How can I convert a str to int?
```
"31"        →  31
```

Solution

>>> a = "545.2222"
>>> float(a)
545.22220000000004
>>> int(float(a))
545

Answered By - Harley Holcombe

Answer Checked By - Gilberto Lyons (PHPFixing Admin)

[FIXED] Why would R use the "L" suffix to denote an integer?

July 19, 2022 integer, parsing, r, semantics No comments

Issue

In R we all know it is convenient for those times we want to ensure we are dealing with an integer to specify it using the "L" suffix like this:

1L
# [1] 1

If we don't explicitly tell R we want an integer it will assume we meant to use a numeric data type...

str( 1 * 1 )
# num 1
str( 1L * 1L )
# int 1

Why is "L" the preferred suffix, why not "I" for instance? Is there a historical reason?

In addition, why does R allow me to do (with warnings):

str(1.0L)
# int 1
# Warning message:
# integer literal 1.0L contains unnecessary decimal point

But not..

str(1.1L)
# num 1.1
#Warning message:
#integer literal 1.1L contains decimal; using numeric value

I'd expect both to either return an error.

Solution

Why is "L" used as a suffix?

I've never seen it written down, but I theorise in short for two reasons:

Because R handles complex numbers which may be specified using the suffix "i" and this would be too simillar to "I"
Because R's integers are 32-bit long integers and "L" therefore appears to be sensible shorthand for referring to this data type.

The value a long integer can take depends on the word size. R does not natively support integers with a word length of 64-bits. Integers in R have a word length of 32 bits and are signed and therefore have a range of −2,147,483,648 to 2,147,483,647. Larger values are stored as double.

This wiki page has more information on common data types, their conventional names and ranges.

And also from ?integer

Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.

Why do 1.0L and 1.1L return different types?

The reason that 1.0L and 1.1L will return different data types is because returning an integer for 1.1 will result in loss of information, whilst for 1.0 it will not (but you might want to know you no longer have a floating point numeric). Buried deep with the lexical analyser (/src/main/gram.c:4463-4485) is this code (part of the function NumericValue()) which actually creates a int data type from a double input that is suffixed by an ascii "L":

/* Make certain that things are okay. */
if(c == 'L') {
double a = R_atof(yytext);
int b = (int) a;
/* We are asked to create an integer via the L, so we check that the
   double and int values are the same. If not, this is a problem and we
   will not lose information and so use the numeric value.
*/
if(a != (double) b) {
    if(GenerateCode) {
    if(seendot == 1 && seenexp == 0)
        warning(_("integer literal %s contains decimal; using numeric value"), yytext);
    else {
        /* hide the L for the warning message */
        *(yyp-2) = '\0';
        warning(_("non-integer value %s qualified with L; using numeric value"), yytext);
        *(yyp-2) = (char)c;
    }
    }
    asNumeric = 1;
    seenexp = 1;
}
}

Answered By - Simon O'Hanlon

Answer Checked By - Terry (PHPFixing Volunteer)

Tuesday, December 13, 2022

Issue

Solution

Issue

Solution

Monday, December 12, 2022

Issue

Solution

Wednesday, November 2, 2022

Issue

Solution

Monday, October 31, 2022

Issue

Solution

Friday, October 28, 2022

Issue

Solution

Tuesday, October 25, 2022

Issue

Solution

Tuesday, August 30, 2022

Issue

Solution

Monday, August 29, 2022

Issue

Solution

Sunday, August 28, 2022

Issue

Solution

Saturday, August 27, 2022

Issue

Solution

Thursday, August 18, 2022

Issue

Solution

Tuesday, July 26, 2022

Issue

Solution

Monday, July 25, 2022

Issue

Solution

Issue

Solution

Sunday, July 24, 2022

Issue

Solution

Saturday, July 23, 2022

Issue

Solution

Wednesday, July 20, 2022

Issue

Solution

EDIT

Tuesday, July 19, 2022

Issue

Solution

Issue

Solution

Why is "L" used as a suffix?

Why do 1.0L and 1.1L return different types?

Total Pageviews

Featured Post

Subscribe To