PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Tuesday, August 2, 2022

[FIXED] How to convert specific HTML Table with specific columns/values to Excel using JavaScript and Puppeteer?

 August 02, 2022     converters, html-table, javascript, puppeteer     No comments   

Issue

I am making a web project with puppeteer.

In one of the pages of the website i am running (i don't own it), out of 12 tables, there is a html table inside of a table, that inside the tbody and tr, there is two td elements. No classes and no IDs.

The first td is always the column and the second one is the value.
How can i convert this specific table into a line on an Excel spreadsheet or a .csv with JavaScript and Puppeteer?

Dave's answer below only works if you have the page's html and if the content. works, which is not working for me, even in an example.

Here is the table i am trying to convert. The real one has 29 columns.

<table style="color:Black;background-color:LightGrey;height:400px;width:898px;border: 1px solid grey">
        <tbody>
            <tr>
                <td>
                    <table>
                        <tbody>
                            <tr>
                                <td>Column1</td>
                                <td>Value1</td>
                            </tr>
                            <tr>
                                <td>Column2</td>
                                <td>Value2</td>
                            </tr>
                            <tr>
                                <td>Column3</td>
                                <td>Value3</td>
                            </tr>
                        </tbody>
                    </table>
                </td>
            </tr>
        </tbody>
    </table>

The .csv output should be:

Column1,Column2,Column3
Value1,Value2,Value3

Solution

This solution has been tested on your sample HTML page. You will have to craft the CSS selector to be specific enough on the actual page. And/or filter the resulting elementHandles:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://renatocfrancisco.github.io/getTableWithPuppeteer/');

  // Get array of elementHandles from page matching this CSS selector.
  const elements = await page.$$('table table td');

  const columns = [];
  const values  = [];

  // Process the elementHandles.
  let i = 0;
  for (const element of elements) {
    // Extract text from elementHandle.
    const text = await element.evaluate(el => el.textContent);
    if ((i++ % 2) == 0) {
        // Even elements are columns.
        columns.push(text);
    } else {
        // Odd elements are values.
        values.push(text);
    }
  };

  // Construct CSV string.
  console.log(`${columns.join(',')}\n${values.join(',')}`);

  await browser.close();
})();

Output:

$ node ./src/index.js 
column1,column2,column3
value1,value2,value3

update:

This is a variation that doesn't use Puppeteer; only fetch and Cheerio.js. It is much faster and uses less resources, so I recommend using this version when possible. (Your sample URL can be processed fine without Puppeteer.)

const cheerio = require('cheerio');

(async () => {
  const fetch = (...args) => import('node-fetch').then(({default: fetch}) => fetch(...args));
  const response = await fetch('https://renatocfrancisco.github.io/getTableWithPuppeteer/');
  const html = await response.text();
  const $ = cheerio.load(html);

  // Get array of elements from page matching this CSS selector.
  const elements = $('table table td');

  const columns = [];
  const values  = [];

  // Process the elementHandles.
  let i = 0;
  for (const element of elements) {
    // Extract text from elementHandle.
    const text = $(element).text();
    if ((i++ % 2) == 0) {
        // Even elements are columns.
        columns.push(text);
    } else {
        // Odd elements are values.
        values.push(text);
    }
  };

  // Construct CSV string.
  console.log(`${columns.join(',')}\n${values.join(',')}`);
})();

Output:

$ node ./src/index.js 
column1,column2,column3
value1,value2,value3


Answered By - Leftium
Answer Checked By - Katrina (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing