Issue
I am making a web project with puppeteer.
In one of the pages of the website i am running (i don't own it), out of 12 tables, there is a html table inside of a table, that inside the tbody
and tr
, there is two td
elements. No classes and no IDs.
The first td
is always the column and the second one is the value.
How can i convert this specific table into a line on an Excel spreadsheet or a .csv with JavaScript and Puppeteer?
Dave's answer below only works if you have the page's html and if the content. works, which is not working for me, even in an example.
Here is the table i am trying to convert. The real one has 29 columns.
<table style="color:Black;background-color:LightGrey;height:400px;width:898px;border: 1px solid grey">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Column1</td>
<td>Value1</td>
</tr>
<tr>
<td>Column2</td>
<td>Value2</td>
</tr>
<tr>
<td>Column3</td>
<td>Value3</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
The .csv output should be:
Column1,Column2,Column3
Value1,Value2,Value3
Solution
This solution has been tested on your sample HTML page. You will have to craft the CSS selector to be specific enough on the actual page. And/or filter the resulting elementHandles
:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://renatocfrancisco.github.io/getTableWithPuppeteer/');
// Get array of elementHandles from page matching this CSS selector.
const elements = await page.$$('table table td');
const columns = [];
const values = [];
// Process the elementHandles.
let i = 0;
for (const element of elements) {
// Extract text from elementHandle.
const text = await element.evaluate(el => el.textContent);
if ((i++ % 2) == 0) {
// Even elements are columns.
columns.push(text);
} else {
// Odd elements are values.
values.push(text);
}
};
// Construct CSV string.
console.log(`${columns.join(',')}\n${values.join(',')}`);
await browser.close();
})();
Output:
$ node ./src/index.js column1,column2,column3 value1,value2,value3
update:
This is a variation that doesn't use Puppeteer; only fetch and Cheerio.js. It is much faster and uses less resources, so I recommend using this version when possible. (Your sample URL can be processed fine without Puppeteer.)
const cheerio = require('cheerio');
(async () => {
const fetch = (...args) => import('node-fetch').then(({default: fetch}) => fetch(...args));
const response = await fetch('https://renatocfrancisco.github.io/getTableWithPuppeteer/');
const html = await response.text();
const $ = cheerio.load(html);
// Get array of elements from page matching this CSS selector.
const elements = $('table table td');
const columns = [];
const values = [];
// Process the elementHandles.
let i = 0;
for (const element of elements) {
// Extract text from elementHandle.
const text = $(element).text();
if ((i++ % 2) == 0) {
// Even elements are columns.
columns.push(text);
} else {
// Odd elements are values.
values.push(text);
}
};
// Construct CSV string.
console.log(`${columns.join(',')}\n${values.join(',')}`);
})();
Output:
$ node ./src/index.js column1,column2,column3 value1,value2,value3
Answered By - Leftium Answer Checked By - Katrina (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.