PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Friday, April 15, 2022

[FIXED] how do I get multiple a tag href links (in form of an array) from inside an iframe (puppeteer)?

 April 15, 2022     iframe, javascript, node.js, puppeteer, web-scraping     No comments   

Issue

Pretty new to coding, which is why this question might be easily answered, but after scanning the internet for 2 days now and still not having a real solution, i thought I'd just ask here.

So, as the title explains, I have an iframe inside a website I want to scrape with an id attribute (we'll just call it iframeid) and somewhere inside this iframe I have a div container with a class attribute (we'll call it divclass) that contains - besides other elements - multiple <a> tags. My goal is to get an array in which all the links from those <a> tags are listed, put to date I only achieved the following through researching and a bit of luck:

const elementHandle = await page.waitForSelector('iframe#iframeid');
const frame = await elementHandle.contentFrame();
await frame.waitForSelector('div[class=divclass] a');
var x = 2; //a var to determine which a tag I want
const oneA= await frame.$('div[class=entryLayer] a:nth-child(' + x + ')');
const link = await (await oneA.getProperty('href'))._remoteObject.value;
console.log(link);

What it does is it takes a variable and pulls the link of its according <a> tag, but I can't figure out how to put it in a loop and besides that, the amount of <a> tags varies, which makes the loop for me to code even harder.

Wouldn't it even be possible to leave out the loop completely? I found similar stackoverflow questions, but one for example only had one <a> tag which seems to change the code completely.

At the end I just want a working piece of code that I as a newbie can understand but is fairly compact at the same time. Thanks for helping in advance!

EDIT

My solution with the help of a comment:

const elementHandle = await page.waitForSelector('iframe#iframeid');
const frame = await elementHandle.contentFrame();
const thisDiv = await frame.waitForSelector('div[class=divclass]');
const xpath_expression = '//a[@href]';
await page.waitForXPath(xpath_expression);
const links = await thisDiv.$x(xpath_expression);
const link_urls = await thisDiv.evaluate((...links) => {
    return links.map(e => e.href);
}, ...links);
console.log(link_urls);

It does pull out some strange other links though, but I'm just going to filter them out normally.


Solution

As I know in every iframe can be treated as different page. Here is the reference I used for the same type of task https://stackoverflow.com/a/54940865/17755263



Answered By - Rohit Nehra
Answer Checked By - Willingham (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing