PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Monday, June 27, 2022

[FIXED] How can I make a graph of a website in python?

 June 27, 2022     graph, python, python-2.7     No comments   

Issue

I would like to make a python script which take as an input primary url for example: https://stackoverflow.com/ and then it goes recursivley over all the pages and make a directed graph of all the pages (nodes) of the site and its sub pages, which has an edged if and only if there is link in node a (page a) to node b (page b). I assume there is something like that but I didn't find in google... If there is any ideas, maybe using wget who something else I would love to hear.


Solution

I'll only give you pointers to what you'll need to build such a tool using basic python :

  • First you will need urllib to open urls .
  • Then, you can either use regexps or BeautifulSoup to find links in your pages. The first one is less CPU-expensive, but less precise, the latter is a fuzzy HTML parser (meaning that it accepts defective HTML)
  • You can then store your URLs in a list of urls to crawl, and for each link you find in a page, you will need to check if you already crawled it (to avoid infinite cycles), using a set
  • To build your graph, for each new page you crawl, it's a new node. Each link you find is a new edge.

Or, you can use scrapy, a python library made for crawling.



Answered By - Scharron
Answer Checked By - Timothy Miller (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing