Scraping the sites with python

Retrive the url in a website

The code below is used to retrieve links from a webpage.. Note that the script does't work on https protocols.... If you need more advanced scraper of crawler you can contact us below.

If you want to contribute in building this website contact us.

We building this website to make the next generation hackers with great skills as the technology climbs.

Help us build this community, cause we need the hackers and Geeks and you gonna figure out why we need them during this gurney in becoming a great HACKER.

# SCRAP THE HTTP WEBSITES

 

from bs4 import BeautifulSoup

from urllib.request import urlopen

import sys

links = []

html = urlopen(sys.argv[1])

bsobj = BeautifulSoup(html,'html5lib')

for link in bsobj.findAll('a'):

    if link.attrs['href'] not in links:

        links.append(link.attrs['href'])

links.sort

i = 1

for links in links:

    print(i, ' ',links)

    i += 1

 
follow and like us on Facebook

Leave a Reply