Saturday, January 21, 2017

BFS (Boy Friend Search)....I am kidding its --Breadth First Search--


BFS is an graph searching algorithm.


Graph????????

  • Has nodes (vertices)
  • Has Edges (links) -can have a value(weight)


This is an example, Numbers show nodes and links connect
them.

Graphs can be directed or undirected.

picture shows an undirected graph. you can easily identify it... no arrows in links lads......

In directed graph ,,,
you can guess - can be traveled among nodes only in the indicated direction



My sample code for the BFS in python




to_visit = [0]
parent = {}
level4 = []
nodes = [0,1,2,3,4,5]
i = 0
while to_visit != [] and i <= len(to_visit)-1:
    front = to_visit[i]
    #del to_visit[0]
    for j in nodes:
        
        if j not in to_visit and adj[front][j] == 1:
            to_visit.append(j)
            parent[j]=front
    print to_visit,parent
    i+=1


  • BFS can be used to get the shortest path in the graph.
  • parent nodes of each node indicate the path


















Friday, January 20, 2017

BeautifulSoup4 extremely beginner guide





First you have to,


  •  Install beautifulsoup4 and requests libraries.

pip install beautifulsoup4
pip install requests

  • import these libraries to script
from bs4 import BeautifulSoup
import requests


Now the Fun Part begins,



  • There are few step before using we should do
  • by using requests we retrieve data from a specific URL using GET Request, and the response is stored in r variable. We use requests get method for it.
r=requests.get(url)


  • Then the content in specific response (by the way it is html content) used to create beautifulsoup  soup object. 

soup=BeautifulSoup(r.content,"html.parser")

html parser is optional,you know that everything in r.content is html right!!


  • Actually to do an any web scrape you would only need to know 3 keyword and that's  all.Rest is up up you.
  • Here are they,
    • findAll() function
    • .contents
    • .text

  • findAll() function 
soupObject.findAll("element",{"property":'name'}[optional])

this returns all the html content  having these properties in a List form.


  • .contents
item.contents

converts immediate child elements inside html element to a list form.


  • .text
get the text (content visible to you in website) in html elements without any html elements.


This is an sample example for web scraping and store data in an excel sheet.



from bs4 import BeautifulSoup
import requests




def getSoupObject(url="http://www.list.com/search/home.html"):

    
    r=requests.get(url)
    soup=BeautifulSoup(r.content,"html.parser")
    return  soup


def getDataFromPage(soupObj):

    lst=[]

    divCont= soupObj.findAll("div",{"class":'list_l_box'})

    for item in divCont:
        itemList=item.contents

        if itemList[1].findAll('img',{"alt":"No image"}) !=[]:
            continue
        else:
            companyDataList=itemList[3].contents

            if companyDataList[5].findAll("span",{"itemprop":"telephone"})!=[]:

                companyName=companyDataList[1].text
                companyTele=companyDataList[5].findAll("span",{"itemprop":"telephone"})[0].text
                temp=[]
                temp.append(companyName)
                temp.append(companyTele)
                

                lst.append(temp)
    return lst