Skip to content

Instantly share code, notes, and snippets.

@jaggedsoft
Last active August 29, 2015 14:19
Show Gist options
  • Select an option

  • Save jaggedsoft/b55bc848aa4118174d49 to your computer and use it in GitHub Desktop.

Select an option

Save jaggedsoft/b55bc848aa4118174d49 to your computer and use it in GitHub Desktop.

Revisions

  1. @ishfuseini ishfuseini created this gist Jun 17, 2014.
    27 changes: 27 additions & 0 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,27 @@
    from scrapy.spider import Spider
    from scrapy.selector import Selector
    from yp.items import YpItem

    class YpSpider(Spider):
    name = "yp"
    allowed_domains = ["yellowpages.com"]
    start_urls = [
    "http://www.yellowpages.com/ft-worth-tx/churches?g=ft.%20worth%2C%20tx&q=churches"
    ]


    def parse(self, response):
    sel = Selector(response)
    divs = sel.xpath('//div[@id="main-content"]')
    items = []
    for span in divs.select('.//div[@class="info"]'):
    item = YpItem()
    item['name'] = divs.xpath('.//span[@itemprop="name"]/text()').extract()
    item['streetAddress'] = divs.xpath('.//span[@itemprop="streetAddress"]/text()').extract()
    item['addressCity'] = divs.xpath('.//span[@itemprop="addressLocality"]/text()').extract()
    item['addressState'] = divs.xpath('.//span[@itemprop="addressRegion"]/text()').extract()
    item['addressZip'] = divs.xpath('.//span[@itemprop="postalCode"]/text()').extract()
    item['phone'] = divs.xpath('.//li[@itemprop="telephone"]/text()').extract()
    items.append(item)
    return items