How to crawl hotel names and urls from booking.com using Python and Selenium – Bits & Notes

How to crawl hotel names and urls from booking.com using Python and Selenium

Crawling, Data Science, Python

Crawling, Python, Selenium

You might be needing a list of all hotels in your city for any reason. Most of them can be found at booking.com (assuming it’s a city in Europe).

If you need hotel names, ratings and/or hotel url list from any city you can crawl booking for it. Coding it with Python and selenium is pretty easy. Below is the script that collects hotel names, booking.com hotel urls and ratings for city of Vienna. The list is finally saved to json file.

Go crazy with it…

#! /usr/bin/python
# coding: utf-8

__author__=&quot;selfconstruct3d&quot;
__date__ =&quot;$Jun 17, 2016 11:41:36 PM$&quot;

# this script is used to collect basic hotel-info from booking.com
# hotel name, url, and user rating are extracted and saved to json file

from selenium import webdriver
import json

driver = webdriver.Firefox()
# output dict
hotelsDict = dict()

# pagination offset
booking_list_offset = 0
CITY_NAME = &quot;Vienna&quot;

for i in range (1,80):

    # just paste booking.com link with city entered. arrival and departure dates are not inserted
    driver.get('http://www.booking.com/searchresults.de.html?dcid=1&amp;label=gen173nr-1DCAEoggJCAlhYSDNiBW5vcmVmaBKIAQGYAQe4AQrIAQzYAQPoAQGoAgM&amp;lang=de&amp;sid=e8b897b588f56aa2e25913117df47bcc&amp;sb=1&amp;src=searchresults&amp;src_elem=sb&amp;error_url=http%3A%2F%2Fwww.booking.com%2Fsearchresults.de.html%3Flabel%3Dgen173nr-1DCAEoggJCAlhYSDNiBW5vcmVmaBKIAQGYAQe4AQrIAQzYAQPoAQGoAgM%3Bsid%3De8b897b588f56aa2e25913117df47bcc%3Bdcid%3D1%3Bclass_interval%3D1%3Bdest_id%3D-1746443%3Bdest_type%3Dcity%3Bgroup_adults%3D2%3Bgroup_children%3D0%3Bhlrd%3D0%3Blabel_click%3Dundef%3Bno_rooms%3D1%3Boffset%3D0%3Breview_score_group%3Dempty%3Broom1%3DA%252CA%3Bsb_price_type%3Dtotal%3Bscore_min%3D0%3Bsrc%3Dindex%3Bsrc_elem%3Dsb%3Bss%3DBerlin%252C%2520Berlin%2520%2528Bundesland%2529%252C%2520Deutschland%3Bss_raw%3Dber%3Bssb%3Dempty%26%3B&amp;ss=Wien%2C+Wien+%28Bundesland%29%2C+%C3%96sterreich&amp;ssne=Berlin&amp;ssne_untouched=Berlin&amp;city=-1746443&amp;room1=A%2CA&amp;no_rooms=1&amp;group_adults=2&amp;group_children=0&amp;ss_raw=wien&amp;ac_popular_badge=1&amp;ac_position=0&amp;ac_langcode=de&amp;dest_id=-1995499&amp;dest_type=city&amp;ac_pageview_id=d2db9ad66c2d0283&amp;ac_suggestion_list_length=5&amp;ac_suggestion_theme_list_length=1&amp;rows=15&amp;offset='+str(booking_list_offset))

    hotelUrls = driver.find_elements_by_css_selector(&quot;a.hotel_name_link.url&quot;)
    hotelNames = driver.find_elements_by_css_selector(&quot;span.sr-hotel__name&quot;)
    hotelRatings = driver.find_elements_by_css_selector(&quot;span.average.js--hp-scorecard-scoreval&quot;)

    for hotelurl, hotelRating in zip(hotelUrls, hotelRatings):
        #get hotel name
        name = hotelurl.text
        # get url
        url = hotelurl.get_attribute(&quot;href&quot;).split(&quot;?&quot;)[0]
        # get rating
        rating = hotelRating.text
        print url, &quot;,&quot;,name,&quot;,&quot;,rating
        # set up dictionary structure
        hotelsDict[url] = {}
        hotelsDict[url][&quot;name&quot;] = name
        hotelsDict[url][&quot;rating&quot;] = rating

    #increase offset
    booking_list_offset += 15

# save to json file
with open(&quot;crawlbooking-&quot;+CITY_NAME+&quot;-hotel-urls-ratings.json&quot;,&quot;w&quot;) as f:
    json.dump(hotelsDict,f)

One response to “How to crawl hotel names and urls from booking.com using Python and Selenium”

How to crawl hotel addresses from booking using Python and Selenium – Dzenan Hamzic

June 18, 2016 at 10:44 pm

[…] post continues on the last one. Assuming you have the hotel list with urls from booking you can now extract addresses for each […]

LikeLike

Reply

Leave a comment Cancel reply