How to crawl hotel names and urls from using Python and Selenium

You might be needing a list of all hotels in your city for any reason. Most of them can be found at (assuming it’s a city in Europe).

If you need hotel names, ratings and/or hotel url list from any city you can crawl booking for it. Coding it with Python and selenium is pretty easy. Below is the script that collects hotel names, hotel urls and ratings for city of Vienna. The list is finally saved to json file.

Go crazy with it…

#! /usr/bin/python
# coding: utf-8

__date__ ="$Jun 17, 2016 11:41:36 PM$"

# this script is used to collect basic hotel-info from
# hotel name, url, and user rating are extracted and saved to json file

from selenium import webdriver
import json

driver = webdriver.Firefox()
# output dict
hotelsDict = dict()

# pagination offset
booking_list_offset = 0
CITY_NAME = "Vienna"

for i in range (1,80):

    # just paste link with city entered. arrival and departure dates are not inserted

    hotelUrls = driver.find_elements_by_css_selector("a.hotel_name_link.url")
    hotelNames = driver.find_elements_by_css_selector("")
    hotelRatings = driver.find_elements_by_css_selector("span.average.js--hp-scorecard-scoreval")

    for hotelurl, hotelRating in zip(hotelUrls, hotelRatings):
        #get hotel name
        name = hotelurl.text
        # get url
        url = hotelurl.get_attribute("href").split("?")[0]
        # get rating
        rating = hotelRating.text
        print url, ",",name,",",rating
        # set up dictionary structure
        hotelsDict[url] = {}
        hotelsDict[url]["name"] = name
        hotelsDict[url]["rating"] = rating

    #increase offset
    booking_list_offset += 15

# save to json file
with open("crawlbooking-"+CITY_NAME+"-hotel-urls-ratings.json","w") as f:
