You might be needing a list of all hotels in your city for any reason. Most of them can be found at booking.com (assuming it’s a city in Europe).
If you need hotel names, ratings and/or hotel url list from any city you can crawl booking for it. Coding it with Python and selenium is pretty easy. Below is the script that collects hotel names, booking.com hotel urls and ratings for city of Vienna. The list is finally saved to json file.
Go crazy with it…
#! /usr/bin/python # coding: utf-8 __author__="selfconstruct3d" __date__ ="$Jun 17, 2016 11:41:36 PM$" # this script is used to collect basic hotel-info from booking.com # hotel name, url, and user rating are extracted and saved to json file from selenium import webdriver import json driver = webdriver.Firefox() # output dict hotelsDict = dict() # pagination offset booking_list_offset = 0 CITY_NAME = "Vienna" for i in range (1,80): # just paste booking.com link with city entered. arrival and departure dates are not inserted driver.get('http://www.booking.com/searchresults.de.html?dcid=1&label=gen173nr-1DCAEoggJCAlhYSDNiBW5vcmVmaBKIAQGYAQe4AQrIAQzYAQPoAQGoAgM&lang=de&sid=e8b897b588f56aa2e25913117df47bcc&sb=1&src=searchresults&src_elem=sb&error_url=http%3A%2F%2Fwww.booking.com%2Fsearchresults.de.html%3Flabel%3Dgen173nr-1DCAEoggJCAlhYSDNiBW5vcmVmaBKIAQGYAQe4AQrIAQzYAQPoAQGoAgM%3Bsid%3De8b897b588f56aa2e25913117df47bcc%3Bdcid%3D1%3Bclass_interval%3D1%3Bdest_id%3D-1746443%3Bdest_type%3Dcity%3Bgroup_adults%3D2%3Bgroup_children%3D0%3Bhlrd%3D0%3Blabel_click%3Dundef%3Bno_rooms%3D1%3Boffset%3D0%3Breview_score_group%3Dempty%3Broom1%3DA%252CA%3Bsb_price_type%3Dtotal%3Bscore_min%3D0%3Bsrc%3Dindex%3Bsrc_elem%3Dsb%3Bss%3DBerlin%252C%2520Berlin%2520%2528Bundesland%2529%252C%2520Deutschland%3Bss_raw%3Dber%3Bssb%3Dempty%26%3B&ss=Wien%2C+Wien+%28Bundesland%29%2C+%C3%96sterreich&ssne=Berlin&ssne_untouched=Berlin&city=-1746443&room1=A%2CA&no_rooms=1&group_adults=2&group_children=0&ss_raw=wien&ac_popular_badge=1&ac_position=0&ac_langcode=de&dest_id=-1995499&dest_type=city&ac_pageview_id=d2db9ad66c2d0283&ac_suggestion_list_length=5&ac_suggestion_theme_list_length=1&rows=15&offset='+str(booking_list_offset)) hotelUrls = driver.find_elements_by_css_selector("a.hotel_name_link.url") hotelNames = driver.find_elements_by_css_selector("span.sr-hotel__name") hotelRatings = driver.find_elements_by_css_selector("span.average.js--hp-scorecard-scoreval") for hotelurl, hotelRating in zip(hotelUrls, hotelRatings): #get hotel name name = hotelurl.text # get url url = hotelurl.get_attribute("href").split("?")[0] # get rating rating = hotelRating.text print url, ",",name,",",rating # set up dictionary structure hotelsDict[url] = {} hotelsDict[url]["name"] = name hotelsDict[url]["rating"] = rating #increase offset booking_list_offset += 15 # save to json file with open("crawlbooking-"+CITY_NAME+"-hotel-urls-ratings.json","w") as f: json.dump(hotelsDict,f)
[…] post continues on the last one. Assuming you have the hotel list with urls from booking you can now extract addresses for each […]
LikeLike