## Anonymous Web Scraping with Python Selenium PhantomJS Xpath and TOR

Using pure selenium methods to extract data from metasearch websites is quite tricky and uses a lot of CPU resources. I have spent significant amount of time with selenium built-in methods with python and have a feeling that the development is quite tedious, time consuming and prone to bugs. Selenium’s WebElement objects are not flexible, ...

## Searching neighbors in Graph Data Structure by matrix multiplication

Which vertices in graph can be reached in 1, 2 or N hops? This can basically be implemented in two ways. First, if the graph is implemented as adjacency matrix, by checking connections (ones in double array or a row) and iterating further. Second, if the graph structure is implemented as ArrayList, by checking lists of ...

## Java MapReduce for top N Twitter Hashtags

Back in 2015 i had to implement MapReduce job to extract top 15 hashtags from twitter’s raw data in Hadoop. This was a part of Business Intelligence lecture exercise at Vienna University of Technology. Regex used (not the best one by my opinion), for hashtag extraction, has following format: String regex = “text\”:\\\”(.*)\\\”,\”source”; The ‘source’ field comes ...

## Black Jack in Java

Back in 2009 at the University we had to implement a game for mobile devices in J2ME. We chose Black Jack and I did the business logic. The logic consists of 3 packages. The bl package implements interfaces from the interfaces package and the executable is in the game package. So here is the code … Package: game ...

## Linear Regression with multiple Variables in Matlab

In the previous post I showed you how to implement Linear Regression with one Variable in Matlab. In this one I’m going to discuss implementation with multiple variables. Before implementing multivariate Linear Regression, feature normalization would be the smart step since the gradient descent would converge (would find minimum cost function) much more quickly. Every sample value is ...

## Linear Regression with one Variable in Matlab

In this post i will show you how to implement one of the basic Machine Learning concepts in Matlab, the Linear Regression with one Variable. Matlab and Octave are very useful high-level languages for prototyping Machine Learning algorithms. [Linear Regression Example from mathworks.com] The idea is to find the line that perfectly fits all ...

## Distribute Missing Values randomly across Columns in Python Pandas

Continuing to the previous post this script distributes missing values across all features in Pandas DataFrame. Just set min and max missing value distribution and you’re ready to execute … Enjoy…

## Distribute Missing Values in Pandas DataFrame Column with Python

If you want to test how Maschine Learning algorithms perform with missing values you may need a script to distribute a fixed percentage of missing values in a feature. This script randomly distributes missing values in a single data set’s column. Enjoy…

## Pandas DataFrame Subsampling in Python

Written long time ago to feed some ML algorithms with data subsets because the original data set was to huge and the algorithm execution performance was too long. Have fun with the script…

## Feature Scaling in Python and Pandas DataFrame

Hire is a small script that i wrote long time ago to scale some of the features in order to get better performance and better prediction results in some ML algorithms. I used Python with Pandas to read in the CSV file and process feature values. Formulas for feature scaling used in the script can be found ...

## How to crawl Hotel Informations from booking using Python and Selenium

This post continues on the last one. Assuming you have the hotel list with urls from booking you can now extract addresses for each hotel. This address can be further converted to latitude and longitude since geo information that can be crawled from booking is not quite right. The script below extracts the address and hotel star ...

## How to crawl hotel names and urls from booking.com using Python and Selenium

You might be needing a list of all hotels in your city for any reason. Most of them can be found at booking.com (assuming it’s a city in Europe). If you need hotel names, ratings and/or hotel url list from any city you can crawl booking for it. Coding it with Python and selenium is pretty easy. Below ...