Category: Data Science

Artificial Neuron learning in Python

It all starts with a single neuron! Or with a perceptron. Perceptron is a neuron’s computational model. If you link some of them together, you get to a real ‘artificial brain’ capable of learning complex stuff. When I try to explain basic AI concepts to non computer scientist, I usually start like this … You ...

The power of Visual Data Science.

When I do not deal with people, machines or numbers, I enjoy the pleasures of data visualizations. #VisualThinking #DataScience #Python  

Some nice & new visualizations

I love plotting data and learning from the visualizations. Here are some new data plots of projects I was working on lately.  

Vizuelna Analiza Trzista Nekretnina Bosne i Hercegovine

Gdje kupiti, iznajmiti ili prodati nekretnine u Bosni i Hercegovini. Ova vizuelna prezentacija ce vam pomoci oko nekih bitnih pitanja. Izvolite…      

Hotel Time Series Clustering Analysis - Dzenan Hamzic Blog

3D Interactive Hotel Market Segmentation

There is a life beyond regression analysis! You can do so much more with pure time-series data!  #datascience #plotly #timeseries

dzenanhamzic.com

Web Mining on CSS classes

I was just working on some ‘side projects’ and made some nice visualizations of CSS classes as graph structures from Amazon, The New York Times and some other sites. Some interesting clusters can be spotted. Somehow, I have a feeling, I’m gonna dig deeper into this things of WebMining… Check out the plots below. Cheers! ...

Market Basket Analysis – Mining Frequent Pairs in Python

Have you ever asked yourself how the store managers decide on product shelf placement in retail stores? There must be some strategy behind it, right? It can’t be just a random choice. Almost on daily basis, you receive product purchase recommendations from variety of sources where you have left your “digital fingerprint”. In many cases these ...

Bloom Filter Example in Python

The title might also have been, “how to reduce 10 Gb of data to 1 single Megabyte”. BigData is only going to get bigger in the future. Our challenge, among others, is to find efficient methods and algorithms to (quickly) deal with  wast amounts of data, extract meaningful information and to find ways how to ...

Recommender System in Python

The Amazon and Netflix are making almost 50% of their revenues by recommending appropriate products (books, movies) to their users. But how do they know what to recommend to their users? Well, they use the power of the collaboration between all other items and users. btw… If you would like to go deeper into the ...

Visual Gallery

As the end of year 2016 is approaching, I decided to gather and summarize my visualizations on one page. Have a look. Visual Gallery Happy new year! Cheers! #visual #analytics #bigdata #BI #ML #datamining #mmds #plots  Post by @dzhamzic. Source: Visual Gallery

Twitter API Streaming in Python

Maybe the easiest way to connect to Twitter API and to stream tweets, is using Python and Tweepy. You can install Tweepy using pip: Tweepy is an open source library and you can check the source here. Maybe, take a peak at StreamListener class and see what additional options are available. ‘Nuff said. Here’s the ...

Google’s PageRank Algorithm in Python

Have you ever asked yourself how google ranks the pages when you search something on google.com? If yes, have a look at PageRank algorithm definition. I’ll not go into much details here, but to give you an idea, the World Wide Web can be seen as a large graph, consisting of pages as nodes and ...

Searching neighbors in Graph Data Structure by matrix multiplication

Which vertices in graph can be reached in 1, 2 or N hops? This can basically be implemented in two ways. First, if the graph is implemented as adjacency matrix, by checking connections (ones in double array or a row) and iterating further. Second, if the graph structure is implemented as ArrayList, by checking lists of ...

Java MapReduce for top N Twitter Hashtags

Back in 2015 i had to implement MapReduce job to extract top 15 hashtags from  twitter’s raw data in Hadoop. This was a part of Business Intelligence lecture exercise at Vienna University of Technology. Regex used (not the best one by my opinion), for hashtag extraction, has following format: String regex = “text\”:\\\”(.*)\\\”,\”source”; The ‘source’ field comes ...

Linear Regression with multiple Variables in Matlab

In the previous post I showed you how to implement Linear Regression with one Variable in Matlab.  In this one I’m going to discuss implementation with multiple variables. Before implementing multivariate Linear Regression, feature normalization would be the smart step since the gradient descent would converge (would find minimum cost function) much more quickly. Every sample value is ...