Linear Regression with multiple Variables in Matlab

In the previous post I showed you how to implement Linear Regression with one Variable in Matlab.  In this one I’m going to discuss implementation with multiple variables.

Before implementing multivariate Linear Regression, feature normalization would be the smart step since the gradient descent would converge (would find minimum cost function) much more quickly. Every sample value is going to be normalized with Standard score normalization.

X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));

for feature_index = 1:size(X,2)

	feature_mean = mean(X(:,feature_index))
	X_norm(:,feature_index) = X(:,feature_index) - feature_mean;

	feature_std = std(X_norm(:,feature_index))
	X_norm(:,feature_index) = X_norm(:,feature_index) / feature_std;	

	sigma(feature_index) = feature_std;
	mu(feature_index) = feature_mean;
end

 

Computing the cost of Theta parameters stays same as in univariate regression implementation since whole X matrix (with all features) is multiplied with theta matrix.

squared_error = sum(((X * theta) - y).^2);
J = (1/(2*m))*squared_error;

 

The Gradient descent for multiple linear regression updates initial thetas for every single feature so instead of having only 2 thetas in univariate case we now have to update theta for every feature in data-set(matrix).

m = length(y); % number of training examples
for iter = 1:num_iters
	new_theta = zeros(1,size(X,2));

	for feature_index = 1:size(X, 2)
   		new_theta(feature_index) = theta(feature_index) - alpha*(1/m)*sum(((X * theta) - y).*X(:,feature_index))
	end

	% update thetas
	for feature_index = 1:size(X, 2)
		theta(feature_index) = new_theta(feature_index);
	end

    % hold costs in array
    J_history(iter) = computeCostMulti(X, y, theta);

end

Note that all the formulas are the same as in the post with univariate linear regression implementation.

Cheers!

Leave a comment