Example: Predicting House Prices

Now it's time to put everything together! Let's build a complete Machine Learning model to predict house prices, using all the concepts we've covered: functions, matrices, statistics, probability, and gradient descent. This example will show you how these mathematical tools work together in a real machine learning application.

We'll walk through the entire process—from defining the problem to training the model—just like a data scientist would approach it in practice. This section is more mathematical, but you now have all the foundation needed to understand it.


Step 1: Defining Our Problem (Supervised Learning)

Remember from our discussion of machine learning types that this is a supervised learning problem—we have labeled training data (houses with known prices) and want to predict prices for new houses.

Specifically, it's a regression problem because we're predicting a continuous numerical value (price) rather than categories.

Our Dataset: Imagine you're working for a real estate company with data on:

  • Size(square meters): Our primary feature.
  • Location Score (1-10): Quality of neighborhood.
  • Price (€): What we want to predict (our target variable).

We want to build a function (or model) that maps house features to prices:

f(size,location)pricef(size, location) \rightarrow price

In other words, we want to teach the model to capture the relationship between house features and price so it can make accurate predictions on unseen properties.


Step 2: Choosing Our Model (Linear Regression)

Based on our exploration of core machiine learning algorithms, we'll use linear regression because:

  • House prices tend to increase with size (linear relationship).
  • We want an interpretable model where we can understand each feature's impact.
  • It's simple enough to solve step-by-step.

Single Feature Model: Starting with just house size, our function looks like:

y=w1x+by=w_1 x + b

Let's break it down:

  • yy is the predicted price.
  • xx is the size of the house.
  • w1w_1 is the weight (how much price increases per square meter).
  • bb is the bias (baseline price for a house with 0 square meters).

Multiple Features Model: Including location score, we extend to:

y=w1x1+w2x2+by=w_1 x_1 + w_2 x_2 + b

where x1x_1 is our first feature size and x2x_2 is location score.

We can extend a linear regression model depending on the amount of features we have!


Step 3: Representing Data with Matrices

Remember matrices from Chapter 2? Here's how we organize our data for efficient processing. Using matrix notation, we can write:

y=Xw+b\mathbf{y} = \mathbf{X} \mathbf{w} + \mathbf{b}

This is new to us, but remember how we can stack information to form matrices or vectors? Matrices after often represented using a captial letter (X\mathbf{X}) while vectors are represented with lower case letters (y\mathbf{y}).

What do these matrices and vectors look like do you think? Let's break them down.

X\mathbf{X} is the feature matrix, where we stack all features for each data point. The labels (or truth values) are stored in vector y\mathbf{y} and the weights are represented by w\mathbf{w}.

X=[size1location1size2location2]y=[price1price2]w=[w1w2]\mathbf{X} = \begin{bmatrix} size_1 & location_1 \\ size_2 & location_2 \\ \vdots & \vdots \end{bmatrix} \quad \mathbf{y} = \begin{bmatrix} price_1 \\ price_2 \\ \vdots \end{bmatrix} \quad \mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \end{bmatrix}

Each row in X\mathbf{X} represents one house with all its features. Each column represents one feature across all houses. The y\mathbf{y} vector contains all the actual prices we're trying to predict.

This matrix representation lets us process all houses simultaneously instead of one at a time—much more efficient!


Step 3: Finding the Best Weights (Optimization)

Now comes the core challenge: finding the optimal weights. This is where statistics, derivatives, and loss functions from our previous discussions come together.

To build our model, we need to find the best weights that determine how much house size affects price. Intuitively, we want to find the line that best fits the data, minimizing the difference between our predictions and actual prices.

If we want to find w1w_1 by hand, let's start with two houses to illustrate the principle:

  • House 1: Has a size of 50m² and costs €150.000.
  • House 2: Has a size of 100m² and costs €300.000.

Let's use the slope formula, which is actually a derivative!

Δ\Delta (delta) just means "change." So when you see Δx\Delta x, it means "change in x"—how much xx has increased or decreased. Same with Δy\Delta y, which is the "change in yy." It’s like saying: how far did we move horizontally (xx), and how far did things shift vertically (yy)?

w1=ΔyΔx=300.000150.00010050=150.00050=3000w_1 = \frac{\Delta y}{\Delta x} = \frac{300.000 - 150.000}{100 - 50} = \frac{150.000}{50} = 3000

This means each additional square meter adds €3.000 to the house price.

Now, to find the bias (bb), we substitute one of our data points into the equation:

150.000=3000(50)+b150.000 = 3000(50) + b b=150.000150.000=0b = 150.000 - 150.000 = 0

Thus, our final model for predicting price based on size is:

y^=3000x\hat{y} = 3000 \mathbf{x}

If we extend this to multiple features, we move from simple algebra to matrices—where the same principle applies, but we solve for all weights at once! This ensures we find the optimal weights that minimize the overall prediction error across all training data.


Step 5: Using Gradient Descent for Complex Cases

The manual approach works for simple cases, but real datasets have hundreds of houses and multiple features. This is where gradient descent (which we just learned about) becomes essential.

The process is as follows:

  1. Start with random weights: w1=0.5,w2=0.3,b=1000w_1 = 0.5, \quad w_2 = 0.3, \quad b = 1000.
  2. Make predictions: Use current weights to predict all house prices.
  3. Calculate error: Use Mean Squared Error (our loss function from evaluation).
  4. Compute gradients: Use derivatives to find which direction to adjust weights.
  5. Update weights: Move in the direction that reduces error.
  6. Repeat: Continue until error stops decreasing.

Why Derivatives Matter

Remember from Chapter 2 that derivatives tell us how functions change. In gradient descent:

  • ΔMSEΔw1\frac{\Delta MSE}{\Delta w_1} tells us how the error changes when we adjust the size weight.
  • ΔMSEΔw2\frac{\Delta MSE}{\Delta w_2} tells us how the error changes when we adjust the location weight.
  • ΔMSEΔb\frac{\Delta MSE}{\Delta b} tells us how the error changes when we adjust the bias.

So, gradient descent uses these derivatives as a guide, steadily adjusting the weights until the model makes the most accurate predictions possible.


Step 6: Evaluating Our Model

Using our evaluation techniques, let's assess how well our model performs. After gradient descent, our model might learn:

  • w1=2800w_1 = 2800 (each m² adds €2.800).
  • w2=15.000w_2 = 15.000 (each location point adds €15.000).
  • b=50.000b = 50.000 (base price regardless of size/location).

After plugging these values into our linear regression formula, our model becomes:

price=2800×size+15000×location+50000price = 2800 \times size + 15000 \times location + 50000

Let's interpret this model:

  • Location has a bigger impact than size (weight of 15.000 vs 2.800).
  • Even a tiny house in a great location has significant base value.
  • This makes intuitive sense—location is often more important than size in real estate.

After using our train/validation/test split and testing our model we might get:

  • Training MSE: 25.000² (average error of €25.000).
  • Validation MSE: 28.000² (slightly higher, which is normal).
  • Test MSE: 27.000² (final performance estimate).

The similar performance across all datasets suggests good generalization—our model learned patterns rather than memorizing training data.


Step 7: Making Predictions

Now we can use our trained model to predict prices for new houses!

🏠 New House Example: Let's say a new house comes on the market and we want to find out what price to ask for it, it has the following features:

  1. Size: 120 m²
  2. Location Score: 7

Using our model and the known features we get a prediction:

price=2.800(120)+15.000(7)+50.000price = 2.800(120) + 15.000(7) + 50.000 price=336.000+105.000+50.00=491.000price = 336.000 + 105.000 + 50.00 = 491.000

So this new house has a predicted price of €491.000.

Remember probability from Chapter 2? In practice, we'd also provide confidence intervals: "The predicted price is €491,000 ± €35,000 with 95% confidence." This acknowledges the inherent uncertainty in predictions.


What We've Accomplished

This example demonstrates how all our learned concepts work together:

  • Functions: We created a mathematical function mapping house features to prices.
  • Matrices: We organized data efficiently for batch processing.
  • Statistics: We used mean squared error to measure and minimize prediction errors.
  • Derivatives: We applied gradient descent to find optimal model parameters.
  • Evaluation: We properly split data and assessed model performance.
  • Supervised Learning: We learned from labeled examples to make predictions on new data.
  • Linear Regression: We implemented one of the core machine learning algorithms.

Professional, real-world, models would also include:

  • More features (bedrooms, age, garage, etc.).
  • Polynomial regression (a higher order) for non-linear relationships.
  • Regularization to prevent overfitting.
  • Cross-validation for more robust evaluation.
  • Feature engineering to create better input variables.

Together, these steps give you the foundation for building real-world machine learning models that are both accurate and reliable.


Final Takeaways

Building a machine learning model involves systematically applying mathematical and statistical principles to learn patterns from data. This house price predictor demonstrates how functions, matrices, derivatives, and optimization techniques combine to create systems that can make accurate predictions on new, unseen data.

The key insight is that machine learning isn't magic—it's the systematic application of mathematical tools to find the best possible function for mapping inputs to outputs. Understanding this foundation prepares you to tackle more complex machine learning problems and appreciate the sophisticated algorithms powering modern AI applications.