If you’re thinking of a new laptop, you might wonder how much you should pay for one. You could sift through websites, but some Python code and a little linear regression could make the job easier.

Why Build a Laptop Price Predictor?

You could search through thousands of pages and trawl through physical stores, but this would take a lot of time. I like computers, but I don’t have the time or the inclination for this task. I have other things I’d rather be doing. I would like to have a program that I can input specs like how much RAM I want or the screen resolution I need and have it spit out a price.

There are so many machines on the market that it would be difficult for me to compute all of this information by hand. I also thought that sharing this info with other people who might be in the market for a new laptop could be helpful. Who wants to overpay for a machine? I don’t, and I can guess that you probably don’t either.

Output of laptops.head() in Jupyter notebook.

With my knowledge of linear regression from statistics, I realized I could easily build a model to answer these questions. Python is a great language, and I already had some basic familiarity with it. It’s become popular in data analysis because it’s simple enough for people without computer science backgrounds to pick up yet offers powerful libraries to analyze data.

Assembling the Python Statistical Toolkit

For this project, I used pieces of the Python statistical ecosystem that I was already comfortable with.

I’d already set up aMambaenvironment with these tools. While many systems, including Linux, include Python, it’s meant for supporting the system and less for user programs. If you upgrade the system Python, you might find that scripts that depend on it will break. There are tools for installing custom environments like VirtualEnv.

Descriptive statistics of the numerical columns in the laptops dataset.

The first component isNumPy. It’s a popular library for all kinds of numerical operations, particularly statistical and linear algebra calculations that will happen in the background.

The next library you’ll need isPandas, which will let you import the dataset and view it in columns as a “data frame.” It’s a bit like a cross between a relational database and a spreadsheet. You can also make some powerful manipulations on your data.

Histogram of laptop prices in a Jupyter notebook.

Seabornis a library for viewing statistical data plots. I use it for visualizing data distributions in histograms, scatterplots, and linear regressions.

Finally,Pingouinlets me perform many statistical tests easily, without having to memorize all those formulas I forgot in my college statistics class years ago. This is the program that will build the model through multiple linear regression of the retail price vs all the laptop attributes.

Laptop mutiple regression model in a Jupyter notebook.

Putting all this together is simple in most Unix-like environments, including Windows using theWindows Subsystem for Linux. You can follow the instructions on the web page to install it.

Jupyter notebooks provide a relatively user-friendly way to run the Python commands and view the results, as well as store the results for later, but it’s strictly optional. I created aJupyter notebook, and will be demonstrating code examples from it. I’ve posted it tomy GitHub, so you may see the code and some examples I couldn’t cover in this article.

With Mamba installed, you can create an environment you need. Like a cooking show, I had one already ready. To activate it, I type this at the Linux shell:

Acquiring the Laptop Data

To build the dataset for the regression model, I could trawl through internet stores and build up a comprehensive database of laptops. That would take a long time to build up, as well as clean the data so it would be consistent. Fortunately, someone has done that already.

There’s a database of laptops with certain hardware specs like CPU speed, amount of RAM, amount of storage, and horizontal and vertical screen resolutions available onKaggle.

The price of the laptops was in euros, but a quick check onXe.comin July 2025 showed that the exchange rate between euros and US dollars is pretty close.

Building the Regression Model

With the environment assembled and the data acquired, now is the time to build the model. First, I have to import the libraries I’m going to use.

These lines import the NumPy, Pandas, Seaborn, and Pingouin libraries. Numpy, Pandas, Seaborn, and Pinguoin, are shortened to “np, pd, sns, and pg.” The line that starts with “%” is for use in a Jupyter notebooks. It tells it to use the Matplotlib library that draws the plots to display them within the Jupyter notebook. Otherwise, they’ll be displayed as a separate window.

Next, we’ll import the data with Pandas:

This will create a Pandas data frame. We can see how the data is laid out with the head() method:

We can also see basic descriptive statistics of all the numerical columns with the describe() method.

This will show the mean, median, the standard deviation, the minimum value, the lower quartile or 25th percentile, the median, the upper quartile or 75th percentile, and the maximum value of each column.

I also like to visualize the distributions of data through histograms. Seaborn’s displot does this.

To see how the prices are distributed:

This tells Pingouin to plot the prices along the x axis and to use the laptop dataframe as the source. The tail of the distribution is noticeably skewed to the right.

We’ll build up a model that uses various specs. It’ll look something like this:

price = a(CPU speed) + b(RAM) + c(size in inches) …

The letters are stand-ins for the coefficients defined by the regression. It’s similar to simplelinear regressionyou might have seen, but instead of fitting a line over a scatterplot, you’re fitting a plane. Since there are more than three dimensions in this model, it’s actually a hyperplane.

To obtain the regression of price in euros vs the laptop size, CPU speed, screen size, weight, primary storage and secondary storage, use Pingouin’s linear regression function:

This will give us the coefficients for this regression equation. The relimp= option will tell Pingouin to calculate how much each variable contributes to the price. The coefficients will be displayed in the left-most column, with the column on the far right telling us that RAM is the biggest predictor of price. The number to pay attention to in determining how good of a fit is the square of the correlation coefficient, which is “r2” in this table. It’s around .66, which means it’s a pretty good fit.

With the predicted coefficients, we can now plug values into the equation to predict the price. Here’s a function that does just that:

You should indent the second line, but the limitations of our system require me to present it this way.

Do Prices Really Differ Among Brands?

This regression model only looks at specs. You might wonder if price is really a predictor of price. We can use analysis of variance, or ANOVA, to determine if the differences among brands is significant. Because the price data was skewed, as seen with the histogram, a non-parametric test will be more accurate. Pingouin has a Kruskal-Wallis test that does this.

This will test the null hypothesis that there’s no relationship between price and brand:

The p-value is 0, which means that that price is indeed significant. The rounding was done to make the p-value more apparent. Otherwise, it will be shown in scientific notation. This means that we can reject the null hypothesis and conclude that brand is a predictor of price.

I was able to build a price predictor to help me decide what a fair price to pay for a machine would be based on its specifications, and another to determine how significant brand was. This shows the power of Python and its libraries to make something that might have been difficult to do by hand reducible to a few lines of code.