Post image

Predicting Laptop Prices: An End-to-End Project Using E-commerce Website Data

In this project, I tried to create a model that can predict the price of a laptop based on the criteria of the desired laptop by using the data of the available laptops that I collected from the Digikala website, the largest e-commerce site in Iran.

Project Github Repository

Project Demo

Note: Since the price of laptops in Iran is constantly changing, this model is trained based on the data of laptops available on 2022-11-06 (1401-8-15) in Digikala. And in the coming years, it may not have an accurate prediction, and the solution to this problem is to update the model with the most up-to-date data.

Project Overview :

1 - Collect Data

To collect my data, I used Digikala's secret api. I was able to collect the data I wanted (available laptops with prices) with a simple fitler.

  • Collect Laptop main data

    • id
    • title_fa
    • title_en
    • price
    • image_url
    • brand
  • Collect Laptop details data

    • cpu manufacturer
    • cpu series
    • cpu model
    • ram
    • ram type
    • internal storage
    • internal storage type
    • gpu manufacturer
    • gpu model
    • screen resolution
    • ports
  • Merge Collected data

  • Remove duplicated rows

  • Save data into csv file

2 - Take a look at data :

After collecting the data, I started checking the collected data to make sure it was collected correctly

  • #### Check shape of data
  • #### Check is there any null value
  • #### Check data types
  • #### Check number of unique values in each column

3 - Cleaning data

Like all machine learning projects, the data doesn't arrive perfect and ready for prediction. At this point, I started cleaning the collected data.

  • #### Convert brands name from persian to english
  • #### Convert ram from persian to english digits
  • #### Clean and convert internal storage to english
  • #### Convert and clean internal storage to english
  • #### Convert and clean laptops screen size
  • #### Clean laptops resolution

4 - EDA

For the next step, which is Feature engineering, it was necessary to get information about the data. In this step, I analyzed and explored the data.

  • #### Laptops price distribution
  • #### Number of laptops of each brand
  • #### Number of cpu of each cpu manufacturer
  • #### Number of laptops for each ram group
  • #### Number of laptops for each ram type group
  • #### Number of laptops with diffrent internal storage
  • #### Number of laptops for each internal storage group
  • #### Number of laptops with diffrent screen sizes
  • #### Number of laptop with diffrent screen resolution

5 - Feature Engineering

In this step, I prepared the features for training the model

  • #### Remove outliers base on laptops price using z score
  • #### Convert screen resolution to number
  • #### Extract Gaming brands from title (asus rog , acer nitro ...)
  • #### Remove brand with only one laptop
  • #### Extract clean gpu model from gpu model column
  • #### Remove laptops with only less than 3 model gpu
  • #### Label endcoding cleand gpu models
  • #### Convert internal storage from tb and gb to mg
  • #### Label encoding internal storage type
  • #### Convert ram from str to int
  • #### Extract port count
  • #### Label encoding ram type
  • #### Label encoding cpu series
  • #### One hot encoding brand - cpu manufacturer - gpu manufacturer (nominal categorical variables)

6 - Feature Selection

In this step, I chose the features needed to train the model

  • #### Check correlation
  • #### Mutual information regression

7 - Model training

8 - Hyperparameter tuning

In this step, I hyperparameter tuned the model that had the best r2 score

9 - Cross validation

10 - Save model

I pickeld model for use in the gui environment

11 - Website

To create website, I used streamlit formwork, a powerful formwork that allows me to create the desired user interface completely using Python.

12 - Deploy

I used Heroku a cloud platform as a service which provide a free hosting to deploy my app on it. it's and amzaing platform gave me so much flexbilte to deploy your apps

Libraries and FrameWorks used in the project

Read next