Analyzing a boat listings website


Data analysis with Pandas and data visualization with Seaborn regarding a boat listing website | Most popular boats, prices, etc..

Published on December 08, 2021 by Andrés Ingelmo Poveda

jupyter notebook python data analysis pandas seaborn data visualization

8 min READ

What do you do when you want to sell your boat? You can either sell it to a friend or relative or post it online. This is the reason why boat listings websites exists. In this blog post, I will try to analyze the main common patterns of the most viewed ads within the last 7 days. This analysis was part of my DataCamp Data Analyst certification.

Dataset

The dataset contains almost 10,000 observations of boat ads from an unknown website. It has the following structure:

Data columns (total 10 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Price                        9888 non-null   object 
 1   Boat Type                    9888 non-null   object 
 2   Manufacturer                 8550 non-null   object 
 3   Type                         9882 non-null   object 
 4   Year Built                   9888 non-null   int64  
 5   Length                       9879 non-null   float64
 6   Width                        9832 non-null   float64
 7   Material                     8139 non-null   object 
 8   Location                     9852 non-null   object 
 9   Number of views last 7 days  9888 non-null   int64

As we can see, the data is not clean enough. A lot of observations are missing and, well, the price column contains different currencies so… If you want to check out all of of the steps I followed to clean up the data. Please, check out the jupyter notebook associated with the post.

After cleaning the dataset, I decided I was gonna create a filtered dataframe with the top 1% of the most viewed ads. Why? Because they represent almost 7% of the total views and I think it is representative enough. Don’t forget, we are trying to check out the top performing ads to look for common patterns.

The filtered dataframe looks like this:

 Boat TypeManufacturerTypeYear BuiltLengthWidthMaterialLocationNumber of views last 7 daysPrice_EUR
9580Motor YachtBayliner power boatsUsed boat,Unleaded19927.72.46PlasticSwitzerland » Le Landeron (NE)326314304
8723HardtopPrincess power boatsUsed boat,Diesel197911.123.88GRPSwitzerland » Neuenburgersee » Hauterive243233600
6211Bowrider,Motor Yacht,Sport BoatWindy power boatsUsed boat,Diesel200212.353.48GRPSwitzerland » Lago Maggiore » 6600 Locarno2261120864
3700HardtopPershing power boatsUsed boat,Diesel200920.35.2GRPNeustadt in Holstein (Ostsee)2154949000
308Sport BoatSea Ray power boatsUsed boat,Unleaded19936.142.34PlasticSwitzerland » Murtensee » Avenches202619104

Exploratory data analysis

By taking a look at the most characteristic statistics, we can draw some conclusions. Lets take a look at the top performers statistics:

 Year BuiltLengthWidthNumber of views last 7 daysPrice_EUR
count8080808080
mean1997.889.827253.1291202.56558834
std20.19427.17161.47437445.1533.51186e+06
min19013.351.558593648
25%19896.32.35589521336
50%20037.52.59101946280
75%2011.2511.033.4575132793400
max202054.49.9532633.1e+07

And the whole dataframe ones:

 Year BuiltLengthWidthNumber of views last 7 daysAmountExchange_EURPrice_EUR
count9246924692469246924692469246
mean2004.9211.71743.55226150.4313202400.98518301920
std16.40646.000271.21254155.0889790040.122769940032
min18851.040.011333000.133203.52
25%19997.562.557045000144500
50%200810.53.3810898000195000
75%2018144.261722590001249000
max202110025.1632633.1e+071.173.1e+07

Just by taking a look at the two tables from above, we can get some valuable information:

  • The median boat in the top listed ads is older (2003 vs 2008).
  • The top performer boats are smaller in length and width.
  • The median price of the most viewed boats is cheaper.

Let’s perform some data visualizations to check out our hypotheses!

Data visualizations

Let’s start by asking the following question: do more expensive boats receive more views?

Relation between price and number of views

As the graph shows, there is not any positive relation between the number of views and the price. This means that the most expensive boats are not getting the majority of attention and it makes sense. People may want to see expensive and overpriced boats but they are not functional. Majority of them cannot afford them so they search for more affordable options.

Now, let’s check if the year they were built matters. Do the newer boats make to the top of the list?

Years when top boats were built

And, as we can see, the top 80 most viewed boats are pretty new, but not as new as the total. The median of the most viewed boats was 2003 while the whole dataset was 2008. We can say that older boats can have a higher chance of being in the top positions of the list.

Does size matters?

Size of most viewed boats

We can get some important insights from this visualization. The most viewed boats are smaller than the average ones listed on the site. The majority of them are 3 meters wide and 7.5 meters long. This means that people prefer smaller ships. And again, it makes sense. Smaller ships are usually more affordable.

Now we now that users prefer older and smaller ships, probably because they are cheaper. Let’s try to figure out the most common type of boat.

Type of most viewed boats

And yes! Sports boats are the most popular ones on the site. This boats are usually small and cheap. People can afford them so they are more likely to be on the top of the rank. Cabin boats are also small and affordable so, selling those type of boats, will give the users more chance of being visible.

Let’s check the most common type of material.

Materials of most viewed boats

And we have a winner! GRP (Glass Reinforced Plastic) is the most common type of material for boats! Pound for pound, GRP is lighter and stronger than aluminum or steel. Smaller boats are usually made of this so, its another reason to take into account regarding why smaller boats are so popular.

Conclusion

Here it comes! The part we were all waiting for…

After analyzing the whole dataset with the objective of showing what are the common characteristics in the most viewed boats, we can conclude the following:

  • Cheaper boats make the top of the list.
  • Smaller boats as well.
  • Sports and cabin boats are the most common ones.
  • GRP is the most common material.

Putting all previous insights together, we can say that announcing a well-priced boat, which is usually smaller, will give the user more probabilities of being visible than a big and expensive boat. People are using the website to buy sports and cabin boats, probably because they are more affordable than yatches.

If you liked my article. Please, consider reading another one from my blog or checkout my projects.