Interesting IMDb facts


Have you ever wondered what's the most expensive movie ever made? In this post, you may find interesting answers.

Published on October 10, 2021 by Andrés Ingelmo Poveda

pandas jupyter notebook data analysis

13 min READ

IMBd is the biggest source of information regarding movies in the world. By analyzing its whole database, we can get some insights in very interesting matters. For example, have you ever wonder what is the most popular genre over the years? Are the new movies more profitable than older ones? Find the answers to all your questions in this post!

Dataset

The dataset used for the analysis contains information from more than 10k movies from the well-known website “IMDb”. The dataset columns give information regarding to popularity, genres, revenue, budget and release year among others. It is structured in the following way:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10866 entries, 0 to 10865
Data columns (total 21 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    10866 non-null  int64  
 1   imdb_id               10856 non-null  object 
 2   popularity            10866 non-null  float64
 3   budget                10866 non-null  int64  
 4   revenue               10866 non-null  int64  
 5   original_title        10866 non-null  object 
 6   cast                  10790 non-null  object 
 7   homepage              2936 non-null   object 
 8   director              10822 non-null  object 
 9   tagline               8042 non-null   object 
 10  keywords              9373 non-null   object 
 11  overview              10862 non-null  object 
 12  runtime               10866 non-null  int64  
 13  genres                10843 non-null  object 
 14  production_companies  9836 non-null   object 
 15  release_date          10866 non-null  object 
 16  vote_count            10866 non-null  int64  
 17  vote_average          10866 non-null  float64
 18  release_year          10866 non-null  int64  
 19  budget_adj            10866 non-null  float64
 20  revenue_adj           10866 non-null  float64
dtypes: float64(4), int64(6), object(11)

Exploratory data analysis

Usually, when performing an analysis, starting with an histogram is a good idea. By doing this, we can get very helpful insights regarding the dataset.

imdb-hist

From this histogram we can get two obvious facts: a lot of new movies have been made in the last twenty years and the average vote results is around 6. But let’s do a more profound analysis.

How many movies have been done over the years?

It is nice to begin our analysis detailing the previous exposed fact. Taking a look at the following graph, we can notice that the movie industry has skyrocketed in the past twenty years.

imdb-movies-count

More and more movies are being done. The sector is booming but, are all of them profitable? Let’s find out.

What are the most expensive movies ever made?

At this point you may be wondering what is the most expensive movie ever made and our dataset allow us to do that.

 original_titlerelease_datedirectorbudget_adjrevenue_adj
2244The Warrior’s Way12/2/10Sngmoo Lee4.25e+081.10876e+07
3375Pirates of the Caribbean: On Stranger Tides5/11/11Rob Marshall3.68371e+089.90418e+08
7387Pirates of the Caribbean: At World’s End5/19/07Gore Verbinski3.15501e+081.01065e+09
6570Superman Returns6/28/06Bryan Singer2.92051e+084.2302e+08
5231Titanic11/18/97James Cameron2.71692e+082.50641e+09

The Warrior’s Way, an action/fantasy film by the korean director Lee Seung-moo in 2010 seems to be the highest budgeted movie in history. However, it is not. Someone added an extra 0 when inputting the data and made it look like the highest budgeted movie ever with 420,000,000 $. The correct amount is 42,000,000 $.

Knowing that, the second in place must be the highest budgeted movie in history: Pirates of the Caribbean: On Stranger Tides by american director Rob Marshall. It costed the incredible amount of 360,000,000 $. It also seems that Pirate of the Caribbean saga is quite costly because the third most expensive movie ever is Pirates of the Caribbean: At World’s End by american director Gore Verbinski.

This movies are super expensive but, are they really the ones that get the most revenue? We’ll find out.

What are the highest-grossing films ever made?

It is nice to know the most expensive movies ever made but, what about the highest-grossing ones?

 original_titlerelease_datedirectorbudget_adjrevenue_adj
1386Avatar12/10/09James Cameron2.40887e+082.82712e+09
1329Star Wars3/20/77George Lucas3.95756e+072.78971e+09
5231Titanic11/18/97James Cameron2.71692e+082.50641e+09
10594The Exorcist12/26/73William Friedkin3.92893e+072.16732e+09
9806Jaws6/18/75Steven Spielberg2.83627e+071.90701e+09

Avatar, a movie directed by James Cameron and released in 2007, is the highest grossing movie with more than 2.8 billion dollars in box office. But this is not the only James Cameron movie in the top 5! Titanic, released in 1997, was also directed by him! What a guy!

There are other well-known movies on the list like Star Wars by George Lucas or Jaws by Steven Spielberg.

Another important insight we may find is that three out of the five highest-grossing films ever made were done in the 1970s. Maybe people had less leisure options back then.

Now we know the movies that made the highest box office in history but some may be interested in knowing the profit. Because, yes, you can earn a lot but, if you cost a lot… that isn’t great news.

What are the most profitable movies ever made?

Profitability can be measured in a variety of ways. It can be absolute or relative. In absolute terms, super productions will have an advantage because they are made to sell a lot. Let’s check it out.

 original_titlerelease_datedirectorbudget_adjrevenue_adjprofit_adj
1329Star Wars3/20/77George Lucas3.95756e+072.78971e+092.75014e+09
1386Avatar12/10/09James Cameron2.40887e+082.82712e+092.58624e+09
5231Titanic11/18/97James Cameron2.71692e+082.50641e+092.23471e+09
10594The Exorcist12/26/73William Friedkin3.92893e+072.16732e+092.12804e+09
9806Jaws6/18/75Steven Spielberg2.83627e+071.90701e+091.87864e+09

Based on the previous list, Star Wars is the most profitable movie of all time. We can also see that the other four movies on the list are super productions like Avatar or Titanic. But, how about smaller productions. Can we measure its impact in the industry? Of course we can! By calculating the profit in relative terms. This means, how much they raised in proportion o what they costed.

 original_titlerelease_datedirectorbudget_adjrevenue_adjprofit_adjprofit_rel_adj
7447Paranormal Activity9/14/07Oren Peli157752.03346e+082.0333e+0812889.4
2449The Blair Witch Project7/14/99Daniel Myrick32726.33.24645e+083.24612e+089919
1354Eraserhead3/19/77David Lynch35977.82.51845e+072.51485e+07699
7277Pink Flamingos3/12/72John Waters62574.73.12874e+073.12248e+07499
7178Super Size Me1/17/04Morgan Spurlock750393.29884e+073.29133e+07438.617

In this list, Paranormal Activity, a horror movie directed by Oren Peli released in 2007 appears at the first place. However, its profit is way less than Star Wars or Avatar. How can this be possible? Well, Paranormal Activity budget was only 15,000 dollars and it raised more than 200,000,000 dollars in revenue. This is huge! It raised more than 12,500 times its budget in revenue. Incredible.

Taking a look at the list, we may find some other interesting insights. The Blair Witch Project and Eraserhead are also horror movies. Seems like this genre is very profitable!

What are the biggest deceptions in history?

Not all the movies are good. Some of them have a very high budget but perform badly in box office. We call this a deception. Let’s take a look at the movies with the worst profit in history.

 original_titlerelease_datedirectorbudget_adjrevenue_adjprofit_adj
2244The Warrior’s Way12/2/10Sngmoo Lee4.25e+081.10876e+07-4.13912e+08
5508The Lone Ranger7/3/13Gore Verbinski2.38689e+088.35783e+07-1.5511e+08
7031The Alamo4/7/04John Lee Hancock1.67395e+082.98077e+07-1.37587e+08
2435The 13th Warrior8/27/99John McTiernan2.09448e+088.07671e+07-1.28681e+08
4970Brother Bear10/20/03Aaron Blaise, Robert Walker1.18535e+08296.338-1.18535e+08

Again, The Warrior’s Way appears to be the worst. However, we know that the budget is not correct so let’s move to the second. The Lone Ranger, released by Disney in 2013 is the worst movie ever in economic terms. It made the studio lose more than 150,000,000 dollars. That is a huge loss!

Taking a deeper look into the table we find a interest fact. All of the movies were super expensive, costing more than 100,000,000 dollars. It looks like super productions can turn in a studio’s biggest enemy.

A common assumption to think is that more popular movies should gross higher revenue. However, does this relate to profit? In the following graph we can see a relation between profit a popularity.

imdb-popularity-profit

Taking a look at the graph above, we may conclude that more popular movies tend to be more profitable and it makes sense. More popular movies have a higher change of grossing a higher revenue, the more revenue a movie raise, the more potential it has to become profitable.

What is the average vote rating?

Another important insight we can get from the data is the average vote rating from users. This would allow us to know if a movie is above the average or below. If a movie rating is higher than the median, the chances of it being good are higher than if the rate is lower.

imdb-movies-rating

Based on the previous histogram, we can conclude that most of the movies are rated near 6. Let’s detail it.

 vote_average
count10725
mean5.96432
std0.930166
min1.5
25%5.4
50%6
75%6.6
max9.2

The median is exactly 6. This means that 50% of movies are above 6 and 50% below 6. The highest rated movie in the dataset is 9.2 while the lowest is 1.5. We also may say that a movie with a rating above 6 has higher chances of being good.

Are newer movies worst than old ones?

This is another question that may arise. Ok, we know the average rating of movies but, are the new ones worst rated? Let’s find out. The following groups the movies newer than 2006 (included) and older than 2006.

imdb-movies-rating-old-new.png

The graph shows that old movies are slightly better rated than new movies. However, the difference is not enough to affirm that old movies are better than new movies.

Conclusion

I could be exposing insights regarding this dataset. However, the post is getting to long so I will stop there. If you are interested in knowing more about movies and the procedures used to clean and draw conclusions, please take a look at my jupyter notebook.

After analyzing the dataset, we may conclude the following:

  • Lots of movies are being done nowadays. However, this doesn’t imply that all of them are profitable. The probability of failing is as high as succeeding. Avatar, released in 2009, is the highest-grossing movie of all time and second most profitable. However, The Lone Ranger, released in 2013, is the worst profitable movie of all times. Studios should be careful when releasing movies nowadays as people have more leisure activity options than 40 years ago.
  • Horror movies seems to perform well in terms of yield. Paranormal Activity multiplied its budget by more than 12,000 times in revenue. This is a huge success, as well as The Blair Witch Project and Eraserhead.
  • Older movies are better rated than newer ones. In the old days, as less movies were made, it was easier to make a good movie. However, nowadays the probability of making a bad movie is higher as the amount of movies has increased a lot.

Thank you for reading until the end. Hope you enjoyed it!