Exploratory data analysis with each data type (part 2)

Mustapha Mekhatria
3 min readDec 4, 2019

This article is the continuation of the Exploratory data analysis with each data type (part 1)

Ratio data

Data

The tables below represent Toyota and Dodge 2019 cars fuel consumption in cities, on highways, and combined. The tables include data from the following car classes: Mini-compact, Mid-size, Compact, and Full-size for both brands:

The fuel consumption, in the tables, is a ratio data, because it is a measured data, where zero and proportion have meaning. For example, a car with 10 L/100km consumes twice more fuel than a car with 5 L/100km.

Statistic

Let’s explore the data using the statistical tools for ratio data:

Let’s use visualization to get better insight from the statistical data.

Visualization

Like the interval data, histogram and boxplot charts are often used to explore ratio data sets.

From the box plot charts below (link to box plot demo), it is clear that Toyota cars’ fuel consumption outperforms Dodge cars’ fuel consumption. The most fuel-consuming Toyota cars are below the most fuel-efficient Dodge cars either in city, highway, or combined. Toyota cars’ fuel consumption is relatively similar to each other compared to the Dodge cars, as all Toyota IQRs are compact and consistent, where the Dodge IQRs are wide and vary from each other.

The histogram charts below compare performances between Toyota’s cars and Dodge’s cars in the city, highway, and combined.

I use the same approach as in interval data to explore a Histogram by checking the four points: shape, outliers, center, and spread.

Histograms charts, like the previous box plots, show how consistent Toyota’s cars fuel consumption vs the inconsistent Dodge cars fuel consumption:

  • All Toyota cars’ histograms are relatively centered, where only the Dodge (highway) histogram is centered, the rest of the Dodge histograms are skewed.
  • All Toyota histograms have higher numbers of cars in the middle compared to Dodge histograms.
  • All Dodge histograms have large ranges and higher fuel consumption compare to Toyota histograms.

From the tables and the charts above, Toyota’s fuel consumption performances are much better than Dodge’s performances. The average consumption for Toyota’s cars in cities (7.94L/100km) is half the average consumption of Dodges’ cars (15.32L/100km). The standard deviations of Toyota’s cars are lower than the Dodge’s cars, and Toyota’s cars IQR (1.2) in the city is 4 times lower than Dodges’ cars (4.8).

As I mentioned in my previous article, this is just a nice introduction to data exploration. I will keep publishing and sharing other articles to understand better and improve data literacy.

--

--

Mustapha Mekhatria

I strongly believe that “everything is easy to understand”, if taught well :)