# Data types used in statistics: definition and examples

## overview of data types

The below graphic describes the different data types that are used in statistics and six sigma to categorize data. So, let's have a look at how they are defined and illustrate the different types with some examples.

The first distinction that we will make is the distinction between attribute data and variable data. Therefore it is good to know that attribute data is also called qualitative data or categorical data while variable data is also called quantitative data or numerical data.

## Attribute / variable data definition and examples

Let's start with Attribute data. When you can only put your data in categories, it is referred to as attribute data. Therefore, it consists of labels or names where each label or name indicates a separate category of the data.

Below list contains examples of attribute data:

- Yes, No
- Go, No Go
- Machine 1, Machine 2, Machine 3
- Green, Red, Orange, Blue
- Pass/Fail

Variable Data is data that you can quantify. While with attribute data you can only count the number of items within a category, with variable data you can numerically describe the characteristic itself.

Below list contains examples of variable data:

- number of people in a bus
- temperature

## Attribute data subtypes and examples

within attribute data, you have 3 subcategories: Binary data, nominal data, and ordinal data.

### Binary data definitionand examples

Binary data is data that has only two possible outcomes. Therefore the data can thus only be classified into two categories. Examples are:

- On / Off
- Yes/No
- Pass/Fail
- Good/Bad
- Agree/Disagree

### Nominal data definition and examples

Nominal data is data that is descriptive, and not numeric., but with more than two categories. It consists of names, labels or categories. The data cannot be arranged in an ordering scheme in a way that the ordering makes sense. Therefore, no arithmetic operations can be performed on the nominal data itself. Examples are:

- City of birth: Amsterdam, Tokyo, New York, …
- blood type: A, B, O, AB
- car brands: Ford, Mercedes, Tesla, Jeep,…
- last holiday county: US, UK, China, Spain,…
- colors: green, blue, yellow, ….
- phone numbers

### Ordinal data definition and examples

Ordinal Data is data that is descriptive, and not numeric, with more than two categories. It consists our of names, labels or categories. Ordinal data can be arranged in some order in a way that it makes sense, but differences between data values either cannot be determined or are meaningless. No arithmetic operations can be performed on the ordinal data itself. Examples are:

- service rating: poor, neutral, good, best
- sport results: first place, second place, third place
- clothing sizes: XS, S, M, L, XL
- automobile sizes: subcompact, compact, intermediate, full size, luxury

## Variable data subtypes and examples

Variable data can further be divided in interval or ratio data. Another possibility to divide variable data further is into discrete or continuous data. Discrete data is all about counting while continuous data is all about measurements. Lets first have a look at the definitions and then clarify it further with examples of all the possible combinations.

### Discrete Data definition

Firstly, discrete data is data that can only take point values (1, 2, 3, ...) and no values in between. It is all about counting.

### Continuous Data definition

secondly, continuous data is data that can take values in between point values. As a result, this means that between two values you can always find a value that makes sense. Therefore continuous Data can take on any value on a continuous scale. continuous data is usually associated with some sort of physical measurement.

### Interval data definition

Interval data is data that can be arranged in some order and for which differences in data value are meaningful. Addition and subtraction can thus be performed. The data can be arranged in an ordering scheme and differences can be interpreted. Multiplication and division are not possible or do not make sense. And most importantly, the Zero-point is arbitrary. Negative values can exist.

### Ratio data definition

Ratio data is data that can be ranked and for which all arithmetic operations including multiplication and division can be performed and make sense. Moreover, the data has an absolute zero and a value of zero indicates a complete absence of the characteristic of interest.

### discrete interval data examples

Year of birth is a good example of discrete interval data. You can order the years in a meaningful way. Somebody who is born in 1965 is older then somebody who is born in 2005. Difference can be interpreted. The older person is born 40 years before the younger person. Multiplication and division do not make sense. you can not say anything meaningful about somebody who will be born in 4010 (2 x 2005) compared to a person who is born in 2005. He/she will not be 2 times Younger/Older.

The year "0" is an arbitrary chosen year. Is does not mean that there is a complete absence of the characteristic of interest. People are also born in the year 0. further more you have " negative values". the years before Christus (B.C.).

Above illustrate clearly that the data is interval data.

Birth years are only expressed in point values. Nobody will have 2005,43 standing as birth year on its birth certificate. Therefore it is discrete interval data.

### discrete ratio data examples

Number of people on a bus is a good example of discrete ratio data. You can order the number of people on a bus in a meaningful way. A bus with 40 people has twice as more people on the bus then a bus with 20 people. when you have 0 people on the bus you have an absolute absence of the characteristic, "people on a bus".

### continuous interval data examples

Temperature expressed in Celsius or Fahrenheit is a good example of continuous interval data. You can do subtractions and additions. It makes sense if you say that tomorrow will be 5°C hotter then today. But if it is today 5°C and tomorrow 10°C it does not make sense to say that it will be two times hotter today. Since you can measure negative temperatures it is clear that 0 does not mean an absolute lack of the characteristic. The data is continuous if you measure it precisely enough. It is meaningful to say that the temperature is 20.3 °C.

### continuous ratio data examples

A good example of continuous ratio data is temperature expressed in Kelvin. 0 Kelvin means that there is an absolute lack of heat. As a result, nearly all molecular motion is stopped. The zero point is thus meaningful and means an absolute lack of the observed characteristic "heat". The data is also continuous if you measure it precisely enough. It is meaningful to say that the temperature is 303.48 kelvin.

Another example is age. On one hand, the zero point makes sense (birth ) and is an absolute lack of the characteristic (years alive). On the other hand, it makes sense to say that you are 4.5 years old. Multiplication and division makes also sense. Somebody who is 40 years old is twice as old as somebody who is 20 years old.

## data type in function of analysis

Although the age example is a good example of continuous ratio data, you first thought could have been that age is discrete ratio data. Because if you ask somebody how old he/she is, it is very unlikely that the person will answer 24,53 years.

You will measure your characteristics differently depending on what kind of data analysis you will do. How you measure the characteristic will determine the type of data you use for your data in your data analysis.

Imagine you want to know at which age persons drank their first beer. Not many will know the exact month or day, so you will limit the response point values (...,16,17,18,19,20,21,22,...). In this case "age" is discrete ratio data.

Imagine you want to know what the favorite tv show is of different age groups: baby, toddler, teener, ...), in this case you will probably create age groups (0-3, 3-6, 6-12, ...) in your survey. In this case "age" is an ordinal data.

Also the height of a person is continuous ratio data. But in theme parks you are not allowed on an attraction if you are under a certain height. In this case you could treat it as binary data.

## converting data into continuous data

The table below shows which statistical "six sigma" operation you can perform on the different types of data:

Data type: | Binary | Nominal | Ordinal | Interval | Ratio |
---|---|---|---|---|---|

Mode or frequency distribution | X | X | X | X | X |

Median and percentiles | X | X | X | ||

Mean & standard deviation | X | X | |||

Coefficient of variation | X |

Ratio gives you the most possibilities . Therefore, It is the most useful type of data to work with if you want to try to solve a problem using six sigma techniques. within Ratio data there is a preference for continuous data.

This mean that you have to try to convert your data into continuous data if you have the opportunity. Lets have a look at some examples:

- 40 defect could be transferred into 1.6 defect per 1000 hours of production
- 3 hole diameters out of tolerance could be transferred into 5,2 5,3 and 5,1mm diameter
- a go / no-go paint control could be transferred into a scratch of 12.4 cm

## summary

Below table gives a summary

Data type: | Binary | Nominal | Ordinal | Interval | Ratio |
---|---|---|---|---|---|

Category | max 2 | X | X | X | X |

Order is know | X | X | X | ||

Equal intervals Addition & substraction make snese | X | X | |||

True Zero Multiplcation and devision make sence | X |