As an introduction to statistics –** statistics** is a branch of mathematics concerned with collection, organisation, and analysis of numerical information called data.

What is **data**? Data is a collection of information obtained from people or through conducting experiments. Data are records of information, and you get data by counting or measuring.

There are mainly two types of data: categorical and quantitative.

**Categorical data** describe a category or a group of people or things. It cannot be counted or measured. For example, nationality, eye colour, sports type, method of transport, and gender.

There are two types of categorical data: ordinal and nominal. **Ordinal** **data** describes the order of the data, say first, second, third, and so on. **Nominal** **data**, on the other hand, describes the type or characteristic of data like gender, colour or hair, occupation, etc.

**Quantitative** **data,** on the other hand, can be measured and it has values. There are two types of quantitative data: discrete and continuous. **Discrete data** have exact value, and are separate and distinct, and there are no in-between values. Examples of discrete data are shoe sizes, exam marks, number of people in a room, etc.

**Continuous data** can take any values between certain limits. It has a range of values. Examples are height, weight, temperature or the speed of a car.

The chart below summarises this brief introduction to statistics:

There are different ways of collecting statistical information (or data). One of the common ways of collecting data is through a survey. The survey contains a set of questions that are clear and relevant to the people being asked. Also the questionnaire (containing the list of questions) should be completely objective and unbiased.

When the survey involves asking the same questions to *all* members of a group or population, then it is called a **census**. Census’ are typically complex and detailed projects with a huge scope for getting information, so they tend to be quite time consuming and costly. That’s why census’ are rarely done, and generally governments do it once every 5 or 10 years to collect and update the information relating to their country’s population.

When a survey is conducting with a cross-section of a population, it is called a **sample**. A sample needs to be **random** and **representative**. Random means the people chosen to be part of the sample are chosen totally randomly, and any one person is just as likely to be selected than another person. In other words, all people have an equal chance of being selected. Representative means the number of people selected is big enough to represent the entire population.

If a sample does not satisfy the above conditions, then is considered biased, and the results obtained by analysing this data will not be truly representative of the entire population, hence unreliable. A sample is typically a much smaller number compared to the total population it represents.

There are three main sampling techniques:

- A
**simple random sample**involves choosing a member of the population at random, and s/he has the same chance of being selected as everybody else. For example, drawing a name out of a hat, or choosing a person from a crowd, etc. - A
**systematic random sample**(or systematic sample) is obtained by choosing a member of the population after every equally spaced interval. For example, choosing every tenth person entering an auditorium. - A
**stratified random sample**(or stratified sample) is obtained by dividing the population into groups or strata, and selecting a random sample from each group or strata. For example, choosing 10% of students from every grade to make a working bee committee.

We now look at various types of graphs to represent statistical data, or data in general.