Reading Graphs

It's very helpful to display data visually for easier analysis. That's where graphs come in! This section will discuss different types of graphs and how to interpret them.


There are a few types of variables that are used in statistics and can impact what the best way to visualize them is.

Categorical Variables are different groups (or categories) such as "wet" and "dry" that are qualitative rather than numerical.

Discrete Variables are quantitative but must be whole numbers. Essentially, it's something you can "count", such as the number of times I walked my dog each day. There are no decimals in each individual data point (although the average can have a decimal) - I can't have taken my dog for a fraction of a walk, he either went for a walk or didn't.

Continuous Variables are also quantitative but they can have decimals and have a basically an infinite number of values between points. An example of this is time, temperature, the concentration of a substance, etc. It could be 23°C, 24°C, or 23.2415°C. Likewise, you could have 5M of glucose just as easily as you can have 3.12M of glucose.

Graphing Essentials

There are a few key things that you should keep in mind whenever you are constructing a graph. Usually, the dependent variable will be plotted on the y-axis and the independent variable will be plotted on the x-axis.


The major points can be remembered with the acronym "SULTAN"

Scale - First, be sure to include a scale and have numbers on your axes where appropriate. Make sure that your scale on your graph is consistent (e.g. each box/tick should go up by the same amount). Choose an appropriate scale such that you fill almost the entire graph area.

Units - Include units for any measured values where appropriate.

Labels - Identify what the variable was with each axis. If needed, include a legend, or key, to describe what different colors/designs represent in situations with more than one set of data.

Title - Include a title summarizing what the graph shows. Oftentimes in publishes research, the titles are replaced by information in the figure legend.

Accuracy - This one should go without saying, but be sure to accurately plot the data.

Neatness - Make the graph neat so it's possible to easily read and understand.

Bar Graphs

Bar graphs are best used when comparing data from different groups. In fancier words, one of the two variables will be categorical if you are using a bar graph. Each group will have its own bar, which can display individual data points or calculated averages.

There are times where you will have more than one categorical variable along with your quantitative variable. In these situations, a grouped bar chart is appropriate. The position on the axis indicates how it falls into one of the groups, while the color or design of the bar (such as solid color vs striped) identifies the second group that data belongs to.

Scatterplot

If you have two quantitative variables and are trying to look at potential trends or correlations, scatterplots are an extremely useful tool. Each individual datapoint should be plotted as a dot on the graph, and the points should not be connected.

The general pattern of the dots can give you helpful information. For example, a bell-shaped curve indicates normal distribution while a concave upward curve indicates exponential growth.

If the two variables have a linear relationship, regression lines can be drawn. These are straight lines that best describe how the dependent variable changes as the independent variable changes. These lines can then be used to predict the value of the independent variable for a specific dependent variable value.

If the two variables have a positive correlation, that means one goes up as the other goes up. If they have a negative correlation, then as one goes up, the other goes down.

Line Graph

One of the most commonly known types of graphs, line graphs are used to display the relationship between two quantitative variables. They are used for two continuous variables, or for a discrete variable that is changing based on a continous variable.

Dual-Y Line Graph

Sometimes, you want to look for potential relationships between multiple variables. A dual-y line graph has two y-axes, each representing a different variable, and each with its own line(s). A legend is essential for this so that readers know which line goes with which y-axis.

Line Graph (Semi-Log)

In a semi-log line graph, we break the rule about going up by the same amount each tick for our axes. In these, one axis (usually y) is plotted on a logarithmic scale, while the other axis is on a linear scale like normal. 

These take a non-linear graph and make them appear linear for an easier comparison of the variables and look into the relationship.

These are useful when there is a very large range of values for one of your variables.

Histogram

Sometimes, rather than just plotting the average of a set of data, it is useful to display the distribution of data. The x-axis has a range of numbers known as bins. The frequency of datapoints within each bin is then what is plotted.

These are particularly useful for seeing if data is parametric (normally distributed) or nonparametric.

Box and Whisker

Another way to look at the distribution of data, box and whisker graphs (or box plots) are better at allowing for the comparison of different datasets. These graphs use five pieces of data:


This type of graph is named because of the shape that it takes on. The "box" is the middle part, which contains 50% of the data. The "whiskers" are the lines on either end of the box, which display the top and bottom 25% of the datapoints.

Pie Chart

Pie charts are used to compare the parts of a whole. 


These are most useful when there are few categories, however bar graphs are often capable of visualizing comparisons more easily.

Statistical Analysis with Graphs

When graphing averages, it is helpful to indicate the uncertainty in your data. Error bars are graphic representations of uncertainty that help to show how precise the measurement is. The larger the error bar, the more uncertain the data is.

Error bars can also be used to display standard deviation, which shows variation in data rather than uncertainty. More rarely, confidence intervals will be used, which shows the reliability of the measurement.

Typically, we will use s.e.m. (standard error of the mean) bars. Oftentimes, 2SEM (or double the standard error of the mean) is used as our error bars. The bar should extend down from the average (either the dot in a line graph or the top of the bar in a bar graph) by the 2SEM value and extend up from the average by the 2SEM. The data within the error bar then incorporates approximately 95% of the means in the data, approximating a 95% confidence interval.

We can use error bars to help determine if observed differences are meaningful. If the error bars overlap between two datasets, that suggests that there may not be a statistical difference between them, even if the averages are different. While a statistical test would need to be performed to determine this, for many classes (such as AP Biology), whether they overlap or not is considered enough to make the conclusion.

Sometimes, specific statistical tests were used (such as a t-test or ANOVA) to determine if there was a statistical difference between values that are being graphed. In this situation, asterisks are often used to indicate a statistical difference. The test used and what significance level was determined (typically via the p value) are often then described in the figure legend.