How to Run Regression In Excel

Discover interesting correlations in data with regression analysis

Image of a statistical spreadsheet

Olena_T/Getty Images

If you've ever wanted to find a correlation between two things, using regression analysis in Excel is one of the best ways to do that.

Regression in Excel is a way to automate the statistical process of comparing several sets of information to see how changes in independent variables affect changes in dependent variables.

Instructions in this article apply to Excel 2019, 2016, 2013, 2010; Excel for Office 365, and Excel for Mac.

What's the Meaning of Regression?

Regression is a statistical modeling approach that analysts use to determine relationships between multiple variables.

Regression analysis starts with a single variable you're trying to analyze, and independent variables you're testing to see if they affect that single variable. The analysis looks at changes in the independent variables and attempts to correlate those changes with resulting changes in the single (dependent) variable.

This may sound like advanced statistics, but Excel makes this complex analysis available to anyone.

Performing Linear Regression in Excel

The simplest form of regression analysis is linear regression. Simple linear regression looks at the relationship between only two variables.

For example, the following spreadsheet shows data containing the number of calories a person has eaten each day, and their weight on that day.

Screenshot of a weight and calorie spreadsheet

Since this spreadsheet contains two columns of data, and one variable could potentially have an affect on the other, you can run a regression analysis on this data using Excel.

Before you can use Excel's regression analysis feature, you need to enable the Analysis ToolPak addon.

  1. Select the File menu, and select Options.

    Screenshot of Options in Excel
  2. Select Add-ins in the left navigation menu. Then, make sure Excel Add-ins is selected in the Manage field. Finally, select the Go button.

    Screenshot of adding Excel Add-ins
  3. In the Add-ins popup window. Enable Analysis ToolPack and select OK.

    Screenshot of enabling Analysis ToolPak in Excel
  4. Now that Analysis ToolPak is enabled, you're ready to start doing regression analysis in Excel.

How to Perform Simple Linear Regression in Excel

Using the weight and calories spreadsheet as an example, you can perform a linear regression analysis in excel as follows.

  1. Select the Data menu. Then, in the Analysis group, select Data Analysis.

    Screenshot of selecting Data Analysis in Excel
  2. In the Data Analysis window, select Regression from the list and click OK.

    Screenshot of selecting Regression data analysis
  3. The Input Y Range is the range of cells that contains the dependent variable. In this case that's the weight. The Input X Range is the range of cells that contains the independent variable. In this case that's the calorie column. Select Labels for the header cells, and then select New Worksheet to send results to a new worksheet.

    Screenshot of configuring regression analysis in Excel
  4. Select OK to have Excel run the analysis and send the results into a new sheet. The analysis output has a number of values that you'll need to understand to interpret the results.

    Screenshot of regression analysis output in Excel

Each of these numbers has the following meanings:

  • Multiple R: The Correlation Coefficient. 1 means there's a strong correlation between the two variables. -1 means there's a strong negative relationship. 0 means there's no correlation.
  • R Square: The Coefficient of Determination, which shows how many points between the two variables fall on the regression line. Statistically, this is the sum of the squared deviations from the mean.
  • Adjusted R Square: A statistical value called R square that's adjusted for the number of independent variables you've chosen.
  • Standard Error: How precise the regression analysis results are. If this error is small then your regression results are more accurate.
  • Observations: The number of observations in your regression model.

The remaining values in the regression output give you details about smaller components in the regression analysis.

  • df: Statistical value known as degrees of freedom related to the sources of variance.
  • SS:┬áSum of squares. The ratio of the residual sum of squares versus the total SS should be smaller if most of your data fits the regression line.
  • MS: Mean square of the regression data.
  • F: The F statistic (F-test) for null hypothesis. This provides the significance of the regression model.
  • Significance F: Statistical value known as P-value of F.

Unless you understand statistics and calculating regression models, the values at the bottom of the summary won't have a lot of meaning. However the Multiple R and R Square are the two most important.

As you can see in this example, calories have a very strong correlation to total weight.

Multiple Linear Regression Analysis in Excel

To perform the same linear regression above but with multiple independent variables, you can just select the entire range (multiple columns and rows) for the Input X Range.

Screenshot of selecting a range for Input X Range

When selecting multiple independent variables, it's less likely to find as strong a correlation because there are so many variables.

However a regression analysis in Excel can help you find correlations with one or more of those variables that you may not realize exists just by reviewing the data manually.