Chapter 1.  General Introduction

Practicing biologists need a strong foundation in experimental design and data analysis; these skills allow biologists to transform an idea or hypothesis into a conclusion.

Biologists who do not perform (experimental) research also require these skills to critically evaluate a study (is the study likely reliable?), draw meaning from a study’s conclusions, and understand the limits of the conclusions. 

Reason for being

Starting in the early 2010’s, prominent research demonstrated that results from many studies were not “replicable”; i.e., when scientists tried to re-perform previous research they obtained different results and conclusions from the original work (see Chapter, “Questionable Research Practices”).  This observation raised serious doubt regarding the reliability of scientific findings. 

As a result, biology and other sciences have rapidly evolved to improve reproducibility.  For example, journals are updating requirements for authors, granting agencies require additional consideration for proposed experiments, and new models of publishing have arisen (e.g.,'registered reports').  This website aims to provide training in experimental design and data analysis to meet these new developments.

Does this mean that this website is only useful to people who wish to work in academia?  No; this website can help a great diversity of people.  For example, many companies need to know how to design better experiments, and many use R (the same software we use) for data science:

View example companies that use R

Some example companies that use R include Starbucks, Kraft Heinz, Twitter, United Airlines, etc. (please note that by listing these companies, we do not endorse their products or business model; we only selected names that may be familiar to you).  Our primary motivation for creating this website is to improve research reproducibility, generally, and the principles we teach apply beyond academia.

Structure and content

This website provides resources to design and conduct reproducible research.  We begin with an introduction to hypothesis testing, which lays the foundation to understand the principles behind a vast array of statistical tests.  We then discuss plotting data, how to estimate values with uncertainty, and introduce t-tests. 

Some research programs rely almost exclusively on  t-tests. Therefore, after introducing t-tests we discuss approaches to improve research reproducibility in such experiments.  Specifically, we discuss experimental design (including pseudo-replication and power analysis) and Questionable Research Practices.

With this foundation to conduct reliable experiments with simple experimental designs in hand, we then introduce General Linear Models (GLMs) as a powerful and flexible approach to analyze data for a vast array of experimental designs.  For each form of GLM, we explore power analysis and aspects of experimental design. 

Software

We use the software 'R’ throughout this website.  R is a computing language with a vast array of statistical and computations tools.  It is widely used, has facilities to promote reproducibility of analyses, is Open Source, and freely downloadable.  Some prior knowledge is required before using R effectively, however, our experience with hundreds of undergraduates is that these skills grow quickly.

Read further information about The R Project for Statistical Computing.

This is not a course about R

Please note that, while we use R for many purposes, this website does not present a course about R, per se.  Instead, this website aims to teach principles for experimental design and analysis that can be implemented in many statistical software packages.  For example, when discussing t-tests, we spend most of our time learning how t-tests work and when they should be used in research; these principles can be applied using any statistical software.

Building experience

Practice is essential to learn data analysis and experimental design (these are skills to be learned).  Therefore, this website walks through example analyses (using R), and offers practice problems with answers. 

Accessibility 

To improve accessibility (e.g., with weak internet connections), we provide access to slides and other materials (where appropriate) for all videos.  Please note that all videos include subtitles (we apologize in advance that the subtitles are not perfect, however).  A user can download a transcript for any video by: (i) clicking on the link to the video (hosted by MediaHopper CREATE) and then click on the University of edinburgh logo in the bottom-right of the video; (ii)  Select the “Attachments” tab below the video; (iii) click on the “download” icon (a picture of a downward facing arrow) in the ‘Actions’ column; you are given the choice of downloading a .txt fie or a .json file

We also provide practice problems with answers, suggested further reading, and R code to help users take their next steps.