December 2021 How I teach life scientists… Gaps, Missteps, and Errors in Data Analysis

How I Teach Life Scientists Talk Series

The “How I Teach” talk series is an invitation for anyone delivering professional development to life scientists and educators to share their curriculum, tips, technologies, and approaches. Email info@lifescitrainers.org to participate or complete a submission form to sign up to give a short talk and/or demo of the teaching skill you want to share. See full blog post for details.

Time and Date for Talks

LifeSciTrainers Community Calls December 2021

Register on Zoom for our community call or Join our Slack for more details.

YouTube: Link


How I teach life scientists… Gaps, Missteps, and Errors in Data Analysis

Arjun Krishnan, Michigan State University

Format: Short talk and demo

Key “take home” points

  1. What the practical definitions of various concepts and their relationships are?
  2. When and why they are applied in certain situations and not others?
  3. What is a robust sequence of actions to take when applying them to data?
  4. How to judiciously interpret the results?
  5. How to clearly and transparently communicate the findings?

Abstract

As many parts of biological and biomedical research are becoming data-intensive, the typical graduate training in this area involves courses in coding and statistics. In these courses, students learn the array of summary statistics and hypothesis tests available and the functions in R or Python to wrangle data, make plots, calculate various statistics, and perform different tests. However, students face a significant challenge when trying to apply these learnings to novel research questions and new complex datasets. This is because such an application requires them to know: i) what the practical definitions of various concepts and their relationships are, ii) when and why they are applied in certain situations and not others, iii) what is a robust sequence of actions to take when applying them to data, iv) how to judiciously interpret the results, and, finally, v) how to clearly and transparently communicate the findings. Currently, most students learn these ideas only by piecing together a mental model of the “acceptable/standard/best practices” in their field based on bits information from their mentors, peers, and – often – published papers.

Towards addressing this challenge, we developed a short course focused on discussing common misunderstandings and typical errors in the practice of statistical data analysis, and providing a mental toolkit for critically thinking about statistical methods and results. The course covers twelve major topics: i) The scientific method, Critically reading literature, Cognitive biases; ii) Estimation of error and uncertainty; iii) P-value, P-hacking, Publication bias, Multiple hypothesis correction; iv) Statistical power, Underpowered statistics; v) Pseudoreplication, Confounding variables; vi) Circular analysis, Regression to the mean, Sampling biases; vii) Base rates, Conditional probabilities, Bayesian reasoning, viii) Measuring associations in continuous big data; ix) Issues with high dimensional data, Evaluating predictive models; x) Challenges in data presentation and visualization, Communicating statistics; xi) Principles for effective data management, analysis, and sharing; and xii) Code management and sharing, Reproducible research. The course builds on introductory statistics and coding, and includes several hands-on exercises. We are continuing to improve the course to better train students in skills critical to the day-to-day use and consumption of data analysis.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.