Bas(e|ic) R
Lesson 2 with Ian Carroll
Contents
- Why learn R?
- The Editor
- Data types
- Multi-dimensional data structures
- Load data into R
- Parts of an Object
- Functions
- Flow control
- Linear models
- Review
- Exercise solutions
Why learn R?
- High-level programming language good for interactive statistical analysis
- General purpose programming language for scripting entire data-processing pipelines
- Large selection of add-on packages that extend the capabilities of “base R”
- Large user community especially within statistics and ecology
- Open source
The Editor
The console is for evaluating commands you don’t intend to keep or reuse. It’s useful for testing commands and poking around. The editor is where you compose scripts that will process data, perform analyses, code up visualizations, and even write reports.
These work together in RStudio, which has multiple ways to send parts of the script you are editing to the console for immediate evaluation. Alternatively you can “source” the entire script.
Open up “worksheet-2.R” in the editor, and follow along by replacing the ...
placeholders with the code here. Then evalute just this line (Ctrl+Enter on Windows, ⌘+Enter on Mac OS).
vals <- seq(1, 100)
Let’s review the elements of this statement, from left to right:
vals
is the name of a (new) variable<-
is an operator that assigns what’s named on the left to equal the result of the expression on the rightseq
is the name of the function(
is the opening paren of the function call1
and100
are both arguments, or parameters, to the function)
is the closing paren of a function call
- Question
- Why call
vals
a “variable” andseq
a “function”? - Answer
- It is true they are both names of objects known to R, and could be called variables. But
seq
has the important distinguishing feature of being callable, which is indicated in documentation by writing the function name with empty parens, as inseq()
.
Our call to the function seq
could have been much more explicit. We could give the arguments by the names that seq
is expecting.
vals <- seq(from = 1,
to = 100)
Run this code either line-by-line, or highlight the section to run (optionally with keyboard shortcut Ctrl-Return or ⌘ Return).
- Question
- What’s an advantage of naming arguments?
- Answer
- One advantage is that you can put them in any order. A related advantage is that you can then skip some arguments, which is fine to do if each skipped argument has a default value. A third advantage is code readability, which you should always be concious of while writing in the editor.
Readability
Code readability in the editor cuts both ways: sometimes verbosity is useful, sometimes it is cumbersome.
The seq()
function has an operator form available when only the from
and to
arguments are needed.
1:100
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
[18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
[35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
[52] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
[69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
[86] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
The :
operator should be used whenever possible because it replaces a common, cumbersome function call with an brief, intuitive syntax.
Likewise, the assign
function duplicates the functionallity of the <-
symbol, but is never used when the simpler operator will suffice.
Function documentation
How would you get to know these properties and the names of a function’s arguments?
?seq
How would you even know what function to call?
??sequence
Data types
Type | Example |
---|---|
integer | -4, 0, 999 |
double | 3.1, -4, Inf, NaN |
character | ‘a’, “4”, “👏” |
logical | TRUE, FALSE |
missing | NA |
Data structures
Compound objects, built from one or more of these data types, or even other objects.
Common one-dimensional, array data structures:
- Vectors
- Lists
- Factors
Vectors
Vectors are the basic data structure in R. They are a collection of data that are all of the same type. Create a vector by combining elements together using the function c()
. Use the operator :
for a sequence of numbers (forwards or backwards), otherwise separate elements with commas.
counts <- c(4, 3, 7, 5)
All elements of an vector must be the same type, so when you attempt to combine different types they will be coerced to the most flexible type.
c(1, 2, "c")
[1] "1" "2" "c"
Lists
Lists are like vectors but their elements can be of any data type or structure. You construct lists by using list()
instead of c()
.
list(1, 2, "c")
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] "c"
Lists can even include another list!
list(1, list(2, 3))
[[1]]
[1] 1
[[2]]
[[2]][[1]]
[1] 2
[[2]][[2]]
[1] 3
Exercise 1
Look at the outputs of list(1, list(2, 3))
and c(1, list(1, 2))
. Store the output of each command as new variables, and then examine each variable’s structure with the str()
function. What’s different about the structure of the two variables? Are both lists?
Factors
A factor is a vector that can contain only predefined values, and is used to store categorical data. Factors are built on top of integer vectors using two attributes: the class()
, “factor”, which makes them behave differently from regular integer vectors, and their levels()
, or the set of allowed values.
Use factor()
to create a vector with predefined values, which are often characters or “strings”.
education <- factor(
c("college", "highschool", "college", "middle"),
levels = c("middle", "highschool", "college"))
str(education)
Factor w/ 3 levels "middle","highschool",..: 3 2 3 1
A factor can be unorderd, as above, or ordered with each level somehow “less than” the next.
education <- factor(
c("college", "highschool", "college", "middle"),
levels = c("middle", "highschool", "college"),
ordered = TRUE)
str(education)
Ord.factor w/ 3 levels "middle"<"highschool"<..: 3 2 3 1
Multi-dimensional data structures
Data can be stored in several types of data structures depending on its complexity.
Dimensions | Homogeneous | Heterogeneous |
---|---|---|
1d | c() | list() |
2d | matrix() | data.frame() |
nd | array() |
Of these, the data frame is far and away the most used.
Data frames
Data frames are 2-dimensional and can contain heterogenous data like numbers in one column and a factor in another.
It is the data structure most similar to a spreadsheet, with two key differences:
- Data frames columns are equal-length vectors.
- As vectors, the columns are homogeneous and cannot hold values of the wrong type.
Creating a data frame from scratch can be done by combining vectors with the data.frame()
function.
df <- data.frame(education, counts)
df
education counts
1 college 4
2 highschool 3
3 college 7
4 middle 5
Some functions to get to know your data frame are:
Function | Output |
---|---|
dim() |
dimensions |
nrow() |
number of rows |
ncol() |
number of columns |
names() |
(column) names |
str() |
structure |
summary() |
summary info |
head() |
shows beginning rows |
names(df)
[1] "education" "counts"
Exercise 2
Create a data frame with two columns, one called “species” and another called “abund”. Store your data frame as a variable called data
. You can do this with or without populating the data frame with values.
Load data into R
We will use the function read.csv()
that reads in a Comma-Separated-Values file by passing it the location of the file.
The essential argument for the function to read in data is the path to the file, and optinal additional arguments specify additional ways of reading the data.
Additional file types can be read in using read.table()
; in fact, read.csv()
is a simple wrapper for the read.table()
function that specifies default values for some of the optional arguments.
Type a comma after read.csv(
and then press tab to see what arguments that this function takes.
Hovering over each item in the list will show a description of that argument from the help documentation about that function.
Specify the values to use for an argument using the syntax name = value
.
read.csv(file = "data/plots.csv", header = TRUE)
id treatment
1 1 Spectab exclosure
2 2 Control
3 3 Long-term Krat Exclosure
4 4 Control
5 5 Rodent Exclosure
6 6 Short-term Krat Exclosure
7 7 Rodent Exclosure
8 8 Control
9 9 Spectab exclosure
10 10 Rodent Exclosure
11 11 Control
12 12 Control
13 13 Short-term Krat Exclosure
14 14 Control
15 15 Long-term Krat Exclosure
16 16 Rodent Exclosure
17 17 Control
18 18 Short-term Krat Exclosure
19 19 Long-term Krat Exclosure
20 20 Short-term Krat Exclosure
21 21 Long-term Krat Exclosure
22 22 Control
23 23 Rodent Exclosure
24 24 Rodent Exclosure
- Question
- Is the
header
argument necessary? - Answer
- No. Look at
?read.csv
to see thatTRUE
is the default value for this argument.
Use the assignment operator “<-“ to store that data in memory and work with it
plots <- read.csv(file = "data/plots.csv")
animals <- read.csv(file = "data/animals.csv")
You can specify what indicates missing data in the read.csv function using either na.strings = "NA"
or na = "NA"
.
You can also specify multiple things to be interpreted as missing values, such as na.strings = c("missing", "no data", "< 0.05 mg/L", "XX")
.
After reading in the “animals.csv” and “plots.csv” files, let’s explore what types of data are in each column and what kind of structure your data has.
str(plots)
'data.frame': 24 obs. of 2 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ treatment: Factor w/ 5 levels "Control","Long-term Krat Exclosure",..: 5 1 2 1 3 4 3 1 5 3 ...
summary(plots)
id treatment
Min. : 1.00 Control :8
1st Qu.: 6.75 Long-term Krat Exclosure :4
Median :12.50 Rodent Exclosure :6
Mean :12.50 Short-term Krat Exclosure:4
3rd Qu.:18.25 Spectab exclosure :2
Max. :24.00
Exercise 3
By default, all character data is read in to a data.frame as factors.
Use the read.csv()
argument stringsAsFactors
to suppress this behavior, then subsequently modify the sex
column in animals
to make it a factor.
Columns of a data.frame
are identified to the R interpreter with the $
operator, e.g. animals$sex
. We’ll see more on this below.
Parts of an Object
Parts of objects are always accessible, either by their name or by their position, using square brackets: [
and ]
.
Position
counts[1]
[1] 4
counts[3]
[1] 7
Names
Parts of an object can usually also have a name. The names can be given when you are creating a vector or afterwards using the names()
function.
df['education']
education
1 college
2 highschool
3 college
4 middle
names(df) <- c('ed', 'ct')
df['ed']
ed
1 college
2 highschool
3 college
4 middle
- Question
- This use of
<-
withnames(x)
on the left is a little odd. What’s going on? - Answer
- We are overwriting an existing variable, but one that is accessed through the output of the function on the left rather than the global environment.
In a multi-dimensional array, you separate the dimension along which a part is requested with a comma.
df[3, 'ed']
[1] college
Levels: middle < highschool < college
It’s fine to mix names and indices when selecting parts of an object.
Subsetting ranges
There are multiple ways to simultaneously extract multiple parts of an object.
Use in brackets | Subset instructions |
---|---|
positive integers | elements at the specified positions |
negative integers | omit elements at the specified positions |
logical vectors | select elements where the corresponding value is TRUE |
nothing | return the original vector (all) |
days <- c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
weekdays <- days[2:6]
weekend <- days[c(1, 7)]
weekdays
[1] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday"
weekend
[1] "Sunday" "Saturday"
Exercise 4
- Get weekdays using negative integers.
- Get M-W-F using a vector of postitions generated by
seq()
that uses theby
argument (don’t forget to?seq
for help).
Subsetting data frames
The $
sign is an operator that makes for quick access to a single, named part of an object.
It’s most useful when used interactively with “tab completion” on the columns of a data frame.
df$ed
[1] college highschool college middle
Levels: middle < highschool < college
A logical test applied to a single column produces a vector of TRUE
and FALSE
values that’s the right length for subsetting the data.
df[df$ed == 'college', ]
ed ct
1 college 4
3 college 7
Functions
The purpose of R functions is to package up a batch of commands. There are several reasons to develop functions
- reuse
- readability
- modularity
- consistency
Writing functions to use multiple times within a project prevents you from duplicating code, a real time-saver when you want to update what the function does. If you see blocks of similar lines of code through your project, those are usually candidates for being moved into functions.
Anatomy of a function
Writing functions is also a great way to understand the terminology and workings of R.
Like all programming languages, R has keywords that are reserved for import activities, like creating functions.
Keywords are usually very intuitive, the one we need is function
.
function(...) {
...
return(...)
}
Three components:
- arguments: control how you can call the function
- body: the code inside the function
- return value: controls what output the function gives
We’ll make a function to extract the first row of its argument, which we give a name to use inside the function:
function(dat) {
result <- dat[1, ]
return(result)
}
Note that dat
doesn’t exist until we call the function, which merely contains the instructions for how any dat
will be handled.
Finally, we need to give the function a name so we can use it like we used c()
and seq()
above.
first <- function(dat) {
result <- dat[1, ]
return(result)
}
first(df)
ed ct
1 college 4
- Question
- Can you explain the result of entering
first(counts)
into the console? - Answer
- The function caused an error, which prompted the interpreter to print a helpful error message. Never ignore an error message. (It’s okay to ignore a “warning”.)
Flow control
The R interpreter’s “focus” flows through a script (or any section of code you run) line by line. Without additional instruction, every line is processed from the top to bottom. “Flow control” is the generic term for causing the interpreter to repeat or skip certain lines, using concepts like “for loops” and “if/else conditionals”.
Flow control happens within blocks of code isolated between curly braces {
and }
, known as “statements”.
if (...) {
...
} else {
...
}
The keyword if
must be followed by a logical test which determines, at runtime, what to do next.
The R interpreter goes to the first statement if the logical value is TRUE
and to the second statement if it’s FALSE
.
An if/else conditional would allow the first
function to avoid the error thrown by calling first(counts)
.
first <- function(dat) {
if (is.vector(dat)) {
result <- dat[1]
} else {
result <- dat[1, ]
}
return(result)
}
first(df)
ed ct
1 college 4
first(counts)
[1] 4
Linear models
The formula requires a response variable left of a “~” and any number of predictors to its right.
Formula | Equivalent | Description |
y ~ a |
y ~ 1 + a |
constant and one predictor |
y ~ -1 + a |
y ~ 0 + a |
one predictor with no constant |
y ~ a + b |
y ~ 1 + a + b |
constant and two predictors |
y ~ a:b |
constant and one predictor, the interaction of (at least) one factor and another variable | |
y ~ a*b |
y ~ 1 + a + b + a:b |
constant and three predictors |
y ~ a*b - a |
y ~ 1 + b + a:b |
constant and two predictors |
y ~ (a + b + ... )^n |
constant and all combinations of predictors up to order n |
Linear model
In addition, certain functions are allowed within the formula definition.
animals <- read.csv('data/animals.csv', stringsAsFactors = FALSE, na.strings = '')
fit <- lm(
log(weight) ~ hindfoot_length,
data = animals)
summary(fit)
Call:
lm(formula = log(weight) ~ hindfoot_length, data = animals)
Residuals:
Min 1Q Median 3Q Max
-2.39077 -0.21749 -0.05046 0.15017 2.08463
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.5604389 0.0072416 215.5 <2e-16 ***
hindfoot_length 0.0650048 0.0002357 275.8 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3943 on 30736 degrees of freedom
(4811 observations deleted due to missingness)
Multiple R-squared: 0.7122, Adjusted R-squared: 0.7122
F-statistic: 7.607e+04 on 1 and 30736 DF, p-value: < 2.2e-16
Exercise 5
Regress hindfoot_length against weight and species_id. Does it appear that the Chihuahuan Desert’s common kangaroo rat (DM) have inordinately large feet for their weight?
Pay attention to factors
Data type matters in statistical modelling. For the predictors in a linear model, the most important distinction is discrete versus continuous.
animals$species_id <- factor(animals$species_id)
fit <- lm(
log(weight) ~ species_id,
data = animals)
summary(fit)
Call:
lm(formula = log(weight) ~ species_id, data = animals)
Residuals:
Min 1Q Median 3Q Max
-2.28157 -0.10063 0.02803 0.12574 1.48272
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.12857 0.03110 68.448 < 2e-16 ***
species_idDM 1.62159 0.03117 52.031 < 2e-16 ***
species_idDO 1.74519 0.03134 55.690 < 2e-16 ***
species_idDS 2.63791 0.03139 84.024 < 2e-16 ***
species_idNL 2.89645 0.03170 91.373 < 2e-16 ***
species_idOL 1.29724 0.03181 40.780 < 2e-16 ***
species_idOT 1.04031 0.03142 33.110 < 2e-16 ***
species_idOX 0.91176 0.09066 10.056 < 2e-16 ***
species_idPB 1.30426 0.03135 41.609 < 2e-16 ***
species_idPE 0.92374 0.03165 29.188 < 2e-16 ***
species_idPF -0.07717 0.03155 -2.446 0.014447 *
species_idPH 1.28769 0.04869 26.446 < 2e-16 ***
species_idPI 0.82629 0.08004 10.323 < 2e-16 ***
species_idPL 0.79433 0.04665 17.029 < 2e-16 ***
species_idPM 0.90396 0.03189 28.349 < 2e-16 ***
species_idPP 0.69278 0.03133 22.114 < 2e-16 ***
species_idPX 0.81448 0.15075 5.403 6.61e-08 ***
species_idRF 0.45257 0.03934 11.505 < 2e-16 ***
species_idRM 0.20908 0.03137 6.665 2.70e-11 ***
species_idRO 0.18447 0.08004 2.305 0.021191 *
species_idRX 0.56824 0.15075 3.769 0.000164 ***
species_idSF 1.87613 0.04504 41.656 < 2e-16 ***
species_idSH 2.10024 0.03572 58.803 < 2e-16 ***
species_idSO 1.80152 0.04504 40.000 < 2e-16 ***
species_idSS 2.32672 0.15075 15.434 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2086 on 32258 degrees of freedom
(3266 observations deleted due to missingness)
Multiple R-squared: 0.9216, Adjusted R-squared: 0.9215
F-statistic: 1.58e+04 on 24 and 32258 DF, p-value: < 2.2e-16
The difference between 1 and 24 degrees of freedom between the last two models—with one fixed effect each—arises from the discreteness of species_id
.
Review
In this introduction to R, we briefly touched on several key principle of scientific programming.
- Data types
- Assignment
- Reading data
- Subsetting data
- Functions (keyword
function
) - Statistics
Special characters in R
Perhaps more than most languages, an R script can appear like a jumble of archaic symbols. Here is a little table of characters to recognize as having special meaning
Symbol | Meaning |
---|---|
? |
get help |
# |
comment |
: |
sequence |
:: , ::: |
access namespaces (advanced) |
<- |
assignment |
$ , [ ] , [[ ]] |
subsetting |
% % |
infix operators, e.g. %*% |
{ } |
statements |
. |
|
@ |
slot (advanced) |
Yes, the .
in R has no fixed meaning and is often used as _
might be used to separate words in a variable name.
Exercise solutions
Solution 1
x <- list(1, list(2, 3))
y <- c(1, list(2, 3))
str(x)
List of 2
$ : num 1
$ :List of 2
..$ : num 2
..$ : num 3
str(y)
List of 3
$ : num 1
$ : num 2
$ : num 3
The variable x
contains two elements, a number and a list. The variable y
has concatenation of the two arguments, coerced to the more flexible of the two (a list is more flexible than a number). Both x
and y
are lists.
Solution 2
species <- c()
abund <- c()
data <- data.frame(species, abund)
str(data)
'data.frame': 0 obs. of 0 variables
Solution 3
animals <- read.csv('data/animals.csv', stringsAsFactors = FALSE, na.strings = '')
animals$sex <- factor(animals$sex)
str(animals)
'data.frame': 35549 obs. of 9 variables:
$ id : int 2 3 4 5 6 7 8 9 10 11 ...
$ month : int 7 7 7 7 7 7 7 7 7 7 ...
$ day : int 16 16 16 16 16 16 16 16 16 16 ...
$ year : int 1977 1977 1977 1977 1977 1977 1977 1977 1977 1977 ...
$ plot_id : int 3 2 7 3 1 2 1 1 6 5 ...
$ species_id : chr "NL" "DM" "DM" "DM" ...
$ sex : Factor w/ 2 levels "F","M": 2 1 2 2 2 1 2 1 1 1 ...
$ hindfoot_length: int 33 37 36 35 14 NA 37 34 20 53 ...
$ weight : int NA NA NA NA NA NA NA NA NA NA ...
Solution 4
sol5a <- days[c(-1, -7)]
sol5b <- days[seq(2, 7, 2)]
sol5a
[1] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday"
sol5b
[1] "Monday" "Wednesday" "Friday"
Solution 5
hl_model <- lm(hindfoot_length ~ log(weight) * species_id, data = animals)
summary(hl_model)
Call:
lm(formula = hindfoot_length ~ log(weight) * species_id, data = animals)
Residuals:
Min 1Q Median 3Q Max
-24.0566 -0.6536 0.0456 0.7234 29.7702
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.7858 2.1240 6.020 1.77e-09 ***
log(weight) 0.1006 0.9932 0.101 0.919307
species_idDM 11.9327 2.1438 5.566 2.63e-08 ***
species_idDO 14.3676 2.1935 6.550 5.84e-11 ***
species_idDS 17.7785 2.2183 8.015 1.14e-15 ***
species_idNL 7.8180 2.2269 3.511 0.000448 ***
species_idOL 3.8329 2.2205 1.726 0.084326 .
species_idOT 3.9959 2.1768 1.836 0.066421 .
species_idOX 2.8502 19.2255 0.148 0.882145
species_idPB 7.7369 2.1607 3.581 0.000343 ***
species_idPE 2.7944 2.2052 1.267 0.205104
species_idPF 0.1775 2.1608 0.082 0.934532
species_idPH 0.3679 4.6568 0.079 0.937035
species_idPI 6.7215 19.7938 0.340 0.734180
species_idPL 5.5096 3.3721 1.634 0.102294
species_idPM 3.3929 2.1991 1.543 0.122874
species_idPP 4.3321 2.1504 2.015 0.043963 *
species_idPX 34.6473 54.3637 0.637 0.523918
species_idRF 2.7435 3.4223 0.802 0.422754
species_idRM 1.5570 2.1465 0.725 0.468236
species_idRO 11.3148 6.9792 1.621 0.104982
species_idRX -7.8187 9.0769 -0.861 0.389032
species_idSF 3.0692 3.2602 0.941 0.346492
species_idSH 0.8047 2.5495 0.316 0.752296
species_idSO 6.0981 2.9040 2.100 0.035746 *
log(weight):species_idDM 2.9055 0.9962 2.917 0.003541 **
log(weight):species_idDO 2.0768 1.0032 2.070 0.038436 *
log(weight):species_idDS 3.9753 1.0022 3.967 7.31e-05 ***
log(weight):species_idNL 2.2186 1.0020 2.214 0.026832 *
log(weight):species_idOL 1.0443 1.0110 1.033 0.301634
log(weight):species_idOT 0.9991 1.0045 0.995 0.319923
log(weight):species_idOX 1.4617 6.3414 0.231 0.817702
log(weight):species_idPB 1.5264 0.9998 1.527 0.126857
log(weight):species_idPE 1.4128 1.0119 1.396 0.162681
log(weight):species_idPF 1.1727 1.0116 1.159 0.246392
log(weight):species_idPH 3.5936 1.5661 2.295 0.021763 *
log(weight):species_idPI 0.7853 6.7317 0.117 0.907135
log(weight):species_idPL 0.4921 1.3354 0.368 0.712515
log(weight):species_idPM 1.3015 1.0107 1.288 0.197849
log(weight):species_idPP 1.5413 1.0003 1.541 0.123355
log(weight):species_idPX -9.5918 18.4815 -0.519 0.603767
log(weight):species_idRF 0.6685 1.4342 0.466 0.641130
log(weight):species_idRM 0.7976 1.0019 0.796 0.426004
log(weight):species_idRO -3.8730 3.0337 -1.277 0.201735
log(weight):species_idRX 4.9175 3.4007 1.446 0.148179
log(weight):species_idSF 2.6055 1.1681 2.231 0.025712 *
log(weight):species_idSH 3.4480 1.0472 3.292 0.000994 ***
log(weight):species_idSO 1.6095 1.1125 1.447 0.147980
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.375 on 30690 degrees of freedom
(4811 observations deleted due to missingness)
Multiple R-squared: 0.9793, Adjusted R-squared: 0.9792
F-statistic: 3.085e+04 on 47 and 30690 DF, p-value: < 2.2e-16