Section 8 Prepare the vis-NIR augmented dataset
Here we’ll create the data that will be used for Spatial modelling. This dataset will contain
- Nr: The arbitrary sample number.
- ID: The
factor
indicating the sample IDs.
- POINT_X: The X (geographical) coordinate.
- POINT_Y: The Y (geographical) coordinate.
- layer: A
factor
indicating the depth layer at which the sample was collected (A: 0-20 cm and B: 80-100 cm). - set: A
factor
indicating whether the sample was used for vis-NIR calibrations (train
), for vis-NIR predictions (prediction
) or if it belongs to model’s validation (validation
). The samples labeled as validation are the same samples initially labeled as validation in the original dataset. - Ca: The exchangeable Calcium content in the sample (\(mmol_{c}\) \(kg^{−1}\), measured by conventional laboratory methods)
- Clay: The percentage of clay contnet in the soil sample (measured by conventional laboratory methods).
- Silt: The percentage of silt contnet in the soil sample (measured by conventional laboratory methods).
- Sand: The percentage of sand contnet in the soil sample (measured by conventional laboratory methods).
- alr_Clay: The additive log-ratio transformed clay contnets (measured by conventional laboratory methods).
- alr_Silt: The additive log-ratio transformed silt contnets (measured by conventional laboratory methods).
- Ca_spec: This is the vis-NIR augmented exchangeable Ca2+ contents.
- alr_Clay_spec: This is the vis-NIR augmented additive log-ratio transformed clay contnets.
- alr_Silt_spec: This is the vis-NIR augmented additive log-ratio transformed silt contnets.
For the vis-NIR augmented variables (alr_Clay_spc, alr_Silt_spc and Ca_spec) there are three classes of values:
The values of the samples that are labeled as
train
come from the conventional laboratory methods (e.g. for Ca_spec the values of these samples for this variable are identical to the corresponding values in the variable Ca).The values of the samples that are labeled as
prediction
come from the predictions done with the respective vis-NIR model.The values of the samples that are labeled as
validation
are treated as missing (i.e.NA
s).
## samples for the set 'prediction'
vnirpredictions
## samples for the set 'train'
vnirtrain <- train[, c("ID", "POINT_X", "POINT_Y", "set", "Ca", "Clay", "Silt",
"Sand", "alr_Clay", "alr_Silt")]
vnirtrain$set <- factor("train")
vnirtrain$Ca_spec <- vnirtrain$Ca
vnirtrain$alr_Clay_spec <- vnirtrain$alr_Clay
vnirtrain$alr_Silt_spec <- vnirtrain$alr_Silt
## samples for the set 'validation'
vnirvalidation <- valida[, c("ID", "POINT_X", "POINT_Y", "set", "Ca", "Clay",
"Silt", "Sand", "alr_Clay", "alr_Silt")]
vnirvalidation$set <- factor(vnirvalidation$set)
vnirvalidation$Ca_spec <- NA
vnirvalidation$alr_Clay_spec <- NA
vnirvalidation$alr_Silt_spec <- NA
Now create a single data.frame
containing the three data sets…
vniraugmented <- rbind(vnirtrain, vnirpredictions, vnirvalidation)
vniraugmented$layer <- factor(substr(vniraugmented$ID, 1, 1))
## Reorganize the variables
vniraugmented <- vniraugmented[, c("ID", "POINT_X", "POINT_Y", "layer", "set",
"Ca", "Clay", "Silt", "Sand", "alr_Clay", "alr_Silt", "Ca_spec", "alr_Clay_spec",
"alr_Silt_spec")]
Compute some statistics for the final data set…
## Names of the properties
props <- c("Ca", "Clay", "Silt", "Sand", "alr_Clay", "alr_Silt", "Ca_spec",
"alr_Clay_spec", "alr_Silt_spec")
## Compute the statistics: mean, standard deviation and the quantiles ('0%',
## '25%', '50%', '75%' and'100%')
statsprops <- aggregate(vniraugmented[, props], by = list(set = vniraugmented$set,
layer = vniraugmented$layer), FUN = function(x) {
c(mean = mean(as.matrix(x), na.rm = TRUE), sd = sd(as.matrix(x), na.rm = TRUE),
quantile(x, na.rm = TRUE))
})
## Reorganize the object containing the results of the statistics
statsprops <- lapply(props, FUN = function(x, object, ids) {
object <- cbind(object[, keep], as.data.frame(statsquant[[x]]))
}, object = statsprops, ids = c("set", "layer"))
names(statsprops) <- props
statsprops <- do.call("rbind", statsprops)
statsprops$property <- gsub(".[0-9]", "", rownames(statsprops))
statsprops[is.na(statsprops)] <- NA
## Reorganize the order of the variables
statsprops <- statsprops[, c("set", "layer", "property", "mean", "sd", "0%",
"25%", "50%", "75%", "100%")]
statsprops
Optionally, save this data in your working directory
write.table(x = vniraugmented, file = "vniraugmented.txt", sep = "\t", row.names = FALSE)