Section 8 Prepare the vis-NIR augmented dataset
Here we’ll create the data that will be used for Spatial modelling. This dataset will contain
- Nr: The arbitrary sample number.
- ID: The factorindicating the sample IDs.
 
- POINT_X: The X (geographical) coordinate.
- POINT_Y: The Y (geographical) coordinate.
- layer: A factorindicating the depth layer at which the sample was collected (A: 0-20 cm and B: 80-100 cm).
- set: A factorindicating whether the sample was used for vis-NIR calibrations (train), for vis-NIR predictions (prediction) or if it belongs to model’s validation (validation). The samples labeled as validation are the same samples initially labeled as validation in the original dataset.
- Ca: The exchangeable Calcium content in the sample (\(mmol_{c}\) \(kg^{−1}\), measured by conventional laboratory methods)
- Clay: The percentage of clay contnet in the soil sample (measured by conventional laboratory methods).
- Silt: The percentage of silt contnet in the soil sample (measured by conventional laboratory methods).
- Sand: The percentage of sand contnet in the soil sample (measured by conventional laboratory methods).
- alr_Clay: The additive log-ratio transformed clay contnets (measured by conventional laboratory methods).
- alr_Silt: The additive log-ratio transformed silt contnets (measured by conventional laboratory methods).
- Ca_spec: This is the vis-NIR augmented exchangeable Ca2+ contents.
- alr_Clay_spec: This is the vis-NIR augmented additive log-ratio transformed clay contnets.
 
- alr_Silt_spec: This is the vis-NIR augmented additive log-ratio transformed silt contnets.
For the vis-NIR augmented variables (alr_Clay_spc, alr_Silt_spc and Ca_spec) there are three classes of values:
- The values of the samples that are labeled as - traincome from the conventional laboratory methods (e.g. for Ca_spec the values of these samples for this variable are identical to the corresponding values in the variable Ca).
- The values of the samples that are labeled as - predictioncome from the predictions done with the respective vis-NIR model.
- The values of the samples that are labeled as - validationare treated as missing (i.e.- NAs).
## samples for the set 'prediction'
vnirpredictions
## samples for the set 'train'
vnirtrain <- train[, c("ID", "POINT_X", "POINT_Y", "set", "Ca", "Clay", "Silt", 
                    "Sand", "alr_Clay", "alr_Silt")]
vnirtrain$set <- factor("train")
vnirtrain$Ca_spec <- vnirtrain$Ca
vnirtrain$alr_Clay_spec <- vnirtrain$alr_Clay
vnirtrain$alr_Silt_spec <- vnirtrain$alr_Silt
## samples for the set 'validation'
vnirvalidation <- valida[, c("ID", "POINT_X", "POINT_Y", "set", "Ca", "Clay", 
                    "Silt", "Sand", "alr_Clay", "alr_Silt")]
vnirvalidation$set <- factor(vnirvalidation$set)
vnirvalidation$Ca_spec <- NA
vnirvalidation$alr_Clay_spec <- NA
vnirvalidation$alr_Silt_spec <- NANow create a single data.frame containing the three data sets…
vniraugmented <- rbind(vnirtrain, vnirpredictions, vnirvalidation)
vniraugmented$layer <- factor(substr(vniraugmented$ID, 1, 1))
## Reorganize the variables
vniraugmented <- vniraugmented[, c("ID", "POINT_X", "POINT_Y", "layer", "set", 
                    "Ca", "Clay", "Silt", "Sand", "alr_Clay", "alr_Silt", "Ca_spec", "alr_Clay_spec", 
                    "alr_Silt_spec")]Compute some statistics for the final data set…
## Names of the properties
props <- c("Ca", "Clay", "Silt", "Sand", "alr_Clay", "alr_Silt", "Ca_spec", 
                    "alr_Clay_spec", "alr_Silt_spec")
## Compute the statistics: mean, standard deviation and the quantiles ('0%',
## '25%', '50%', '75%' and'100%')
statsprops <- aggregate(vniraugmented[, props], by = list(set = vniraugmented$set, 
                    layer = vniraugmented$layer), FUN = function(x) {
                    c(mean = mean(as.matrix(x), na.rm = TRUE), sd = sd(as.matrix(x), na.rm = TRUE), 
                                        quantile(x, na.rm = TRUE))
})
## Reorganize the object containing the results of the statistics
statsprops <- lapply(props, FUN = function(x, object, ids) {
                    object <- cbind(object[, keep], as.data.frame(statsquant[[x]]))
                    
}, object = statsprops, ids = c("set", "layer"))
names(statsprops) <- props
statsprops <- do.call("rbind", statsprops)
statsprops$property <- gsub(".[0-9]", "", rownames(statsprops))
statsprops[is.na(statsprops)] <- NA
## Reorganize the order of the variables
statsprops <- statsprops[, c("set", "layer", "property", "mean", "sd", "0%", 
                    "25%", "50%", "75%", "100%")]
statspropsOptionally, save this data in your working directory
write.table(x = vniraugmented, file = "vniraugmented.txt", sep = "\t", row.names = FALSE)