The Nuolja Transect project organizes, processes, and validates ecological transect survey data, focusing on snow and phenology observations.
This section introduces the overall structure of the codebase
project_directory/
| ├── bash
| │ ├── build.doc.R
| ├── build.docs.R
| ├── data
| │ ├── DATAHERE.md
| ├── descriptions
| │ ├── Nuolja Master Documents
| ├── docs
| │ ├── reference
| ├── man
| ├── out
| ├── R
| │ ├── helper.R
| │ ├── patterns.R
| │ ├── phenology.R
| │ ├── phenology_survey.R
| │ ├── repack.R
| │ ├── snow.R
| │ └── validation.R
| ├── README.md
| ├── repack
| ├── script.R
descriptions/transect_description.csv
file. This file contains information about the transect points, including plot numbers, coordinates (latitude and longitude), and elevations. #### Notes about Data - There are inconsistencies in date naming within the dataset. - One entry from 2022 is missing a valid date. - Geotaging are inconsistent throught the datasetThis section explains the purpose of each key function in the project, showing how they interact to transform raw survey data into structured, validated outputs for further analysis.
Here you’ll find the necessary system and package requirements to ensure the project runs smoothly. Installing these dependencies ensures that scripts and documentation build correctly.
To run the project, ensure the following are installed and configured on your system: 1. R and Rtools
- Install the latest version of R from CRAN.
- If on Windows, install Rtools for building and compiling packages.
dplyr
ggplot2
tidyr
lubridate
readr
devtools
roxygen2
pkgdown
This section provides step-by-step instructions for setting up the working environment, running the main script, and navigating the project’s interactive menu to process and validate datasets.
1. Place your data directories into the data
directory. 2. Run Rscript script.r
to process the data. This will generate files in the /repack
directory and output files in the /out
directory. 3. Follow the prompted options to validate or generate files as required.
Follow these steps to run your R script interactively in RStudio’s terminal, ensuring the working directory is set correctly:
data/
# Example
project_directory/
...
| ├── data
| │ ├── DATAHERE.md
| │ ├── Plant Phenology Data/
| │ ├── Nuolja Snow Data/
...
Before opening the terminal, set the working directory in RStudio using one of these methods:
Session > Set Working Directory > Choose Directory...
.Open
.In the RStudio console, set the working directory manually by typing:
setwd("path/to/your/script")
Tools > Terminal > New Terminal
.Shift + Alt + T
Shift + Option + T
r Rscript script.R
out\
and Log files in log\
#### 2) Construct Survey Constructing Survey Tables for next year, will produce files in out\Plant Phenology Survey
. ### Step 4: Select data to process #### 0) Exit quit process #### n) Data to be processed List of data sets placed in data\
Here you’ll learn where to access full documentation for functions and workflows, as well as how to rebuild and publish updated documentation manually to GitHub Pages.
The full documentation for this project is available as a GitHub Pages site. You can access it here.
This documentation includes detailed information about the project’s structure, data processing steps, and usage examples. It is generated automatically from the source code comments using roxygen2
and pkgdown
.
R
enviromentdevtools::document()
pkgdown::build_site()
R
enviromentThis section defines the structure and meaning of each dataset produced by the pipeline, with column descriptions, file usage notes, and example data to illustrate expected formats.
repack/
The files in the repack/
directory are structured as CSV files with detailed information about geographical plots and their associated data. Each file adheres to the following schema:
Column Name | Description |
---|---|
plot |
The plot number associated with the data entry. |
subplot |
The subplot number within the plot. |
proj_factor |
A calculated projection factor, used for scaling or alignment in analyses. |
id |
A unique identifier for each record, formatted as NS-YYYYMMDD-XXX , where XXX is the sequential entry. |
date |
The date of the record, formatted as YYYY-MM-DD . |
latitude |
The latitude of the recorded point in decimal degrees. |
longitude |
The longitude of the recorded point in decimal degrees. |
elevation |
The elevation at the specific point, measured in meters. |
contemporary |
A label indicating the contemporary observation status. Possible values: |
- o : Open |
|
- s : Snow |
|
- os : Both Open and Snow |
|
historical |
A label indicating the historical observation status. Possible values: |
- o : Open |
|
- s : Snow |
Below is an excerpt to illustrate the format:
"plot","subplot","proj_factor","id","date","latitude","longitude","elevation","contemporary","historical"
20,78,3357.62764497642,"NS-20180506-001","2018-05-06","68.37261122","18.69783956",1180.841,"o","o"
19,76,3260.95020778743,"NS-20180506-004","2018-05-06","68.37218199","18.69989872",1169.419,"os","s"
18,69,2957.15889307984,"NS-20180506-011","2018-05-06","68.37041561","18.70585272",1103.361,"s","s"
Column Name | Description |
---|---|
Species |
Scientific name of the observed plant species. |
Date |
Date of observation, formatted as YYYY-MM-DD . |
Subplot |
Identifier of the subplot area within the transect (e.g., "20 to 21" ). |
Code |
Phenological code representing the observed developmental stage (e.g., "+" , "B1" , "b2" ). |
out/
The files in the out/
directory include CSV files with data representing daily snow of various plot statuses. Each file adheres to the following schema:
Column Name | Description |
---|---|
DOY |
Day of the Year (DOY) for the recorded observations. |
plot/subplot |
The plot number associated with the data entry. |
s |
Proportion of open categorized as “Snow” for the given plot and day. |
so |
Proportion of open categorized as “Snow and Open” for the given plot and day. |
o |
Proportion of open categorized as “Open” for the given plot and day. |
os |
Proportion of open categorized as “Open and Snow” for the given plot and day. |
s
, so
, o
, and os
represent proportions (values between 0 and 1) for each category.Below is an excerpt to illustrate the format:
"DOY","plot","s","so","o","os"
126,6,0.0540559471516565,0.21497303215613,0.0818149541820005,0.649156066510213
126,7,0,0.318104640512793,0.139848133706484,0.542047225780723
126,8,0.123763616329758,0.876236383670242,0,0
126,9,0,0.863436913617926,0,0.136563086382074
126,10,0.768916411918146,0,0.0997298448068066,0.13135374327
Column Name | Description |
---|---|
Synonym Current |
The current accepted scientific name of the species. |
Year |
The year the observations were made. |
Poles |
The transect segment or subplot identifier (e.g., "14 to 15" ). |
Number of Observations |
The total number of days on which the species was observed in that segment. |
Column Name | Description |
---|---|
Synonym Current |
The current accepted scientific name of the species. |
Year |
The year the observations were made. |
Poles |
The transect segment or subplot identifier. |
Code |
Phenological development stage code (e.g., B1 , C , K , g1 ). |
Number of Observations |
Count of how many times this code/stage was recorded for the species. |
Column Name | Description |
---|---|
Synonym Current |
The current accepted scientific name of the species. |
Year |
The year the observations were made. |
Code |
Phenological development stage code (e.g., + , K , b2 , ed ). |
Poles |
The transect segment or subplot identifier. |
First Observation Date |
The earliest date that this stage was observed, formatted as YYYY-MM-DD . |
Last Observation Date |
The latest date that this stage was observed, formatted as YYYY-MM-DD . |