---
title: "Getting started"
knitr:
  opts_chunk:
    collapse: true
    comment: "#>"
# description: |
#   An overview of the nowcastr package.
vignette: >
  %\VignetteIndexEntry{Getting started}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
library(dplyr) # Ensure pipe operator is available
```

Nowcasting is the process of estimating the current state of a phenomenon when the data are incomplete due to reporting delays. The **nowcastr** package implements the chain-ladder method for nowcasting, supporting both non-cumulative delay-based estimation and model-based completeness fitting (*e.g.*, logistic or Gompertz curves). This vignette provides a quick start guide to using the package with demo data.

## Setup

The package is available on GitHub. Install it with:

```{r}
#| eval: false
pak::pak("whocov/nowcastr")
```
```{r}
library(nowcastr)
```



## Data Structure

Your dataset must contain at least three columns:

- **occurrence date**: when the event happened
- **reporting date**: when the event was reported
- **value**: the observed count/value
- \<*groups*\>: none, one or multiple grouping columns: *e.g.* `group_cols = "species"` # or `group_cols = c("region", "disease")`

The package includes a demo dataset `nowcast_demo` that follows this structure

```{r}
print(nowcast_demo)
```

The demo data also includes a `group` column for demonstrating grouped processing, though you can have multiple grouping columns.




```{r}
#| echo: false
#| eval: false
# generate_test_data(
#   n_reportdates = 5,
#   n_delays = 5
# )
```


## Workflow

A typical nowcasting workflow with **nowcastr** involves the following steps.



### 1. Visualize Input Data

Before nowcasting, inspect the reporting pattern of your data:

```{r, fig.width=9, fig.asp=5.5/10, out.width="100%"}
nowcast_demo %>%
  plot_nc_input(
    option = "triangle",
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = "group" # use a vector for multiple columns: c("column1", "column2"... )
  )
```

The "millipede" plot provides an alternative view of delays. 
Each reporting date is mapped to a distinct line and color. 

```{r, fig.width=9, fig.asp=5.5/10, out.width="100%"}
nowcast_demo %>%
  plot_nc_input(
    option = "millipede",
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = "group"
  )
```



### 2. Prepare Data (Optional)

Depending on your data and use case, 
you may want to fill missing values with the last known reported value.


```{r, fig.width=9, fig.asp=5.5/10, out.width="100%"}
data_filled <- nowcast_demo %>%
  fill_future_reported_values(
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = "group",
    max_delay = "auto"
  )
data_filled %>%
  plot_nc_input(
    option = "triangle",
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = "group"
  )
```

This step is optional; `nowcast_cl` can handle unfilled data.



### 3. Run Nowcast

Perform the nowcasting using the chain-ladder method:

```{r}
nc_obj <-
  data_filled %>%
  nowcast_cl(
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = "group",
    time_units = "weeks",
    # max_delay = 5,
    # max_reportunits = 8,
    do_model_fitting = TRUE
  )
```

The `nowcast_cl()` function returns a `nowcast_results` object containing predictions, delay distributions, completeness estimates, and parameters.


```{r}
S7::prop_names(nc_obj)
```

### 4. Explore Results

Access the different datasets.

```{r}
nc_obj@results # Final nowcasted values
nc_obj@delays # Summarised completeness values by delay
nc_obj@completeness # Detailed completeness estimates
```

Plot the results:

```{r, fig.width=9, fig.asp=5.5/10, out.width="100%"}
#| warning: false
# Delay distribution
plot(nc_obj, which = "delays") +
  ggplot2::labs(
    caption = NULL,
    subtitle = paste0("From data reported on: ", max(data_filled$date_report))
  )

# Nowcast time series
plot(nc_obj, which = "results") +
  ggplot2::labs(
    caption = NULL,
    subtitle = paste0("From data reported on: ", max(data_filled$date_report))
  )
```


Open a Shiny app to explore results group by group:

```{r}
#| eval: false
nowcast_explore(nc_obj)
```





## How It Works

The chain-ladder method estimates "completeness" for each delay bucket:

- **Delay** = reporting date - occurrence date
- **Completeness** = observed value / last reported value (approximation of true value)
- **Average completeness** per delay bucket (across occurrence dates)
- **Nowcast** = observed value / average completeness

Recent occurrence dates have shorter delays and lower completeness. The method upweights these observations to estimate the true count.










## Other Utility Functions


### Calculate Retro Scores of input data

The retro-score is the ratio of actual value changes to maximum possible changes [0-1].


```{r calculate_retro_score}
# Calculate retro-scores
retroscores <- nowcast_demo %>%
  calculate_retro_score(
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = "group"
  )
print(retroscores)
```


```{r, fig.width=9, fig.asp=5.5/10, out.width="100%"}
retroscores %>%
  ggplot2::ggplot(ggplot2::aes(y = stats::reorder(group, retro_score), x = retro_score)) +
  ggplot2::geom_bar(stat = "identity", fill = "dodgerblue1") +
  theme_nowcastr() +
  ggplot2::scale_x_continuous(
    limits = c(0, 1),
    # labels = scales::label_percent()
  ) +
  ggplot2::labs(
    y = "Group",
    x = "Retro-Score"
  )
```




### Remove repeated values

This is somewhat the opposite of `fill_future_reported_values()`.

```{r rm_repeated_values}
# Remove duplicate reported values (same value and higher reporting date)
nowcast_demo %>%
  rm_repeated_values(
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = "group"
  )
```

