# Randomness and variability

The parameters **`seed`** and **`noise`** control the stochastic behavior of the scheduler.  
They define how reproducible or variable the generated dataset will be across runs, balancing determinism with realism.

---

## `seed`

Determines the **reproducibility** of all random processes in the scheduler, including NumPy, Python’s built-in `random`, and the Faker library used for synthetic names.

### Format
**Type:** `int` or `None`  
**Default:** `42`  
**Accepted values:** any integer (positive or negative), or `None`

### Validation rules
- Must be an integer or `None`.  
- When set to an integer, all random sources are initialized with the same seed.  
- When `None`, outputs vary at each run.  
- Negative or excessively large integers are valid but discouraged for clarity.

### How it works
At initialization, the scheduler configures a consistent random state:
- NumPy’s `default_rng(seed)`  
- Python’s `random.seed(seed)`  
- Faker’s internal random generator (`self.fake.seed_instance(seed)`)

This ensures that all randomness—patient demographics, slot assignment, rebooking, and durations—is repeatable.  
When `seed=None`, every execution generates new random outcomes, suitable for stochastic experiments.

### Examples

**Reproducible simulation**
```python
from medscheduler import AppointmentScheduler

sched = AppointmentScheduler(seed=42)
sched.generate()
```

**Non-deterministic simulation**
```python
sched = AppointmentScheduler(seed=None)
sched.generate()
```

### Recommended usage
For reproducible results and testing, fix the seed (e.g., `seed=42`).  
To explore scenario variability, omit or randomize the seed between runs.

---

## `noise`

Adds **controlled randomness** to probabilistic processes throughout the simulation.  
It allows small deviations from deterministic baselines, producing more realistic yet statistically stable outcomes.

### Format
**Type:** float  
**Default:** `0.1`  
**Accepted range:** `≥ 0.0`

### Validation rules
- Must be a non-negative float.  
- Values above 0.5 introduce substantial variability and are not recommended for reproducibility.  
- If set to 0.0, all stochastic effects are disabled except those driven by `seed`.

### How it works
The noise parameter acts as a multiplicative perturbation:
\[
x' = x \times U(1 - \text{noise}, 1 + \text{noise})
\]
where \(U(a,b)\) is a uniform random factor applied to intermediate probabilities and scaling operations.

`noise` is applied in several contexts:
- **Patient generation:** adds slight randomness to demographic sampling and visit frequency.  
- **Appointment filling:** modulates local fill rates and lead-time distribution.  
- **Custom distributions:** smooths otherwise uniform probabilities to avoid artifacts.

### Examples

**Deterministic output**
```python
sched = AppointmentScheduler(seed=42, noise=0.0)
```

**Slightly variable simulation**
```python
sched = AppointmentScheduler(seed=42, noise=0.1)
```

**Highly stochastic output**
```python
sched = AppointmentScheduler(seed=None, noise=0.3)
```

### Interpretation
| Noise value | Behavior | Use case |
|--------------|-----------|----------|
| 0.0 | Fully deterministic | Unit testing, reproducible examples |
| 0.1 | Realistic variability | Default educational setting |
| 0.3 | Strong heterogeneity | Scenario simulation, Monte Carlo |

---

### Next steps
- Explore {doc}`appointments_table` to observe how random variation affects outcomes and scheduling.  
- See {doc}`custom_columns` for how the same noise parameter applies to user-defined categorical data.  
- You can also revisit {doc}`patient_demographics` to understand how noise shapes sampling distributions.