# Patients table

The **`patients`** table defines the demographic backbone of the synthetic outpatient population.  
Each row represents one unique, fully synthetic patient with assigned demographic attributes — including a generated identifier, name, sex, and date of birth.  
These records provide the foundation for all simulated appointments and are later referenced in the `appointments` table via `patient_id`.

---

## Overview

- **File name:** `patients.csv`  
- **Generated by:** `AppointmentScheduler.generate_patients()`  
- **Schema:** One row per unique patient  
- **Linked to:** `appointments` table through `patient_id`  

This table represents the **base population**.  
The number of patients generated depends on visit frequency and attendance ratios (`visits_per_year`, `first_attendance`).

---

## Core columns

| Column | Type | Description |
|---------|------|-------------|
| `patient_id` | string | Unique, sequential, zero-padded identifier. |
| `name` | string | Realistic synthetic name generated using the [Faker](https://faker.readthedocs.io/) library. |
| `sex` | string | `"Male"` or `"Female"`, assigned using NHS outpatient demographic proportions. |
| `dob` | date | Simulated date of birth, used to derive patient age during appointment generation. |

> **Note:**  
> Age and age group are not stored in this table.  
> They are dynamically calculated in the `appointments` table based on each appointment’s date.

---

## Demographic generation

Patient demographics are created according to national-level outpatient statistics from *NHS England (2023–24, Summary Report 3)*.  
This ensures realistic, reproducible age–sex distributions while maintaining complete anonymity.

### Sampling process

1. **Age–sex structure:** probabilities derived from {doc}`patient_demographics` via `age_gender_probs`.  
2. **Truncation:** defined by `lower_cutoff`, `upper_cutoff`, and `truncated`.  
3. **Date of birth (`dob`):** reverse-calculated from sampled age at the simulation’s `ref_date` plus a random 0–364 day offset to avoid clustering.  
4. **Population size:** determined by the combination of `visits_per_year` and total appointment volume.

---

## Identity fields

### Field: `patient_id`

Each patient is assigned a **sequential, zero-padded identifier** that ensures uniqueness within the dataset.

**Format**  
- Type: string  
- Example: `"00001"`, `"01234"`  
- Length: automatically determined by population size (minimum 5 digits).  

**Logic**  
Identifiers are generated after all demographic records are assembled, using an internal counter that increments across simulation runs:

```python
ids = [f"{i:0{id_length}d}" for i in range(start, start + total_patients)]
```

This guarantees reproducibility and uniqueness, even when multiple populations are created sequentially.

---

### Field: `name`

Names are generated using the **Faker** library to simulate realistic and gender-consistent identifiers.

**Format**  
- Type: string  
- Example: `"Emma Clark"`, `"James Turner"`  
- Locale: `"en_US"` by default (can be customized, e.g. `"es_ES"` for Spanish names).  
- Stored only in the `patients` table (not replicated in appointments).  

**Logic**  
- For `sex = "Female"` → `fake.name_female()`  
- For `sex = "Male"` → `fake.name_male()`  
- The resulting name is appended to the patient record alongside `patient_id`, `sex`, and `dob`.

This approach adds realism for UI mockups or de-identified educational datasets, without any link to real individuals.

```python
patients.append({
    "name": self.fake.name_female(),
    "sex": "Female",
    "age": int(age)
})
```

---

## Example

```python
from medscheduler import AppointmentScheduler

sched = AppointmentScheduler(seed=42)
patients = sched.generate_patients(total_patients=5)
print(patients)
```

Example output:

| patient_id | name           | sex     | dob        |
|-------------|----------------|---------|------------|
| 00001       | Emily Lewis     | Female  | 1987-05-22 |
| 00002       | James Turner    | Male    | 1962-10-03 |
| 00003       | Sarah Murray    | Female  | 1975-08-16 |
| 00004       | Robert Hill     | Male    | 1953-12-30 |
| 00005       | Olivia Clark    | Female  | 1967-01-19 |

---

## Related parameters

| Parameter | Description | Reference |
|------------|--------------|------------|
| `age_gender_probs` | Defines population structure and sampling probabilities. | {doc}`patient_demographics` |
| `lower_cutoff`, `upper_cutoff`, `truncated` | Control inclusion of age ranges. | {doc}`patient_demographics` |
| `visits_per_year` | Determines population size relative to appointments. | {doc}`patient_flow` |
| `first_attendance` | Controls share of new patients. | {doc}`patient_flow` |
| `seed` | Enables reproducible patient generation. | {doc}`randomness_and_noise` |
| `noise` | Adds slight variability to sampling probabilities. | {doc}`randomness_and_noise` |

---

## References

- NHS England (2024). *Hospital Outpatient Activity 2023–24: Summary Report 3.*  
  [https://files.digital.nhs.uk/34/18846B/hosp-epis-stat-outp-rep-tabs-2023-24-tab.xlsx](https://files.digital.nhs.uk/34/18846B/hosp-epis-stat-outp-rep-tabs-2023-24-tab.xlsx)  
- [Faker library documentation](https://faker.readthedocs.io/en/master/) – used for name generation.

---

### Next steps
- Discover how to enrich the patient registry with synthetic fields in {doc}`custom_columns`.  
- Explore {doc}`appointments_table` to see how patient IDs are linked to appointment records.  
- For related logic, review {doc}`patient_demographics` and {doc}`patient_flow`.