# Patients table The **`patients`** table defines the demographic backbone of the synthetic outpatient population. Each row represents one unique, fully synthetic patient with assigned demographic attributes — including a generated identifier, name, sex, and date of birth. These records provide the foundation for all simulated appointments and are later referenced in the `appointments` table via `patient_id`. --- ## Overview - **File name:** `patients.csv` - **Generated by:** `AppointmentScheduler.generate_patients()` - **Schema:** One row per unique patient - **Linked to:** `appointments` table through `patient_id` This table represents the **base population**. The number of patients generated depends on visit frequency and attendance ratios (`visits_per_year`, `first_attendance`). --- ## Core columns | Column | Type | Description | |---------|------|-------------| | `patient_id` | string | Unique, sequential, zero-padded identifier. | | `name` | string | Realistic synthetic name generated using the [Faker](https://faker.readthedocs.io/) library. | | `sex` | string | `"Male"` or `"Female"`, assigned using NHS outpatient demographic proportions. | | `dob` | date | Simulated date of birth, used to derive patient age during appointment generation. | > **Note:** > Age and age group are not stored in this table. > They are dynamically calculated in the `appointments` table based on each appointment’s date. --- ## Demographic generation Patient demographics are created according to national-level outpatient statistics from *NHS England (2023–24, Summary Report 3)*. This ensures realistic, reproducible age–sex distributions while maintaining complete anonymity. ### Sampling process 1. **Age–sex structure:** probabilities derived from {doc}`patient_demographics` via `age_gender_probs`. 2. **Truncation:** defined by `lower_cutoff`, `upper_cutoff`, and `truncated`. 3. **Date of birth (`dob`):** reverse-calculated from sampled age at the simulation’s `ref_date` plus a random 0–364 day offset to avoid clustering. 4. **Population size:** determined by the combination of `visits_per_year` and total appointment volume. --- ## Identity fields ### Field: `patient_id` Each patient is assigned a **sequential, zero-padded identifier** that ensures uniqueness within the dataset. **Format** - Type: string - Example: `"00001"`, `"01234"` - Length: automatically determined by population size (minimum 5 digits). **Logic** Identifiers are generated after all demographic records are assembled, using an internal counter that increments across simulation runs: ```python ids = [f"{i:0{id_length}d}" for i in range(start, start + total_patients)] ``` This guarantees reproducibility and uniqueness, even when multiple populations are created sequentially. --- ### Field: `name` Names are generated using the **Faker** library to simulate realistic and gender-consistent identifiers. **Format** - Type: string - Example: `"Emma Clark"`, `"James Turner"` - Locale: `"en_US"` by default (can be customized, e.g. `"es_ES"` for Spanish names). - Stored only in the `patients` table (not replicated in appointments). **Logic** - For `sex = "Female"` → `fake.name_female()` - For `sex = "Male"` → `fake.name_male()` - The resulting name is appended to the patient record alongside `patient_id`, `sex`, and `dob`. This approach adds realism for UI mockups or de-identified educational datasets, without any link to real individuals. ```python patients.append({ "name": self.fake.name_female(), "sex": "Female", "age": int(age) }) ``` --- ## Example ```python from medscheduler import AppointmentScheduler sched = AppointmentScheduler(seed=42) patients = sched.generate_patients(total_patients=5) print(patients) ``` Example output: | patient_id | name | sex | dob | |-------------|----------------|---------|------------| | 00001 | Emily Lewis | Female | 1987-05-22 | | 00002 | James Turner | Male | 1962-10-03 | | 00003 | Sarah Murray | Female | 1975-08-16 | | 00004 | Robert Hill | Male | 1953-12-30 | | 00005 | Olivia Clark | Female | 1967-01-19 | --- ## Related parameters | Parameter | Description | Reference | |------------|--------------|------------| | `age_gender_probs` | Defines population structure and sampling probabilities. | {doc}`patient_demographics` | | `lower_cutoff`, `upper_cutoff`, `truncated` | Control inclusion of age ranges. | {doc}`patient_demographics` | | `visits_per_year` | Determines population size relative to appointments. | {doc}`patient_flow` | | `first_attendance` | Controls share of new patients. | {doc}`patient_flow` | | `seed` | Enables reproducible patient generation. | {doc}`randomness_and_noise` | | `noise` | Adds slight variability to sampling probabilities. | {doc}`randomness_and_noise` | --- ## References - NHS England (2024). *Hospital Outpatient Activity 2023–24: Summary Report 3.* [https://files.digital.nhs.uk/34/18846B/hosp-epis-stat-outp-rep-tabs-2023-24-tab.xlsx](https://files.digital.nhs.uk/34/18846B/hosp-epis-stat-outp-rep-tabs-2023-24-tab.xlsx) - [Faker library documentation](https://faker.readthedocs.io/en/master/) – used for name generation. --- ### Next steps - Discover how to enrich the patient registry with synthetic fields in {doc}`custom_columns`. - Explore {doc}`appointments_table` to see how patient IDs are linked to appointment records. - For related logic, review {doc}`patient_demographics` and {doc}`patient_flow`.