Patients table#
The patients table defines the demographic backbone of the synthetic outpatient population.
Each row represents one unique, fully synthetic patient with assigned demographic attributes — including a generated identifier, name, sex, and date of birth.
These records provide the foundation for all simulated appointments and are later referenced in the appointments table via patient_id.
Overview#
File name:
patients.csvGenerated by:
AppointmentScheduler.generate_patients()Schema: One row per unique patient
Linked to:
appointmentstable throughpatient_id
This table represents the base population.
The number of patients generated depends on visit frequency and attendance ratios (visits_per_year, first_attendance).
Core columns#
Column |
Type |
Description |
|---|---|---|
|
string |
Unique, sequential, zero-padded identifier. |
|
string |
Realistic synthetic name generated using the Faker library. |
|
string |
|
|
date |
Simulated date of birth, used to derive patient age during appointment generation. |
Note:
Age and age group are not stored in this table.
They are dynamically calculated in theappointmentstable based on each appointment’s date.
Demographic generation#
Patient demographics are created according to national-level outpatient statistics from NHS England (2023–24, Summary Report 3).
This ensures realistic, reproducible age–sex distributions while maintaining complete anonymity.
Sampling process#
Age–sex structure: probabilities derived from Patient demographics via
age_gender_probs.Truncation: defined by
lower_cutoff,upper_cutoff, andtruncated.Date of birth (
dob): reverse-calculated from sampled age at the simulation’sref_dateplus a random 0–364 day offset to avoid clustering.Population size: determined by the combination of
visits_per_yearand total appointment volume.
Identity fields#
Field: patient_id#
Each patient is assigned a sequential, zero-padded identifier that ensures uniqueness within the dataset.
Format
Type: string
Example:
"00001","01234"Length: automatically determined by population size (minimum 5 digits).
Logic
Identifiers are generated after all demographic records are assembled, using an internal counter that increments across simulation runs:
ids = [f"{i:0{id_length}d}" for i in range(start, start + total_patients)]
This guarantees reproducibility and uniqueness, even when multiple populations are created sequentially.
Field: name#
Names are generated using the Faker library to simulate realistic and gender-consistent identifiers.
Format
Type: string
Example:
"Emma Clark","James Turner"Locale:
"en_US"by default (can be customized, e.g."es_ES"for Spanish names).Stored only in the
patientstable (not replicated in appointments).
Logic
For
sex = "Female"→fake.name_female()For
sex = "Male"→fake.name_male()The resulting name is appended to the patient record alongside
patient_id,sex, anddob.
This approach adds realism for UI mockups or de-identified educational datasets, without any link to real individuals.
patients.append({
"name": self.fake.name_female(),
"sex": "Female",
"age": int(age)
})
Example#
from medscheduler import AppointmentScheduler
sched = AppointmentScheduler(seed=42)
patients = sched.generate_patients(total_patients=5)
print(patients)
Example output:
patient_id |
name |
sex |
dob |
|---|---|---|---|
00001 |
Emily Lewis |
Female |
1987-05-22 |
00002 |
James Turner |
Male |
1962-10-03 |
00003 |
Sarah Murray |
Female |
1975-08-16 |
00004 |
Robert Hill |
Male |
1953-12-30 |
00005 |
Olivia Clark |
Female |
1967-01-19 |
References#
NHS England (2024). Hospital Outpatient Activity 2023–24: Summary Report 3.
https://files.digital.nhs.uk/34/18846B/hosp-epis-stat-outp-rep-tabs-2023-24-tab.xlsxFaker library documentation – used for name generation.
Next steps#
Discover how to enrich the patient registry with synthetic fields in Adding custom columns.
Explore Appointments table to see how patient IDs are linked to appointment records.
For related logic, review Patient demographics and Patient flow.