Patients table#

The patients table defines the demographic backbone of the synthetic outpatient population.
Each row represents one unique, fully synthetic patient with assigned demographic attributes — including a generated identifier, name, sex, and date of birth.
These records provide the foundation for all simulated appointments and are later referenced in the appointments table via patient_id.


Overview#

  • File name: patients.csv

  • Generated by: AppointmentScheduler.generate_patients()

  • Schema: One row per unique patient

  • Linked to: appointments table through patient_id

This table represents the base population.
The number of patients generated depends on visit frequency and attendance ratios (visits_per_year, first_attendance).


Core columns#

Column

Type

Description

patient_id

string

Unique, sequential, zero-padded identifier.

name

string

Realistic synthetic name generated using the Faker library.

sex

string

"Male" or "Female", assigned using NHS outpatient demographic proportions.

dob

date

Simulated date of birth, used to derive patient age during appointment generation.

Note:
Age and age group are not stored in this table.
They are dynamically calculated in the appointments table based on each appointment’s date.


Demographic generation#

Patient demographics are created according to national-level outpatient statistics from NHS England (2023–24, Summary Report 3).
This ensures realistic, reproducible age–sex distributions while maintaining complete anonymity.

Sampling process#

  1. Age–sex structure: probabilities derived from Patient demographics via age_gender_probs.

  2. Truncation: defined by lower_cutoff, upper_cutoff, and truncated.

  3. Date of birth (dob): reverse-calculated from sampled age at the simulation’s ref_date plus a random 0–364 day offset to avoid clustering.

  4. Population size: determined by the combination of visits_per_year and total appointment volume.


Identity fields#

Field: patient_id#

Each patient is assigned a sequential, zero-padded identifier that ensures uniqueness within the dataset.

Format

  • Type: string

  • Example: "00001", "01234"

  • Length: automatically determined by population size (minimum 5 digits).

Logic
Identifiers are generated after all demographic records are assembled, using an internal counter that increments across simulation runs:

ids = [f"{i:0{id_length}d}" for i in range(start, start + total_patients)]

This guarantees reproducibility and uniqueness, even when multiple populations are created sequentially.


Field: name#

Names are generated using the Faker library to simulate realistic and gender-consistent identifiers.

Format

  • Type: string

  • Example: "Emma Clark", "James Turner"

  • Locale: "en_US" by default (can be customized, e.g. "es_ES" for Spanish names).

  • Stored only in the patients table (not replicated in appointments).

Logic

  • For sex = "Female"fake.name_female()

  • For sex = "Male"fake.name_male()

  • The resulting name is appended to the patient record alongside patient_id, sex, and dob.

This approach adds realism for UI mockups or de-identified educational datasets, without any link to real individuals.

patients.append({
    "name": self.fake.name_female(),
    "sex": "Female",
    "age": int(age)
})

Example#

from medscheduler import AppointmentScheduler

sched = AppointmentScheduler(seed=42)
patients = sched.generate_patients(total_patients=5)
print(patients)

Example output:

patient_id

name

sex

dob

00001

Emily Lewis

Female

1987-05-22

00002

James Turner

Male

1962-10-03

00003

Sarah Murray

Female

1975-08-16

00004

Robert Hill

Male

1953-12-30

00005

Olivia Clark

Female

1967-01-19



References#


Next steps#