# Adding Custom Categorical Variables ## Overview Beyond demographic and scheduling parameters, **`AppointmentScheduler`** allows extending the synthetic dataset with new **categorical variables** — such as insurance type, geographic region, or clinic branch — using the `add_custom_column()` method. These additional fields enrich the simulated patient table (`patients_df`) without altering the core data-generation process. You will explore: 1. Adding a balanced attribute (uniform distribution). 2. Adding a skewed regional attribute (normal distribution). 3. Adding a highly imbalanced attribute (Pareto distribution). --- ## Example 1 – Uniform distribution (balanced attribute) A new variable `insurance_type` evenly splits the population into *Public* and *Private* coverage. ```python from medscheduler import AppointmentScheduler from medscheduler.utils.plotting import plot_custom_column_distribution # Generate baseline dataset sched_uniform = AppointmentScheduler() slots_df, appts_df, patients_df = sched_uniform.generate() # Add insurance type with uniform probability sched_uniform.add_custom_column( column_name="insurance_type", categories=["Public", "Private"], distribution_type="uniform" ) plot_custom_column_distribution(patients_df, column="insurance_type") ``` **Output preview:** Below are the main results for this configuration: 1. **Category distribution** – Nearly equal proportions across insurance groups. ![Custom column distribution – Uniform](../_static/visuals/examples/custom_columns/sched_uniform_plot_custom_column_distribution.png) **Interpretation:** Uniform distributions are ideal for balanced segmentation variables, ensuring each group is equally represented — useful for fair testing or visualization purposes. --- ## Example 2 – Normal distribution (regional variation) Now we introduce a three-region attribute, with the central region most frequent and the outer ones less represented. ```python sched_region = AppointmentScheduler() slots_df, appts_df, patients_df = sched_region.generate() sched_region.add_custom_column( column_name="region", categories=["North", "Center", "South", "West", "East"], distribution_type="normal" ) plot_custom_column_distribution(patients_df, column="region") ``` **Output preview:** 1. **Category distribution** – Bell-shaped distribution with a dominant “Center” group. ![Custom column distribution – Normal](../_static/visuals/examples/custom_columns/sched_region_plot_custom_column_distribution.png) **Interpretation:** The *normal* model generates a realistic middle-heavy pattern, useful for representing geographically centered populations or other naturally clustered attributes. --- ## Example 3 – Pareto distribution (dominant providers) Finally, we simulate a field representing health insurance providers, where a few dominate the market while others serve smaller shares. ```python sched_pareto = AppointmentScheduler() slots_df, appts_df, patients_df = sched_pareto.generate() sched_pareto.add_custom_column( column_name="insurance_provider", categories=[ "Vitalynx Orbit", "Carebubble Spectrum", "CurativeWhale", "HealthZenotron", "Mediflora Nexus", "BioCrest Harmony", "MediNimbus", "QuantumPetal", "Heliospring Vital", "MediSpectra Flux", "CuraQuark Alliance" ], distribution_type="pareto" ) plot_custom_column_distribution(patients_df, column="insurance_provider") ``` **Output preview:** 1. **Category distribution** – Heavily right-skewed proportions showing a few dominant insurers. ![Custom column distribution – Pareto](../_static/visuals/examples/custom_columns/sched_pareto_plot_custom_column_distribution.png) **Interpretation:** The *Pareto* model mirrors real-world skewness, where a small number of categories account for most observations. Such variables are useful when simulating unequal resource distribution or modeling provider market share. --- ## Summary | Scenario | Distribution type | Typical shape | Use case | |-----------|-------------------|----------------|-----------| | **Insurance type** | Uniform | Flat | Balanced segmentation | | **Region** | Normal | Bell-shaped | Central dominance | | **Insurance provider** | Pareto | Right-skewed | Market inequality | --- ### Notes - Each column is reproducible under the same `seed`. - Probabilities can also be manually supplied via `custom_probs`. - Added fields integrate seamlessly into `patients_df` for analysis, joining, or visualization. --- ### Next Steps - Review {doc}`../api-reference/patients_table` to see how these columns integrate into patient data. - Explore {doc}`../api-reference/randomness_and_noise` for details on stochastic sampling. - Combine with demographic or flow parameters to create richer simulation scenarios. - Return to {doc}`patient_flow_demographics` to visualize how added attributes interact with baseline population structure.