#pip install textstat pdfplumber python-docx kaleido

Requirement already satisfied: textstat in c:\users\victo\anaconda3\lib\site-packages (0.7.13)Note: you may need to restart the kernel to use updated packages.

Requirement already satisfied: pdfplumber in c:\users\victo\anaconda3\lib\site-packages (0.11.10)
Requirement already satisfied: python-docx in c:\users\victo\anaconda3\lib\site-packages (1.2.0)
Collecting kaleido
  Downloading kaleido-1.3.0-py3-none-any.whl (55 kB)
     ---------------------------------------- 55.6/55.6 kB ? eta 0:00:00
Requirement already satisfied: nltk in c:\users\victo\anaconda3\lib\site-packages (from textstat) (3.7)
Requirement already satisfied: setuptools in c:\users\victo\anaconda3\lib\site-packages (from textstat) (65.6.3)
Requirement already satisfied: pyphen in c:\users\victo\anaconda3\lib\site-packages (from textstat) (0.17.2)
Requirement already satisfied: pypdfium2>=5.9.0 in c:\users\victo\anaconda3\lib\site-packages (from pdfplumber) (5.10.1)
Requirement already satisfied: pdfminer.six==20260107 in c:\users\victo\anaconda3\lib\site-packages (from pdfplumber) (20260107)
Requirement already satisfied: Pillow>=12.2.0 in c:\users\victo\anaconda3\lib\site-packages (from pdfplumber) (12.2.0)
Requirement already satisfied: charset-normalizer>=2.0.0 in c:\users\victo\anaconda3\lib\site-packages (from pdfminer.six==20260107->pdfplumber) (2.0.4)
Requirement already satisfied: cryptography>=36.0.0 in c:\users\victo\anaconda3\lib\site-packages (from pdfminer.six==20260107->pdfplumber) (39.0.1)
Requirement already satisfied: lxml>=3.1.0 in c:\users\victo\anaconda3\lib\site-packages (from python-docx) (6.1.1)
Requirement already satisfied: typing_extensions>=4.9.0 in c:\users\victo\anaconda3\lib\site-packages (from python-docx) (4.15.0)
Requirement already satisfied: packaging in c:\users\victo\anaconda3\lib\site-packages (from kaleido) (22.0)
Collecting logistro>=1.0.8
  Downloading logistro-2.0.1-py3-none-any.whl (8.6 kB)
Collecting orjson>=3.10.15
  Downloading orjson-3.11.9-cp310-cp310-win_amd64.whl (127 kB)
     -------------------------------------- 127.3/127.3 kB 7.3 MB/s eta 0:00:00
Collecting choreographer>=1.3.0
  Downloading choreographer-1.3.0-py3-none-any.whl (52 kB)
     ---------------------------------------- 52.6/52.6 kB 2.6 MB/s eta 0:00:00
Collecting platformdirs>=4.3.6
  Downloading platformdirs-4.10.0-py3-none-any.whl (22 kB)
Collecting simplejson>=3.19.3
  Downloading simplejson-4.1.1-cp310-cp310-win_amd64.whl (90 kB)
     ---------------------------------------- 90.5/90.5 kB 5.3 MB/s eta 0:00:00
Requirement already satisfied: tqdm in c:\users\victo\anaconda3\lib\site-packages (from nltk->textstat) (4.64.1)
Requirement already satisfied: click in c:\users\victo\anaconda3\lib\site-packages (from nltk->textstat) (8.0.4)
Requirement already satisfied: regex>=2021.8.3 in c:\users\victo\anaconda3\lib\site-packages (from nltk->textstat) (2022.7.9)
Requirement already satisfied: joblib in c:\users\victo\anaconda3\lib\site-packages (from nltk->textstat) (1.1.1)
Requirement already satisfied: cffi>=1.12 in c:\users\victo\anaconda3\lib\site-packages (from cryptography>=36.0.0->pdfminer.six==20260107->pdfplumber) (1.15.1)
Requirement already satisfied: colorama in c:\users\victo\anaconda3\lib\site-packages (from click->nltk->textstat) (0.4.6)
Requirement already satisfied: pycparser in c:\users\victo\anaconda3\lib\site-packages (from cffi>=1.12->cryptography>=36.0.0->pdfminer.six==20260107->pdfplumber) (2.21)
Installing collected packages: simplejson, platformdirs, orjson, logistro, choreographer, kaleido
  Attempting uninstall: platformdirs
    Found existing installation: platformdirs 2.5.2
    Uninstalling platformdirs-2.5.2:
      Successfully uninstalled platformdirs-2.5.2
Successfully installed choreographer-1.3.0 kaleido-1.3.0 logistro-2.0.1 orjson-3.11.9 platformdirs-4.10.0 simplejson-4.1.1


### Readability Scoring


import pdfplumber
import textstat
import csv
import os
from docx import Document
from datetime import datetime


def extract_text_from_pdf(path):
    with pdfplumber.open(path) as pdf:
        return " ".join(
            page.extract_text() for page in pdf.pages
            if page.extract_text()
        )

def extract_text_from_docx(path):
    doc = Document(path)
    return " ".join(para.text for para in doc.paragraphs if para.text)


documents = [
    {"insurer": "Aetna", "state": "Georgia", "doc_type": "Web Privacy Policy", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\Aetna privacy.docx"},
    {"insurer": "Anthem BCBS", "state": "Georgia", "doc_type": "HIPAA Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\anthem BCBS privacy practices.pdf"},
    {"insurer": "Anthem BCBS", "state": "Georgia", "doc_type": "HIPAA Notice (Spanish)", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\anthem BCBS privacy spanish.pdf"},
    {"insurer": "Cigna", "state": "Georgia", "doc_type": "Data Sharing Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\Cigna privacy data sharing.docx"},
    {"insurer": "Cigna", "state": "Georgia", "doc_type": "Global Health Benefits Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\cigna-global-health-benefits-privacy-notice-eng_copy.pdf"},
    {"insurer": "Cigna", "state": "Georgia", "doc_type": "HIPAA Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\cigna-health-care-and-cigna-supplemental-benefits-privacy-notice-eng_copy.pdf"},
    {"insurer": "Cigna", "state": "Georgia", "doc_type": "GLB Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\gramm-leach-bliley-act-privacy-notice_copy.pdf"},
    {"insurer": "Humana", "state": "Georgia", "doc_type": "HIPAA Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\humana privacy practices.pdf"},
    {"insurer": "UnitedHealthcare", "state": "Georgia", "doc_type": "Web Privacy Policy", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\UHC privacy.docx"},
    {"insurer": "UnitedHealthcare", "state": "Georgia", "doc_type": "HIPAA Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\united hipaa privacy.pdf"},
]

run_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
results = []


for doc in documents:
    print(f"Scoring: {doc['insurer']} - {doc['doc_type']}")
    try:
        if doc["path"].endswith(".pdf"):
            text = extract_text_from_pdf(doc["path"])
        elif doc["path"].endswith(".docx"):
            text = extract_text_from_docx(doc["path"])
        else:
            print(f"Unsupported file type: {doc['path']}")
            continue

        results.append({
            "run_timestamp": run_timestamp,
            "insurer": doc["insurer"],
            "state": doc["state"],
            "doc_type": doc["doc_type"],
            "word_count": textstat.lexicon_count(text),
            "flesch_kincaid_grade": round(textstat.flesch_kincaid_grade(text), 2),
            "flesch_reading_ease": round(textstat.flesch_reading_ease(text), 2),
            "smog_index": round(textstat.smog_index(text), 2),
            "path": doc["path"],
        })

    except Exception as e:
        print(f"Error processing {doc['path']}: {e}")

output_path = r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\readability_scores.csv"

Scoring: Aetna - Web Privacy Policy
Scoring: Anthem BCBS - HIPAA Notice
Scoring: Anthem BCBS - HIPAA Notice (Spanish)
Scoring: Cigna - Data Sharing Notice
Scoring: Cigna - Global Health Benefits Notice
Scoring: Cigna - HIPAA Notice
Scoring: Cigna - GLB Notice
Scoring: Humana - HIPAA Notice
Scoring: UnitedHealthcare - Web Privacy Policy
Scoring: UnitedHealthcare - HIPAA Notice


existing_rows = []
if os.path.exists(output_path):
    with open(output_path, "r", newline="") as f:
        reader = csv.DictReader(f)
        existing_rows = list(reader)

existing_keys = {
    (row["insurer"], row["state"], row["doc_type"], row["run_timestamp"])
    for row in existing_rows
}

new_rows = [
    r for r in results
    if (r["insurer"], r["state"], r["doc_type"], r["run_timestamp"])
    not in existing_keys
]

all_rows = existing_rows + new_rows

with open(output_path, "w", newline="") as f:
    fieldnames = ["run_timestamp", "insurer", "state", "doc_type",
                  "word_count", "flesch_kincaid_grade", "flesch_reading_ease",
                  "smog_index", "path"]
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(all_rows)

print(f"\nRun timestamp: {run_timestamp}")
print(f"New rows added: {len(new_rows)}")
print(f"Total rows in file: {len(all_rows)}")
print(f"Results saved to {output_path}")

Run timestamp: 2026-06-17 00:15:55
New rows added: 10
Total rows in file: 10
Results saved to C:\Users\victo\OneDrive\Desktop\Privacy Policies\readability_scores.csv


### Visualizations


import pandas as pd
import plotly.graph_objects as go

df = pd.read_csv(r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\readability_scores.csv")

latest = (
    df.sort_values("run_timestamp", ascending=False)
    .drop_duplicates(subset=["insurer", "state", "doc_type"])
)

insurer_avg = (
    latest.groupby("insurer")["flesch_kincaid_grade"]
    .max()
    .sort_values(ascending=False)
)

insurer_order = {insurer: i for i, insurer in enumerate(insurer_avg.index)}
latest["insurer_rank"] = latest["insurer"].map(insurer_order)

latest = latest.sort_values(
    ["insurer_rank", "flesch_kincaid_grade"],
    ascending=[False, False]
)

latest["label"] = latest["insurer"] + "<br>  " + latest["doc_type"]

fig = go.Figure()

fig.add_trace(go.Bar(
    y=latest["label"],
    x=latest["smog_index"],
    name="SMOG index",
    orientation="h",
    marker=dict(color="#888780", opacity=0.5),
    customdata=latest[["insurer", "doc_type", "smog_index"]].values,
    hovertemplate=(
        "<b>%{customdata[0]}</b><br>"
        "%{customdata[1]}<br>"
        "SMOG index: <b>%{customdata[2]:.1f}</b>"
        "<extra></extra>"
    ),
))

fig.add_trace(go.Bar(
    y=latest["label"],
    x=latest["flesch_kincaid_grade"],
    name="Flesch-Kincaid grade level",
    orientation="h",
    marker=dict(color="#3266ad", opacity=0.9),
    customdata=latest[[
        "insurer", "doc_type", "word_count",
        "flesch_reading_ease", "smog_index", "run_timestamp"
    ]].values,
    hovertemplate=(
        "<b>%{customdata[0]}</b><br>"
        "%{customdata[1]}<br>"
        "<br>"
        "Flesch-Kincaid grade: <b>%{x:.1f}</b><br>"
        "Flesch reading ease: <b>%{customdata[3]:.1f}</b><br>"
        "SMOG index: <b>%{customdata[4]:.1f}</b><br>"
        "Word count: <b>%{customdata[2]:,}</b><br>"
        "<br>"
        "<i>Scored: %{customdata[5]}</i>"
        "<extra></extra>"
    ),
))

fig.add_vline(
    x=6,
    line_dash="dash",
    line_color="#c0392b",
    line_width=1.5,
)

fig.update_layout(
    barmode="overlay",
    title=dict(
        text=(
            "Readability of private insurer privacy documents<br>"
            "<sup>Georgia primary case, documents scored June 17, 2026, "
            "dashed line = AMA recommended threshold (grade 6)</sup>"
        ),
        font=dict(size=16, color="#333"),
    ),
    xaxis=dict(
        title="Grade level required to understand document",
        range=[0, 22],
        tickvals=[0, 3, 6, 9, 12, 15, 18, 21],
        ticktext=["0", "3rd", "6th", "9th", "12th", "15th", "18th", "21st"],
        gridcolor="#eeeeee",
        gridwidth=1,
    ),
    yaxis=dict(
        title=None,
        automargin=True,
        tickfont=dict(size=11),
    ),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=-0.08,
        xanchor="center",
        x=0.5,
        font=dict(size=12),
    ),
    height=700,
    margin=dict(l=20, r=20, t=100, b=100),
    plot_bgcolor="white",
    paper_bgcolor="white",
    font=dict(family="Arial", size=12, color="#333"),
)

output_html = r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\readability_chart_grouped.html"
output_png = r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\readability_chart_grouped.png"

fig.write_html(output_html, include_plotlyjs="cdn")
print(f"Interactive chart saved to {output_html}")

try:
    fig.write_image(output_png, width=1200, height=700, scale=2)
    print(f"Static image saved to {output_png}")
except Exception as e:
    print(f"PNG export skipped: {e}")

fig.show()

Interactive chart saved to C:\Users\victo\OneDrive\Desktop\Privacy Policies\readability_chart_grouped.html
PNG export skipped: 
Image export using the "kaleido" engine requires the kaleido package,
which can be installed using pip:
    $ pip install -U kaleido

Readability Analysis¶

Overview¶

Metrics¶

Flesch-Kincaid Grade Level¶

Flesch Reading Ease¶

SMOG Index¶

Why Readability Matters for This Population¶

Key Findings¶

Methods¶

References¶

Generative AI Statement¶