#pip install textstat pdfplumber python-docx kaleido
Requirement already satisfied: textstat in c:\users\victo\anaconda3\lib\site-packages (0.7.13)Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: pdfplumber in c:\users\victo\anaconda3\lib\site-packages (0.11.10)
Requirement already satisfied: python-docx in c:\users\victo\anaconda3\lib\site-packages (1.2.0)
Collecting kaleido
Downloading kaleido-1.3.0-py3-none-any.whl (55 kB)
---------------------------------------- 55.6/55.6 kB ? eta 0:00:00
Requirement already satisfied: nltk in c:\users\victo\anaconda3\lib\site-packages (from textstat) (3.7)
Requirement already satisfied: setuptools in c:\users\victo\anaconda3\lib\site-packages (from textstat) (65.6.3)
Requirement already satisfied: pyphen in c:\users\victo\anaconda3\lib\site-packages (from textstat) (0.17.2)
Requirement already satisfied: pypdfium2>=5.9.0 in c:\users\victo\anaconda3\lib\site-packages (from pdfplumber) (5.10.1)
Requirement already satisfied: pdfminer.six==20260107 in c:\users\victo\anaconda3\lib\site-packages (from pdfplumber) (20260107)
Requirement already satisfied: Pillow>=12.2.0 in c:\users\victo\anaconda3\lib\site-packages (from pdfplumber) (12.2.0)
Requirement already satisfied: charset-normalizer>=2.0.0 in c:\users\victo\anaconda3\lib\site-packages (from pdfminer.six==20260107->pdfplumber) (2.0.4)
Requirement already satisfied: cryptography>=36.0.0 in c:\users\victo\anaconda3\lib\site-packages (from pdfminer.six==20260107->pdfplumber) (39.0.1)
Requirement already satisfied: lxml>=3.1.0 in c:\users\victo\anaconda3\lib\site-packages (from python-docx) (6.1.1)
Requirement already satisfied: typing_extensions>=4.9.0 in c:\users\victo\anaconda3\lib\site-packages (from python-docx) (4.15.0)
Requirement already satisfied: packaging in c:\users\victo\anaconda3\lib\site-packages (from kaleido) (22.0)
Collecting logistro>=1.0.8
Downloading logistro-2.0.1-py3-none-any.whl (8.6 kB)
Collecting orjson>=3.10.15
Downloading orjson-3.11.9-cp310-cp310-win_amd64.whl (127 kB)
-------------------------------------- 127.3/127.3 kB 7.3 MB/s eta 0:00:00
Collecting choreographer>=1.3.0
Downloading choreographer-1.3.0-py3-none-any.whl (52 kB)
---------------------------------------- 52.6/52.6 kB 2.6 MB/s eta 0:00:00
Collecting platformdirs>=4.3.6
Downloading platformdirs-4.10.0-py3-none-any.whl (22 kB)
Collecting simplejson>=3.19.3
Downloading simplejson-4.1.1-cp310-cp310-win_amd64.whl (90 kB)
---------------------------------------- 90.5/90.5 kB 5.3 MB/s eta 0:00:00
Requirement already satisfied: tqdm in c:\users\victo\anaconda3\lib\site-packages (from nltk->textstat) (4.64.1)
Requirement already satisfied: click in c:\users\victo\anaconda3\lib\site-packages (from nltk->textstat) (8.0.4)
Requirement already satisfied: regex>=2021.8.3 in c:\users\victo\anaconda3\lib\site-packages (from nltk->textstat) (2022.7.9)
Requirement already satisfied: joblib in c:\users\victo\anaconda3\lib\site-packages (from nltk->textstat) (1.1.1)
Requirement already satisfied: cffi>=1.12 in c:\users\victo\anaconda3\lib\site-packages (from cryptography>=36.0.0->pdfminer.six==20260107->pdfplumber) (1.15.1)
Requirement already satisfied: colorama in c:\users\victo\anaconda3\lib\site-packages (from click->nltk->textstat) (0.4.6)
Requirement already satisfied: pycparser in c:\users\victo\anaconda3\lib\site-packages (from cffi>=1.12->cryptography>=36.0.0->pdfminer.six==20260107->pdfplumber) (2.21)
Installing collected packages: simplejson, platformdirs, orjson, logistro, choreographer, kaleido
Attempting uninstall: platformdirs
Found existing installation: platformdirs 2.5.2
Uninstalling platformdirs-2.5.2:
Successfully uninstalled platformdirs-2.5.2
Successfully installed choreographer-1.3.0 kaleido-1.3.0 logistro-2.0.1 orjson-3.11.9 platformdirs-4.10.0 simplejson-4.1.1
This analysis scores the reading difficulty of member-facing privacy documents from five major private insurers operating in Georgia. Three validated readability metrics are applied to each document to assess whether members can meaningfully understand and act on the consent terms they are presented with.
Translates readability into a US school grade level using average sentence length and average syllables per word. A score of 8 means the document requires an 8th grade reading level. The American Medical Association recommends patient health materials be written at or below a 6th grade reading level.
A 0–100 scale where higher scores indicate easier reading. Documents scoring below 30 are considered very difficult and typically require a college education to comprehend. Scores of 60–70 are considered standard and accessible to most adults. Historically used in state insurance regulation to set minimum readability requirements for policy documents.
The Simple Measure of Gobbledygook estimates years of education required to understand a document based on the density of polysyllabic words, meaning words with three or more syllables. Considered more accurate than Flesch-Kincaid for health materials specifically (McLaughlin, 1969) and is the metric most commonly used in health literacy research.
1 in 5 US adults may find dense consent documents difficult to read, making truly informed consent a challenge (Kutner et al., 2006). Lower health literacy is more prevalent among racial and ethnic minorities, older adults, people with low educational attainment, and people with chronic illness (Berkman et al., 2011).
A systematic review of 114 US medical school IRB consent forms found a mean Flesch-Kincaid score of grade 10.6, exceeding AMA readability standards by 4.6 grade levels (Paasche-Orlow et al., 2003).
People navigating insurance consent during periods of housing instability, reentry from incarceration, or recovery from substance use disorder face compounded barriers. They are disproportionately likely to have lower health literacy, less stable access to the internet or printing, and less capacity to engage in multi-step bureaucratic opt-out processes. For this population, a postgraduate level consent document is not an inconvenience. It is a structural barrier to meaningful consent.
All ten documents in the Georgia primary case exceed the AMA recommended 6th grade threshold. Scores range from grade 6.7 (UnitedHealthcare HIPAA Notice) to grade 18.4 (UnitedHealthcare Online Privacy Policy).
The documents governing the broadest data collection and offering the least member control are consistently the hardest to read. The two most extreme examples belong to the same insurer. UnitedHealthcare's HIPAA notice is the most readable document in the dataset at grade 6.7 while its online privacy policy is the hardest at grade 18.4. The HIPAA notice covers the narrowest data practices. The online privacy policy covers location data, behavioral tracking, device identifiers, and third-party data combining with no unified opt-out.
The Cigna Gramm-Leach-Bliley notice scores grade 16.5. It is the only document in the dataset that explicitly states members cannot limit data sharing. The hardest documents to read contain the most harmful terms.
The Anthem BCBS Spanish notice scores grade 12.4, harder than the English version at grade 9.9. Spanish-speaking members receive a harder document than English-speaking members governed by identical terms.
Text was extracted from PDF documents using pdfplumber and from Word documents
using python-docx. Readability scores were computed using the textstat Python
library. Each document was scored in full without section-level filtering. All runs
are timestamped to enable reproducibility and comparison across document versions.
See scripts/readability_scoring.py for the full scoring script.
### Readability Scoring
import pdfplumber
import textstat
import csv
import os
from docx import Document
from datetime import datetime
def extract_text_from_pdf(path):
with pdfplumber.open(path) as pdf:
return " ".join(
page.extract_text() for page in pdf.pages
if page.extract_text()
)
def extract_text_from_docx(path):
doc = Document(path)
return " ".join(para.text for para in doc.paragraphs if para.text)
documents = [
{"insurer": "Aetna", "state": "Georgia", "doc_type": "Web Privacy Policy", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\Aetna privacy.docx"},
{"insurer": "Anthem BCBS", "state": "Georgia", "doc_type": "HIPAA Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\anthem BCBS privacy practices.pdf"},
{"insurer": "Anthem BCBS", "state": "Georgia", "doc_type": "HIPAA Notice (Spanish)", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\anthem BCBS privacy spanish.pdf"},
{"insurer": "Cigna", "state": "Georgia", "doc_type": "Data Sharing Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\Cigna privacy data sharing.docx"},
{"insurer": "Cigna", "state": "Georgia", "doc_type": "Global Health Benefits Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\cigna-global-health-benefits-privacy-notice-eng_copy.pdf"},
{"insurer": "Cigna", "state": "Georgia", "doc_type": "HIPAA Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\cigna-health-care-and-cigna-supplemental-benefits-privacy-notice-eng_copy.pdf"},
{"insurer": "Cigna", "state": "Georgia", "doc_type": "GLB Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\gramm-leach-bliley-act-privacy-notice_copy.pdf"},
{"insurer": "Humana", "state": "Georgia", "doc_type": "HIPAA Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\humana privacy practices.pdf"},
{"insurer": "UnitedHealthcare", "state": "Georgia", "doc_type": "Web Privacy Policy", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\UHC privacy.docx"},
{"insurer": "UnitedHealthcare", "state": "Georgia", "doc_type": "HIPAA Notice", "path": r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\united hipaa privacy.pdf"},
]
run_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
results = []
for doc in documents:
print(f"Scoring: {doc['insurer']} - {doc['doc_type']}")
try:
if doc["path"].endswith(".pdf"):
text = extract_text_from_pdf(doc["path"])
elif doc["path"].endswith(".docx"):
text = extract_text_from_docx(doc["path"])
else:
print(f"Unsupported file type: {doc['path']}")
continue
results.append({
"run_timestamp": run_timestamp,
"insurer": doc["insurer"],
"state": doc["state"],
"doc_type": doc["doc_type"],
"word_count": textstat.lexicon_count(text),
"flesch_kincaid_grade": round(textstat.flesch_kincaid_grade(text), 2),
"flesch_reading_ease": round(textstat.flesch_reading_ease(text), 2),
"smog_index": round(textstat.smog_index(text), 2),
"path": doc["path"],
})
except Exception as e:
print(f"Error processing {doc['path']}: {e}")
output_path = r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\readability_scores.csv"
Scoring: Aetna - Web Privacy Policy Scoring: Anthem BCBS - HIPAA Notice Scoring: Anthem BCBS - HIPAA Notice (Spanish) Scoring: Cigna - Data Sharing Notice Scoring: Cigna - Global Health Benefits Notice Scoring: Cigna - HIPAA Notice Scoring: Cigna - GLB Notice Scoring: Humana - HIPAA Notice Scoring: UnitedHealthcare - Web Privacy Policy Scoring: UnitedHealthcare - HIPAA Notice
existing_rows = []
if os.path.exists(output_path):
with open(output_path, "r", newline="") as f:
reader = csv.DictReader(f)
existing_rows = list(reader)
existing_keys = {
(row["insurer"], row["state"], row["doc_type"], row["run_timestamp"])
for row in existing_rows
}
new_rows = [
r for r in results
if (r["insurer"], r["state"], r["doc_type"], r["run_timestamp"])
not in existing_keys
]
all_rows = existing_rows + new_rows
with open(output_path, "w", newline="") as f:
fieldnames = ["run_timestamp", "insurer", "state", "doc_type",
"word_count", "flesch_kincaid_grade", "flesch_reading_ease",
"smog_index", "path"]
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(all_rows)
print(f"\nRun timestamp: {run_timestamp}")
print(f"New rows added: {len(new_rows)}")
print(f"Total rows in file: {len(all_rows)}")
print(f"Results saved to {output_path}")
Run timestamp: 2026-06-17 00:15:55 New rows added: 10 Total rows in file: 10 Results saved to C:\Users\victo\OneDrive\Desktop\Privacy Policies\readability_scores.csv
### Visualizations
import pandas as pd
import plotly.graph_objects as go
df = pd.read_csv(r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\readability_scores.csv")
latest = (
df.sort_values("run_timestamp", ascending=False)
.drop_duplicates(subset=["insurer", "state", "doc_type"])
)
insurer_avg = (
latest.groupby("insurer")["flesch_kincaid_grade"]
.max()
.sort_values(ascending=False)
)
insurer_order = {insurer: i for i, insurer in enumerate(insurer_avg.index)}
latest["insurer_rank"] = latest["insurer"].map(insurer_order)
latest = latest.sort_values(
["insurer_rank", "flesch_kincaid_grade"],
ascending=[False, False]
)
latest["label"] = latest["insurer"] + "<br> " + latest["doc_type"]
fig = go.Figure()
fig.add_trace(go.Bar(
y=latest["label"],
x=latest["smog_index"],
name="SMOG index",
orientation="h",
marker=dict(color="#888780", opacity=0.5),
customdata=latest[["insurer", "doc_type", "smog_index"]].values,
hovertemplate=(
"<b>%{customdata[0]}</b><br>"
"%{customdata[1]}<br>"
"SMOG index: <b>%{customdata[2]:.1f}</b>"
"<extra></extra>"
),
))
fig.add_trace(go.Bar(
y=latest["label"],
x=latest["flesch_kincaid_grade"],
name="Flesch-Kincaid grade level",
orientation="h",
marker=dict(color="#3266ad", opacity=0.9),
customdata=latest[[
"insurer", "doc_type", "word_count",
"flesch_reading_ease", "smog_index", "run_timestamp"
]].values,
hovertemplate=(
"<b>%{customdata[0]}</b><br>"
"%{customdata[1]}<br>"
"<br>"
"Flesch-Kincaid grade: <b>%{x:.1f}</b><br>"
"Flesch reading ease: <b>%{customdata[3]:.1f}</b><br>"
"SMOG index: <b>%{customdata[4]:.1f}</b><br>"
"Word count: <b>%{customdata[2]:,}</b><br>"
"<br>"
"<i>Scored: %{customdata[5]}</i>"
"<extra></extra>"
),
))
fig.add_vline(
x=6,
line_dash="dash",
line_color="#c0392b",
line_width=1.5,
)
fig.update_layout(
barmode="overlay",
title=dict(
text=(
"Readability of private insurer privacy documents<br>"
"<sup>Georgia primary case, documents scored June 17, 2026, "
"dashed line = AMA recommended threshold (grade 6)</sup>"
),
font=dict(size=16, color="#333"),
),
xaxis=dict(
title="Grade level required to understand document",
range=[0, 22],
tickvals=[0, 3, 6, 9, 12, 15, 18, 21],
ticktext=["0", "3rd", "6th", "9th", "12th", "15th", "18th", "21st"],
gridcolor="#eeeeee",
gridwidth=1,
),
yaxis=dict(
title=None,
automargin=True,
tickfont=dict(size=11),
),
legend=dict(
orientation="h",
yanchor="top",
y=-0.08,
xanchor="center",
x=0.5,
font=dict(size=12),
),
height=700,
margin=dict(l=20, r=20, t=100, b=100),
plot_bgcolor="white",
paper_bgcolor="white",
font=dict(family="Arial", size=12, color="#333"),
)
output_html = r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\readability_chart_grouped.html"
output_png = r"C:\Users\victo\OneDrive\Desktop\Privacy Policies\readability_chart_grouped.png"
fig.write_html(output_html, include_plotlyjs="cdn")
print(f"Interactive chart saved to {output_html}")
try:
fig.write_image(output_png, width=1200, height=700, scale=2)
print(f"Static image saved to {output_png}")
except Exception as e:
print(f"PNG export skipped: {e}")
fig.show()
Interactive chart saved to C:\Users\victo\OneDrive\Desktop\Privacy Policies\readability_chart_grouped.html
PNG export skipped:
Image export using the "kaleido" engine requires the kaleido package,
which can be installed using pip:
$ pip install -U kaleido
Berkman, N. D., Sheridan, S. L., Donahue, K. E., Halpern, D. J., & Crotty, K. (2011). Low health literacy and health outcomes: An updated systematic review. Annals of Internal Medicine, 155(2), 97–107. https://doi.org/10.7326/0003-4819-155-2-201107190-00005
Kutner, M., Greenberg, E., Jin, Y., & Paulsen, C. (2006). The health literacy of America's adults: Results from the 2003 National Assessment of Adult Literacy (NCES 2006–483). U.S. Department of Education, National Center for Education Statistics. https://nces.ed.gov/pubs2006/2006483.pdf
McLaughlin, G. H. (1969). SMOG grading: A new readability formula. Journal of Reading, 12(8), 639–646.
Paasche-Orlow, M. K., Taylor, H. A., & Brancati, F. L. (2003). Readability standards for informed-consent forms as compared with actual readability. New England Journal of Medicine, 348(8), 721–726. https://doi.org/10.1056/NEJMsa021212
Rudd, R. E. (2010). Health literacy skills of U.S. adults. American Journal of Health Behavior, 31(Suppl 1), S8–S18.
Portions of this analysis were developed with assistance from Claude (Anthropic, claude-sonnet-4-6). AI assistance was used for code generation, literature identification, and methodological scaffolding. All citations were independently verified against primary sources. Coding decisions, interpretive judgments, and research conclusions are the author's own.