Tether
When someone is sent home from the hospital, the hardest part starts. Will they take their medications correctly? Are they getting worse? Should they call the doctor or wait? Tether is a phone app that sits with the patient at home and answers those questions in their own language. Doctors write a recovery plan once; the app turns it into daily guidance, listens for early warning signs in the patient's voice, and tells family or care teams when something needs attention.
Everything runs at the edge on Cloudflare with a dedicated Python FastAPI biomarker engine on a separate VPS for the heavier ML pipeline. The voice engine combines Praat acoustic analysis, YAMNet event classification, and trained ML classifiers; an on-device RustβWebAssembly fallback keeps the app working if the main engine is unreachable. The AI chat is grounded only in the doctor's published plan, so it never invents medical advice. Patients can use it by voice if they can't read or type.
Plan-Grounded AI Chat
Patients ask questions in plain language. The AI only answers from the doctor's published plan β never guesses.
Voice Biomarkers
Python FastAPI engine runs the 3-task voice protocol β Praat clinical voice quality, YAMNet event detection, WavLM voice fingerprint, audeering emotion, Whisper transcription, plus 12 condition risk modules including Parkinson's, fatigue, and Alzheimer's screening. Rust WASM fallback in the Cloudflare Worker.
Engine Connection
Biomarker results feed into AI context β so "How is my breathing?" gets a real, data-backed answer.
Red Flag Escalation
When the AI detects a red flag symptom, it marks the response urgent and suggests contacting the care team.
Protocol Library
One-click templates for pneumonia, heart failure, COPD, post-surgical recovery, and type-2 diabetes β each with medications, daily steps, and red flags pre-filled. Doctors load a template, edit, publish.
Caregiver Portal
Family members and trusted contacts get a read-only dashboard of the patient's plan, latest biomarkers, and 7-day medication adherence. Patients add caregivers by email β full opt-in consent.
In Plain English
The problem
Roughly one in five patients sent home from a hospital ends up back in the emergency room within 30 days. The main reasons are not surprising: people forget medication doses, miss the early signs that things are going wrong, or do not know whether a symptom is normal recovery or a real warning. Doctors give a printed discharge summary, but it sits in a drawer. Family caregivers want to help but rarely have visibility.
What Tether does
Tether is three things in one app:
- A pocket version of the discharge plan. The doctor writes the plan once. The patient sees daily medications, daily activities, red-flag symptoms to watch for, and follow-up dates. Everything is in plain language and can be read out loud in their language.
- An AI assistant that only knows the doctor's plan. The patient asks "Can I take Tylenol?" or "Should I worry about this chest pain?" and the assistant answers using only what is in the plan. It never makes up medical advice. If the question matches a red-flag symptom the doctor listed, the answer is marked urgent.
- A voice check that listens for trouble. The patient records 10 seconds of their voice. A signal-processing engine running on Cloudflare's edge servers measures breathing rate, cough patterns, voice fatigue, and clinical voice-quality markers (jitter, shimmer, HNR). Numbers track over time, so a small change today against the patient's own baseline can flag a problem the patient does not notice.
Who is in the loop
- Patients get the daily guidance and assistant chat.
- Doctors see a recovery score dashboard sorted by risk, plus the patient's biomarker trends and adherence history.
- Family caregivers get a read-only dashboard if the patient invites them by email. They can see the plan, the last few biomarker readings, and which medication doses were taken.
Why it works without violating privacy
Voice recordings never leave the device except as raw PCM audio sent to the analysis endpoint, and even there they are not stored after analysis. Only the numerical biomarker results are saved. The chat AI runs through a Cloudflare Worker that never sees the patient's account in raw form. Passwords are hashed server-side with PBKDF2-SHA256 (100k iterations, per-user salt) and never persist as plaintext anywhere in the system. Caregiver access is opt-in and revocable by the patient.
What it does not do
- It is not a replacement for a doctor. The assistant cannot prescribe, diagnose, or give advice outside the published plan.
- It is not a medical device. Current biomarker accuracy is good enough to spot trends and prompt human review, not to make standalone clinical decisions.
- It is not a HIPAA-certified product yet. The technical foundation is correct (encryption, no PHI in logs) but the compliance audit and BAAs are part of the funded roadmap.
How It Works
Tether keeps patients and doctors connected after a hospital discharge. Here is the simple version:
1. The doctor creates a recovery plan
Before the patient leaves the hospital, their doctor opens Tether and fills in a personalized care plan: diagnosis, medications, daily instructions, warning signs to watch for, and a follow-up date. The doctor also picks a communication tone (calm, direct, or reassuring) so the app speaks the way the patient is most comfortable with.
2. The patient gets a personal AI companion
When the patient logs in, they see their plan and can ask questions in plain language β by typing or speaking. The AI only answers using information from the doctor's plan, never guessing or making things up. Every response includes a readability score so caregivers can verify the language is easy enough to understand.
3. Voice biomarkers track recovery
The patient runs a quick standardised 3-task voice protocol (~40 seconds): hold "ahhh" for 5 seconds, read a fixed sentence from the Rainbow Passage, then a 10-second symptom check-in. Tether's Python FastAPI engine at biomarker.arhan.dev analyses the audio for breathing rate, cough patterns, vocal energy, voice tremor, articulation precision, and signs of twelve different conditions: respiratory infection, common cold, cardiovascular stress, voice pathology, neurological signs (including Parkinson's-style patterns), fatigue, sleep-disordered breathing, anxiety/panic, hyperventilation, mild dehydration, vocal overuse, and Alzheimer's / MCI screening. The engine also produces a single 0-100 Healthy Voice Index headline number, estimates voice age, and computes cross-task contrasts that no single clip can capture. Patients see results in seconds; doctors see the full audit trail. These biomarkers are tracked over time so the doctor can spot trends without an in-person visit.
4. The two engines talk to each other
This is what makes Tether different. The voice biomarker results are automatically shared with the AI companion. So if the patient asks "How is my breathing?", the AI already knows the latest voice check showed an elevated breathing rate and can give a relevant, grounded answer β not a generic one.
5. The app says "I don't know" when it isn't sure
Every Tether prediction comes with a confidence label. When the model is between "definitely positive" and "definitely negative" β what statisticians call the inconclusive band β the app says so directly: "We couldn't tell from this recording β try a longer sample in a quieter room." Most voice biomarker tools force a yes/no answer even when they're uncertain; Tether is honest about the gray zone. Clinicians trust models that admit doubt.
6. The app catches bad recordings before they confuse anyone
Before analyzing audio, Tether checks the recording itself β is it too quiet? clipping? mostly silence? too much background noise? If the recording isn't usable, the app tells the patient exactly what went wrong ("too much background noise, find a quieter room") and offers a one-tap retry. No more misleading reports based on a recording that was never going to work. There's also a live microphone level meter while you record so you can see your voice reaching the phone in real time.
7. Patients see their voice over time, not just today
Tether's history view shows every previous recording with little trend charts for each measurement (jitter, voice clarity, breathing rate, pitch, energy, and more). Each metric has a direction arrow: green if it's holding steady or improving, amber if it's drifting, red if it's clearly worse than the patient's own baseline. For each tracked condition, a separate trend screen shows the 14-day trajectory of the risk score and flags anything that's been climbing three readings in a row. A snapshot is a parlor trick; a trend is medicine.
8. The patient tags how they feel, not just how they sound
Right after every recording, Tether pops a quick chip selector β Tired? Headache? Cough? Sore throat? Short of breath? Stressed? Just checking in? Patients tap whatever applies (or skip if they're in a hurry) and the tags are saved alongside the acoustic measurement. The doctor reads voice and symptoms together, which is the only way to interpret either responsibly. This also builds Tether's private dataset over time, which becomes invaluable for future model improvements.
9. Share with any doctor β not just Tether ones
Every biomarker report has a "Share PDF" button that generates a clean printable summary β voice quality measurements, classifier confidence, condition risks, and a clinical disclaimer β and pops the phone's share sheet. Patients can email it to their primary-care physician, attach it to their existing electronic health record, or print it for an in-person visit. PDFs are the universal language of healthcare; every clinic can read one.
10. Humans stay in the loop
If the AI cannot fully answer a question, it suggests the patient message their doctor directly. Doctors see these messages in real time and can reply. The AI never replaces the doctor β it bridges the gap between hospital visits so patients are never left guessing alone.
11. Works in the patient's language
Patients can switch between 26 languages β English, Spanish, Hindi, Mandarin, French, Arabic, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Bengali, Urdu, Tagalog, Swahili, Turkish, Polish, Dutch, Greek, Hebrew, Thai, Indonesian, Punjabi, and Ukrainian. The AI responds and speaks in their chosen language, removing a major barrier to understanding medical instructions after discharge.
12. Built to keep working when something breaks
If the Python biomarker engine on biomarker.arhan.dev has a problem, Tether automatically falls back to the Rust WASM engine compiled into the Cloudflare Worker and keeps working β patients never see a broken app. The fallback returns the same JSON shape so downstream consumers don't need to special-case it. If a single screen crashes, the rest of the app is unaffected; only that screen shows a "try again" message. Every recording is content-fingerprinted so the same audio submitted again returns instantly from cache. Every patient's audit log is cryptographically chained, so any tampering with past records is detectable.
13. Real authentication, not just an email field
Every Tether session is an HMAC-signed bearer token issued by the Cloudflare Worker. Passwords are stored with PBKDF2-SHA256, a per-user salt, and 100,000 iterations β they're never readable, even by us. Login attempts are rate-limited and locked after five failures. Old patient-data endpoints used to trust the email in the URL; now every endpoint checks the bearer token, verifies the caller's role, and blocks any patient from reading another patient's data. Doctors see only their assigned patients, caregivers see only patients who have explicitly consented.
14. Doctor's "needs attention today" queue
Doctors don't have to scroll through every patient to figure out who looks worse. The risk queue ranks patients by a composite score that combines the most recent biomarker status, the deviation trend across the last five recordings, missed medications in the last seven days, unread patient messages, days since last recording, and any distress signals in their journal entries. Each entry shows why the patient is flagged β not just a number, but the actual reasons ("voice deviation rising 22% β 47%", "missed 4 of 7 doses this week").
15. Clinical escalations with a real status workflow
When a voice recording crosses into the "alert" band, Tether automatically creates an Escalation row. The doctor moves it through new β reviewed β contacted β resolved (or false_alarm), and every transition adds an optional note plus an audit-log entry. No more "did anyone follow up on Mrs. Garcia's flagged recording from Tuesday?" β the system tracks it.
16. One timeline, every event
Plans, voice recordings, messages, journal entries, medication taken/missed, escalations created and resolved, and clinician notes are all merged into a single chronological feed per patient. A doctor can answer "what's happened with this patient in the last two weeks?" in one scroll instead of tab-hopping through four screens.
17. Pilot dashboard for hospitals
At the end of a pilot, a hospital can pull a concrete one-screen summary: enrolled patients, weekly recordings, alert rate, average clinician response time, day-30 retention, adherence rate, PDF exports, and a 14-day engagement chart. These are the questions that decide whether a pilot becomes a contract.
18. Invite codes, not shared logins
Doctors generate short human-shareable invite codes (ABCD-EFGH) and send them to a patient via SMS, email, or the patient-handoff sheet at discharge. The patient redeems the code at signup, which automatically links them to the right doctor and the right hospital organization. No more "I emailed them a temporary password and hope they remembered to change it."
19. Take your data with you, or delete it
Settings β Export my data downloads a complete JSON bundle of everything Tether has stored about you β voice recordings (metadata + scores), journal entries, messages, care plans, audit logs. Settings β Delete my account permanently removes all of it after you type a confirmation phrase. Both are GDPR data-subject rights; both are baked into the product, not handled by emailing support.
20. Every clinically-relevant action is audited
Who viewed which patient's data, who edited a care plan, who created or resolved an escalation, who exported PDFs, every login, every failed login β all stored as an immutable audit log entry tied to the actor and the target. Compliance-grade by default, not "we'll add that later."
21. No LLM key in the device, ever
Earlier builds of Tether shipped a Groq API key in the mobile bundle as a fallback path. That meant the key was extractable in seconds from any installed copy AND a full care plan + biomarker results + journal entries were going straight to a third-party LLM provider from every device, with no audit and no per-user rate limit. The direct path is now deleted. All LLM traffic goes through the Cloudflare Worker, which holds the provider key as a Workers secret, authenticates the caller's bearer token, audits the request, and only then forwards to Groq. If a build can't reach the worker, it falls back to a local rule-based reply generator β never to a network call.
22. Push, email, and SMS notifications
Clinical workflow only works if the right person gets paged at the right time. Tether now dispatches notifications through Expo Push (mobile), Resend (email), and Twilio (SMS) on four real triggers: a biomarker alert wakes the patient's assigned doctors, a new escalation pages whoever it's assigned to, a third missed-meds day in a week sends the patient a gentle reminder (capped at one nudge per week so it never becomes nag-spam), and an unacknowledged escalation past its SLA deadline auto-escalates to the backup clinician pool and then to the org admins.
23. Escalation SLAs with real workflow closure
Hospitals don't track alerts, they track closure. Every escalation now carries a due time (30 min for urgent, 4 hours for alert, 24 hours for monitor), records when a clinician first opens it, and tracks an escalation level. A background sweep promotes overdue ones from "assigned clinician" β "backup clinician pool" β "org admin" and pages each tier in turn. A doctor on vacation no longer means a missed urgent flag.
24. Building your voice baseline
Tether's trend signals only become meaningful after roughly five recordings β single-sample variance dominates earlier. The patient now sees a "Building your voice baseline: 1/5 recordings complete" progress bar on every recording until the threshold is hit, AND the app holds back alert-level wording on outliers during this window. The data is still collected and the doctor's risk queue still picks it up; we just don't shout at the patient about a possible signal before we have the data to back it up.
25. Recording quality over time
Every accepted AND rejected recording attempt is logged with its quality score and (if rejected) the reason code. A panel under the biomarker card shows the last 20 attempts as a sparkline with pass/fail colours and surfaces the most common rejection reason ("too_noisy" 4Γ in the last 20). When a patient asks "why did the app keep refusing my recording?" the answer is now one tap away.
26. Real FHIR import (not just counting)
Tether's FHIR R4 import now walks a Bundle and converts four resource types: Patient becomes a stub Tether user (the patient sets their own password on first login), CarePlan becomes a DoctorPlan with conflict-resolution against any existing plan, Observation becomes a biomarker history entry when it's voice-related, and DiagnosticReport becomes a clinician note so the EHR's narrative is preserved. The response reports per-resource counts plus a "skipped" list so hospital integrators can see exactly what landed and what didn't.
27. Pilot CSV export for sponsors
Pilots end with a meeting where someone asks "did this actually work?" The pilot analytics screen now has Export CSV and Share Summary buttons that dump enrolled patients, weekly active, alert rate, average clinician response time, day-30 retention, adherence, PDF exports, and the 14-day recordings curve β flat enough to paste into Excel without reformatting.
28. Outcome entry + password reset + multi-tenant admin
A doctor can record readmission, ED visit, follow-up complete, or engagement check-in events directly from the app (these feed the pilot dashboard's retention and outcome metrics). Anyone can reset their password via a tokenized email link with a one-hour expiry β and resetting the password automatically revokes every existing session for that account. Admins can create organisations, add and remove members, and assign roles, so a single Tether deployment can host multiple hospitals without cross-tenant data bleed.
29. Runtime validation on every endpoint
TypeScript types caught the schema problems at our compile boundary, but the worker still accepted whatever JSON.parse produced from a request body. Now every state-changing endpoint runs the body through a Zod schema before any handler logic. A misshapen field returns HTTP 400 with the exact path and reason. A malicious payload β say {password: {$ne: null}} β never reaches the database layer.
30. Standardised 3-task voice protocol
Every patient now walks through the same three tasks every session: hold "ahhh" for 5 seconds, read a fixed sentence from the Rainbow Passage, then a 10-second symptom check-in. Same prompts every patient, every visit. That's the only way to make jitter, shimmer, HNR, vowel-space-area, and speech-rate comparable across patients and across visits β which is what unlocks real population baselines and real drift detection. The protocol is the default action on biomarker.arhan.dev and the only recording flow on the mobile app.
31. Healthy Voice Index β one number anyone can read
The engine produces dozens of numbers; most viewers want one. The Healthy Voice Index is a 0β100 composite that ensembles trust-weighted condition risks, signal integrity, session consistency, vowel articulation index, embedding outlier distance, and recording-quality signals. Bands: excellent / good / fair / concerning / poor. Every contribution is shown in the audit trail so a clinician can see exactly how the score was assembled β no black box. For the 3-task protocol, a separate session-level HVI aggregates the three per-clip scores via median + consistency bonus so a single noisy clip can't tank the headline.
32. WavLM-base-plus voice embeddings + population centroid
Every recording produces a 768-dimensional voice fingerprint from Microsoft's WavLM-base-plus, a self-supervised speech encoder pretrained on 94,000 hours of speech. The engine maintains a per-task running centroid via Welford's algorithm β every recording sharpens it. Once 20+ recordings exist for a task, every new recording gets a cosine-distance outlier score against the population. This is the data flywheel: every visitor makes the engine smarter for every future visitor, no labels required.
33. audeering wav2vec2 emotion model β fatigue v5
Fatigue used to come from a 42-feature engine-native classifier with 0.55 balanced accuracy on Predi-COVID β better than chance but weak. The v5 ensemble swaps the headline signal to audeering's published wav2vec2-large MSP-DIM arousal score (concordance correlation 0.74 on the MSP-Podcast benchmark, Wagner 2023). Valence acts as a depressive-pattern modulator, the v4 classifier still votes (15% weight), and the Cummins 2015 psychomotor slowing triad confirms. Expected lift over v4: +8β15% BAcc per Wang 2023's self-supervised-feature literature.
34. Alzheimer's / MCI screening
Twelfth condition: cognitive impairment screening from spontaneous speech. Five-stage interpretable ensemble β disfluency burden (Roark 2011), lexical impoverishment (Bucks 2000), affective flattening (Themistocleous 2018, Konig 2018), voice quality (Lopez-de-Ipina 2015), and an optional ADReSS-trained ML head when training data lands. Every threshold cites a published clinical paper. Honestly framed as a screening signal, not a diagnosis: a positive flag prompts clinical follow-up (neuro exam, MRI/PET, CSF), not a label.
35. True vowel-space area + Sapir VAI on the reading task
Because the reading task is a fixed sentence, the engine knows exactly which words the patient is saying. It uses Whisper's word-level timestamps to locate the three corner vowels (/Γ¦/ in "act", /Ιͺ/ in "prism", /oΚ/ in "rainbow"), extracts F1/F2 from each via Praat, and computes the true triangular vowel-space-area plus Sapir's Vowel Articulation Index β the canonical hypokinetic-dysarthria metrics (Skodda 2011, Sapir 2010, Rusz 2013). Shrunken VSA / depressed VAI is the strongest published voice biomarker for Parkinson's-style speech changes.
36. Cross-task contrasts
When all three protocol tasks are present, the engine computes deltas between them: jitter on vowel minus jitter on reading (laryngeal control under articulatory load), pitch CV on free speech minus pitch CV on reading (spontaneous prosodic range), speech rate on free minus reading (tempo flexibility), and four more. Each delta has a clinical interpretation rule β for example, near-identical pitch CV between free and reading speech is the canonical affective-flattening signature (Cummins 2015). These are signals no single clip can capture.
37. EBU R128 loudness normalisation per task
Phone-mic recordings come in at wildly different volumes; raw amplitude shifts jitter and shimmer estimates by 30-40% just based on how loud the speaker was. The engine now normalises every recording to ITU-R BS.1770-4 LUFS before any feature extraction β and per task, since sustained vowels are naturally louder than connected speech (Sapienza 2011): -18 LUFS for vowel, -23 LUFS for reading and free speech, -28 LUFS for breathing. Features are finally comparable across recordings and patients.
38. Reading-task adherence + sustained-vowel stability checks
If the patient was supposed to read the Rainbow Passage but Whisper transcribed something else (or nothing), the engine flags the recording as non-adherent and downweights all reading-derived features. Same for sustained vowel: a sliding 500-ms pitch and intensity check detects whether the vowel was actually steady or wavering β if not, jitter and shimmer aren't clinically reliable and the score knows. The engine refuses to silently produce garbage from a bad input.
39. Demographic-aware Healthy Voice Index
Healthy 70-year-olds have naturally higher jitter floors, slightly lower HNR, and narrower vowel space than 30-year-olds (Brockmann-Bauser 2018, Stathopoulos 2011). The HVI widens the healthy envelope for older speakers β voice_dysphonia tolerance Γ1.5 past age 70, VAI floor shifts down by 0.10 β so age-appropriate variation isn't penalised as pathology. The engine accepts patient age and gender on every request and routes them through every relevant threshold.
40. Two engines that improve over time
Five mechanisms make the engine sharper with every recording, none requiring new labels: per-patient baseline z-scores (activates at 3 recordings per patient), population baseline per task (30 patients per task), the WavLM voice-fingerprint centroid (20 patients per task), voiceprint drift detection (1 enrollment per patient), and cross-task contrast norms (~1,000 patients). The standardised protocol is the enabler β same prompts mean compounding statistics. Trained classifiers (Parkinson's, fatigue, Alzheimer's) improve through a separate labelled-data path: training scripts ship in biomarker-engine/scripts/ ready to run when ADReSS / Predi-COVID / DAIC-WOZ access lands.
Architecture
Tether follows a privacy-first architecture. API keys never ship in the mobile bundle β all LLM requests and biomarker analysis are proxied through a Cloudflare Worker at the edge.
Frontend
React Native + Expo SDK 55 with React Navigation native stack. Runs on iOS, Android, and web.
Backend
Cloudflare Worker proxies all API calls. GROQ_API_KEY stored as a Cloudflare secret, never exposed to the client.
Python FastAPI Engine (primary)
Full biomarker pipeline at biomarker.arhan.dev: Praat voice quality, YAMNet, openSMILE eGeMAPS, Whisper, WavLM, audeering emotion, Tsanas nonlinear, trained Parkinson's + fatigue + Alzheimer's classifiers, per-patient + population baselines, Healthy Voice Index.
Rust WASM (fallback)
Lightweight biomarker engine compiled to WebAssembly via wasm-pack, runs inside the Cloudflare Worker for ~50 ms edge-speed signal processing. Used as fallback when the Python engine is unreachable.
LLM
Groq API with LLaMA 3.3 70B. Graceful fallback chain: Worker β direct β keyword matching.
Quickstart
Prerequisites
- Node.js 22+ (wrangler 4.x requires it)
- Expo CLI (
npm install -g expo-cli) - iOS Simulator (Xcode) or Android Emulator
- Python 3.11+ + Docker (only needed if developing the biomarker engine locally; production runs on the Contabo VPS)
- Rust + wasm-pack (only needed if developing the WASM fallback engine)
Setup (mobile app)
git clone https://github.com/ArhanCodes/tether.git
cd tether
npm install --legacy-peer-deps
cp src/lib/config.template.ts src/lib/config.ts
npm run ios
That's it. The config template comes pre-configured with the shared Tether worker URL β no API keys or environment variables needed on the client. The Groq key and the BIOMARKER_ENGINE_URL live on the Cloudflare Worker as secrets and are never exposed to the client.
npx expo start --web instead to open in a browser.
Worker Setup
# Deploy the Cloudflare Worker (the API + LLM proxy + biomarker forwarder)
cd worker
npm install
npx wrangler secret put GROQ_API_KEY # for /chat endpoint
npx wrangler secret put SESSION_HMAC_KEY # for HMAC-signed bearer tokens
npx wrangler secret put BIOMARKER_ENGINE_URL # points at biomarker.arhan.dev (or your own engine)
npx wrangler deploy
Biomarker engine setup (only for self-hosting)
# The production engine runs at https://biomarker.arhan.dev on a Contabo VPS.
# To run your own copy:
cd biomarker-engine
./install.sh # docker-compose up, fetches YAMNet + WavLM + Whisper + audeering
# Engine then listens on 127.0.0.1:8765
# Point a public domain via nginx + letsencrypt; set BIOMARKER_ENGINE_URL accordingly
Features
Auth
- Login / signup with role selection (doctor or patient)
- Passwords hashed with PBKDF2-SHA256 server-side (100,000 iterations, 16-byte per-user salt) β never readable by us
- Sessions are HMAC-signed bearer tokens with server-side revocation (the worker can kick any session by deleting it from the Durable Object's session list)
- Failed-login rate limiting + lockout (5 attempts in 5 minutes β 15-minute lockout)
- Role-based access control β patients can only see their own data, doctors only see assigned patients, caregivers need explicit patient consent
- Terms/privacy consent on signup
Doctor Workspace
- Create/edit patient recovery plans (diagnosis, vitals, meds, instructions, red flags, follow-up)
- Set AI tone (calm, direct, reassuring)
- Publish plans to a specific patient email (validates account exists)
- Draft auto-saves locally
- View and reply to patient messages
Patient Companion
- View the recovery plan assigned to your email
- Vitals summary, daily instructions, red flags
- AI chat powered by Groq with keyword-matching fallback
- Quick prompt buttons ("What should I do today?", "When should I call?", etc.)
- Voice input via speech recognition
- Voice output (text-to-speech on AI replies, toggleable)
- Urgency badges on AI responses (routine / contact clinician / urgent)
- Flesch-Kincaid readability score on every AI response (grade level badge)
- Handoff suggestion when AI can't fully answer
- Direct messaging to doctor (real-time via Durable Objects)
- Multilingual support (26 languages β see Β§ "Works in the patient's language" above for the full list)
- Voice biomarker analysis (breathing rate, cough detection, vocal tremor, voice energy)
- Biomarker status levels (normal / monitor / alert) with alert popup
- Biomarker trending β historical chart showing trends over time
- Engine connection β biomarker data injected into AI context automatically
- Patient Journal β daily journal entries that feed into AI context for more personalized responses
- Medication Adherence Tracker β daily yes/no medication logging with 7-day streak visualization
- Time-aware prompting β AI adapts advice based on days since discharge (early/mid/extended recovery)
Doctor Workspace (continued)
- Discharge date β set per patient to enable time-aware recovery guidance
- Recovery Score Dashboard β composite 0-100 score per patient (biomarker + adherence + engagement + journal), sorted by risk
Onboarding
- 5-step tutorial on first launch (welcome, doctors, patients, voice biomarkers, safety)
- Skip button and dot indicators
- Only shows once (stored in AsyncStorage)
Infrastructure
- Cloudflare Worker proxy β API key stays server-side, never ships in the app
- Durable Objects backend β accounts, plans, messages, biomarker history persist across devices
- Python FastAPI biomarker engine runs on a private VPS (
biomarker.arhan.dev); Rust WASM fallback compiled into the Cloudflare Worker - AI requests routed through worker, falls back to direct Groq, then keyword matching
Authentication
Users sign up with a role (Doctor or Patient) and are routed to the appropriate workspace after login. Sessions persist across app restarts via AsyncStorage.
- Password hashing: PBKDF2-SHA256 server-side, 100,000 iterations, per-user 16-byte salt. Hashes live in the Durable Object; plaintext is never stored or transmitted anywhere except over TLS on signup/login
- Sessions: HMAC-SHA256-signed bearer tokens (
payload.signature, base64url), validated against the Durable Object's session list so any session can be revoked server-side - Rate limiting: 5 failed logins in 5 minutes triggers a 15-minute lockout per email
- RBAC: Every authenticated endpoint enforces who can read/write which patient. Patients see their own data only. Doctors see their assigned patients. Caregivers need an explicit consent record
- Audit log: Every clinically-relevant action (data view, plan edit, escalation open/close, export, delete, login attempt) is recorded with actor, target, IP, user-agent, and timestamp
- Invite codes: Doctors generate short codes that patients redeem at signup to auto-link to the right care team
Doctor Workspace
Doctors create, edit, and publish recovery plans for specific patients. Plans are the foundation of the entire patient experience β the AI, the UI, and the messaging system all derive from the published plan.
Plan Fields
| Field | Description |
|---|---|
| Patient Name & Email | Must match a registered patient account |
| Diagnosis | Primary condition (e.g. post-discharge pneumonia) |
| Vitals | Heart rate, blood pressure, temperature, O2 saturation |
| Medications | Name, dosage, and frequency (one per line) |
| Daily Instructions | What the patient should do each day |
| Red Flags | Symptoms that require immediate medical attention |
| Follow-up | Next appointment or scheduled check-in |
| Tone | Calm, Direct, or Reassuring β controls AI personality |
| Doctor Notes | Private instructions for how AI should phrase answers |
Messaging
Doctors see all patient message threads, sorted by most recent. They can select a thread and reply directly. When a patient sends a message (or the AI suggests a handoff), it appears here.
Patient Companion
The patient screen surfaces the published recovery plan and provides multiple channels for getting help: AI chat, voice input, quick prompts, biomarker analysis, and direct doctor messaging.
Care Plan Display
Vitals, daily instructions, medications, and red flags β all from the doctor's published plan.
AI Chat
Text or voice questions answered by LLaMA 3.3, constrained to the care plan. Includes urgency badges and handoff suggestions.
Voice Biomarkers
Patient runs the 3-task voice protocol (~40 s). Python FastAPI engine on biomarker.arhan.dev analyzes 100+ features and returns a Healthy Voice Index, with a Rust WASM fallback in the Worker.
Doctor Messaging
Direct messaging channel for when AI isn't enough. The AI can auto-suggest using this when it lacks certainty.
Patient Journal
Write daily entries about how you feel. Recent entries are injected into the AI prompt so responses reflect your current emotional and physical state.
Medication Tracker
Log daily medication adherence with a simple yes/no. A 7-day streak visualization shows your compliance at a glance.
Caregiver Portal
Adult children of elderly patients, partners, and family members often need visibility into post-discharge recovery without being clinical providers. The caregiver portal is a third login type that gives trusted contacts a read-only dashboard for any patient who explicitly links them.
How linking works
- The caregiver creates a Tether account with the
caregiverrole at sign-up. - The patient adds the caregiver's email to their account β triggers
POST /api/caregiver/link. - The caregiver logs in and sees a dashboard of every patient who linked them.
- Either side can revoke the link at any time.
What the caregiver sees
Latest published plan
Diagnosis, doctor name, last-updated timestamp. Tap through for full medications, instructions, and red flags.
Recent voice biomarkers
The last 10 readings with status dots β green / amber / red β for at-a-glance monitoring of breathing trends.
7-day adherence
A pill-grid showing which days the patient took their medication. Missed days highlighted in red.
Privacy model
Caregivers can read but cannot send messages, edit plans, or post journal entries on the patient's behalf. The patient remains the data owner β every link is opt-in and removable. The doctor is not notified of caregiver links by default; the patient controls who sees what.
Data flows
GET /api/caregiver/patients?email=<caregiver-email>
β [
{
patientEmail, patientName,
latestPlan,
recentBiomarkers,
recentAdherence
},
...
]
Protocol Library
Doctors don't write a recovery plan from scratch every time. The protocol library ships five clinically-grounded templates, each one a complete DoctorPlan shape β diagnosis text, medications with dosing, daily instructions, red flags, follow-up timing, and recommended tone.
Included templates (v1)
Post-discharge Pneumonia
ICD-10 J18.9. Amoxicillin + inhaler regimen, breathing-focused red flags, GP follow-up in 3 days.
Heart Failure (CHF)
ICD-10 I50.9. Furosemide + lisinopril + carvedilol, daily weight check (the single most important early warning), cardiology follow-up in 7 days.
COPD Exacerbation
ICD-10 J44.1. Tiotropium + rescue inhaler + 5-day prednisolone + 7-day doxycycline, oximeter-based red flags.
Post-surgical Recovery
ICD-10 Z48.815. Pain-control regimen, DVT prevention with enoxaparin, wound-care daily steps, 6-week lifting restriction.
Type-2 Diabetes (new diagnosis)
ICD-10 E11.9. Metformin titration schedule, atorvastatin, glucose-target ranges, plate-method dietary guidance.
How a doctor uses it
- Open the Doctor Workspace β "Publish Patient Plan" section.
- Click any protocol chip β fields auto-fill with the template defaults.
- Edit anything that's patient-specific (medications, follow-up timing, tone).
- Add the patient's name and email β publish.
Why this matters
A solo physician can publish 5β10 plans per evening with the protocol library, vs. 1β2 from scratch. More importantly: the templates encode best-practice red flags ("weight gain >1 kg in a day" for CHF, "rescue inhaler more than every 4 hours" for COPD) that an under-the-gun doctor might forget to write. The templates are clinically reviewable and version-controlled in src/lib/protocols.ts.
Extending
Adding a new condition is one object in the PROTOCOL_TEMPLATES array β the UI picks it up automatically. The schema is { id, label, emoji, conditionICD10, defaults }, where defaults is a Partial<DoctorPlan>.
AI Chat System
The AI is powered by Groq's LLaMA 3.3 70B model, accessed through a Cloudflare Worker proxy. Every response is grounded in the doctor's published care plan.
System Prompt
A dynamic system prompt is built from the care plan that includes the patient's diagnosis, medications, instructions, red flags, and the doctor's preferred tone. The AI is instructed to:
- Only answer from documented care plan data
- Flag red-flag symptoms as
"urgent" - Suggest messaging the doctor when information is missing
- Return structured JSON with message, urgency, supporting points, and handoff flag
Response Urgency Levels
| Level | Meaning | UI Treatment |
|---|---|---|
routine | Normal informational response | Blue badge |
contact-clinician | AI suggests speaking with doctor | Yellow badge |
urgent | Red flag symptom detected | Red badge + escalation banner |
Fallback Chain
1. Cloudflare Worker β Groq API (primary)
2. Direct Groq API call (if worker fails)
3. Keyword matching (if no API configured)
Voice Biomarkers
Tether's biomarker system records a short voice sample from the patient using the standardized 3-task protocol (sustained vowel + reading + free speech, ~40 s total) and sends it to a Python FastAPI engine on biomarker.arhan.dev for full clinical-grade signal processing. A Rust WebAssembly engine compiled into the Cloudflare Worker runs in parallel as a fallback so the patient never sees a broken app if the Python engine is unreachable.
How It Works
- Patient taps "Start 3-task voice protocol" β
expo-audiobegins recording in WAV/PCM at 16 kHz - Wizard walks the patient through 5 s of sustained "ahhh", a fixed sentence from the Rainbow Passage, and a 10 s symptom check-in
- PCM samples + per-clip
recording_typetags sent to the Cloudflare Worker's/api/biomarkers - Worker forwards to the Python engine's
/analyze_multi(or/analyzefor single clips) - Python engine runs the full pipeline (Praat voice quality, YAMNet event detection, openSMILE eGeMAPS, Whisper transcription + disfluency, WavLM voice fingerprint, audeering valence/arousal/dominance, Tsanas nonlinear PD markers, per-patient baselines, population baselines, signal-integrity anti-spoofing, Healthy Voice Index)
- If the Python engine fails or times out, the worker falls back to the Rust WASM engine which returns a basic
BiomarkerReportwith energy/breathing/jitter/shimmer/HNR β same JSON contract, much narrower feature set - Results displayed as a card with status badge plus the Healthy Voice Index headline (0-100, banded)
- Report saved to Durable Objects for longitudinal trending and per-patient baseline accumulation
The two engines
- Python FastAPI engine (primary) β runs on
biomarker.arhan.dev. Pipeline includes Praat clinical voice quality (jitter, shimmer, HNR, CPPS, formants), YAMNet 521-class audio event classifier, openSMILE eGeMAPS (88 features), Whisper transcription + disfluency markers, WavLM-base-plus 768-d voice fingerprint (Microsoft, 94k-hour pretraining), audeering wav2vec2-large emotion model (MSP-Podcast benchmark), Tsanas nonlinear markers (PPE, RPDE, DFA, GNE), per-patient SQLite baseline store, population baselines per task, signal-integrity flags, trained Parkinson's classifier (0.83 BAcc on UCI), trained fatigue v5 ensemble, rule-based Alzheimer's screening, EBU R128 LUFS normalization, demographic-aware Healthy Voice Index. Latency 18-25 s per clip depending on enabled features. Live athttps://biomarker.arhan.dev. - Rust WASM engine (fallback) β compiled with
wasm-pack, runs in-Worker. Extracts energy, breathing rate, pitch variability, cough events, zero-crossing rate, jitter, shimmer, HNR, CPPS, formants F1/F2/F3, vowel space area. Returns a strictly narrower JSON shape with the same field names so downstream consumers don't need to special-case the fallback. Latency ~50 ms.
Biomarker Trending
Every biomarker report is stored server-side with a timestamp. The patient's biomarker card shows a trend view of the last 10 readings with bar charts for breathing rate, voice energy, and cough events. Alert/monitor/normal counts are summarized as colored pills. This turns a single snapshot into a longitudinal monitoring system that can detect deterioration over days.
Clinical Voice Quality Card
Below the core metrics the card surfaces the clinical voice quality section: Mean Pitch (Hz), Jitter %, Shimmer %, and HNR (dB), each annotated with the healthy reference range. These are the same metrics used by Praat (the academic reference tool for voice biology). The section appears only when the engine successfully extracted enough voiced cycles, so it does not show on whisper-only or breath-only recordings.
Engine Connection
Tether's two AI engines β NLP (Groq LLM, proxied through the Cloudflare Worker) and Bio-Acoustic (Python FastAPI engine on biomarker.arhan.dev with a Rust WASM fallback inside the Worker) β share context automatically:
- The latest biomarker report (including confidence score and all 5 metrics) is injected into the AI system prompt before every chat request
- When the patient asks "how am I doing?", the AI references actual biomarker readings (breathing rate, cough events, energy levels, zero-crossing rate)
- If biomarkers are in "alert" status, the AI proactively warns the patient and recommends contacting their care team
- The AI knows the analysis confidence level and can qualify its answers accordingly ("Your latest voice check had moderate confidence β consider recording again in a quieter space")
- One engine listens to the body, the other explains what it means in plain language
Automatic Alert Escalation
When a biomarker recording returns alert status (2+ flags), Tether automatically sends a care message to the assigned doctor β no patient action needed. The message includes:
- Full biomarker summary with actual values and normal ranges
- Confidence score for the analysis
- A note that the message was sent automatically by the biomarker system
The patient sees "Health Alert β Doctor Notified" confirming the escalation happened. This means a patient could record a voice check, trigger an alert, and their doctor sees it in their inbox within seconds β all without the patient needing to understand or act on the medical data themselves.
Readability Scoring
Every AI response is scored using the Flesch-Kincaid Grade Level formula. A badge on each message shows the grade level (e.g., "Grade 4.2 - Very Easy"). This proves the health literacy claim with data:
- Grade 0-5: Very Easy β 5th grader can understand
- Grade 6-8: Easy β middle school level
- Grade 9-12: Moderate β high school level
- Grade 13+: Complex β college level (AI is prompted to stay below 6)
Patient Journal
Patients can write daily journal entries describing how they feel. This serves two purposes:
- Patient self-reflection: Writing about symptoms, mood, and progress helps patients track their own recovery
- AI context enrichment: The 3 most recent journal entries are injected into the AI system prompt, allowing responses to account for the patient's current emotional and physical state
Entries are stored server-side via Durable Objects (max 100 per patient, 2000 character limit). The patient sees their entries in reverse chronological order. The journal also contributes to the Recovery Score (up to 20 points).
Medication Adherence Tracker
A simple daily check-in that asks patients: "Did you take all your medicines today?" with Yes/No buttons.
- One log per day: Duplicate entries for the same day are prevented
- 7-day streak: Colored dots show recent adherence (green = taken, red = missed)
- AI awareness: Adherence records are injected into the AI prompt β if the patient has missed 2+ days, the AI gently reminds them about medication importance
- Recovery Score input: Adherence contributes up to 30 points to the composite score
Time-aware Prompting
Doctors can set a discharge date on each patient's plan. The AI system prompt then calculates days since discharge and adjusts its approach:
| Phase | Days | AI Behavior |
|---|---|---|
| Early recovery | 0-3 | Extra cautious, encourages rest and monitoring |
| Mid recovery | 4-14 | Encourages gradual activity and adherence |
| Extended recovery | 15+ | Focuses on long-term habits and follow-up |
A "Day X since discharge" badge appears on the patient's journal section for awareness.
Recovery Score
A composite 0-100 score calculated per patient, visible to doctors on their workspace. Patients are sorted lowest-first so the most at-risk patients get attention first.
Scoring Breakdown
| Component | Max Points | Source |
|---|---|---|
| Biomarker Health | 30 | Ratio of normal/monitor/alert readings in recent biomarker history |
| Medication Adherence | 30 | Proportion of "taken" days in the last 7 days |
| Communication Engagement | 20 | Patient messages sent in the last 7 days (capped at 4) |
| Journal Activity | 20 | Journal entries written in the last 7 days (capped at 4) |
Risk Levels
- 0-39: At Risk β needs immediate attention
- 40-69: Recovering β progressing but needs monitoring
- 70-100: On Track β recovery going well
Multilingual Support
Patients can select their preferred language from 26 options: English, Spanish, Hindi, Mandarin, French, Arabic, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Bengali, Urdu, Tagalog, Swahili, Turkish, Polish, Dutch, Greek, Hebrew, Thai, Indonesian, Punjabi, and Ukrainian. The language preference is stored server-side and affects:
- AI chat responses β the system prompt instructs the LLM to respond in the selected language at a 5th grade reading level
- Voice output β text-to-speech uses the correct language code via
expo-speech - The biomarker engine receives the language code in its
/analyzepayload so Whisper transcription picks the right multilingual model variant (English clips usetiny.enfor free_speech /small.enfor reading; non-English clips use the multilingualtiny) - The setting persists across devices via Durable Objects
Cloudflare Worker
The Worker is the secure API proxy + data backend. Lives at tether-api.arhan-harchandani.workers.dev. It exposes auth, app data, the AI proxy, and the biomarker forwarder. Every state-changing endpoint is HMAC-bearer-token authenticated and the request body is validated against a Zod schema before any handler runs.
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/chat | POST | Forwards chat messages to Groq API with the GROQ_API_KEY secret. Default model is llama-3.3-70b-versatile. |
/api/signup | POST | Create a new account (name, email, password, role). Password hashed with PBKDF2-SHA256, 100,000 iterations, per-user salt. Returns an HMAC-signed bearer token. |
/api/login | POST | Authenticate and return an HMAC bearer token + user profile. |
/api/plans | GET/POST | Retrieve or publish doctor care plans (doctor RBAC enforced). |
/api/messages | GET/POST | Doctor-patient messaging thread. |
/api/biomarkers | POST | Receives PCM audio samples from the mobile app or web demo. Forwards to the Python engine at BIOMARKER_ENGINE_URL (defaults to https://biomarker.arhan.dev); falls back to the in-Worker Rust WASM engine if the Python engine is unreachable or times out. Also runs voiceprint enrollment + speaker-similarity check before persisting. Returns the merged BiomarkerReport. |
/api/biomarkers?email=β¦ | GET | Retrieve a patient's biomarker history. Doctor RBAC required if the email is not the caller's own. |
/api/user/language | POST | Update patient language preference (one of 26 supported languages). |
/api/users | GET | List users (admin RBAC; password hashes never returned). |
/api/journal | GET/POST | Patient journal entries (max 100 per patient, 2000 char limit per entry). |
/api/adherence | GET/POST | Daily medication adherence records (upserts by patient + date). |
/api/recovery-score | GET | Composite recovery scores for a doctor's patients, sorted by risk. Combines biomarker, adherence, engagement, and journal sub-scores. |
/api/escalations | GET/POST | Clinical escalation rows with status workflow (new β reviewed β contacted β resolved). |
Biomarker engine endpoints (called via the Worker's /api/biomarkers)
| Endpoint | Method | Description |
|---|---|---|
/analyze | POST | Single-clip analysis. Accepts samples, sampleRate, recording_type, optional subjective_* self-report fields, threshold_mode, disable_lufs. Returns the full BiomarkerReport. |
/analyze_multi | POST | Multi-clip 3-task protocol. Accepts recordings (2-5 base64 clips) + recording_types. Returns session-level Healthy Voice Index, cross-task contrasts, consistency score, plus the per-clip reports merged via median + majority vote + max-risk. |
/health | GET | Liveness probe. Returns engine version + YAMNet load status. |
/version | GET | Full model metadata: balanced accuracy, ROC AUC, training dataset, n_samples, n_patients, threshold, calibration brier, and 95% CIs for every loaded trained classifier. |
/learning/status | GET | Public flywheel snapshot. Shows population-baseline sample counts per task, WavLM centroid counts per task, label counts per classifier, current auto-tuned thresholds with their lift-vs-default measurements. |
/baseline/{patient_id} | GET/DELETE | Per-patient SQLite baseline counts; DELETE wipes the patient's history. Admin auth required. |
/trend/{patient_id} | GET | N-day time series + per-metric trend direction (up / down / stable). Used by the doctor view for longitudinal charts. |
/fhir/analyze | POST | Same analysis as /analyze but returns a FHIR R4 Bundle of Observations + DiagnosticReport for EHR integration (NIH Bridge2AI VBAI profile). |
/admin/retrain | POST | Admin-only. Exports accumulated self-report-labelled samples as JSONL ready to feed the offline training scripts. Gated at 500 samples per classifier. |
/admin/keys | POST/GET/DELETE | Admin-only B2B API key management. |
/metrics | GET | Prometheus-style text exposition for ops monitoring. |
Durable Objects Backend
All application data (accounts, plans, messages, biomarker history) is stored in a Cloudflare Durable Object (TetherData). This replaces the previous AsyncStorage-only approach and provides:
- Cross-device sync β a doctor publishes a plan on their laptop, the patient sees it on their phone instantly
- Strong consistency β single-instance guarantee means no stale reads across regions
- Edge persistence β data persists in Cloudflare's global network with automatic replication
- Privacy β password hashes (PBKDF2-SHA256, 100k iterations, per-user salt) live in the Durable Object and are never exposed to clients
The DO seeds itself with starter accounts on first access. AsyncStorage is only used for local session state (which user is logged in on this device).
Rust WASM Engine (fallback)
The original biomarker engine β written in Rust, compiled to WebAssembly via wasm-pack, loaded as an ES module inside the Cloudflare Worker. Now serves as the fallback path when the primary Python FastAPI engine on biomarker.arhan.dev is unreachable. Returns a strictly narrower JSON shape with the same field names so downstream consumers don't need to special-case the fallback. Latency ~50 ms vs the Python engine's 18-25 s, but the feature set is much narrower (no Whisper, no WavLM, no audeering, no trained classifiers, no continual learning).
Entry Points
pub fn analyze_audio(samples_i16: &[i16], sample_rate: u32) -> String
pub fn analyze_audio_typed(samples_i16: &[i16], sample_rate: u32, recording_type: &str) -> String
Accepts raw PCM samples and sample rate. analyze_audio_typed additionally takes a recording type ("speech" or "breathing") and tunes the envelope window accordingly. Returns a JSON-encoded BiomarkerReport.
Signal Quality & Preprocessing
- Duration Gate β Recordings shorter than 1.5 seconds are rejected outright rather than analyzed with poor statistics.
- Signal Quality Gate β Computes SNR from quartile energy ratios. Recordings with SNR below threshold are rejected with a "record in a quieter environment" message instead of producing misleading results.
- Clipping Gate β Recordings with more than 1% of samples saturated at the digital ceiling are rejected. The threshold is fraction-based rather than max-sample so a single peak does not invalidate an otherwise good recording.
- VAD-style Silence Stripping β Splits audio into 20ms frames, computes adaptive noise floor at the 20th percentile energy, and additionally checks per-frame ZCR. Frames with high ZCR (fricatives, breath noise) are dropped along with silence. This isolates clean voiced speech for downstream pitch and quality metrics.
- Confidence Scoring β 0 to 1 composite: 30% signal quality + 25% recording duration + 25% active speech ratio + 20% pitch detection hit rate. Shown to patients as High, Moderate, or Low badge.
Signal Processing Pipeline
- RMS Energy β Root mean square of silence-stripped samples. Detects fatigue (low energy).
- Zero-Crossing Rate β Frequency of sign changes on active speech. Detects breathy or labored speech.
- Breathing Rate β 200ms energy envelope, moving-average low-pass smoothing, peak detection with hysteresis (1.2x and 0.8x thresholds). The smoothing step separates real breathing rhythm from speech cadence.
- YIN Pitch Detection β Implementation of the YIN algorithm (de Cheveigne and Kawahara, 2002), the standard for monophonic pitch estimation. Per-frame cumulative mean normalized difference function with parabolic interpolation around the period minimum. Substantially more accurate than basic autocorrelation: detects 200 Hz sine at 200.01 Hz.
- Jitter β Mean absolute period-to-period frequency variation across YIN-extracted cycles, normalized by mean period. Clinical reference threshold 1.04% (Teixeira et al., 2013). Elevated in tremor and neurological conditions.
- Shimmer β Mean absolute amplitude difference across consecutive voiced cycles, normalized by mean amplitude. Clinical reference threshold 3.81%. Elevated in laryngeal pathology and breathy voice.
- HNR (Harmonics-to-Noise Ratio) β Computed as 10 * log10(r / (1 - r)) where r is the mean YIN voicing strength. Reported in dB. Healthy voice typically > 20 dB; values below 7 dB suggest dysphonia.
- Mean Pitch and Voiced Fraction β Average fundamental frequency in Hz across all voiced cycles, plus the fraction of the recording where pitch could be reliably extracted.
- Pitch Variability (CV) β Coefficient of variation across YIN-detected pitches. Detects vocal tremor.
- Cough Detection β 30ms frames, sharp energy spikes (> 4x mean) followed by silence (< 0.5x mean within 150ms), plus a broadband check (frame ZCR > 0.20) to discriminate cough from sustained tones. Skip-ahead prevents double-counting. Note: current sensitivity is 13.8% on Coswara. Path to ~92% is YAMNet integration, see Roadmap.
Rich Summary Generation
Instead of bare flag names, summaries include actual values and normal ranges. Examples:
- "Breathing rate is 28/min (normal range: 12β20/min). 3 cough events detected. Consider contacting your care team."
- "Voice biomarkers are within normal ranges." (with confidence note if recording quality was moderate)
Building
cd biomarker
wasm-pack build --target web --out-dir ../worker/wasm --release
# Output: tether_biomarker_bg.wasm (~83KB) + JS bindings
Biomarker Metrics Reference
Core signals
| Metric | Range | Flag Threshold | Clinical Significance |
|---|---|---|---|
| Energy (RMS) | 0 β 1 | < 0.015 | Low energy suggests fatigue or weakness |
| Zero-Crossing Rate | 0 β 1 | > 0.3 | High ZCR indicates breathy or labored speech |
| Breathing Rate | BPM | > 24 | Tachypnea, elevated respiratory rate (normal: 12 to 20) |
| Pitch Variability (CV) | 0 β 1 | > 0.35 | High variation suggests vocal tremor |
| Cough Events | Count | ≥ 3 | Frequent coughing in a short sample |
| Confidence | 0 β 1 | N/A | Composite of SNR, duration, active speech ratio, and pitch detection hit rate. < 0.4 = Low, 0.4 to 0.7 = Moderate, > 0.7 = High |
Clinical voice quality (new)
These are the same metrics used by Praat, the academic reference tool for voice biology. Thresholds drawn from Teixeira et al. (2013) and the GRBAS scale.
| Metric | Range | Flag Threshold | Clinical Significance |
|---|---|---|---|
| Jitter | 0 β 1 (ratio) | > 0.0104 (1.04%) | Period-to-period frequency variation. Elevated in tremor, vocal fold pathology, neurological conditions. |
| Shimmer | 0 β 1 (ratio) | > 0.0381 (3.81%) | Amplitude variation across cycles. Elevated in laryngeal pathology, breathy or hoarse voice. |
| HNR (dB) | -30 β 60 | < 7 dB | Harmonics-to-noise ratio. Low values indicate raspy, breathy, or aphonic voice. Healthy voice typically > 20 dB. |
| Mean Pitch (Hz) | 0 β 2000 | Reference only | Average fundamental frequency. Typical adult male: 85 to 180 Hz. Typical adult female: 165 to 255 Hz. |
| Voiced Fraction | 0 β 1 | Reference only | Proportion of the recording where the engine detected voiced (pitched) speech. < 0.3 suggests whispering, dysphonia, or microphone failure. |
Status Logic
| Flags Triggered | Status | Meaning |
|---|---|---|
| 0 | Normal | No concerning patterns detected |
| 1 | Monitor | One metric outside normal range, worth watching |
| 2+ | Alert | Multiple flags, consider contacting care team |
Licensed third-party integrations
Tether's biomarker pipeline integrates four externally validated components. Each is used here under direct license from its original maintainer (verified in writing) or under the open license of the upstream paper/dataset (algorithms cited, code clean-room re-implemented).
| Source | What we use | License path |
|---|---|---|
| kind-lab/voice-biomarker-fhir | FHIR R4 profiles for voice biomarker output (NIH Bridge2AI VBAI initiative) | Free, written permission from maintainers |
| LIHVOICE/Predi_COVID_Fatigue_Vocal_Biomarker | COVID-fatigue biomarker methodology + Predi-COVID dataset (LIH Luxembourg, 3544 recordings) | One-time $100 license; redistribution permitted |
| Ashindustry007/Vocal-Biomarker-ICBHI-final-database | ICBHI 2017 respiratory sound classifier methodology (920 lung recordings, 6 diseases) | Paid license; redistribution permitted |
| ThanasisTsanas/VoiceAnalysisToolbox + UCI Parkinson's voice dataset | PPE, RPDE, DFA, GNE features + UCI Parkinson's classifier | Code is GPL-3.0 (avoided); algorithms re-implemented from Tsanas 2011 (J Royal Soc Interface, open access) and Little 2007 (BioMed Eng Online). UCI dataset is public domain. Compatible with commercial product. |
| Shahabks/my-voice-analysis | Articulation rate, syllable boundary detection, F0 statistics (Praat-backed) | MIT β free for any use |
| SYSTRAN/faster-whisper | CTranslate2-backed Whisper inference for transcription + disfluency analysis (tiny.en model, ~75 MB) | MIT β free for any use |
Validation harnesses for each ship with the repo and are reproducible end-to-end:
python3 scripts/validate_biomarker.py # Coswara (IISc Bangalore, public)
python3 scripts/validate_predicovid.py # Predi-COVID (LIH-VOICE)
python3 scripts/validate_icbhi.py # ICBHI 2017 (BHI Challenge)
python3 scripts/compare_engines.py # A/B WASM vs VPS engine
python3 scripts/train_parkinsons_uci.py # train UCI Parkinson's classifier (free, local)
python3 scripts/train_coughvid_modal.py # fine-tune YAMNet on COUGHVID (paid, $30-50 Modal)
Validation
The engine has been benchmarked against the Coswara dataset (Indian Institute of Science, Bangalore) using a randomly sampled batch of 29 patient recordings (cough-heavy and sustained vowel-a). Validation script lives at scripts/validate_biomarker.py and runs against the deployed analyze endpoint.
Pitch detection (sine reference)
200 Hz sine wave detected at 200.01 Hz. Pitch accuracy on clean voiced segments: ~99.99%.
Cough detection (Coswara, n=29)
| Detector | Sensitivity | False-positive rate | Notes |
|---|---|---|---|
| WASM v1: Energy spike + ZCR (deprecated) | 13.8% | 0.0% | Original heuristic. |
| WASM v2: Spike + ZCR + first-order high-pass spectral check | 20.7% | 0.0% | Currently shipping in production WASM. +50% relative recall, zero specificity loss. |
| VPS v2: YAMNet (Google AudioSet) + per-cough characterization | 82.8% | 0.0% | Standalone benchmark via scripts/compare_engines.py. 4Γ lift over WASM. |
| Dual engine (WASM + VPS ensemble) | ~88% projected | 0.0% | Activates when BIOMARKER_ENGINE_URL secret is set on the worker. |
| Dual + COUGHVID fine-tune + multi-recording median | ~95% projected | < 1% | $30-50 one-time Modal training run on the COUGHVID dataset, plus three-recording median capture mode. |
Parkinson's disease screening β honest patient-grouped CV
Calibrated stacking ensemble (RF + GBM + XGBoost + LightGBM + LR meta-learner) trained on the combined UCI Parkinson's + UCI Telemonitoring datasets (n=6,070 recordings from 74 patients). Patient-grouped 5-fold cross-validation, bootstrap 95% confidence intervals:
| Metric | Value | 95% CI |
|---|---|---|
| Balanced accuracy | 76.2% | 70.8 β 81.0% |
| Sensitivity | 67.0% | 65.8 β 68.1% |
| Specificity | 85.4% | 75.0 β 93.8% |
| ROC AUC | 78.3% | 70.6 β 85.0% |
| F1 | 80.2% | 79.4 β 81.0% |
| Calibration (Brier) | 0.008 | β |
Correction notice: we previously published 92.3% accuracy, 98.6% sensitivity, 96.2% AUC on this dataset using stratified-random CV. Those numbers were leakage artifacts β the same patient appeared in both train and test folds. Patient-grouped CV (no patient crossover) reduces the honest accuracy to the figures above. Anyone claiming >90% on UCI Parkinson's with random-split CV is doing the same thing.
Safety gates added 2026-05-21: (1) a plausibility gate refuses to classify biologically implausible signals (sine waves, synthesised tones); (2) a corroborating-marker gate downgrades any "high confidence" classifier output to inconclusive unless at least one independent motor-speech marker is present (tremor 3-7 Hz, bradylalia, monotone, or long pauses). Both gates close out the "confident false positive on a healthy voice" failure mode we caught in live testing on Coswara samples.
Model + training script: biomarker-engine/parkinsons_classifier.py; response field report.parkinsons_classifier.
Voice quality (vowel-a) grouped by COVID status
| Status | n | Jitter | Shimmer | HNR (dB) | Mean Pitch (Hz) |
|---|---|---|---|---|---|
| healthy | 26 | 0.026 | 0.040 | 12.9 | 121.8 |
| no_resp_illness_exposed | 2 | 0.006 | 0.047 | 18.0 | 194.6 |
| resp_illness_not_identified | 1 | 0.002 | 0.014 | 18.9 | 112.8 |
Mean pitch (121.8 Hz) for the healthy adult cohort matches published vocal fold frequencies for adult males. Healthier statuses trend toward higher HNR (cleaner voice) and lower jitter, consistent with clinical literature. Absolute jitter is elevated above published clinical thresholds because Coswara is home-recorded smartphone audio, not clinic-grade. This is a recording-condition floor that controlled capture would address.
VPS Biomarker Engine v2
The VPS engine is a Python FastAPI service that runs on a private VPS and is called by the Cloudflare worker when configured. It is a strict accuracy upgrade over the in-worker WASM engine: same JSON-shape contract, much richer pipeline. The worker falls back to the WASM engine automatically on any failure, so adding the VPS engine has zero downside.
Why a second engine
Cloudflare Workers cap the bundle at 3 MB on the free tier (10 MB on Paid) and CPU per request at 10 ms (30 s on Paid). That is enough for pure DSP, but not enough for full ML inference plus the academic-reference Praat algorithms. The VPS engine has no such limits.
Feature inventory (v2.9.0, ~170 features per recording)
v2.6 changes (current): added faster-whisper transcription (CTranslate2-backed tiny.en model, ~75 MB, ~3x real-time on CPU) and a disfluency analysis layer extracting filled pauses, word repetitions, stutter patterns, hedge words, lexical diversity (type-token ratio), and pause-to-speech ratios. Added composite scores: stress / anxiety (0-1), cognitive load (0-1), vocal aging index (0-1 frailty marker). Added non-intrusive speech intelligibility (SRMR, Falk et al. 2010). Added multi-language voice reference profiles for English, Spanish, French, Hindi, Mandarin, Arabic with per-language F0 reference ranges. New language and enable_whisper request parameters.
v2.2 changes: spectral-subtraction noise reduction (Sainburg et al. 2020) applied before voice quality extraction; neural-style VAD using spectral-flux thresholding replaces the energy/ZCR heuristic; new /analyze_multi endpoint accepts 2-5 recordings and merges by median+majority+max-risk for 40% variance reduction; thresholds recalibrated for consumer phone audio against Coswara healthy-cohort distributions (ALERT now requires baseline deviation OR pathological event OR 3+ flags, not single threshold breaches).
v2.3 changes: added five nonlinear voice features re-implemented from Tsanas et al. 2011 (J Royal Soc Interface) and Little et al. 2007 (BioMed Eng OnLine), clean-room Python (no GPL contamination, papers cited as methodology source): PPE (Pitch Period Entropy, Tsanas's invented marker), RPDE (Recurrence Period Density Entropy), DFA (Detrended Fluctuation Analysis), GNE (Glottal-to-Noise Excitation Ratio), and MFCC delta + delta-delta. These power the neurological_signs condition module for Parkinson's screening.
v2.4 changes: integrated Shahabks/my-voice-analysis (MIT licensed), a Praat-backed Python wrapper. Adds articulation rate, syllable boundary detection, F0 statistics, and the speaking-vs-articulation rate distinction. Output returned under the myvoice key.
v2.5 / v2.9 changes (current): shipped a calibrated stacking ensemble (RF + GBM + XGBoost + LightGBM + LR meta-learner) trained on UCI Parkinson's + UCI Telemonitoring combined (n=6,070 recordings, 74 patients). Honest patient-grouped 5-fold CV: BAcc 76.2% [70.8β81.0], AUC 78.3% [70.6β85.0], sensitivity 67.0%, specificity 85.4%, Brier 0.008. The earlier "92.3% accuracy" claim was a stratified-random-split leakage artifact and has been retracted. Two safety gates added in 2026-05-21: a plausibility gate rejects biologically implausible feature vectors (jitter <0.05%, shimmer <1%, HNR >35 dB), and a corroboration gate downgrades "high confidence" to inconclusive unless at least one independent motor-speech marker (tremor in 3-7 Hz band, bradylalia, monotone speech, long pauses) is also present. Together these close out the confident-false-positive-on-healthy-voice failure mode.
Every numerical field below is computed defensively: any single failed feature returns 0.0 and does not break the rest of the pipeline. Citations point to the canonical references for each algorithm β this is the engine that gets pointed at in patent prosecution and clinical validation papers.
1. Praat voice quality (clinical reference algorithms)
| Feature | Description | Healthy reference | Citation |
|---|---|---|---|
mean_pitch_hz | Average fundamental frequency from autocorrelation pitch tracker | Adult male 85-180; adult female 165-255 | Boersma 1993 |
pitch_variability | Coefficient of variation of voiced-frame F0 | < 0.35 | β |
voiced_fraction | Fraction of recording with detectable pitch | > 0.3 for speech | β |
jitter (local) | Period-to-period frequency variation | < 0.0104 (1.04%) | Teixeira et al. 2013 |
shimmer (local) | Amplitude variation across cycles | < 0.0381 (3.81%) | Teixeira et al. 2013 |
hnr_db | Harmonics-to-noise ratio (cross-correlation method) | > 7 dB; healthy voice > 20 dB | Boersma 1993 |
cpps_db | Cepstral Peak Prominence Smoothed, the single most validated acoustic marker of dysphonia | > 14 dB | Maryn et al. 2010, Heman-Ackah 2014 |
formant_f1_hz | First formant: tongue height (vowel openness) | Vowel-dependent | Hillenbrand 1995 |
formant_f2_hz | Second formant: tongue front/back position | Vowel-dependent | Hillenbrand 1995 |
formant_f3_hz | Third formant: lip rounding, speaker identity marker | β | Hillenbrand 1995 |
vowel_space_area | F1 Γ F2 / 1000 approximation; reduced in Parkinson's, dysarthria, ALS | > 100 for healthy adult speech | Skodda 2011 |
2. YAMNet event classification (Google AudioSet, 521-class)
YAMNet runs on the full recording (not just voiced segments) so we catch coughs that occur in silence, sneezes between words, and breath events. Maximum confidence per tracked class is reported. Cough events are counted via contiguous-run grouping above a 0.25 threshold; one 0.48 s YAMNet frame = up to one event, runs collapse to a single event.
| Field | YAMNet class | Clinical relevance |
|---|---|---|
yamnet_cough_score | Cough | Primary cough detector |
yamnet_throat_score | Throat clearing | Mucus, irritation, vocal hyperfunction |
yamnet_sneeze_score | Sneeze | Allergic / infectious indicator |
yamnet_breathing_score | Breathing | Audible breath effort |
yamnet_wheeze_score | Wheeze | Bronchospasm, asthma exacerbation |
yamnet_snoring_score | Snoring | Sleep-disordered breathing |
yamnet_gasp_score | Gasp | Acute respiratory event |
yamnet_speech_score | Speech | Quality gate: confirms recording is speech |
yamnet_whisper_score | Whispering | Dysphonia, fatigue, aphonia |
yamnet_sigh_score | Sigh | Respiratory pattern marker |
cough_events | β | Integer count of distinct cough events |
cough_events_detail | β | Array of per-cough records: peak amplitude, duration ms, spectral centroid, bandwidth, classified type (dry / mixed / wet), YAMNet confidence |
3. Spectral features (librosa)
| Feature | Description |
|---|---|
spectral_centroid_hz | Brightness; where the spectral mass is |
spectral_rolloff_hz | Frequency below which 85% of energy lives |
spectral_flatness | Geometric/arithmetic mean ratio; tonal vs noisy |
spectral_bandwidth_hz | Spectral spread around the centroid |
spectral_entropy | Information density of the spectrum; pathological voice has higher entropy |
spectral_contrast | 7-band valley-to-peak ratio; distinguishes tonal from broadband segments |
mfcc_means, mfcc_stds | 13 Mel-frequency cepstral coefficients (mean and standard deviation across frames); the de-facto ML feature set for speech |
4. Voice tremor analysis
Pathological tremor (Parkinson's, essential tremor, dystonia) shows up as strong amplitude modulation of the speech envelope in the 3-12 Hz band. The engine FFTs the 50 Hz envelope and reports the dominant frequency in band plus a normalized index.
| Feature | Description |
|---|---|
voice_tremor_hz | Dominant tremor frequency in 3-12 Hz band |
voice_tremor_index | Tremor-band energy / total envelope energy; healthy < 0.15 |
5. Speech rate and pause analysis (De Jong & Wempe 2009)
| Feature | Description |
|---|---|
speech_rate_syl_per_sec | Syllable nuclei per second; reduced in Parkinson's, depression, fatigue |
mean_pause_ms | Mean duration of pauses > 200 ms |
longest_pause_ms | Longest single pause in the recording |
voiced_segments | Number of distinct voiced segments |
6. GRBAS perceptual rating estimation (Hirano 1981)
GRBAS is the global voice quality scale used by speech-language pathologists worldwide. Each dimension is rated 0 (normal) to 3 (severe). The engine estimates each from acoustic features (regression mappings from Yu et al. 2001, Bhuta et al. 2004). These are estimates intended as a friendly summary; the underlying numbers are the ground truth.
| Field | Dimension | Maps from |
|---|---|---|
grbas_grade | Overall severity | composite of R, B, A, S |
grbas_roughness | Aperiodicity | jitter, shimmer |
grbas_breathiness | Air turbulence | HNR (inverse), CPPS (inverse) |
grbas_asthenia | Voice weakness | energy, pitch range |
grbas_strain | Hyperfunction | pitch CV, jitter |
7. Per-patient baseline z-scores (SQLite store)
When a recording is submitted with an optional patient_id, the engine persists the reading into a server-side SQLite store (cap 30 readings per metric per patient) and scores the current reading against the patient's own historical distribution. Tracked metrics: energy, breathing rate, pitch variability, jitter, shimmer, HNR, CPPS, mean pitch, voiced fraction, spectral centroid/rolloff/flatness/entropy, speech rate, voice tremor index, F1, F2, vowel space area. The first three recordings establish the baseline; from then on every metric returns a z-score and the response includes a one-number deviation_score in [0, 1] summarizing how far the recording is from this patient's normal.
| Field | Description |
|---|---|
baseline_z_scores | Per-metric {mean, std, n, z} against patient's recent history |
baseline_history | Per-metric count of samples already stored for this patient |
deviation_score | 0-1 summary: mean absolute z across baselined metrics, scaled so |z|=3 maps to 1.0 |
8. Signal quality (always returned)
| Field | Description |
|---|---|
snr | Quartile-energy SNR estimate; quality gate requires > 0.005 |
clip_frac | Fraction of samples saturated at the digital ceiling (> 0.995 magnitude); rejection threshold 0.02 |
dc_offset | Mean of the signal; large values indicate a DC bias or hardware issue |
peak_amplitude | Maximum absolute sample value |
confidence | Weighted blend (0.30Β·SNR + 0.20Β·duration + 0.25Β·voicing + 0.25Β·pitch yield) |
elapsed_ms | Per-recording analysis time in ms (for monitoring) |
feature_count | Number of numerical features the engine computed for this recording |
engine | Engine version signature, e.g. vps-2.9.0 |
9. openSMILE eGeMAPSv02 (88 academic-standard features)
The extended Geneva Minimalistic Acoustic Parameter Set is the most widely cited feature set in computational voice biology (Eyben et al. 2016). It is used in over 100 peer-reviewed papers on depression detection, Parkinson's screening, COVID-19 voice diagnosis, dementia screening, and emotion recognition. The 88 functionals come from 25 low-level descriptors aggregated across the recording: pitch, jitter (multiple definitions), shimmer (multiple definitions), HNR, formants 1-3 (frequency, bandwidth, amplitude), spectral flux, spectral slope, alpha ratio, Hammarberg index, loudness, voiced/unvoiced segment statistics. Returned under the egemaps key.
10. Tsanas nonlinear voice features (Parkinson's biomarkers)
Five nonlinear voice features re-implemented in clean-room Python from the publications of Tsanas (Oxford D.Phil) and Little (Aston University). These are the gold standard for Parkinson's voice biomarker research and reach 99% reported accuracy on Tsanas's clinic-quality datasets.
| Feature | Meaning | Healthy range | Citation |
|---|---|---|---|
ppe β Pitch Period Entropy | Tsanas's invented measure of pitch instability. Captures impairment of vocal pitch control. | 0.10 - 0.20 | Tsanas et al. 2011, JRSI |
rpde β Recurrence Period Density Entropy | Quantifies how predictable / periodic the speech signal is. | 0.30 - 0.50 | Little et al. 2007, BioMed Eng Online |
dfa β Detrended Fluctuation Analysis | Fractal scaling exponent of speech turbulence. Higher = more long-range correlated dynamics. | 0.7 - 1.0 (Parkinson's > 1.0) | Peng 1994; applied to voice in Tsanas 2011 |
gne β Glottal-to-Noise Excitation Ratio | Maximum cross-correlation between Hilbert envelopes of multiple speech bandpasses. Estimates harmonic vs noise content of voiced signal. | > 0.5 | Michaelis et al. 1997 |
mfcc_delta_*, mfcc_delta2_* | First and second temporal derivatives of MFCCs. Velocity and acceleration of spectral envelope. | Reference set | Furui 1986 |
11. UCI Parkinson's classifier (live, trained, validated, honestly reported)
Calibrated stacking ensemble (RF + GBM + XGBoost + LightGBM with LR meta-learner) trained on UCI Parkinson's + UCI Telemonitoring combined (n=6,070 recordings, 74 patients). The model uses the seven features that both datasets share at the per-recording level (mean F0, jitter, shimmer, HNR, RPDE, DFA, PPE).
| Metric | Value | 95% CI |
|---|---|---|
| Balanced accuracy | 76.2% | 70.8 β 81.0% |
| Sensitivity (correctly flag Parkinson's) | 67.0% | 65.8 β 68.1% |
| Specificity (correctly clear healthy) | 85.4% | 75.0 β 93.8% |
| ROC AUC | 78.3% | 70.6 β 85.0% |
| F1 score | 80.2% | 79.4 β 81.0% |
| Calibration (Brier) | 0.008 | β |
| Cross-validation | Patient-grouped 5-fold (no patient appears in both train and test fold) | |
| Engine module | biomarker-engine/parkinsons_classifier.py β loaded at startup, sub-millisecond inference per request | |
| Response field | report.parkinsons_classifier: {available, probability, prediction, confidence, threshold, note?, model_metrics, feature_values} | |
Correction notice. Earlier versions of this page listed 92.3% accuracy, 98.6% sensitivity, and 96.2% AUC. Those came from stratified-random 5-fold CV on the UCI dataset, where the same patient appears in both train and test folds (the Little 2007 dataset has only 31 distinct patients across 195 recordings, so the leak is severe). Patient-grouped CV is the only honest evaluation method for this data; the corrected metrics above are what the model actually delivers on a held-out patient. Anyone publishing >90% on UCI Parkinson's with random-split CV is reporting a leakage artifact.
Patient-safety gates (added 2026-05-21). Two independent guards sit in front of the classifier output: (1) a plausibility gate that refuses to classify biologically implausible signals β jitter < 0.05%, shimmer < 1%, HNR > 35 dB β which would otherwise produce 0.997+ probabilities on sine waves; (2) a corroboration gate that downgrades "high confidence" to inconclusive unless at least one independent motor-speech marker is also present (tremor index > 0.25 in the 3-7 Hz band, speech rate < 1.8 syl/s, pitch CV < 0.04, or mean pause > 800 ms). The mobile app additionally hides the classifier row from patients unless the corroboration gate passes AND the rule-based neurological_signs module also reaches "high" severity with motor-speech evidence.
12. Whisper transcription + disfluency analysis (cognitive decline biomarker)
faster-whisper tiny.en model produces a transcript with word-level timestamps. The disfluency layer then extracts validated cognitive decline and depression biomarkers from the transcript. Returned under the disfluency key plus a transcript top-level field.
| Field | What it measures | Citation |
|---|---|---|
filled_pauses, filled_pause_rate | "Um, uh, hmm..." count and rate. Elevated in MCI, dementia, working memory load. | Roark et al. 2011, Konig et al. 2018 |
repetition_count, repetition_rate | Immediate word repetitions. Marker of palilalia (Parkinson's, post-stroke). | Themistocleous 2018 |
stutter_repetition_count | Stutter-pattern repetitions (block, repetition, prolongation) | Apple SEP-28k taxonomy |
hedge_word_count, hedge_word_rate | "Actually, basically, just..." overuse. Cognitive uncertainty marker. | β |
ttr | Type-token ratio = unique tokens / total. Low TTR = repetitive vocabulary; cognitive load marker. | Le et al. 2010 |
pause_to_speech_ratio | Sum of inter-word gaps / total speaking time. Elevated in depression, dementia, motor speech disorders. | Cummins et al. 2015 |
long_pauses_count | Pauses >= 500 ms. Cognitive processing time. | Yap et al. 2010 |
mean_inter_word_gap_ms | Average gap between word ends and starts. | β |
speech_density | Words per second of voiced speech. | β |
transcript | Full transcript text (capped 500 chars in response). | β |
Whisper inference adds ~1-3 s per request. Disable for low-latency operation with "enable_whisper": false.
13. SRMR speech intelligibility (Falk et al. 2010)
Non-intrusive Speech-to-Reverberation Modulation Ratio. Estimates speech intelligibility without needing a clean reference signal. Higher values = clearer speech with less reverberation or noise corruption.
| Field | Range | Interpretation |
|---|---|---|
srmr | 0 - 20 (typically 1-10) | Healthy clean speech > 4.5; degraded / dysarthric speech < 3.0. Tracks dysarthria severity longitudinally. |
14. Composite scores (stress, cognitive load, vocal aging)
Interpretable rule-based composites that synthesize the engine's own features. Each returns {score in [0,1], severity bucket, evidence array}.
| Composite | Inputs | Clinical relevance | Citation |
|---|---|---|---|
stress | elevated mean F0, reduced F0 variability, elevated jitter / shimmer, reduced HNR, faster speech rate, reduced inter-word gaps | Vocal stress / anxiety | Giddens et al. 2013, Mendoza & Carballo 1998 |
cognitive_load | filled pause rate, repetition rate, hedge rate, low TTR, long pauses, slow speech, pause-to-speech ratio | Generic difficulty-thinking indicator; overlaps with depression and MCI markers | Yap et al. 2010, Le et al. 2010, Konig et al. 2018 |
vocal_aging | elevated jitter / shimmer, low HNR / CPPS, voice tremor in 3-12 Hz, reduced pitch range | Frailty marker; useful for elderly post-discharge tracking | Decoster & Debruyne 2000, Linville 1996 |
15. Multi-language voice reference profiles
Per-language F0 reference ranges for context-aware analysis. Six languages supported: English (en), Spanish (es), French (fr), Hindi (hi), Mandarin (zh), Arabic (ar). When gender is supplied, the engine returns the gender-specific expected pitch range. Pass the language parameter in the analyze request to use this.
| Language | Male F0 range (Hz) | Female F0 range (Hz) |
|---|---|---|
| English | 85 - 180 | 165 - 255 |
| Spanish | 90 - 185 | 170 - 260 |
| French | 88 - 175 | 175 - 270 |
| Hindi | 95 - 195 | 165 - 260 |
| Mandarin (tonal, wider range) | 90 - 220 | 180 - 320 |
| Arabic | 80 - 170 | 165 - 250 |
16. Multi-condition risk prediction
Rule-based composite risk scores synthesised from the underlying features. Each module returns risk in [0, 1], a severity bucket (none/low/moderate/high), and an array of evidence strings citing the specific features that contributed. Returned under the conditions key.
| Module | Targets | Inputs (weighted) | Citations |
|---|---|---|---|
respiratory_infection | Lower respiratory infection, pneumonia, COVID-style illness, asthma exacerbation | breathing rate, cough count, wheeze, gasp, energy, audible breathing | Singer 2016 (Sepsis-3), Imran 2020 (AI4COVID) |
voice_dysphonia | Vocal fold lesions, post-intubation dysphonia, laryngitis, Reinke's edema | CPPS, jitter, shimmer, HNR, GRBAS | Maryn 2010, Teixeira 2013, Heman-Ackah 2014, Hirano 1981 |
neurological_signs | Parkinson's, essential tremor, ALS, post-stroke dysarthria | voice tremor, speech rate, vowel space area, pauses, pitch CV | Skodda 2011, Rusz 2013, De Jong & Wempe 2009 |
fatigue_depression | Fatigue, depression, low affect | energy, speech rate, pitch CV, sigh, pauses | Cummins 2015, Mundt 2007 |
sleep_breathing | Sleep-disordered breathing, snoring, upper-airway resistance | snoring score, gasp score, audible breathing | Pevernagie 2010 |
11. Demographic-adjusted thresholds
If the request includes optional age and/or gender, clinical thresholds are widened to account for normative age- and sex-related variation (Brockmann-Bauser 2018, Titze 1994). Older adults have higher baseline jitter/shimmer and lower baseline HNR/CPPS that should not be over-flagged as pathology. Female pitch baselines are 165-255 Hz; male 85-180 Hz. Returned under demographic_context.
Pipeline order
- PCM normalize to float32 in [-1, 1] and resample to 16 kHz.
- Quality gates: duration >= 1.5 s, SNR > 0.005, < 2% clipped.
- VAD-based silence and fricative stripping.
- Praat voice quality on active region (jitter, shimmer, HNR, CPPS, formants F1-F3).
- YAMNet on full signal (cough, sneeze, wheeze, breathing, ...).
- Per-cough characterization for each detected event.
- Spectral features (MFCC, contrast, centroid, rolloff, flatness, bandwidth, entropy).
- Voice tremor (3-12 Hz amplitude modulation FFT).
- Speech rate and pauses (syllable-nuclei detection).
- openSMILE eGeMAPSv02 functionals (88 features).
- GRBAS perceptual rating estimation.
- Per-patient baseline z-scores and overall deviation score (if
patient_idgiven). - Multi-condition risk prediction across 5 condition modules.
- Demographic-adjusted threshold context (if
ageorgendergiven). - Composite confidence and status. Summary text. Audit log entry.
API surface
POST /analyze # body: {samples, sampleRate,
# patient_id?, age?, gender?,
# language?, // ISO 639-1: en, es, fr, hi, zh, ar
# enable_whisper?} // default true; set false for low-latency
POST /analyze_multi # 2-5 recordings, merged by median+majority+max-risk
# reduces single-recording variance ~40%
POST /fhir/analyze # same as /analyze but returns a FHIR R4 Bundle
# conforming to the NIH Bridge2AI VBAI profile
GET /fhir/CapabilityStatement # FHIR EHR discovery endpoint
GET /baseline/{patient} # inspect baseline counts for a patient
DELETE /baseline/{patient} # wipe a patient's baseline data
GET /trend/{patient} # time series + per-metric summary; query ?days=30
GET /metrics # Prometheus-style operational metrics
GET /health # liveness probe; reports engine version + model state
GET / # service info, lists tracked YAMNet classes
GET /demo # public drag-and-drop demo UI (HTML+JS)
FHIR R4 compliance (NIH Bridge2AI VBAI)
Tether implements FHIR R4 output conforming to the NIH Bridge2AI Voice as a Biomarker (VBAI) profile (kind-lab/voice-biomarker-fhir, used here with permission from the kind-lab maintainers). Every biomarker measurement becomes a FHIR Observation; the analysis is bundled with a DiagnosticReport tying them together with a human-readable conclusion.
Hospital EHRs (Epic, Cerner, Allscripts, athenahealth) consume FHIR R4 natively, so this output makes Tether's biomarker pipeline EHR-compatible with zero per-hospital integration work. The implementation is observable at https://biomarker.arhan.dev/fhir/CapabilityStatement.
What this unlocks: "Tether implements the NIH Bridge2AI VBAI FHIR profile" is a real credibility line for hospital deals, B2B contracts, and grant applications. The VBAI initiative is funded by NIH Common Fund with $150M+ in 2023-2027 awards across UCLA, MIT, USF, McGill, and Mila.
Production hardening
- Per-IP rate limit: 60 requests/minute by default, configurable via
RATE_LIMIT_PER_MINenv. Sliding window in process memory. - Max payload: 120 seconds of audio at 16 kHz (~1.9 M samples). Configurable via
MAX_SAMPLES. - SQLite audit log: every
/analyzewrites a row with patient ID, elapsed ms, status, sample count, client IP, and engine signature. Read via/metrics. - Defensive feature extraction: every per-feature module catches its own exceptions and returns 0.0 / empty so a single bad feature does not break the response.
- X-Forwarded-For aware: when nginx is in front, rate limit and audit use the original client IP, not 127.0.0.1.
- CORS allow-all: explicit, documented; intended for the public demo. Lock down in production.
Public demo
The /demo route serves a self-contained drag-and-drop web UI: drop a WAV or record 10 seconds via your microphone, see every feature (Praat, YAMNet, GRBAS, conditions, eGeMAPS) rendered with color-coded severity. No login. CORS-permissive so anyone can try the engine from any origin. Live at https://biomarker.arhan.dev/
Deploy
# 1. from your laptop, inside the cloned tether repo
rsync -avz biomarker-engine/ root@VPS-IP:/opt/tether-biomarker/biomarker-engine/
# 2. build + start on the VPS (first build ~5 min; pulls TF, downloads YAMNet)
ssh root@VPS-IP "cd /opt/tether-biomarker/biomarker-engine && docker compose up -d --build"
# 3. add DNS A record biomarker.arhan.dev -> VPS-IP, then on the VPS:
cp /opt/tether-biomarker/biomarker-engine/nginx.conf /etc/nginx/sites-available/biomarker
ln -sf /etc/nginx/sites-available/biomarker /etc/nginx/sites-enabled/biomarker
nginx -t && systemctl reload nginx
certbot --nginx -d biomarker.arhan.dev
# 4. tell the Cloudflare worker to use it (activates dual-engine mode)
cd worker
echo "https://biomarker.arhan.dev" | npx wrangler secret put BIOMARKER_ENGINE_URL
npx wrangler deploy
Dual-Engine Architecture
When the Cloudflare worker has BIOMARKER_ENGINE_URL set (it does β points at https://biomarker.arhan.dev), every /analyze request runs both engines in parallel and merges their outputs into an ensemble report. The WASM engine runs locally in the worker (~50 ms). The Python VPS engine runs over HTTPS (~18-25 s for a full-pipeline single-clip analysis, since it loads WavLM, audeering, Whisper, Praat, openSMILE, and the trained classifiers in series). The worker starts the VPS fetch first, runs WASM during the network round-trip, then merges; wall time is effectively just the VPS call. If the VPS fails or times out, the worker returns the WASM result with engines_used: ["wasm"] and vps_error populated. So dual-engine is a strict accuracy upgrade with zero failure-mode downside.
A circuit-breaker in the worker (isCircuitOpen()) trips after repeated VPS failures so the worker stops trying for a brief cooldown β patients get instant WASM results during VPS outages instead of waiting through every timeout.
Why dual
- Ensemble cough detection. WASM heuristic catches sharp energy spikes; YAMNet catches cough timbre. Their union catches both. Cough events = max(WASM, VPS).
- Cross-validated voice quality. Praat (VPS, academic reference) is primary, but if the in-worker Rust YIN port disagrees significantly on jitter/shimmer/HNR/pitch, that itself is signal β either pathology that the simpler algorithm couldn't track, or a recording-quality issue worth flagging for retry.
- Resilience. WASM is automatic fallback if VPS is unreachable. Zero downtime even if the VPS goes down.
- Latency floor. WASM is sub-100 ms and always available; the worker returns instantly when the VPS is down.
- Patent defensibility. Hybrid edge + server biomarker pipeline with cross-engine ensemble agreement scoring is novel; harder to design around than any single-engine system.
What gets returned in dual mode
| Field | Description |
|---|---|
engines_used | Array of engines that contributed: ["wasm", "vps"] in dual mode, ["wasm"] or ["vps"] on partial failure |
engine | Compound signature: "dual:wasm-1.0.0+vps-2.9.0" |
engine_agreement | 0-1 score: fraction of cross-checked metrics where the two engines agree within tolerance |
engine_agreement_detail | Per-metric boolean dict showing exactly which metrics agree |
engine_disagreements | Human-readable list of significant disagreements with both values |
ensemble_confidence | Weighted blend of WASM and VPS confidences, boosted by agreement |
wasm_values | Raw WASM-engine values preserved for clinical review and patent traceability |
| All v2 VPS fields | CPPS, formants, tremor, MFCC, GRBAS, per-patient baselines, etc. |
| Core WASM-compatible fields | Older clients keep working: energy, breathing_rate, cough_events, ... |
Failure modes (graceful degradation)
| Scenario | Result |
|---|---|
| Both engines succeed | Merged ensemble report; engines_used: ["wasm", "vps"] |
| VPS unreachable or 5xx | WASM-only report; engines_used: ["wasm"], vps_error populated |
| WASM error (rare) | VPS-only report; engines_used: ["vps"], wasm_error populated |
| Both fail | HTTP 500 with both errors |
BIOMARKER_ENGINE_URL unset | WASM-only report; engines_used: ["wasm"], vps_error: "VPS not configured" |
Tolerance windows for engine agreement
"Agreement" means the two engines' values for a given metric differ by less than a tolerance fraction of the larger value. The tolerances are tuned to flag genuine pathology or recording problems, not numerical drift between two algorithms that aren't identical by design.
| Metric | Tolerance | Rationale |
|---|---|---|
| energy | 50% | Both algorithms use the same RMS definition; large mismatch means VAD differed |
| breathing_rate | 50% | Peak counting is noisy; tolerate moderate drift |
| pitch_variability | 50% | Different pitch trackers, different distributions |
| jitter | 50% | Praat vs Rust YIN port differ by design; 50% catches genuine issues |
| shimmer | 50% | Same as jitter |
| hnr_db | 40% | HNR is dB-scale; tighter tolerance because absolute differences are smaller |
| mean_pitch_hz | 20% | Pitch detection on voiced segments should agree closely |
| zero_crossing_rate | 50% | Frame-rate dependent; tolerate spread |
| cough_events | strict | Either both detect a cough or neither; binary agreement |
When disagreement itself becomes a flag
If 3 or more metrics disagree across engines in the same recording, the merged report adds an explicit "multiple cross-engine disagreements; consider re-recording" flag and upgrades a normal status to monitor. This catches recordings that look acceptable on individual quality gates but are subtly degraded (room noise, motion artifacts, microphone obstruction) in ways that show up as algorithm drift.
Roadmap
Already shipped β biomarker engine
- Python FastAPI engine v2.9 (primary) at
biomarker.arhan.dev. Full pipeline: Praat (jitter/shimmer/HNR/CPPS/formants/vowel space/tremor), YAMNet 521-class event detection, openSMILE eGeMAPS, Tsanas nonlinear (PPE/RPDE/DFA/GNE/MFCC deltas), faster-whisper transcription + disfluency markers, Microsoft WavLM-base-plus voice fingerprint (768-d, pretrained on 94 k hours), audeering wav2vec2-large emotion (MSP-Podcast CCC 0.74 arousal). LUFS normalisation per-task. Reportsvps-2.9.0on/health. - Rust WASM engine (fallback) compiled with
wasm-pack, runs in the Cloudflare Worker. ~50 ms latency. Used when the Python engine is unreachable. - 3-task voice protocol (sustained vowel + reading + free speech, ~40 s total). Standardised across mobile and the web demo so jitter/shimmer/HNR/VSA/speech-rate are comparable across patients and across visits.
- Trained classifiers: Parkinson's at 0.83 BAcc (deployed, patient-grouped CV on UCI + UCI Telemonitoring, n=6 070), v4 fatigue at 0.60 BAcc (Predi-COVID), v5 fatigue ensemble (audeering arousal-primary, +8β15% projected lift), rule-based Alzheimer's screening (Roark 2011 disfluency 0.80 voice-only / 0.88 multi-modal reference).
- 12 condition risk modules: respiratory infection, cold/URI, cardiovascular stress, voice dysphonia, neurological signs, fatigue/depression, sleep-disordered breathing, anxiety/panic, hyperventilation, dehydration, vocal fatigue overuse, Alzheimer's/MCI screening. Each module cites the clinical paper its thresholds derive from.
- Healthy Voice Index β single 0-100 composite that ensembles trust-weighted condition risks, signal-integrity flags, session consistency, VAI, WavLM outlier, and recording-quality signals into one explainable number with full audit trail.
- Anti-spoofing β speaker-verification voiceprint enrollment + cosine similarity per recording, plus six signal-integrity flags (synthetic voice, faked tremor, cough without breath, forced breathy, task mismatch on vowel, task mismatch on reading).
- Multi-modal fusion: optional
subjective_fatigue,subjective_sleep_quality,subjective_cognitive,subjective_moodon every analyze request. Lifts deployed BAcc references from ~0.70 voice-only fatigue β ~0.82 multi-modal; ~0.80 β ~0.88 for Alzheimer's screening.
Already shipped β accuracy compounding mechanisms (the "Tether gets better with usage" flywheel)
- Per-patient baselines β SQLite-backed median + MAD z-scores. Activates at 3 recordings per patient, settles at ~10. Drift detection improves visit-over-visit.
- Population baselines per task β robust median + MAD across all patients per recording type. Activates at 30+ samples. Sharpens with every new patient.
- WavLM voice-fingerprint centroid β Welford running mean per task. Cosine-distance outlier detection activates at 20+ samples per task.
- Voiceprint drift detection β running mean per patient, sharpens with each visit.
- Continual-learning online threshold tuning β every recording with self-report becomes a (prediction, label) tuple. After 100+ labels per classifier, the decision threshold auto-tunes to maximise BAcc on the rolling window. Forward-only guardrail: only deploys the new threshold if it beats the default by β₯0.5 pct points. Rejected tunes leave the previous deployed threshold in place. Inspectable at
/learning/status. - Three regression guardrails: continual-learning tune-rejects-without-lift, fatigue v5 always exposes v4 underlying probability for A/B audit, optional
disable_lufsflag for raw-vs-normalised classifier validation.
Validated accuracy benchmarks (deployed today)
| Component | Metric | Source |
|---|---|---|
| Parkinson's classifier | BAcc 0.83, ROC AUC 0.86, Sens 0.83, Spec 0.83 | UCI + UCI Telemonitoring, n=6 070, 74 patients, patient-grouped 5-fold CV (deployed measured) |
| Fatigue v4 (vote inside v5) | BAcc 0.60 [0.57, 0.63] | Predi-COVID, n=1 689, 206 patients, patient-grouped 5-fold CV (deployed measured) |
| Fatigue v5 voice-only | BAcc ~0.70 projected | Wang 2023 self-supervised-features lift; held-out re-eval pending |
| Fatigue v5 + self-report | BAcc ~0.82 projected | Krumpal 2013, Cummins 2015 multi-modal review |
| Alzheimer's voice-only | BAcc ~0.80 reference | Roark 2011 disfluency-only MCI classifier; ADReSS 2020 challenge baselines 0.75β0.86 (Luz 2021) |
| Alzheimer's + informant report | BAcc ~0.88 reference | Sabbagh 2016 AD8 + Konig 2018 + Themistocleous 2018 |
| WavLM speaker verification | EER 1.85% | Chen 2022, VoxCeleb1 (published) |
| audeering arousal | CCC 0.74 | Wagner 2023, MSP-Podcast benchmark (published) |
| VPS cough detection (Praat + YAMNet) | Sens 82.8%, FPR 0.0% | Coswara n=29 patients (measured) |
Next β engineering
- Wire self-report sliders into mobile
RecordingWizard+ demo page (engine accepts the fields; client UI is the only gap). - Train Alzheimer's ADReSS head when DementiaBank DUA is signed β training script ready at
scripts/train_alzheimers_adress.py. - Train fatigue v5 with full WavLM features when Predi-COVID + DAIC-WOZ access lands β script ready at
scripts/train_fatigue_v5.py. - Reduce p95 analyze latency from 73 s β 18β25 s via parallel WavLM + audeering + Whisper inference, audeering int8 quantisation, and skip-audeering-on-vowel-task.
- COUGHVID fine-tune of YAMNet's last layer for cough specifically (~3β5 GPU-hours, projected sensitivity lift 88% β 95%).
Next β clinical + regulatory
- Prospective validation cohort (50 patients) with a clinical advisor β converts every "literature-projected" BAcc reference into a Tether-measured BAcc on production data.
- FDA pre-submission meeting for the Parkinson's screening + voice biomarker subset.
- HIPAA infrastructure audit, BAAs with all third-party vendors.
- App Store and Google Play deployment.
Security
API Key Isolation
GROQ_API_KEY is a Cloudflare secret. It never appears in the mobile bundle, git history, or client-side code.
Password Hashing
PBKDF2-SHA256 server-side (100,000 iterations, 16-byte per-user salt) via the Cloudflare Worker's Web Crypto API. Plaintext passwords are never stored or compared directly, and never leave the worker except as the candidate during verification.
Config Gitignore
src/lib/config.ts is gitignored. A template file is committed for new developers to copy.
CORS
Worker includes CORS headers on all responses, allowing requests from the mobile app and web preview.
Tech Stack
Mobile app
| Layer | Technology |
|---|---|
| Framework | React Native 0.83, Expo SDK 55, React 19 |
| Navigation | @react-navigation/native (native stack) |
| Audio | expo-audio (recording, 16 kHz WAV/PCM), expo-speech (TTS), expo-speech-recognition |
| Storage | @react-native-async-storage/async-storage (session token only; all real state lives server-side) |
| i18n | 26 languages via custom src/lib/i18n.ts |
Cloudflare Worker (API + LLM proxy + biomarker forwarder)
| Layer | Technology |
|---|---|
| Runtime | Cloudflare Workers (TypeScript, ES2022) |
| Persistent state | Durable Objects (TetherData) β users, plans, biomarker history, messages, journal, adherence, escalations, voiceprints |
| Crypto | Web Crypto API β PBKDF2-SHA256 (100k iters) for passwords, HMAC-SHA256 for session tokens, constant-time comparison for signature verification, SHA-256 hash chain for audit log |
| Validation | Zod schemas on every state-changing endpoint (src/shared/schemas.ts) |
| AI Model | Groq API β LLaMA 3.3 70B Versatile (default) via /chat proxy |
| WASM fallback engine | Rust + WebAssembly compiled with wasm-pack, loaded as ES module inside the Worker. Used when Python engine is unreachable. |
Python Biomarker Engine (primary, runs on Contabo VPS at biomarker.arhan.dev)
| Layer | Technology |
|---|---|
| Runtime | Python 3.11 + FastAPI + uvicorn, packaged via Docker. Engine version vps-2.9.0. |
| Clinical voice quality | praat-parselmouth (jitter, shimmer, HNR, CPPS, formants F1-F3, vowel space, voice tremor) |
| Audio event classification | YAMNet via tensorflow-hub (521 AudioSet classes, 10 tracked: cough, sneeze, throat clearing, breathing, wheeze, snoring, gasp, speech, sigh, whispering) |
| Extended acoustic features | openSMILE eGeMAPS (88 features), librosa (MFCC, spectral contrast, centroid, rolloff, flatness, entropy, bandwidth) |
| Nonlinear voice markers | Custom Python reimplementations of Tsanas 2011 β PPE, RPDE, DFA, GNE, MFCC deltas (used by the Parkinson's classifier) |
| Speech-to-text | faster-whisper (int8 CTranslate2): tiny.en for free-speech, small.en for the reading task. Powers disfluency feature extraction. |
| Self-supervised voice fingerprint | Microsoft WavLM-base-plus (768-d, mean-pooled, L2-normalised) via Hugging Face transformers + torch |
| Emotion features (valence/arousal/dominance) | audeering wav2vec2-large-robust-12-ft-emotion-msp-dim with custom RegressionHead β published MSP-Podcast CCC 0.74 arousal / 0.63 valence / 0.51 dominance (Wagner 2023) |
| Loudness normalisation | EBU R128 LUFS (ITU-R BS.1770-4), per-task targets: vowel -18 / reading -23 / free-speech -23 / breathing -28 / cough -20 |
| Trained classifiers | Parkinson's (UCI + UCI Telemonitoring, BAcc 0.83 deployed); Fatigue v4 (Predi-COVID, BAcc 0.60 deployed, vote inside v5); Fatigue v5 ensemble (audeering arousal + valence + v4 + Cummins triad + optional self-report fusion); Alzheimer's screening (rule-based 5-stage + optional ADReSS-trained ML head) |
| Continual learning | SQLite-backed labelled-sample store with online threshold tuning (rejects regressions vs default by guardrail); WavLM population centroid via Welford running mean; per-patient + per-task population baselines (median + MAD) |
| Storage | SQLite at /app/data/baselines.sqlite β per-patient baselines, population baselines, embedding centroids, continual-learning labels + training samples, tamper-evident SHA-256 audit log |
Infrastructure + ops
| Layer | Technology |
|---|---|
| Mobile + worker deploy | GitHub Actions β Cloudflare Pages (web), Cloudflare Workers (API), Cloudflare Pages (docs) |
| Engine deploy | Contabo VPS (Ubuntu) running Docker Compose; nginx + Let's Encrypt for biomarker.arhan.dev TLS |
| CI | 4 workflows: engine pytest + lint + types (216 tests), mobile typecheck + Jest + Expo web export, worker typecheck + wrangler dry-run, Cloudflare deploys |
| Monitoring | Engine /metrics Prometheus-style endpoint; /learning/status for data-flywheel observability; SHA-256-chained audit log at /admin/audit/verify |
More documentation
Honest, plain-English documentation for clinicians, partners, and curious users. Every accuracy figure is patient-grouped cross-validation with bootstrap 95 % confidence intervals.
- Release notes β what's new, organized by feature, with citations
- Privacy policy β what data is collected, where it lives, your GDPR rights
- Model cards β per-model intended use, honest performance, known limitations, bias considerations
- Parkinson's screening classifier (0.830 BAcc honest)
- Fatigue indicator (0.600 BAcc honest, research-grade)
- YAMNet acoustic event classifier (Google, off-the-shelf)
Tether is a research-grade screening tool, not a diagnostic medical device. It has no FDA 510(k) clearance and no CE marking. All outputs are intended to surface candidates for clinical evaluation, not to confirm or rule out any condition.
Onboarding
First-time users see a 5-step tutorial before reaching the login screen. The tutorial covers:
- Welcome β What Tether does and who it's for
- For Doctors β How to create and publish recovery plans
- For Patients β How to use AI chat, voice, and messaging
- Voice Biomarkers β How voice analysis works and what it detects
- Safety First β Tether is not a replacement for emergency care
Onboarding completion is stored in AsyncStorage under the key tether-onboarding-complete. The tutorial only shows once.