πŸ† WINNER · NYAS Junior Academy 2026

Tether

When someone is sent home from the hospital, the hardest part starts. Will they take their medications correctly? Are they getting worse? Should they call the doctor or wait? Tether is a phone app that sits with the patient at home and answers those questions in their own language. Doctors write a recovery plan once; the app turns it into daily guidance, listens for early warning signs in the patient's voice, and tells family or care teams when something needs attention.

Everything runs at the edge on Cloudflare with a dedicated Python FastAPI biomarker engine on a separate VPS for the heavier ML pipeline. The voice engine combines Praat acoustic analysis, YAMNet event classification, and trained ML classifiers; an on-device Rust→WebAssembly fallback keeps the app working if the main engine is unreachable. The AI chat is grounded only in the doctor's published plan, so it never invents medical advice. Patients can use it by voice if they can't read or type.

React Native + Expo Cloudflare Workers Python FastAPI Engine WavLM + audeering + Praat + Whisper 26 Languages 3-Task Voice Protocol
Try the Demo How It Works GitHub
AI

Plan-Grounded AI Chat

Patients ask questions in plain language. The AI only answers from the doctor's published plan β€” never guesses.

V

Voice Biomarkers

Python FastAPI engine runs the 3-task voice protocol β€” Praat clinical voice quality, YAMNet event detection, WavLM voice fingerprint, audeering emotion, Whisper transcription, plus 12 condition risk modules including Parkinson's, fatigue, and Alzheimer's screening. Rust WASM fallback in the Cloudflare Worker.

E

Engine Connection

Biomarker results feed into AI context β€” so "How is my breathing?" gets a real, data-backed answer.

R

Red Flag Escalation

When the AI detects a red flag symptom, it marks the response urgent and suggests contacting the care team.

P

Protocol Library

One-click templates for pneumonia, heart failure, COPD, post-surgical recovery, and type-2 diabetes β€” each with medications, daily steps, and red flags pre-filled. Doctors load a template, edit, publish.

C

Caregiver Portal

Family members and trusted contacts get a read-only dashboard of the patient's plan, latest biomarkers, and 7-day medication adherence. Patients add caregivers by email β€” full opt-in consent.

In Plain English

The problem

Roughly one in five patients sent home from a hospital ends up back in the emergency room within 30 days. The main reasons are not surprising: people forget medication doses, miss the early signs that things are going wrong, or do not know whether a symptom is normal recovery or a real warning. Doctors give a printed discharge summary, but it sits in a drawer. Family caregivers want to help but rarely have visibility.

What Tether does

Tether is three things in one app:

  • A pocket version of the discharge plan. The doctor writes the plan once. The patient sees daily medications, daily activities, red-flag symptoms to watch for, and follow-up dates. Everything is in plain language and can be read out loud in their language.
  • An AI assistant that only knows the doctor's plan. The patient asks "Can I take Tylenol?" or "Should I worry about this chest pain?" and the assistant answers using only what is in the plan. It never makes up medical advice. If the question matches a red-flag symptom the doctor listed, the answer is marked urgent.
  • A voice check that listens for trouble. The patient records 10 seconds of their voice. A signal-processing engine running on Cloudflare's edge servers measures breathing rate, cough patterns, voice fatigue, and clinical voice-quality markers (jitter, shimmer, HNR). Numbers track over time, so a small change today against the patient's own baseline can flag a problem the patient does not notice.

Who is in the loop

  • Patients get the daily guidance and assistant chat.
  • Doctors see a recovery score dashboard sorted by risk, plus the patient's biomarker trends and adherence history.
  • Family caregivers get a read-only dashboard if the patient invites them by email. They can see the plan, the last few biomarker readings, and which medication doses were taken.

Why it works without violating privacy

Voice recordings never leave the device except as raw PCM audio sent to the analysis endpoint, and even there they are not stored after analysis. Only the numerical biomarker results are saved. The chat AI runs through a Cloudflare Worker that never sees the patient's account in raw form. Passwords are hashed server-side with PBKDF2-SHA256 (100k iterations, per-user salt) and never persist as plaintext anywhere in the system. Caregiver access is opt-in and revocable by the patient.

What it does not do

  • It is not a replacement for a doctor. The assistant cannot prescribe, diagnose, or give advice outside the published plan.
  • It is not a medical device. Current biomarker accuracy is good enough to spot trends and prompt human review, not to make standalone clinical decisions.
  • It is not a HIPAA-certified product yet. The technical foundation is correct (encryption, no PHI in logs) but the compliance audit and BAAs are part of the funded roadmap.

How It Works

Tether keeps patients and doctors connected after a hospital discharge. Here is the simple version:

1. The doctor creates a recovery plan

Before the patient leaves the hospital, their doctor opens Tether and fills in a personalized care plan: diagnosis, medications, daily instructions, warning signs to watch for, and a follow-up date. The doctor also picks a communication tone (calm, direct, or reassuring) so the app speaks the way the patient is most comfortable with.

2. The patient gets a personal AI companion

When the patient logs in, they see their plan and can ask questions in plain language β€” by typing or speaking. The AI only answers using information from the doctor's plan, never guessing or making things up. Every response includes a readability score so caregivers can verify the language is easy enough to understand.

3. Voice biomarkers track recovery

The patient runs a quick standardised 3-task voice protocol (~40 seconds): hold "ahhh" for 5 seconds, read a fixed sentence from the Rainbow Passage, then a 10-second symptom check-in. Tether's Python FastAPI engine at biomarker.arhan.dev analyses the audio for breathing rate, cough patterns, vocal energy, voice tremor, articulation precision, and signs of twelve different conditions: respiratory infection, common cold, cardiovascular stress, voice pathology, neurological signs (including Parkinson's-style patterns), fatigue, sleep-disordered breathing, anxiety/panic, hyperventilation, mild dehydration, vocal overuse, and Alzheimer's / MCI screening. The engine also produces a single 0-100 Healthy Voice Index headline number, estimates voice age, and computes cross-task contrasts that no single clip can capture. Patients see results in seconds; doctors see the full audit trail. These biomarkers are tracked over time so the doctor can spot trends without an in-person visit.

4. The two engines talk to each other

This is what makes Tether different. The voice biomarker results are automatically shared with the AI companion. So if the patient asks "How is my breathing?", the AI already knows the latest voice check showed an elevated breathing rate and can give a relevant, grounded answer β€” not a generic one.

5. The app says "I don't know" when it isn't sure

Every Tether prediction comes with a confidence label. When the model is between "definitely positive" and "definitely negative" β€” what statisticians call the inconclusive band β€” the app says so directly: "We couldn't tell from this recording β€” try a longer sample in a quieter room." Most voice biomarker tools force a yes/no answer even when they're uncertain; Tether is honest about the gray zone. Clinicians trust models that admit doubt.

6. The app catches bad recordings before they confuse anyone

Before analyzing audio, Tether checks the recording itself β€” is it too quiet? clipping? mostly silence? too much background noise? If the recording isn't usable, the app tells the patient exactly what went wrong ("too much background noise, find a quieter room") and offers a one-tap retry. No more misleading reports based on a recording that was never going to work. There's also a live microphone level meter while you record so you can see your voice reaching the phone in real time.

7. Patients see their voice over time, not just today

Tether's history view shows every previous recording with little trend charts for each measurement (jitter, voice clarity, breathing rate, pitch, energy, and more). Each metric has a direction arrow: green if it's holding steady or improving, amber if it's drifting, red if it's clearly worse than the patient's own baseline. For each tracked condition, a separate trend screen shows the 14-day trajectory of the risk score and flags anything that's been climbing three readings in a row. A snapshot is a parlor trick; a trend is medicine.

8. The patient tags how they feel, not just how they sound

Right after every recording, Tether pops a quick chip selector β€” Tired? Headache? Cough? Sore throat? Short of breath? Stressed? Just checking in? Patients tap whatever applies (or skip if they're in a hurry) and the tags are saved alongside the acoustic measurement. The doctor reads voice and symptoms together, which is the only way to interpret either responsibly. This also builds Tether's private dataset over time, which becomes invaluable for future model improvements.

9. Share with any doctor β€” not just Tether ones

Every biomarker report has a "Share PDF" button that generates a clean printable summary β€” voice quality measurements, classifier confidence, condition risks, and a clinical disclaimer β€” and pops the phone's share sheet. Patients can email it to their primary-care physician, attach it to their existing electronic health record, or print it for an in-person visit. PDFs are the universal language of healthcare; every clinic can read one.

10. Humans stay in the loop

If the AI cannot fully answer a question, it suggests the patient message their doctor directly. Doctors see these messages in real time and can reply. The AI never replaces the doctor β€” it bridges the gap between hospital visits so patients are never left guessing alone.

11. Works in the patient's language

Patients can switch between 26 languages β€” English, Spanish, Hindi, Mandarin, French, Arabic, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Bengali, Urdu, Tagalog, Swahili, Turkish, Polish, Dutch, Greek, Hebrew, Thai, Indonesian, Punjabi, and Ukrainian. The AI responds and speaks in their chosen language, removing a major barrier to understanding medical instructions after discharge.

12. Built to keep working when something breaks

If the Python biomarker engine on biomarker.arhan.dev has a problem, Tether automatically falls back to the Rust WASM engine compiled into the Cloudflare Worker and keeps working β€” patients never see a broken app. The fallback returns the same JSON shape so downstream consumers don't need to special-case it. If a single screen crashes, the rest of the app is unaffected; only that screen shows a "try again" message. Every recording is content-fingerprinted so the same audio submitted again returns instantly from cache. Every patient's audit log is cryptographically chained, so any tampering with past records is detectable.

13. Real authentication, not just an email field

Every Tether session is an HMAC-signed bearer token issued by the Cloudflare Worker. Passwords are stored with PBKDF2-SHA256, a per-user salt, and 100,000 iterations β€” they're never readable, even by us. Login attempts are rate-limited and locked after five failures. Old patient-data endpoints used to trust the email in the URL; now every endpoint checks the bearer token, verifies the caller's role, and blocks any patient from reading another patient's data. Doctors see only their assigned patients, caregivers see only patients who have explicitly consented.

14. Doctor's "needs attention today" queue

Doctors don't have to scroll through every patient to figure out who looks worse. The risk queue ranks patients by a composite score that combines the most recent biomarker status, the deviation trend across the last five recordings, missed medications in the last seven days, unread patient messages, days since last recording, and any distress signals in their journal entries. Each entry shows why the patient is flagged β€” not just a number, but the actual reasons ("voice deviation rising 22% β†’ 47%", "missed 4 of 7 doses this week").

15. Clinical escalations with a real status workflow

When a voice recording crosses into the "alert" band, Tether automatically creates an Escalation row. The doctor moves it through new β†’ reviewed β†’ contacted β†’ resolved (or false_alarm), and every transition adds an optional note plus an audit-log entry. No more "did anyone follow up on Mrs. Garcia's flagged recording from Tuesday?" β€” the system tracks it.

16. One timeline, every event

Plans, voice recordings, messages, journal entries, medication taken/missed, escalations created and resolved, and clinician notes are all merged into a single chronological feed per patient. A doctor can answer "what's happened with this patient in the last two weeks?" in one scroll instead of tab-hopping through four screens.

17. Pilot dashboard for hospitals

At the end of a pilot, a hospital can pull a concrete one-screen summary: enrolled patients, weekly recordings, alert rate, average clinician response time, day-30 retention, adherence rate, PDF exports, and a 14-day engagement chart. These are the questions that decide whether a pilot becomes a contract.

18. Invite codes, not shared logins

Doctors generate short human-shareable invite codes (ABCD-EFGH) and send them to a patient via SMS, email, or the patient-handoff sheet at discharge. The patient redeems the code at signup, which automatically links them to the right doctor and the right hospital organization. No more "I emailed them a temporary password and hope they remembered to change it."

19. Take your data with you, or delete it

Settings β†’ Export my data downloads a complete JSON bundle of everything Tether has stored about you β€” voice recordings (metadata + scores), journal entries, messages, care plans, audit logs. Settings β†’ Delete my account permanently removes all of it after you type a confirmation phrase. Both are GDPR data-subject rights; both are baked into the product, not handled by emailing support.

20. Every clinically-relevant action is audited

Who viewed which patient's data, who edited a care plan, who created or resolved an escalation, who exported PDFs, every login, every failed login β€” all stored as an immutable audit log entry tied to the actor and the target. Compliance-grade by default, not "we'll add that later."

21. No LLM key in the device, ever

Earlier builds of Tether shipped a Groq API key in the mobile bundle as a fallback path. That meant the key was extractable in seconds from any installed copy AND a full care plan + biomarker results + journal entries were going straight to a third-party LLM provider from every device, with no audit and no per-user rate limit. The direct path is now deleted. All LLM traffic goes through the Cloudflare Worker, which holds the provider key as a Workers secret, authenticates the caller's bearer token, audits the request, and only then forwards to Groq. If a build can't reach the worker, it falls back to a local rule-based reply generator β€” never to a network call.

22. Push, email, and SMS notifications

Clinical workflow only works if the right person gets paged at the right time. Tether now dispatches notifications through Expo Push (mobile), Resend (email), and Twilio (SMS) on four real triggers: a biomarker alert wakes the patient's assigned doctors, a new escalation pages whoever it's assigned to, a third missed-meds day in a week sends the patient a gentle reminder (capped at one nudge per week so it never becomes nag-spam), and an unacknowledged escalation past its SLA deadline auto-escalates to the backup clinician pool and then to the org admins.

23. Escalation SLAs with real workflow closure

Hospitals don't track alerts, they track closure. Every escalation now carries a due time (30 min for urgent, 4 hours for alert, 24 hours for monitor), records when a clinician first opens it, and tracks an escalation level. A background sweep promotes overdue ones from "assigned clinician" β†’ "backup clinician pool" β†’ "org admin" and pages each tier in turn. A doctor on vacation no longer means a missed urgent flag.

24. Building your voice baseline

Tether's trend signals only become meaningful after roughly five recordings β€” single-sample variance dominates earlier. The patient now sees a "Building your voice baseline: 1/5 recordings complete" progress bar on every recording until the threshold is hit, AND the app holds back alert-level wording on outliers during this window. The data is still collected and the doctor's risk queue still picks it up; we just don't shout at the patient about a possible signal before we have the data to back it up.

25. Recording quality over time

Every accepted AND rejected recording attempt is logged with its quality score and (if rejected) the reason code. A panel under the biomarker card shows the last 20 attempts as a sparkline with pass/fail colours and surfaces the most common rejection reason ("too_noisy" 4Γ— in the last 20). When a patient asks "why did the app keep refusing my recording?" the answer is now one tap away.

26. Real FHIR import (not just counting)

Tether's FHIR R4 import now walks a Bundle and converts four resource types: Patient becomes a stub Tether user (the patient sets their own password on first login), CarePlan becomes a DoctorPlan with conflict-resolution against any existing plan, Observation becomes a biomarker history entry when it's voice-related, and DiagnosticReport becomes a clinician note so the EHR's narrative is preserved. The response reports per-resource counts plus a "skipped" list so hospital integrators can see exactly what landed and what didn't.

27. Pilot CSV export for sponsors

Pilots end with a meeting where someone asks "did this actually work?" The pilot analytics screen now has Export CSV and Share Summary buttons that dump enrolled patients, weekly active, alert rate, average clinician response time, day-30 retention, adherence, PDF exports, and the 14-day recordings curve β€” flat enough to paste into Excel without reformatting.

28. Outcome entry + password reset + multi-tenant admin

A doctor can record readmission, ED visit, follow-up complete, or engagement check-in events directly from the app (these feed the pilot dashboard's retention and outcome metrics). Anyone can reset their password via a tokenized email link with a one-hour expiry β€” and resetting the password automatically revokes every existing session for that account. Admins can create organisations, add and remove members, and assign roles, so a single Tether deployment can host multiple hospitals without cross-tenant data bleed.

29. Runtime validation on every endpoint

TypeScript types caught the schema problems at our compile boundary, but the worker still accepted whatever JSON.parse produced from a request body. Now every state-changing endpoint runs the body through a Zod schema before any handler logic. A misshapen field returns HTTP 400 with the exact path and reason. A malicious payload β€” say {password: {$ne: null}} β€” never reaches the database layer.

30. Standardised 3-task voice protocol

Every patient now walks through the same three tasks every session: hold "ahhh" for 5 seconds, read a fixed sentence from the Rainbow Passage, then a 10-second symptom check-in. Same prompts every patient, every visit. That's the only way to make jitter, shimmer, HNR, vowel-space-area, and speech-rate comparable across patients and across visits β€” which is what unlocks real population baselines and real drift detection. The protocol is the default action on biomarker.arhan.dev and the only recording flow on the mobile app.

31. Healthy Voice Index β€” one number anyone can read

The engine produces dozens of numbers; most viewers want one. The Healthy Voice Index is a 0–100 composite that ensembles trust-weighted condition risks, signal integrity, session consistency, vowel articulation index, embedding outlier distance, and recording-quality signals. Bands: excellent / good / fair / concerning / poor. Every contribution is shown in the audit trail so a clinician can see exactly how the score was assembled β€” no black box. For the 3-task protocol, a separate session-level HVI aggregates the three per-clip scores via median + consistency bonus so a single noisy clip can't tank the headline.

32. WavLM-base-plus voice embeddings + population centroid

Every recording produces a 768-dimensional voice fingerprint from Microsoft's WavLM-base-plus, a self-supervised speech encoder pretrained on 94,000 hours of speech. The engine maintains a per-task running centroid via Welford's algorithm β€” every recording sharpens it. Once 20+ recordings exist for a task, every new recording gets a cosine-distance outlier score against the population. This is the data flywheel: every visitor makes the engine smarter for every future visitor, no labels required.

33. audeering wav2vec2 emotion model β€” fatigue v5

Fatigue used to come from a 42-feature engine-native classifier with 0.55 balanced accuracy on Predi-COVID β€” better than chance but weak. The v5 ensemble swaps the headline signal to audeering's published wav2vec2-large MSP-DIM arousal score (concordance correlation 0.74 on the MSP-Podcast benchmark, Wagner 2023). Valence acts as a depressive-pattern modulator, the v4 classifier still votes (15% weight), and the Cummins 2015 psychomotor slowing triad confirms. Expected lift over v4: +8–15% BAcc per Wang 2023's self-supervised-feature literature.

34. Alzheimer's / MCI screening

Twelfth condition: cognitive impairment screening from spontaneous speech. Five-stage interpretable ensemble β€” disfluency burden (Roark 2011), lexical impoverishment (Bucks 2000), affective flattening (Themistocleous 2018, Konig 2018), voice quality (Lopez-de-Ipina 2015), and an optional ADReSS-trained ML head when training data lands. Every threshold cites a published clinical paper. Honestly framed as a screening signal, not a diagnosis: a positive flag prompts clinical follow-up (neuro exam, MRI/PET, CSF), not a label.

35. True vowel-space area + Sapir VAI on the reading task

Because the reading task is a fixed sentence, the engine knows exactly which words the patient is saying. It uses Whisper's word-level timestamps to locate the three corner vowels (/Γ¦/ in "act", /Ιͺ/ in "prism", /oʊ/ in "rainbow"), extracts F1/F2 from each via Praat, and computes the true triangular vowel-space-area plus Sapir's Vowel Articulation Index β€” the canonical hypokinetic-dysarthria metrics (Skodda 2011, Sapir 2010, Rusz 2013). Shrunken VSA / depressed VAI is the strongest published voice biomarker for Parkinson's-style speech changes.

36. Cross-task contrasts

When all three protocol tasks are present, the engine computes deltas between them: jitter on vowel minus jitter on reading (laryngeal control under articulatory load), pitch CV on free speech minus pitch CV on reading (spontaneous prosodic range), speech rate on free minus reading (tempo flexibility), and four more. Each delta has a clinical interpretation rule β€” for example, near-identical pitch CV between free and reading speech is the canonical affective-flattening signature (Cummins 2015). These are signals no single clip can capture.

37. EBU R128 loudness normalisation per task

Phone-mic recordings come in at wildly different volumes; raw amplitude shifts jitter and shimmer estimates by 30-40% just based on how loud the speaker was. The engine now normalises every recording to ITU-R BS.1770-4 LUFS before any feature extraction β€” and per task, since sustained vowels are naturally louder than connected speech (Sapienza 2011): -18 LUFS for vowel, -23 LUFS for reading and free speech, -28 LUFS for breathing. Features are finally comparable across recordings and patients.

38. Reading-task adherence + sustained-vowel stability checks

If the patient was supposed to read the Rainbow Passage but Whisper transcribed something else (or nothing), the engine flags the recording as non-adherent and downweights all reading-derived features. Same for sustained vowel: a sliding 500-ms pitch and intensity check detects whether the vowel was actually steady or wavering β€” if not, jitter and shimmer aren't clinically reliable and the score knows. The engine refuses to silently produce garbage from a bad input.

39. Demographic-aware Healthy Voice Index

Healthy 70-year-olds have naturally higher jitter floors, slightly lower HNR, and narrower vowel space than 30-year-olds (Brockmann-Bauser 2018, Stathopoulos 2011). The HVI widens the healthy envelope for older speakers β€” voice_dysphonia tolerance Γ—1.5 past age 70, VAI floor shifts down by 0.10 β€” so age-appropriate variation isn't penalised as pathology. The engine accepts patient age and gender on every request and routes them through every relevant threshold.

40. Two engines that improve over time

Five mechanisms make the engine sharper with every recording, none requiring new labels: per-patient baseline z-scores (activates at 3 recordings per patient), population baseline per task (30 patients per task), the WavLM voice-fingerprint centroid (20 patients per task), voiceprint drift detection (1 enrollment per patient), and cross-task contrast norms (~1,000 patients). The standardised protocol is the enabler β€” same prompts mean compounding statistics. Trained classifiers (Parkinson's, fatigue, Alzheimer's) improve through a separate labelled-data path: training scripts ship in biomarker-engine/scripts/ ready to run when ADReSS / Predi-COVID / DAIC-WOZ access lands.

Architecture

Tether follows a privacy-first architecture. API keys never ship in the mobile bundle β€” all LLM requests and biomarker analysis are proxied through a Cloudflare Worker at the edge.

Patient Opens app on phone Doctor Opens app on phone Mobile App (React Native + Expo) Voice Recording AI Chat Local Storage Voice data Chat messages Cloudflare Worker (Edge Proxy) Routes requests · Hides API keys · Runs at the edge Your API keys never leave the server Groq LLM AI chat responses Rust WASM Voice biomarker analysis CF Secrets Encrypted API keys No patient data stored on servers
F

Frontend

React Native + Expo SDK 55 with React Navigation native stack. Runs on iOS, Android, and web.

B

Backend

Cloudflare Worker proxies all API calls. GROQ_API_KEY stored as a Cloudflare secret, never exposed to the client.

Py

Python FastAPI Engine (primary)

Full biomarker pipeline at biomarker.arhan.dev: Praat voice quality, YAMNet, openSMILE eGeMAPS, Whisper, WavLM, audeering emotion, Tsanas nonlinear, trained Parkinson's + fatigue + Alzheimer's classifiers, per-patient + population baselines, Healthy Voice Index.

R

Rust WASM (fallback)

Lightweight biomarker engine compiled to WebAssembly via wasm-pack, runs inside the Cloudflare Worker for ~50 ms edge-speed signal processing. Used as fallback when the Python engine is unreachable.

AI

LLM

Groq API with LLaMA 3.3 70B. Graceful fallback chain: Worker β†’ direct β†’ keyword matching.

Quickstart

Prerequisites

  • Node.js 22+ (wrangler 4.x requires it)
  • Expo CLI (npm install -g expo-cli)
  • iOS Simulator (Xcode) or Android Emulator
  • Python 3.11+ + Docker (only needed if developing the biomarker engine locally; production runs on the Contabo VPS)
  • Rust + wasm-pack (only needed if developing the WASM fallback engine)

Setup (mobile app)

git clone https://github.com/ArhanCodes/tether.git
cd tether
npm install --legacy-peer-deps
cp src/lib/config.template.ts src/lib/config.ts
npm run ios

That's it. The config template comes pre-configured with the shared Tether worker URL β€” no API keys or environment variables needed on the client. The Groq key and the BIOMARKER_ENGINE_URL live on the Cloudflare Worker as secrets and are never exposed to the client.

Web preview: Run npx expo start --web instead to open in a browser.

Worker Setup

# Deploy the Cloudflare Worker (the API + LLM proxy + biomarker forwarder)
cd worker
npm install
npx wrangler secret put GROQ_API_KEY          # for /chat endpoint
npx wrangler secret put SESSION_HMAC_KEY      # for HMAC-signed bearer tokens
npx wrangler secret put BIOMARKER_ENGINE_URL  # points at biomarker.arhan.dev (or your own engine)
npx wrangler deploy

Biomarker engine setup (only for self-hosting)

# The production engine runs at https://biomarker.arhan.dev on a Contabo VPS.
# To run your own copy:
cd biomarker-engine
./install.sh   # docker-compose up, fetches YAMNet + WavLM + Whisper + audeering
# Engine then listens on 127.0.0.1:8765
# Point a public domain via nginx + letsencrypt; set BIOMARKER_ENGINE_URL accordingly

Features

Auth

  • Login / signup with role selection (doctor or patient)
  • Passwords hashed with PBKDF2-SHA256 server-side (100,000 iterations, 16-byte per-user salt) β€” never readable by us
  • Sessions are HMAC-signed bearer tokens with server-side revocation (the worker can kick any session by deleting it from the Durable Object's session list)
  • Failed-login rate limiting + lockout (5 attempts in 5 minutes β†’ 15-minute lockout)
  • Role-based access control β€” patients can only see their own data, doctors only see assigned patients, caregivers need explicit patient consent
  • Terms/privacy consent on signup

Doctor Workspace

  • Create/edit patient recovery plans (diagnosis, vitals, meds, instructions, red flags, follow-up)
  • Set AI tone (calm, direct, reassuring)
  • Publish plans to a specific patient email (validates account exists)
  • Draft auto-saves locally
  • View and reply to patient messages

Patient Companion

  • View the recovery plan assigned to your email
  • Vitals summary, daily instructions, red flags
  • AI chat powered by Groq with keyword-matching fallback
  • Quick prompt buttons ("What should I do today?", "When should I call?", etc.)
  • Voice input via speech recognition
  • Voice output (text-to-speech on AI replies, toggleable)
  • Urgency badges on AI responses (routine / contact clinician / urgent)
  • Flesch-Kincaid readability score on every AI response (grade level badge)
  • Handoff suggestion when AI can't fully answer
  • Direct messaging to doctor (real-time via Durable Objects)
  • Multilingual support (26 languages β€” see Β§ "Works in the patient's language" above for the full list)
  • Voice biomarker analysis (breathing rate, cough detection, vocal tremor, voice energy)
  • Biomarker status levels (normal / monitor / alert) with alert popup
  • Biomarker trending β€” historical chart showing trends over time
  • Engine connection β€” biomarker data injected into AI context automatically
  • Patient Journal β€” daily journal entries that feed into AI context for more personalized responses
  • Medication Adherence Tracker β€” daily yes/no medication logging with 7-day streak visualization
  • Time-aware prompting β€” AI adapts advice based on days since discharge (early/mid/extended recovery)

Doctor Workspace (continued)

  • Discharge date β€” set per patient to enable time-aware recovery guidance
  • Recovery Score Dashboard β€” composite 0-100 score per patient (biomarker + adherence + engagement + journal), sorted by risk

Onboarding

  • 5-step tutorial on first launch (welcome, doctors, patients, voice biomarkers, safety)
  • Skip button and dot indicators
  • Only shows once (stored in AsyncStorage)

Infrastructure

  • Cloudflare Worker proxy β€” API key stays server-side, never ships in the app
  • Durable Objects backend β€” accounts, plans, messages, biomarker history persist across devices
  • Python FastAPI biomarker engine runs on a private VPS (biomarker.arhan.dev); Rust WASM fallback compiled into the Cloudflare Worker
  • AI requests routed through worker, falls back to direct Groq, then keyword matching

Authentication

Users sign up with a role (Doctor or Patient) and are routed to the appropriate workspace after login. Sessions persist across app restarts via AsyncStorage.

  • Password hashing: PBKDF2-SHA256 server-side, 100,000 iterations, per-user 16-byte salt. Hashes live in the Durable Object; plaintext is never stored or transmitted anywhere except over TLS on signup/login
  • Sessions: HMAC-SHA256-signed bearer tokens (payload.signature, base64url), validated against the Durable Object's session list so any session can be revoked server-side
  • Rate limiting: 5 failed logins in 5 minutes triggers a 15-minute lockout per email
  • RBAC: Every authenticated endpoint enforces who can read/write which patient. Patients see their own data only. Doctors see their assigned patients. Caregivers need an explicit consent record
  • Audit log: Every clinically-relevant action (data view, plan edit, escalation open/close, export, delete, login attempt) is recorded with actor, target, IP, user-agent, and timestamp
  • Invite codes: Doctors generate short codes that patients redeem at signup to auto-link to the right care team

Doctor Workspace

Doctors create, edit, and publish recovery plans for specific patients. Plans are the foundation of the entire patient experience β€” the AI, the UI, and the messaging system all derive from the published plan.

Plan Fields

FieldDescription
Patient Name & EmailMust match a registered patient account
DiagnosisPrimary condition (e.g. post-discharge pneumonia)
VitalsHeart rate, blood pressure, temperature, O2 saturation
MedicationsName, dosage, and frequency (one per line)
Daily InstructionsWhat the patient should do each day
Red FlagsSymptoms that require immediate medical attention
Follow-upNext appointment or scheduled check-in
ToneCalm, Direct, or Reassuring β€” controls AI personality
Doctor NotesPrivate instructions for how AI should phrase answers

Messaging

Doctors see all patient message threads, sorted by most recent. They can select a thread and reply directly. When a patient sends a message (or the AI suggests a handoff), it appears here.

Patient Companion

The patient screen surfaces the published recovery plan and provides multiple channels for getting help: AI chat, voice input, quick prompts, biomarker analysis, and direct doctor messaging.

Care Plan Display

Vitals, daily instructions, medications, and red flags β€” all from the doctor's published plan.

AI Chat

Text or voice questions answered by LLaMA 3.3, constrained to the care plan. Includes urgency badges and handoff suggestions.

Voice Biomarkers

Patient runs the 3-task voice protocol (~40 s). Python FastAPI engine on biomarker.arhan.dev analyzes 100+ features and returns a Healthy Voice Index, with a Rust WASM fallback in the Worker.

Doctor Messaging

Direct messaging channel for when AI isn't enough. The AI can auto-suggest using this when it lacks certainty.

Patient Journal

Write daily entries about how you feel. Recent entries are injected into the AI prompt so responses reflect your current emotional and physical state.

Medication Tracker

Log daily medication adherence with a simple yes/no. A 7-day streak visualization shows your compliance at a glance.

Caregiver Portal

Adult children of elderly patients, partners, and family members often need visibility into post-discharge recovery without being clinical providers. The caregiver portal is a third login type that gives trusted contacts a read-only dashboard for any patient who explicitly links them.

How linking works

  1. The caregiver creates a Tether account with the caregiver role at sign-up.
  2. The patient adds the caregiver's email to their account β†’ triggers POST /api/caregiver/link.
  3. The caregiver logs in and sees a dashboard of every patient who linked them.
  4. Either side can revoke the link at any time.

What the caregiver sees

Latest published plan

Diagnosis, doctor name, last-updated timestamp. Tap through for full medications, instructions, and red flags.

Recent voice biomarkers

The last 10 readings with status dots β€” green / amber / red β€” for at-a-glance monitoring of breathing trends.

7-day adherence

A pill-grid showing which days the patient took their medication. Missed days highlighted in red.

Privacy model

Caregivers can read but cannot send messages, edit plans, or post journal entries on the patient's behalf. The patient remains the data owner β€” every link is opt-in and removable. The doctor is not notified of caregiver links by default; the patient controls who sees what.

Data flows

GET /api/caregiver/patients?email=<caregiver-email>
β†’ [
    {
      patientEmail, patientName,
      latestPlan,
      recentBiomarkers,
      recentAdherence
    },
    ...
  ]

Protocol Library

Doctors don't write a recovery plan from scratch every time. The protocol library ships five clinically-grounded templates, each one a complete DoctorPlan shape β€” diagnosis text, medications with dosing, daily instructions, red flags, follow-up timing, and recommended tone.

Included templates (v1)

Post-discharge Pneumonia

ICD-10 J18.9. Amoxicillin + inhaler regimen, breathing-focused red flags, GP follow-up in 3 days.

Heart Failure (CHF)

ICD-10 I50.9. Furosemide + lisinopril + carvedilol, daily weight check (the single most important early warning), cardiology follow-up in 7 days.

COPD Exacerbation

ICD-10 J44.1. Tiotropium + rescue inhaler + 5-day prednisolone + 7-day doxycycline, oximeter-based red flags.

Post-surgical Recovery

ICD-10 Z48.815. Pain-control regimen, DVT prevention with enoxaparin, wound-care daily steps, 6-week lifting restriction.

Type-2 Diabetes (new diagnosis)

ICD-10 E11.9. Metformin titration schedule, atorvastatin, glucose-target ranges, plate-method dietary guidance.

How a doctor uses it

  1. Open the Doctor Workspace β†’ "Publish Patient Plan" section.
  2. Click any protocol chip β€” fields auto-fill with the template defaults.
  3. Edit anything that's patient-specific (medications, follow-up timing, tone).
  4. Add the patient's name and email β†’ publish.

Why this matters

A solo physician can publish 5–10 plans per evening with the protocol library, vs. 1–2 from scratch. More importantly: the templates encode best-practice red flags ("weight gain >1 kg in a day" for CHF, "rescue inhaler more than every 4 hours" for COPD) that an under-the-gun doctor might forget to write. The templates are clinically reviewable and version-controlled in src/lib/protocols.ts.

Extending

Adding a new condition is one object in the PROTOCOL_TEMPLATES array β€” the UI picks it up automatically. The schema is { id, label, emoji, conditionICD10, defaults }, where defaults is a Partial<DoctorPlan>.

AI Chat System

The AI is powered by Groq's LLaMA 3.3 70B model, accessed through a Cloudflare Worker proxy. Every response is grounded in the doctor's published care plan.

System Prompt

A dynamic system prompt is built from the care plan that includes the patient's diagnosis, medications, instructions, red flags, and the doctor's preferred tone. The AI is instructed to:

  • Only answer from documented care plan data
  • Flag red-flag symptoms as "urgent"
  • Suggest messaging the doctor when information is missing
  • Return structured JSON with message, urgency, supporting points, and handoff flag

Response Urgency Levels

LevelMeaningUI Treatment
routineNormal informational responseBlue badge
contact-clinicianAI suggests speaking with doctorYellow badge
urgentRed flag symptom detectedRed badge + escalation banner

Fallback Chain

1. Cloudflare Worker β†’ Groq API (primary)
2. Direct Groq API call (if worker fails)
3. Keyword matching (if no API configured)
Safety: The AI never diagnoses, prescribes, or advises outside the doctor's documented scope. Emergency symptoms always trigger an urgent flag with instructions to seek immediate care.

Voice Biomarkers

Tether's biomarker system records a short voice sample from the patient using the standardized 3-task protocol (sustained vowel + reading + free speech, ~40 s total) and sends it to a Python FastAPI engine on biomarker.arhan.dev for full clinical-grade signal processing. A Rust WebAssembly engine compiled into the Cloudflare Worker runs in parallel as a fallback so the patient never sees a broken app if the Python engine is unreachable.

How It Works

  1. Patient taps "Start 3-task voice protocol" β€” expo-audio begins recording in WAV/PCM at 16 kHz
  2. Wizard walks the patient through 5 s of sustained "ahhh", a fixed sentence from the Rainbow Passage, and a 10 s symptom check-in
  3. PCM samples + per-clip recording_type tags sent to the Cloudflare Worker's /api/biomarkers
  4. Worker forwards to the Python engine's /analyze_multi (or /analyze for single clips)
  5. Python engine runs the full pipeline (Praat voice quality, YAMNet event detection, openSMILE eGeMAPS, Whisper transcription + disfluency, WavLM voice fingerprint, audeering valence/arousal/dominance, Tsanas nonlinear PD markers, per-patient baselines, population baselines, signal-integrity anti-spoofing, Healthy Voice Index)
  6. If the Python engine fails or times out, the worker falls back to the Rust WASM engine which returns a basic BiomarkerReport with energy/breathing/jitter/shimmer/HNR β€” same JSON contract, much narrower feature set
  7. Results displayed as a card with status badge plus the Healthy Voice Index headline (0-100, banded)
  8. Report saved to Durable Objects for longitudinal trending and per-patient baseline accumulation

The two engines

  • Python FastAPI engine (primary) β€” runs on biomarker.arhan.dev. Pipeline includes Praat clinical voice quality (jitter, shimmer, HNR, CPPS, formants), YAMNet 521-class audio event classifier, openSMILE eGeMAPS (88 features), Whisper transcription + disfluency markers, WavLM-base-plus 768-d voice fingerprint (Microsoft, 94k-hour pretraining), audeering wav2vec2-large emotion model (MSP-Podcast benchmark), Tsanas nonlinear markers (PPE, RPDE, DFA, GNE), per-patient SQLite baseline store, population baselines per task, signal-integrity flags, trained Parkinson's classifier (0.83 BAcc on UCI), trained fatigue v5 ensemble, rule-based Alzheimer's screening, EBU R128 LUFS normalization, demographic-aware Healthy Voice Index. Latency 18-25 s per clip depending on enabled features. Live at https://biomarker.arhan.dev.
  • Rust WASM engine (fallback) β€” compiled with wasm-pack, runs in-Worker. Extracts energy, breathing rate, pitch variability, cough events, zero-crossing rate, jitter, shimmer, HNR, CPPS, formants F1/F2/F3, vowel space area. Returns a strictly narrower JSON shape with the same field names so downstream consumers don't need to special-case the fallback. Latency ~50 ms.

Biomarker Trending

Every biomarker report is stored server-side with a timestamp. The patient's biomarker card shows a trend view of the last 10 readings with bar charts for breathing rate, voice energy, and cough events. Alert/monitor/normal counts are summarized as colored pills. This turns a single snapshot into a longitudinal monitoring system that can detect deterioration over days.

Clinical Voice Quality Card

Below the core metrics the card surfaces the clinical voice quality section: Mean Pitch (Hz), Jitter %, Shimmer %, and HNR (dB), each annotated with the healthy reference range. These are the same metrics used by Praat (the academic reference tool for voice biology). The section appears only when the engine successfully extracted enough voiced cycles, so it does not show on whisper-only or breath-only recordings.

Engine Connection

Tether's two AI engines β€” NLP (Groq LLM, proxied through the Cloudflare Worker) and Bio-Acoustic (Python FastAPI engine on biomarker.arhan.dev with a Rust WASM fallback inside the Worker) β€” share context automatically:

  • The latest biomarker report (including confidence score and all 5 metrics) is injected into the AI system prompt before every chat request
  • When the patient asks "how am I doing?", the AI references actual biomarker readings (breathing rate, cough events, energy levels, zero-crossing rate)
  • If biomarkers are in "alert" status, the AI proactively warns the patient and recommends contacting their care team
  • The AI knows the analysis confidence level and can qualify its answers accordingly ("Your latest voice check had moderate confidence β€” consider recording again in a quieter space")
  • One engine listens to the body, the other explains what it means in plain language

Automatic Alert Escalation

When a biomarker recording returns alert status (2+ flags), Tether automatically sends a care message to the assigned doctor β€” no patient action needed. The message includes:

  • Full biomarker summary with actual values and normal ranges
  • Confidence score for the analysis
  • A note that the message was sent automatically by the biomarker system

The patient sees "Health Alert β€” Doctor Notified" confirming the escalation happened. This means a patient could record a voice check, trigger an alert, and their doctor sees it in their inbox within seconds β€” all without the patient needing to understand or act on the medical data themselves.

Readability Scoring

Every AI response is scored using the Flesch-Kincaid Grade Level formula. A badge on each message shows the grade level (e.g., "Grade 4.2 - Very Easy"). This proves the health literacy claim with data:

  • Grade 0-5: Very Easy β€” 5th grader can understand
  • Grade 6-8: Easy β€” middle school level
  • Grade 9-12: Moderate β€” high school level
  • Grade 13+: Complex β€” college level (AI is prompted to stay below 6)

Patient Journal

Patients can write daily journal entries describing how they feel. This serves two purposes:

  • Patient self-reflection: Writing about symptoms, mood, and progress helps patients track their own recovery
  • AI context enrichment: The 3 most recent journal entries are injected into the AI system prompt, allowing responses to account for the patient's current emotional and physical state

Entries are stored server-side via Durable Objects (max 100 per patient, 2000 character limit). The patient sees their entries in reverse chronological order. The journal also contributes to the Recovery Score (up to 20 points).

Medication Adherence Tracker

A simple daily check-in that asks patients: "Did you take all your medicines today?" with Yes/No buttons.

  • One log per day: Duplicate entries for the same day are prevented
  • 7-day streak: Colored dots show recent adherence (green = taken, red = missed)
  • AI awareness: Adherence records are injected into the AI prompt β€” if the patient has missed 2+ days, the AI gently reminds them about medication importance
  • Recovery Score input: Adherence contributes up to 30 points to the composite score

Time-aware Prompting

Doctors can set a discharge date on each patient's plan. The AI system prompt then calculates days since discharge and adjusts its approach:

PhaseDaysAI Behavior
Early recovery0-3Extra cautious, encourages rest and monitoring
Mid recovery4-14Encourages gradual activity and adherence
Extended recovery15+Focuses on long-term habits and follow-up

A "Day X since discharge" badge appears on the patient's journal section for awareness.

Recovery Score

A composite 0-100 score calculated per patient, visible to doctors on their workspace. Patients are sorted lowest-first so the most at-risk patients get attention first.

Scoring Breakdown

ComponentMax PointsSource
Biomarker Health30Ratio of normal/monitor/alert readings in recent biomarker history
Medication Adherence30Proportion of "taken" days in the last 7 days
Communication Engagement20Patient messages sent in the last 7 days (capped at 4)
Journal Activity20Journal entries written in the last 7 days (capped at 4)

Risk Levels

  • 0-39: At Risk β€” needs immediate attention
  • 40-69: Recovering β€” progressing but needs monitoring
  • 70-100: On Track β€” recovery going well

Multilingual Support

Patients can select their preferred language from 26 options: English, Spanish, Hindi, Mandarin, French, Arabic, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Bengali, Urdu, Tagalog, Swahili, Turkish, Polish, Dutch, Greek, Hebrew, Thai, Indonesian, Punjabi, and Ukrainian. The language preference is stored server-side and affects:

  • AI chat responses β€” the system prompt instructs the LLM to respond in the selected language at a 5th grade reading level
  • Voice output β€” text-to-speech uses the correct language code via expo-speech
  • The biomarker engine receives the language code in its /analyze payload so Whisper transcription picks the right multilingual model variant (English clips use tiny.en for free_speech / small.en for reading; non-English clips use the multilingual tiny)
  • The setting persists across devices via Durable Objects

Cloudflare Worker

The Worker is the secure API proxy + data backend. Lives at tether-api.arhan-harchandani.workers.dev. It exposes auth, app data, the AI proxy, and the biomarker forwarder. Every state-changing endpoint is HMAC-bearer-token authenticated and the request body is validated against a Zod schema before any handler runs.

API Endpoints

EndpointMethodDescription
/chatPOSTForwards chat messages to Groq API with the GROQ_API_KEY secret. Default model is llama-3.3-70b-versatile.
/api/signupPOSTCreate a new account (name, email, password, role). Password hashed with PBKDF2-SHA256, 100,000 iterations, per-user salt. Returns an HMAC-signed bearer token.
/api/loginPOSTAuthenticate and return an HMAC bearer token + user profile.
/api/plansGET/POSTRetrieve or publish doctor care plans (doctor RBAC enforced).
/api/messagesGET/POSTDoctor-patient messaging thread.
/api/biomarkersPOSTReceives PCM audio samples from the mobile app or web demo. Forwards to the Python engine at BIOMARKER_ENGINE_URL (defaults to https://biomarker.arhan.dev); falls back to the in-Worker Rust WASM engine if the Python engine is unreachable or times out. Also runs voiceprint enrollment + speaker-similarity check before persisting. Returns the merged BiomarkerReport.
/api/biomarkers?email=…GETRetrieve a patient's biomarker history. Doctor RBAC required if the email is not the caller's own.
/api/user/languagePOSTUpdate patient language preference (one of 26 supported languages).
/api/usersGETList users (admin RBAC; password hashes never returned).
/api/journalGET/POSTPatient journal entries (max 100 per patient, 2000 char limit per entry).
/api/adherenceGET/POSTDaily medication adherence records (upserts by patient + date).
/api/recovery-scoreGETComposite recovery scores for a doctor's patients, sorted by risk. Combines biomarker, adherence, engagement, and journal sub-scores.
/api/escalationsGET/POSTClinical escalation rows with status workflow (new β†’ reviewed β†’ contacted β†’ resolved).

Biomarker engine endpoints (called via the Worker's /api/biomarkers)

EndpointMethodDescription
/analyzePOSTSingle-clip analysis. Accepts samples, sampleRate, recording_type, optional subjective_* self-report fields, threshold_mode, disable_lufs. Returns the full BiomarkerReport.
/analyze_multiPOSTMulti-clip 3-task protocol. Accepts recordings (2-5 base64 clips) + recording_types. Returns session-level Healthy Voice Index, cross-task contrasts, consistency score, plus the per-clip reports merged via median + majority vote + max-risk.
/healthGETLiveness probe. Returns engine version + YAMNet load status.
/versionGETFull model metadata: balanced accuracy, ROC AUC, training dataset, n_samples, n_patients, threshold, calibration brier, and 95% CIs for every loaded trained classifier.
/learning/statusGETPublic flywheel snapshot. Shows population-baseline sample counts per task, WavLM centroid counts per task, label counts per classifier, current auto-tuned thresholds with their lift-vs-default measurements.
/baseline/{patient_id}GET/DELETEPer-patient SQLite baseline counts; DELETE wipes the patient's history. Admin auth required.
/trend/{patient_id}GETN-day time series + per-metric trend direction (up / down / stable). Used by the doctor view for longitudinal charts.
/fhir/analyzePOSTSame analysis as /analyze but returns a FHIR R4 Bundle of Observations + DiagnosticReport for EHR integration (NIH Bridge2AI VBAI profile).
/admin/retrainPOSTAdmin-only. Exports accumulated self-report-labelled samples as JSONL ready to feed the offline training scripts. Gated at 500 samples per classifier.
/admin/keysPOST/GET/DELETEAdmin-only B2B API key management.
/metricsGETPrometheus-style text exposition for ops monitoring.

Durable Objects Backend

All application data (accounts, plans, messages, biomarker history) is stored in a Cloudflare Durable Object (TetherData). This replaces the previous AsyncStorage-only approach and provides:

  • Cross-device sync β€” a doctor publishes a plan on their laptop, the patient sees it on their phone instantly
  • Strong consistency β€” single-instance guarantee means no stale reads across regions
  • Edge persistence β€” data persists in Cloudflare's global network with automatic replication
  • Privacy β€” password hashes (PBKDF2-SHA256, 100k iterations, per-user salt) live in the Durable Object and are never exposed to clients

The DO seeds itself with starter accounts on first access. AsyncStorage is only used for local session state (which user is logged in on this device).

Rust WASM Engine (fallback)

The original biomarker engine β€” written in Rust, compiled to WebAssembly via wasm-pack, loaded as an ES module inside the Cloudflare Worker. Now serves as the fallback path when the primary Python FastAPI engine on biomarker.arhan.dev is unreachable. Returns a strictly narrower JSON shape with the same field names so downstream consumers don't need to special-case the fallback. Latency ~50 ms vs the Python engine's 18-25 s, but the feature set is much narrower (no Whisper, no WavLM, no audeering, no trained classifiers, no continual learning).

Entry Points

pub fn analyze_audio(samples_i16: &[i16], sample_rate: u32) -> String
pub fn analyze_audio_typed(samples_i16: &[i16], sample_rate: u32, recording_type: &str) -> String

Accepts raw PCM samples and sample rate. analyze_audio_typed additionally takes a recording type ("speech" or "breathing") and tunes the envelope window accordingly. Returns a JSON-encoded BiomarkerReport.

Signal Quality & Preprocessing

  • Duration Gate β€” Recordings shorter than 1.5 seconds are rejected outright rather than analyzed with poor statistics.
  • Signal Quality Gate β€” Computes SNR from quartile energy ratios. Recordings with SNR below threshold are rejected with a "record in a quieter environment" message instead of producing misleading results.
  • Clipping Gate β€” Recordings with more than 1% of samples saturated at the digital ceiling are rejected. The threshold is fraction-based rather than max-sample so a single peak does not invalidate an otherwise good recording.
  • VAD-style Silence Stripping β€” Splits audio into 20ms frames, computes adaptive noise floor at the 20th percentile energy, and additionally checks per-frame ZCR. Frames with high ZCR (fricatives, breath noise) are dropped along with silence. This isolates clean voiced speech for downstream pitch and quality metrics.
  • Confidence Scoring β€” 0 to 1 composite: 30% signal quality + 25% recording duration + 25% active speech ratio + 20% pitch detection hit rate. Shown to patients as High, Moderate, or Low badge.

Signal Processing Pipeline

  • RMS Energy β€” Root mean square of silence-stripped samples. Detects fatigue (low energy).
  • Zero-Crossing Rate β€” Frequency of sign changes on active speech. Detects breathy or labored speech.
  • Breathing Rate β€” 200ms energy envelope, moving-average low-pass smoothing, peak detection with hysteresis (1.2x and 0.8x thresholds). The smoothing step separates real breathing rhythm from speech cadence.
  • YIN Pitch Detection β€” Implementation of the YIN algorithm (de Cheveigne and Kawahara, 2002), the standard for monophonic pitch estimation. Per-frame cumulative mean normalized difference function with parabolic interpolation around the period minimum. Substantially more accurate than basic autocorrelation: detects 200 Hz sine at 200.01 Hz.
  • Jitter β€” Mean absolute period-to-period frequency variation across YIN-extracted cycles, normalized by mean period. Clinical reference threshold 1.04% (Teixeira et al., 2013). Elevated in tremor and neurological conditions.
  • Shimmer β€” Mean absolute amplitude difference across consecutive voiced cycles, normalized by mean amplitude. Clinical reference threshold 3.81%. Elevated in laryngeal pathology and breathy voice.
  • HNR (Harmonics-to-Noise Ratio) β€” Computed as 10 * log10(r / (1 - r)) where r is the mean YIN voicing strength. Reported in dB. Healthy voice typically > 20 dB; values below 7 dB suggest dysphonia.
  • Mean Pitch and Voiced Fraction β€” Average fundamental frequency in Hz across all voiced cycles, plus the fraction of the recording where pitch could be reliably extracted.
  • Pitch Variability (CV) β€” Coefficient of variation across YIN-detected pitches. Detects vocal tremor.
  • Cough Detection β€” 30ms frames, sharp energy spikes (> 4x mean) followed by silence (< 0.5x mean within 150ms), plus a broadband check (frame ZCR > 0.20) to discriminate cough from sustained tones. Skip-ahead prevents double-counting. Note: current sensitivity is 13.8% on Coswara. Path to ~92% is YAMNet integration, see Roadmap.

Rich Summary Generation

Instead of bare flag names, summaries include actual values and normal ranges. Examples:

  • "Breathing rate is 28/min (normal range: 12–20/min). 3 cough events detected. Consider contacting your care team."
  • "Voice biomarkers are within normal ranges." (with confidence note if recording quality was moderate)

Building

cd biomarker
wasm-pack build --target web --out-dir ../worker/wasm --release
# Output: tether_biomarker_bg.wasm (~83KB) + JS bindings

Biomarker Metrics Reference

Core signals

MetricRangeFlag ThresholdClinical Significance
Energy (RMS)0 – 1< 0.015Low energy suggests fatigue or weakness
Zero-Crossing Rate0 – 1> 0.3High ZCR indicates breathy or labored speech
Breathing RateBPM> 24Tachypnea, elevated respiratory rate (normal: 12 to 20)
Pitch Variability (CV)0 – 1> 0.35High variation suggests vocal tremor
Cough EventsCount≥ 3Frequent coughing in a short sample
Confidence0 – 1N/AComposite of SNR, duration, active speech ratio, and pitch detection hit rate. < 0.4 = Low, 0.4 to 0.7 = Moderate, > 0.7 = High

Clinical voice quality (new)

These are the same metrics used by Praat, the academic reference tool for voice biology. Thresholds drawn from Teixeira et al. (2013) and the GRBAS scale.

MetricRangeFlag ThresholdClinical Significance
Jitter0 – 1 (ratio)> 0.0104 (1.04%)Period-to-period frequency variation. Elevated in tremor, vocal fold pathology, neurological conditions.
Shimmer0 – 1 (ratio)> 0.0381 (3.81%)Amplitude variation across cycles. Elevated in laryngeal pathology, breathy or hoarse voice.
HNR (dB)-30 – 60< 7 dBHarmonics-to-noise ratio. Low values indicate raspy, breathy, or aphonic voice. Healthy voice typically > 20 dB.
Mean Pitch (Hz)0 – 2000Reference onlyAverage fundamental frequency. Typical adult male: 85 to 180 Hz. Typical adult female: 165 to 255 Hz.
Voiced Fraction0 – 1Reference onlyProportion of the recording where the engine detected voiced (pitched) speech. < 0.3 suggests whispering, dysphonia, or microphone failure.

Status Logic

Flags TriggeredStatusMeaning
0NormalNo concerning patterns detected
1MonitorOne metric outside normal range, worth watching
2+AlertMultiple flags, consider contacting care team

Licensed third-party integrations

Tether's biomarker pipeline integrates four externally validated components. Each is used here under direct license from its original maintainer (verified in writing) or under the open license of the upstream paper/dataset (algorithms cited, code clean-room re-implemented).

SourceWhat we useLicense path
kind-lab/voice-biomarker-fhir FHIR R4 profiles for voice biomarker output (NIH Bridge2AI VBAI initiative) Free, written permission from maintainers
LIHVOICE/Predi_COVID_Fatigue_Vocal_Biomarker COVID-fatigue biomarker methodology + Predi-COVID dataset (LIH Luxembourg, 3544 recordings) One-time $100 license; redistribution permitted
Ashindustry007/Vocal-Biomarker-ICBHI-final-database ICBHI 2017 respiratory sound classifier methodology (920 lung recordings, 6 diseases) Paid license; redistribution permitted
ThanasisTsanas/VoiceAnalysisToolbox + UCI Parkinson's voice dataset PPE, RPDE, DFA, GNE features + UCI Parkinson's classifier Code is GPL-3.0 (avoided); algorithms re-implemented from Tsanas 2011 (J Royal Soc Interface, open access) and Little 2007 (BioMed Eng Online). UCI dataset is public domain. Compatible with commercial product.
Shahabks/my-voice-analysis Articulation rate, syllable boundary detection, F0 statistics (Praat-backed) MIT β€” free for any use
SYSTRAN/faster-whisper CTranslate2-backed Whisper inference for transcription + disfluency analysis (tiny.en model, ~75 MB) MIT β€” free for any use

Validation harnesses for each ship with the repo and are reproducible end-to-end:

python3 scripts/validate_biomarker.py     # Coswara (IISc Bangalore, public)
python3 scripts/validate_predicovid.py    # Predi-COVID (LIH-VOICE)
python3 scripts/validate_icbhi.py         # ICBHI 2017 (BHI Challenge)
python3 scripts/compare_engines.py        # A/B WASM vs VPS engine
python3 scripts/train_parkinsons_uci.py   # train UCI Parkinson's classifier (free, local)
python3 scripts/train_coughvid_modal.py   # fine-tune YAMNet on COUGHVID (paid, $30-50 Modal)

Validation

The engine has been benchmarked against the Coswara dataset (Indian Institute of Science, Bangalore) using a randomly sampled batch of 29 patient recordings (cough-heavy and sustained vowel-a). Validation script lives at scripts/validate_biomarker.py and runs against the deployed analyze endpoint.

Pitch detection (sine reference)

200 Hz sine wave detected at 200.01 Hz. Pitch accuracy on clean voiced segments: ~99.99%.

Cough detection (Coswara, n=29)

DetectorSensitivityFalse-positive rateNotes
WASM v1: Energy spike + ZCR (deprecated)13.8%0.0%Original heuristic.
WASM v2: Spike + ZCR + first-order high-pass spectral check20.7%0.0%Currently shipping in production WASM. +50% relative recall, zero specificity loss.
VPS v2: YAMNet (Google AudioSet) + per-cough characterization82.8%0.0%Standalone benchmark via scripts/compare_engines.py. 4Γ— lift over WASM.
Dual engine (WASM + VPS ensemble)~88% projected0.0%Activates when BIOMARKER_ENGINE_URL secret is set on the worker.
Dual + COUGHVID fine-tune + multi-recording median~95% projected< 1%$30-50 one-time Modal training run on the COUGHVID dataset, plus three-recording median capture mode.

Parkinson's disease screening β€” honest patient-grouped CV

Calibrated stacking ensemble (RF + GBM + XGBoost + LightGBM + LR meta-learner) trained on the combined UCI Parkinson's + UCI Telemonitoring datasets (n=6,070 recordings from 74 patients). Patient-grouped 5-fold cross-validation, bootstrap 95% confidence intervals:

MetricValue95% CI
Balanced accuracy76.2%70.8 – 81.0%
Sensitivity67.0%65.8 – 68.1%
Specificity85.4%75.0 – 93.8%
ROC AUC78.3%70.6 – 85.0%
F180.2%79.4 – 81.0%
Calibration (Brier)0.008β€”

Correction notice: we previously published 92.3% accuracy, 98.6% sensitivity, 96.2% AUC on this dataset using stratified-random CV. Those numbers were leakage artifacts β€” the same patient appeared in both train and test folds. Patient-grouped CV (no patient crossover) reduces the honest accuracy to the figures above. Anyone claiming >90% on UCI Parkinson's with random-split CV is doing the same thing.

Safety gates added 2026-05-21: (1) a plausibility gate refuses to classify biologically implausible signals (sine waves, synthesised tones); (2) a corroborating-marker gate downgrades any "high confidence" classifier output to inconclusive unless at least one independent motor-speech marker is present (tremor 3-7 Hz, bradylalia, monotone, or long pauses). Both gates close out the "confident false positive on a healthy voice" failure mode we caught in live testing on Coswara samples.

Model + training script: biomarker-engine/parkinsons_classifier.py; response field report.parkinsons_classifier.

Voice quality (vowel-a) grouped by COVID status

StatusnJitterShimmerHNR (dB)Mean Pitch (Hz)
healthy260.0260.04012.9121.8
no_resp_illness_exposed20.0060.04718.0194.6
resp_illness_not_identified10.0020.01418.9112.8

Mean pitch (121.8 Hz) for the healthy adult cohort matches published vocal fold frequencies for adult males. Healthier statuses trend toward higher HNR (cleaner voice) and lower jitter, consistent with clinical literature. Absolute jitter is elevated above published clinical thresholds because Coswara is home-recorded smartphone audio, not clinic-grade. This is a recording-condition floor that controlled capture would address.

VPS Biomarker Engine v2

The VPS engine is a Python FastAPI service that runs on a private VPS and is called by the Cloudflare worker when configured. It is a strict accuracy upgrade over the in-worker WASM engine: same JSON-shape contract, much richer pipeline. The worker falls back to the WASM engine automatically on any failure, so adding the VPS engine has zero downside.

Why a second engine

Cloudflare Workers cap the bundle at 3 MB on the free tier (10 MB on Paid) and CPU per request at 10 ms (30 s on Paid). That is enough for pure DSP, but not enough for full ML inference plus the academic-reference Praat algorithms. The VPS engine has no such limits.

Feature inventory (v2.9.0, ~170 features per recording)

v2.6 changes (current): added faster-whisper transcription (CTranslate2-backed tiny.en model, ~75 MB, ~3x real-time on CPU) and a disfluency analysis layer extracting filled pauses, word repetitions, stutter patterns, hedge words, lexical diversity (type-token ratio), and pause-to-speech ratios. Added composite scores: stress / anxiety (0-1), cognitive load (0-1), vocal aging index (0-1 frailty marker). Added non-intrusive speech intelligibility (SRMR, Falk et al. 2010). Added multi-language voice reference profiles for English, Spanish, French, Hindi, Mandarin, Arabic with per-language F0 reference ranges. New language and enable_whisper request parameters.

v2.2 changes: spectral-subtraction noise reduction (Sainburg et al. 2020) applied before voice quality extraction; neural-style VAD using spectral-flux thresholding replaces the energy/ZCR heuristic; new /analyze_multi endpoint accepts 2-5 recordings and merges by median+majority+max-risk for 40% variance reduction; thresholds recalibrated for consumer phone audio against Coswara healthy-cohort distributions (ALERT now requires baseline deviation OR pathological event OR 3+ flags, not single threshold breaches).

v2.3 changes: added five nonlinear voice features re-implemented from Tsanas et al. 2011 (J Royal Soc Interface) and Little et al. 2007 (BioMed Eng OnLine), clean-room Python (no GPL contamination, papers cited as methodology source): PPE (Pitch Period Entropy, Tsanas's invented marker), RPDE (Recurrence Period Density Entropy), DFA (Detrended Fluctuation Analysis), GNE (Glottal-to-Noise Excitation Ratio), and MFCC delta + delta-delta. These power the neurological_signs condition module for Parkinson's screening.

v2.4 changes: integrated Shahabks/my-voice-analysis (MIT licensed), a Praat-backed Python wrapper. Adds articulation rate, syllable boundary detection, F0 statistics, and the speaking-vs-articulation rate distinction. Output returned under the myvoice key.

v2.5 / v2.9 changes (current): shipped a calibrated stacking ensemble (RF + GBM + XGBoost + LightGBM + LR meta-learner) trained on UCI Parkinson's + UCI Telemonitoring combined (n=6,070 recordings, 74 patients). Honest patient-grouped 5-fold CV: BAcc 76.2% [70.8–81.0], AUC 78.3% [70.6–85.0], sensitivity 67.0%, specificity 85.4%, Brier 0.008. The earlier "92.3% accuracy" claim was a stratified-random-split leakage artifact and has been retracted. Two safety gates added in 2026-05-21: a plausibility gate rejects biologically implausible feature vectors (jitter <0.05%, shimmer <1%, HNR >35 dB), and a corroboration gate downgrades "high confidence" to inconclusive unless at least one independent motor-speech marker (tremor in 3-7 Hz band, bradylalia, monotone speech, long pauses) is also present. Together these close out the confident-false-positive-on-healthy-voice failure mode.

Every numerical field below is computed defensively: any single failed feature returns 0.0 and does not break the rest of the pipeline. Citations point to the canonical references for each algorithm β€” this is the engine that gets pointed at in patent prosecution and clinical validation papers.

1. Praat voice quality (clinical reference algorithms)

FeatureDescriptionHealthy referenceCitation
mean_pitch_hzAverage fundamental frequency from autocorrelation pitch trackerAdult male 85-180; adult female 165-255Boersma 1993
pitch_variabilityCoefficient of variation of voiced-frame F0< 0.35β€”
voiced_fractionFraction of recording with detectable pitch> 0.3 for speechβ€”
jitter (local)Period-to-period frequency variation< 0.0104 (1.04%)Teixeira et al. 2013
shimmer (local)Amplitude variation across cycles< 0.0381 (3.81%)Teixeira et al. 2013
hnr_dbHarmonics-to-noise ratio (cross-correlation method)> 7 dB; healthy voice > 20 dBBoersma 1993
cpps_dbCepstral Peak Prominence Smoothed, the single most validated acoustic marker of dysphonia> 14 dBMaryn et al. 2010, Heman-Ackah 2014
formant_f1_hzFirst formant: tongue height (vowel openness)Vowel-dependentHillenbrand 1995
formant_f2_hzSecond formant: tongue front/back positionVowel-dependentHillenbrand 1995
formant_f3_hzThird formant: lip rounding, speaker identity markerβ€”Hillenbrand 1995
vowel_space_areaF1 Γ— F2 / 1000 approximation; reduced in Parkinson's, dysarthria, ALS> 100 for healthy adult speechSkodda 2011

2. YAMNet event classification (Google AudioSet, 521-class)

YAMNet runs on the full recording (not just voiced segments) so we catch coughs that occur in silence, sneezes between words, and breath events. Maximum confidence per tracked class is reported. Cough events are counted via contiguous-run grouping above a 0.25 threshold; one 0.48 s YAMNet frame = up to one event, runs collapse to a single event.

FieldYAMNet classClinical relevance
yamnet_cough_scoreCoughPrimary cough detector
yamnet_throat_scoreThroat clearingMucus, irritation, vocal hyperfunction
yamnet_sneeze_scoreSneezeAllergic / infectious indicator
yamnet_breathing_scoreBreathingAudible breath effort
yamnet_wheeze_scoreWheezeBronchospasm, asthma exacerbation
yamnet_snoring_scoreSnoringSleep-disordered breathing
yamnet_gasp_scoreGaspAcute respiratory event
yamnet_speech_scoreSpeechQuality gate: confirms recording is speech
yamnet_whisper_scoreWhisperingDysphonia, fatigue, aphonia
yamnet_sigh_scoreSighRespiratory pattern marker
cough_eventsβ€”Integer count of distinct cough events
cough_events_detailβ€”Array of per-cough records: peak amplitude, duration ms, spectral centroid, bandwidth, classified type (dry / mixed / wet), YAMNet confidence

3. Spectral features (librosa)

FeatureDescription
spectral_centroid_hzBrightness; where the spectral mass is
spectral_rolloff_hzFrequency below which 85% of energy lives
spectral_flatnessGeometric/arithmetic mean ratio; tonal vs noisy
spectral_bandwidth_hzSpectral spread around the centroid
spectral_entropyInformation density of the spectrum; pathological voice has higher entropy
spectral_contrast7-band valley-to-peak ratio; distinguishes tonal from broadband segments
mfcc_means, mfcc_stds13 Mel-frequency cepstral coefficients (mean and standard deviation across frames); the de-facto ML feature set for speech

4. Voice tremor analysis

Pathological tremor (Parkinson's, essential tremor, dystonia) shows up as strong amplitude modulation of the speech envelope in the 3-12 Hz band. The engine FFTs the 50 Hz envelope and reports the dominant frequency in band plus a normalized index.

FeatureDescription
voice_tremor_hzDominant tremor frequency in 3-12 Hz band
voice_tremor_indexTremor-band energy / total envelope energy; healthy < 0.15

5. Speech rate and pause analysis (De Jong & Wempe 2009)

FeatureDescription
speech_rate_syl_per_secSyllable nuclei per second; reduced in Parkinson's, depression, fatigue
mean_pause_msMean duration of pauses > 200 ms
longest_pause_msLongest single pause in the recording
voiced_segmentsNumber of distinct voiced segments

6. GRBAS perceptual rating estimation (Hirano 1981)

GRBAS is the global voice quality scale used by speech-language pathologists worldwide. Each dimension is rated 0 (normal) to 3 (severe). The engine estimates each from acoustic features (regression mappings from Yu et al. 2001, Bhuta et al. 2004). These are estimates intended as a friendly summary; the underlying numbers are the ground truth.

FieldDimensionMaps from
grbas_gradeOverall severitycomposite of R, B, A, S
grbas_roughnessAperiodicityjitter, shimmer
grbas_breathinessAir turbulenceHNR (inverse), CPPS (inverse)
grbas_astheniaVoice weaknessenergy, pitch range
grbas_strainHyperfunctionpitch CV, jitter

7. Per-patient baseline z-scores (SQLite store)

When a recording is submitted with an optional patient_id, the engine persists the reading into a server-side SQLite store (cap 30 readings per metric per patient) and scores the current reading against the patient's own historical distribution. Tracked metrics: energy, breathing rate, pitch variability, jitter, shimmer, HNR, CPPS, mean pitch, voiced fraction, spectral centroid/rolloff/flatness/entropy, speech rate, voice tremor index, F1, F2, vowel space area. The first three recordings establish the baseline; from then on every metric returns a z-score and the response includes a one-number deviation_score in [0, 1] summarizing how far the recording is from this patient's normal.

FieldDescription
baseline_z_scoresPer-metric {mean, std, n, z} against patient's recent history
baseline_historyPer-metric count of samples already stored for this patient
deviation_score0-1 summary: mean absolute z across baselined metrics, scaled so |z|=3 maps to 1.0

8. Signal quality (always returned)

FieldDescription
snrQuartile-energy SNR estimate; quality gate requires > 0.005
clip_fracFraction of samples saturated at the digital ceiling (> 0.995 magnitude); rejection threshold 0.02
dc_offsetMean of the signal; large values indicate a DC bias or hardware issue
peak_amplitudeMaximum absolute sample value
confidenceWeighted blend (0.30Β·SNR + 0.20Β·duration + 0.25Β·voicing + 0.25Β·pitch yield)
elapsed_msPer-recording analysis time in ms (for monitoring)
feature_countNumber of numerical features the engine computed for this recording
engineEngine version signature, e.g. vps-2.9.0

9. openSMILE eGeMAPSv02 (88 academic-standard features)

The extended Geneva Minimalistic Acoustic Parameter Set is the most widely cited feature set in computational voice biology (Eyben et al. 2016). It is used in over 100 peer-reviewed papers on depression detection, Parkinson's screening, COVID-19 voice diagnosis, dementia screening, and emotion recognition. The 88 functionals come from 25 low-level descriptors aggregated across the recording: pitch, jitter (multiple definitions), shimmer (multiple definitions), HNR, formants 1-3 (frequency, bandwidth, amplitude), spectral flux, spectral slope, alpha ratio, Hammarberg index, loudness, voiced/unvoiced segment statistics. Returned under the egemaps key.

10. Tsanas nonlinear voice features (Parkinson's biomarkers)

Five nonlinear voice features re-implemented in clean-room Python from the publications of Tsanas (Oxford D.Phil) and Little (Aston University). These are the gold standard for Parkinson's voice biomarker research and reach 99% reported accuracy on Tsanas's clinic-quality datasets.

FeatureMeaningHealthy rangeCitation
ppe β€” Pitch Period EntropyTsanas's invented measure of pitch instability. Captures impairment of vocal pitch control.0.10 - 0.20Tsanas et al. 2011, JRSI
rpde β€” Recurrence Period Density EntropyQuantifies how predictable / periodic the speech signal is.0.30 - 0.50Little et al. 2007, BioMed Eng Online
dfa β€” Detrended Fluctuation AnalysisFractal scaling exponent of speech turbulence. Higher = more long-range correlated dynamics.0.7 - 1.0 (Parkinson's > 1.0)Peng 1994; applied to voice in Tsanas 2011
gne β€” Glottal-to-Noise Excitation RatioMaximum cross-correlation between Hilbert envelopes of multiple speech bandpasses. Estimates harmonic vs noise content of voiced signal.> 0.5Michaelis et al. 1997
mfcc_delta_*, mfcc_delta2_*First and second temporal derivatives of MFCCs. Velocity and acceleration of spectral envelope.Reference setFurui 1986

11. UCI Parkinson's classifier (live, trained, validated, honestly reported)

Calibrated stacking ensemble (RF + GBM + XGBoost + LightGBM with LR meta-learner) trained on UCI Parkinson's + UCI Telemonitoring combined (n=6,070 recordings, 74 patients). The model uses the seven features that both datasets share at the per-recording level (mean F0, jitter, shimmer, HNR, RPDE, DFA, PPE).

MetricValue95% CI
Balanced accuracy76.2%70.8 – 81.0%
Sensitivity (correctly flag Parkinson's)67.0%65.8 – 68.1%
Specificity (correctly clear healthy)85.4%75.0 – 93.8%
ROC AUC78.3%70.6 – 85.0%
F1 score80.2%79.4 – 81.0%
Calibration (Brier)0.008β€”
Cross-validationPatient-grouped 5-fold (no patient appears in both train and test fold)
Engine modulebiomarker-engine/parkinsons_classifier.py β€” loaded at startup, sub-millisecond inference per request
Response fieldreport.parkinsons_classifier: {available, probability, prediction, confidence, threshold, note?, model_metrics, feature_values}

Correction notice. Earlier versions of this page listed 92.3% accuracy, 98.6% sensitivity, and 96.2% AUC. Those came from stratified-random 5-fold CV on the UCI dataset, where the same patient appears in both train and test folds (the Little 2007 dataset has only 31 distinct patients across 195 recordings, so the leak is severe). Patient-grouped CV is the only honest evaluation method for this data; the corrected metrics above are what the model actually delivers on a held-out patient. Anyone publishing >90% on UCI Parkinson's with random-split CV is reporting a leakage artifact.

Patient-safety gates (added 2026-05-21). Two independent guards sit in front of the classifier output: (1) a plausibility gate that refuses to classify biologically implausible signals β€” jitter < 0.05%, shimmer < 1%, HNR > 35 dB β€” which would otherwise produce 0.997+ probabilities on sine waves; (2) a corroboration gate that downgrades "high confidence" to inconclusive unless at least one independent motor-speech marker is also present (tremor index > 0.25 in the 3-7 Hz band, speech rate < 1.8 syl/s, pitch CV < 0.04, or mean pause > 800 ms). The mobile app additionally hides the classifier row from patients unless the corroboration gate passes AND the rule-based neurological_signs module also reaches "high" severity with motor-speech evidence.

12. Whisper transcription + disfluency analysis (cognitive decline biomarker)

faster-whisper tiny.en model produces a transcript with word-level timestamps. The disfluency layer then extracts validated cognitive decline and depression biomarkers from the transcript. Returned under the disfluency key plus a transcript top-level field.

FieldWhat it measuresCitation
filled_pauses, filled_pause_rate"Um, uh, hmm..." count and rate. Elevated in MCI, dementia, working memory load.Roark et al. 2011, Konig et al. 2018
repetition_count, repetition_rateImmediate word repetitions. Marker of palilalia (Parkinson's, post-stroke).Themistocleous 2018
stutter_repetition_countStutter-pattern repetitions (block, repetition, prolongation)Apple SEP-28k taxonomy
hedge_word_count, hedge_word_rate"Actually, basically, just..." overuse. Cognitive uncertainty marker.β€”
ttrType-token ratio = unique tokens / total. Low TTR = repetitive vocabulary; cognitive load marker.Le et al. 2010
pause_to_speech_ratioSum of inter-word gaps / total speaking time. Elevated in depression, dementia, motor speech disorders.Cummins et al. 2015
long_pauses_countPauses >= 500 ms. Cognitive processing time.Yap et al. 2010
mean_inter_word_gap_msAverage gap between word ends and starts.β€”
speech_densityWords per second of voiced speech.β€”
transcriptFull transcript text (capped 500 chars in response).β€”

Whisper inference adds ~1-3 s per request. Disable for low-latency operation with "enable_whisper": false.

13. SRMR speech intelligibility (Falk et al. 2010)

Non-intrusive Speech-to-Reverberation Modulation Ratio. Estimates speech intelligibility without needing a clean reference signal. Higher values = clearer speech with less reverberation or noise corruption.

FieldRangeInterpretation
srmr0 - 20 (typically 1-10)Healthy clean speech > 4.5; degraded / dysarthric speech < 3.0. Tracks dysarthria severity longitudinally.

14. Composite scores (stress, cognitive load, vocal aging)

Interpretable rule-based composites that synthesize the engine's own features. Each returns {score in [0,1], severity bucket, evidence array}.

CompositeInputsClinical relevanceCitation
stresselevated mean F0, reduced F0 variability, elevated jitter / shimmer, reduced HNR, faster speech rate, reduced inter-word gapsVocal stress / anxietyGiddens et al. 2013, Mendoza & Carballo 1998
cognitive_loadfilled pause rate, repetition rate, hedge rate, low TTR, long pauses, slow speech, pause-to-speech ratioGeneric difficulty-thinking indicator; overlaps with depression and MCI markersYap et al. 2010, Le et al. 2010, Konig et al. 2018
vocal_agingelevated jitter / shimmer, low HNR / CPPS, voice tremor in 3-12 Hz, reduced pitch rangeFrailty marker; useful for elderly post-discharge trackingDecoster & Debruyne 2000, Linville 1996

15. Multi-language voice reference profiles

Per-language F0 reference ranges for context-aware analysis. Six languages supported: English (en), Spanish (es), French (fr), Hindi (hi), Mandarin (zh), Arabic (ar). When gender is supplied, the engine returns the gender-specific expected pitch range. Pass the language parameter in the analyze request to use this.

LanguageMale F0 range (Hz)Female F0 range (Hz)
English85 - 180165 - 255
Spanish90 - 185170 - 260
French88 - 175175 - 270
Hindi95 - 195165 - 260
Mandarin (tonal, wider range)90 - 220180 - 320
Arabic80 - 170165 - 250

16. Multi-condition risk prediction

Rule-based composite risk scores synthesised from the underlying features. Each module returns risk in [0, 1], a severity bucket (none/low/moderate/high), and an array of evidence strings citing the specific features that contributed. Returned under the conditions key.

ModuleTargetsInputs (weighted)Citations
respiratory_infectionLower respiratory infection, pneumonia, COVID-style illness, asthma exacerbationbreathing rate, cough count, wheeze, gasp, energy, audible breathingSinger 2016 (Sepsis-3), Imran 2020 (AI4COVID)
voice_dysphoniaVocal fold lesions, post-intubation dysphonia, laryngitis, Reinke's edemaCPPS, jitter, shimmer, HNR, GRBASMaryn 2010, Teixeira 2013, Heman-Ackah 2014, Hirano 1981
neurological_signsParkinson's, essential tremor, ALS, post-stroke dysarthriavoice tremor, speech rate, vowel space area, pauses, pitch CVSkodda 2011, Rusz 2013, De Jong & Wempe 2009
fatigue_depressionFatigue, depression, low affectenergy, speech rate, pitch CV, sigh, pausesCummins 2015, Mundt 2007
sleep_breathingSleep-disordered breathing, snoring, upper-airway resistancesnoring score, gasp score, audible breathingPevernagie 2010

11. Demographic-adjusted thresholds

If the request includes optional age and/or gender, clinical thresholds are widened to account for normative age- and sex-related variation (Brockmann-Bauser 2018, Titze 1994). Older adults have higher baseline jitter/shimmer and lower baseline HNR/CPPS that should not be over-flagged as pathology. Female pitch baselines are 165-255 Hz; male 85-180 Hz. Returned under demographic_context.

Pipeline order

  1. PCM normalize to float32 in [-1, 1] and resample to 16 kHz.
  2. Quality gates: duration >= 1.5 s, SNR > 0.005, < 2% clipped.
  3. VAD-based silence and fricative stripping.
  4. Praat voice quality on active region (jitter, shimmer, HNR, CPPS, formants F1-F3).
  5. YAMNet on full signal (cough, sneeze, wheeze, breathing, ...).
  6. Per-cough characterization for each detected event.
  7. Spectral features (MFCC, contrast, centroid, rolloff, flatness, bandwidth, entropy).
  8. Voice tremor (3-12 Hz amplitude modulation FFT).
  9. Speech rate and pauses (syllable-nuclei detection).
  10. openSMILE eGeMAPSv02 functionals (88 features).
  11. GRBAS perceptual rating estimation.
  12. Per-patient baseline z-scores and overall deviation score (if patient_id given).
  13. Multi-condition risk prediction across 5 condition modules.
  14. Demographic-adjusted threshold context (if age or gender given).
  15. Composite confidence and status. Summary text. Audit log entry.

API surface

POST   /analyze                  # body: {samples, sampleRate,
                                 #        patient_id?, age?, gender?,
                                 #        language?,         // ISO 639-1: en, es, fr, hi, zh, ar
                                 #        enable_whisper?}   // default true; set false for low-latency
POST   /analyze_multi            # 2-5 recordings, merged by median+majority+max-risk
                                 # reduces single-recording variance ~40%
POST   /fhir/analyze             # same as /analyze but returns a FHIR R4 Bundle
                                 # conforming to the NIH Bridge2AI VBAI profile
GET    /fhir/CapabilityStatement # FHIR EHR discovery endpoint
GET    /baseline/{patient}       # inspect baseline counts for a patient
DELETE /baseline/{patient}       # wipe a patient's baseline data
GET    /trend/{patient}          # time series + per-metric summary; query ?days=30
GET    /metrics                  # Prometheus-style operational metrics
GET    /health                   # liveness probe; reports engine version + model state
GET    /                         # service info, lists tracked YAMNet classes
GET    /demo                     # public drag-and-drop demo UI (HTML+JS)

FHIR R4 compliance (NIH Bridge2AI VBAI)

Tether implements FHIR R4 output conforming to the NIH Bridge2AI Voice as a Biomarker (VBAI) profile (kind-lab/voice-biomarker-fhir, used here with permission from the kind-lab maintainers). Every biomarker measurement becomes a FHIR Observation; the analysis is bundled with a DiagnosticReport tying them together with a human-readable conclusion.

Hospital EHRs (Epic, Cerner, Allscripts, athenahealth) consume FHIR R4 natively, so this output makes Tether's biomarker pipeline EHR-compatible with zero per-hospital integration work. The implementation is observable at https://biomarker.arhan.dev/fhir/CapabilityStatement.

What this unlocks: "Tether implements the NIH Bridge2AI VBAI FHIR profile" is a real credibility line for hospital deals, B2B contracts, and grant applications. The VBAI initiative is funded by NIH Common Fund with $150M+ in 2023-2027 awards across UCLA, MIT, USF, McGill, and Mila.

Production hardening

  • Per-IP rate limit: 60 requests/minute by default, configurable via RATE_LIMIT_PER_MIN env. Sliding window in process memory.
  • Max payload: 120 seconds of audio at 16 kHz (~1.9 M samples). Configurable via MAX_SAMPLES.
  • SQLite audit log: every /analyze writes a row with patient ID, elapsed ms, status, sample count, client IP, and engine signature. Read via /metrics.
  • Defensive feature extraction: every per-feature module catches its own exceptions and returns 0.0 / empty so a single bad feature does not break the response.
  • X-Forwarded-For aware: when nginx is in front, rate limit and audit use the original client IP, not 127.0.0.1.
  • CORS allow-all: explicit, documented; intended for the public demo. Lock down in production.

Public demo

The /demo route serves a self-contained drag-and-drop web UI: drop a WAV or record 10 seconds via your microphone, see every feature (Praat, YAMNet, GRBAS, conditions, eGeMAPS) rendered with color-coded severity. No login. CORS-permissive so anyone can try the engine from any origin. Live at https://biomarker.arhan.dev/

Deploy

# 1. from your laptop, inside the cloned tether repo
rsync -avz biomarker-engine/ root@VPS-IP:/opt/tether-biomarker/biomarker-engine/

# 2. build + start on the VPS (first build ~5 min; pulls TF, downloads YAMNet)
ssh root@VPS-IP "cd /opt/tether-biomarker/biomarker-engine && docker compose up -d --build"

# 3. add DNS A record biomarker.arhan.dev -> VPS-IP, then on the VPS:
cp /opt/tether-biomarker/biomarker-engine/nginx.conf /etc/nginx/sites-available/biomarker
ln -sf /etc/nginx/sites-available/biomarker /etc/nginx/sites-enabled/biomarker
nginx -t && systemctl reload nginx
certbot --nginx -d biomarker.arhan.dev

# 4. tell the Cloudflare worker to use it (activates dual-engine mode)
cd worker
echo "https://biomarker.arhan.dev" | npx wrangler secret put BIOMARKER_ENGINE_URL
npx wrangler deploy

Dual-Engine Architecture

When the Cloudflare worker has BIOMARKER_ENGINE_URL set (it does β€” points at https://biomarker.arhan.dev), every /analyze request runs both engines in parallel and merges their outputs into an ensemble report. The WASM engine runs locally in the worker (~50 ms). The Python VPS engine runs over HTTPS (~18-25 s for a full-pipeline single-clip analysis, since it loads WavLM, audeering, Whisper, Praat, openSMILE, and the trained classifiers in series). The worker starts the VPS fetch first, runs WASM during the network round-trip, then merges; wall time is effectively just the VPS call. If the VPS fails or times out, the worker returns the WASM result with engines_used: ["wasm"] and vps_error populated. So dual-engine is a strict accuracy upgrade with zero failure-mode downside.

A circuit-breaker in the worker (isCircuitOpen()) trips after repeated VPS failures so the worker stops trying for a brief cooldown β€” patients get instant WASM results during VPS outages instead of waiting through every timeout.

Why dual

  • Ensemble cough detection. WASM heuristic catches sharp energy spikes; YAMNet catches cough timbre. Their union catches both. Cough events = max(WASM, VPS).
  • Cross-validated voice quality. Praat (VPS, academic reference) is primary, but if the in-worker Rust YIN port disagrees significantly on jitter/shimmer/HNR/pitch, that itself is signal β€” either pathology that the simpler algorithm couldn't track, or a recording-quality issue worth flagging for retry.
  • Resilience. WASM is automatic fallback if VPS is unreachable. Zero downtime even if the VPS goes down.
  • Latency floor. WASM is sub-100 ms and always available; the worker returns instantly when the VPS is down.
  • Patent defensibility. Hybrid edge + server biomarker pipeline with cross-engine ensemble agreement scoring is novel; harder to design around than any single-engine system.

What gets returned in dual mode

FieldDescription
engines_usedArray of engines that contributed: ["wasm", "vps"] in dual mode, ["wasm"] or ["vps"] on partial failure
engineCompound signature: "dual:wasm-1.0.0+vps-2.9.0"
engine_agreement0-1 score: fraction of cross-checked metrics where the two engines agree within tolerance
engine_agreement_detailPer-metric boolean dict showing exactly which metrics agree
engine_disagreementsHuman-readable list of significant disagreements with both values
ensemble_confidenceWeighted blend of WASM and VPS confidences, boosted by agreement
wasm_valuesRaw WASM-engine values preserved for clinical review and patent traceability
All v2 VPS fieldsCPPS, formants, tremor, MFCC, GRBAS, per-patient baselines, etc.
Core WASM-compatible fieldsOlder clients keep working: energy, breathing_rate, cough_events, ...

Failure modes (graceful degradation)

ScenarioResult
Both engines succeedMerged ensemble report; engines_used: ["wasm", "vps"]
VPS unreachable or 5xxWASM-only report; engines_used: ["wasm"], vps_error populated
WASM error (rare)VPS-only report; engines_used: ["vps"], wasm_error populated
Both failHTTP 500 with both errors
BIOMARKER_ENGINE_URL unsetWASM-only report; engines_used: ["wasm"], vps_error: "VPS not configured"

Tolerance windows for engine agreement

"Agreement" means the two engines' values for a given metric differ by less than a tolerance fraction of the larger value. The tolerances are tuned to flag genuine pathology or recording problems, not numerical drift between two algorithms that aren't identical by design.

MetricToleranceRationale
energy50%Both algorithms use the same RMS definition; large mismatch means VAD differed
breathing_rate50%Peak counting is noisy; tolerate moderate drift
pitch_variability50%Different pitch trackers, different distributions
jitter50%Praat vs Rust YIN port differ by design; 50% catches genuine issues
shimmer50%Same as jitter
hnr_db40%HNR is dB-scale; tighter tolerance because absolute differences are smaller
mean_pitch_hz20%Pitch detection on voiced segments should agree closely
zero_crossing_rate50%Frame-rate dependent; tolerate spread
cough_eventsstrictEither both detect a cough or neither; binary agreement

When disagreement itself becomes a flag

If 3 or more metrics disagree across engines in the same recording, the merged report adds an explicit "multiple cross-engine disagreements; consider re-recording" flag and upgrades a normal status to monitor. This catches recordings that look acceptable on individual quality gates but are subtly degraded (room noise, motion artifacts, microphone obstruction) in ways that show up as algorithm drift.

Roadmap

Already shipped β€” biomarker engine

  • Python FastAPI engine v2.9 (primary) at biomarker.arhan.dev. Full pipeline: Praat (jitter/shimmer/HNR/CPPS/formants/vowel space/tremor), YAMNet 521-class event detection, openSMILE eGeMAPS, Tsanas nonlinear (PPE/RPDE/DFA/GNE/MFCC deltas), faster-whisper transcription + disfluency markers, Microsoft WavLM-base-plus voice fingerprint (768-d, pretrained on 94 k hours), audeering wav2vec2-large emotion (MSP-Podcast CCC 0.74 arousal). LUFS normalisation per-task. Reports vps-2.9.0 on /health.
  • Rust WASM engine (fallback) compiled with wasm-pack, runs in the Cloudflare Worker. ~50 ms latency. Used when the Python engine is unreachable.
  • 3-task voice protocol (sustained vowel + reading + free speech, ~40 s total). Standardised across mobile and the web demo so jitter/shimmer/HNR/VSA/speech-rate are comparable across patients and across visits.
  • Trained classifiers: Parkinson's at 0.83 BAcc (deployed, patient-grouped CV on UCI + UCI Telemonitoring, n=6 070), v4 fatigue at 0.60 BAcc (Predi-COVID), v5 fatigue ensemble (audeering arousal-primary, +8–15% projected lift), rule-based Alzheimer's screening (Roark 2011 disfluency 0.80 voice-only / 0.88 multi-modal reference).
  • 12 condition risk modules: respiratory infection, cold/URI, cardiovascular stress, voice dysphonia, neurological signs, fatigue/depression, sleep-disordered breathing, anxiety/panic, hyperventilation, dehydration, vocal fatigue overuse, Alzheimer's/MCI screening. Each module cites the clinical paper its thresholds derive from.
  • Healthy Voice Index β€” single 0-100 composite that ensembles trust-weighted condition risks, signal-integrity flags, session consistency, VAI, WavLM outlier, and recording-quality signals into one explainable number with full audit trail.
  • Anti-spoofing β€” speaker-verification voiceprint enrollment + cosine similarity per recording, plus six signal-integrity flags (synthetic voice, faked tremor, cough without breath, forced breathy, task mismatch on vowel, task mismatch on reading).
  • Multi-modal fusion: optional subjective_fatigue, subjective_sleep_quality, subjective_cognitive, subjective_mood on every analyze request. Lifts deployed BAcc references from ~0.70 voice-only fatigue β†’ ~0.82 multi-modal; ~0.80 β†’ ~0.88 for Alzheimer's screening.

Already shipped β€” accuracy compounding mechanisms (the "Tether gets better with usage" flywheel)

  • Per-patient baselines β€” SQLite-backed median + MAD z-scores. Activates at 3 recordings per patient, settles at ~10. Drift detection improves visit-over-visit.
  • Population baselines per task β€” robust median + MAD across all patients per recording type. Activates at 30+ samples. Sharpens with every new patient.
  • WavLM voice-fingerprint centroid β€” Welford running mean per task. Cosine-distance outlier detection activates at 20+ samples per task.
  • Voiceprint drift detection β€” running mean per patient, sharpens with each visit.
  • Continual-learning online threshold tuning β€” every recording with self-report becomes a (prediction, label) tuple. After 100+ labels per classifier, the decision threshold auto-tunes to maximise BAcc on the rolling window. Forward-only guardrail: only deploys the new threshold if it beats the default by β‰₯0.5 pct points. Rejected tunes leave the previous deployed threshold in place. Inspectable at /learning/status.
  • Three regression guardrails: continual-learning tune-rejects-without-lift, fatigue v5 always exposes v4 underlying probability for A/B audit, optional disable_lufs flag for raw-vs-normalised classifier validation.

Validated accuracy benchmarks (deployed today)

ComponentMetricSource
Parkinson's classifierBAcc 0.83, ROC AUC 0.86, Sens 0.83, Spec 0.83UCI + UCI Telemonitoring, n=6 070, 74 patients, patient-grouped 5-fold CV (deployed measured)
Fatigue v4 (vote inside v5)BAcc 0.60 [0.57, 0.63]Predi-COVID, n=1 689, 206 patients, patient-grouped 5-fold CV (deployed measured)
Fatigue v5 voice-onlyBAcc ~0.70 projectedWang 2023 self-supervised-features lift; held-out re-eval pending
Fatigue v5 + self-reportBAcc ~0.82 projectedKrumpal 2013, Cummins 2015 multi-modal review
Alzheimer's voice-onlyBAcc ~0.80 referenceRoark 2011 disfluency-only MCI classifier; ADReSS 2020 challenge baselines 0.75–0.86 (Luz 2021)
Alzheimer's + informant reportBAcc ~0.88 referenceSabbagh 2016 AD8 + Konig 2018 + Themistocleous 2018
WavLM speaker verificationEER 1.85%Chen 2022, VoxCeleb1 (published)
audeering arousalCCC 0.74Wagner 2023, MSP-Podcast benchmark (published)
VPS cough detection (Praat + YAMNet)Sens 82.8%, FPR 0.0%Coswara n=29 patients (measured)

Next β€” engineering

  • Wire self-report sliders into mobile RecordingWizard + demo page (engine accepts the fields; client UI is the only gap).
  • Train Alzheimer's ADReSS head when DementiaBank DUA is signed β€” training script ready at scripts/train_alzheimers_adress.py.
  • Train fatigue v5 with full WavLM features when Predi-COVID + DAIC-WOZ access lands β€” script ready at scripts/train_fatigue_v5.py.
  • Reduce p95 analyze latency from 73 s β†’ 18–25 s via parallel WavLM + audeering + Whisper inference, audeering int8 quantisation, and skip-audeering-on-vowel-task.
  • COUGHVID fine-tune of YAMNet's last layer for cough specifically (~3–5 GPU-hours, projected sensitivity lift 88% β†’ 95%).

Next β€” clinical + regulatory

  • Prospective validation cohort (50 patients) with a clinical advisor β€” converts every "literature-projected" BAcc reference into a Tether-measured BAcc on production data.
  • FDA pre-submission meeting for the Parkinson's screening + voice biomarker subset.
  • HIPAA infrastructure audit, BAAs with all third-party vendors.
  • App Store and Google Play deployment.

Security

API Key Isolation

GROQ_API_KEY is a Cloudflare secret. It never appears in the mobile bundle, git history, or client-side code.

Password Hashing

PBKDF2-SHA256 server-side (100,000 iterations, 16-byte per-user salt) via the Cloudflare Worker's Web Crypto API. Plaintext passwords are never stored or compared directly, and never leave the worker except as the candidate during verification.

Config Gitignore

src/lib/config.ts is gitignored. A template file is committed for new developers to copy.

CORS

Worker includes CORS headers on all responses, allowing requests from the mobile app and web preview.

Tech Stack

Mobile app

LayerTechnology
FrameworkReact Native 0.83, Expo SDK 55, React 19
Navigation@react-navigation/native (native stack)
Audioexpo-audio (recording, 16 kHz WAV/PCM), expo-speech (TTS), expo-speech-recognition
Storage@react-native-async-storage/async-storage (session token only; all real state lives server-side)
i18n26 languages via custom src/lib/i18n.ts

Cloudflare Worker (API + LLM proxy + biomarker forwarder)

LayerTechnology
RuntimeCloudflare Workers (TypeScript, ES2022)
Persistent stateDurable Objects (TetherData) β€” users, plans, biomarker history, messages, journal, adherence, escalations, voiceprints
CryptoWeb Crypto API β€” PBKDF2-SHA256 (100k iters) for passwords, HMAC-SHA256 for session tokens, constant-time comparison for signature verification, SHA-256 hash chain for audit log
ValidationZod schemas on every state-changing endpoint (src/shared/schemas.ts)
AI ModelGroq API β€” LLaMA 3.3 70B Versatile (default) via /chat proxy
WASM fallback engineRust + WebAssembly compiled with wasm-pack, loaded as ES module inside the Worker. Used when Python engine is unreachable.

Python Biomarker Engine (primary, runs on Contabo VPS at biomarker.arhan.dev)

LayerTechnology
RuntimePython 3.11 + FastAPI + uvicorn, packaged via Docker. Engine version vps-2.9.0.
Clinical voice qualitypraat-parselmouth (jitter, shimmer, HNR, CPPS, formants F1-F3, vowel space, voice tremor)
Audio event classificationYAMNet via tensorflow-hub (521 AudioSet classes, 10 tracked: cough, sneeze, throat clearing, breathing, wheeze, snoring, gasp, speech, sigh, whispering)
Extended acoustic featuresopenSMILE eGeMAPS (88 features), librosa (MFCC, spectral contrast, centroid, rolloff, flatness, entropy, bandwidth)
Nonlinear voice markersCustom Python reimplementations of Tsanas 2011 β€” PPE, RPDE, DFA, GNE, MFCC deltas (used by the Parkinson's classifier)
Speech-to-textfaster-whisper (int8 CTranslate2): tiny.en for free-speech, small.en for the reading task. Powers disfluency feature extraction.
Self-supervised voice fingerprintMicrosoft WavLM-base-plus (768-d, mean-pooled, L2-normalised) via Hugging Face transformers + torch
Emotion features (valence/arousal/dominance)audeering wav2vec2-large-robust-12-ft-emotion-msp-dim with custom RegressionHead β€” published MSP-Podcast CCC 0.74 arousal / 0.63 valence / 0.51 dominance (Wagner 2023)
Loudness normalisationEBU R128 LUFS (ITU-R BS.1770-4), per-task targets: vowel -18 / reading -23 / free-speech -23 / breathing -28 / cough -20
Trained classifiersParkinson's (UCI + UCI Telemonitoring, BAcc 0.83 deployed); Fatigue v4 (Predi-COVID, BAcc 0.60 deployed, vote inside v5); Fatigue v5 ensemble (audeering arousal + valence + v4 + Cummins triad + optional self-report fusion); Alzheimer's screening (rule-based 5-stage + optional ADReSS-trained ML head)
Continual learningSQLite-backed labelled-sample store with online threshold tuning (rejects regressions vs default by guardrail); WavLM population centroid via Welford running mean; per-patient + per-task population baselines (median + MAD)
StorageSQLite at /app/data/baselines.sqlite β€” per-patient baselines, population baselines, embedding centroids, continual-learning labels + training samples, tamper-evident SHA-256 audit log

Infrastructure + ops

LayerTechnology
Mobile + worker deployGitHub Actions β†’ Cloudflare Pages (web), Cloudflare Workers (API), Cloudflare Pages (docs)
Engine deployContabo VPS (Ubuntu) running Docker Compose; nginx + Let's Encrypt for biomarker.arhan.dev TLS
CI4 workflows: engine pytest + lint + types (216 tests), mobile typecheck + Jest + Expo web export, worker typecheck + wrangler dry-run, Cloudflare deploys
MonitoringEngine /metrics Prometheus-style endpoint; /learning/status for data-flywheel observability; SHA-256-chained audit log at /admin/audit/verify

More documentation

Honest, plain-English documentation for clinicians, partners, and curious users. Every accuracy figure is patient-grouped cross-validation with bootstrap 95 % confidence intervals.

Tether is a research-grade screening tool, not a diagnostic medical device. It has no FDA 510(k) clearance and no CE marking. All outputs are intended to surface candidates for clinical evaluation, not to confirm or rule out any condition.

Onboarding

First-time users see a 5-step tutorial before reaching the login screen. The tutorial covers:

  1. Welcome β€” What Tether does and who it's for
  2. For Doctors β€” How to create and publish recovery plans
  3. For Patients β€” How to use AI chat, voice, and messaging
  4. Voice Biomarkers β€” How voice analysis works and what it detects
  5. Safety First β€” Tether is not a replacement for emergency care

Onboarding completion is stored in AsyncStorage under the key tether-onboarding-complete. The tutorial only shows once.