Tether Docs

Plan-Grounded AI Chat

Patients ask questions in plain language. The AI only answers from the doctor's published plan — never guesses.

Voice Biomarkers

Python FastAPI engine runs the 3-task voice protocol — Praat clinical voice quality, YAMNet event detection, WavLM voice fingerprint, audeering emotion, Whisper transcription, plus 12 condition risk modules including Parkinson's, fatigue, and Alzheimer's screening. Rust WASM fallback in the Cloudflare Worker.

Engine Connection

Biomarker results feed into AI context — so "How is my breathing?" gets a real, data-backed answer.

Red Flag Escalation

When the AI detects a red flag symptom, it marks the response urgent and suggests contacting the care team.

Protocol Library

One-click templates for pneumonia, heart failure, COPD, post-surgical recovery, and type-2 diabetes — each with medications, daily steps, and red flags pre-filled. Doctors load a template, edit, publish.

Caregiver Portal

Family members and trusted contacts get a read-only dashboard of the patient's plan, latest biomarkers, and 7-day medication adherence. Patients add caregivers by email — full opt-in consent.

In Plain English

The problem

Roughly one in five patients sent home from a hospital ends up back in the emergency room within 30 days. The main reasons are not surprising: people forget medication doses, miss the early signs that things are going wrong, or do not know whether a symptom is normal recovery or a real warning. Doctors give a printed discharge summary, but it sits in a drawer. Family caregivers want to help but rarely have visibility.

What Tether does

Tether is three things in one app:

A pocket version of the discharge plan. The doctor writes the plan once. The patient sees daily medications, daily activities, red-flag symptoms to watch for, and follow-up dates. Everything is in plain language and can be read out loud in their language.
An AI assistant that only knows the doctor's plan. The patient asks "Can I take Tylenol?" or "Should I worry about this chest pain?" and the assistant answers using only what is in the plan. It never makes up medical advice. If the question matches a red-flag symptom the doctor listed, the answer is marked urgent.
A voice check that listens for trouble. The patient runs a ~40-second 3-task voice protocol (sustained "ahhh", a fixed reading passage, then a short symptom check-in). A Python FastAPI biomarker engine on a private VPS (biomarker.avlynor.com) measures breathing rate, cough patterns, voice fatigue, and clinical voice-quality markers (jitter, shimmer, HNR, CPPS, formants, plus 100+ other features). A Rust→WebAssembly fallback compiled into the Cloudflare Worker keeps the app working if the main engine is unreachable. Numbers track over time, so a small change today against the patient's own baseline can flag a problem the patient does not notice.

Who is in the loop

Patients get the daily guidance and assistant chat.
Doctors see a recovery score dashboard sorted by risk, plus the patient's biomarker trends and adherence history.
Family caregivers get a read-only dashboard if the patient invites them by email. They can see the plan, the last few biomarker readings, and which medication doses were taken.

Why it works without violating privacy

Voice recordings never leave the device except as raw PCM audio sent to the analysis endpoint, and even there they are not stored after analysis. Only the numerical biomarker results are saved. The chat AI runs through a Cloudflare Worker that never sees the patient's account in raw form. Passwords are hashed server-side with PBKDF2-SHA256 (100k iterations, per-user salt) and never persist as plaintext anywhere in the system. Caregiver access is opt-in and revocable by the patient.

What it does not do

It is not a replacement for a doctor. The assistant cannot prescribe, diagnose, or give advice outside the published plan.
It is not a medical device. Current biomarker accuracy is good enough to spot trends and prompt human review, not to make standalone clinical decisions.
It is not a HIPAA-certified product yet. The technical foundation is correct (encryption, no PHI in logs) but the compliance audit and BAAs are part of the funded roadmap.

How It Works

Tether keeps patients and doctors connected after a hospital discharge. Here is the simple version:

1. The doctor creates a recovery plan

Before the patient leaves the hospital, their doctor opens Tether and fills in a personalized care plan: diagnosis, medications, daily instructions, warning signs to watch for, and a follow-up date. The doctor also picks a communication tone (calm, direct, or reassuring) so the app speaks the way the patient is most comfortable with.

2. The patient gets a personal AI companion

When the patient logs in, they see their plan and can ask questions in plain language — by typing or speaking. The AI only answers using information from the doctor's plan, never guessing or making things up. Every response includes a readability score so caregivers can verify the language is easy enough to understand.

3. Voice biomarkers track recovery

The patient runs a quick standardised 3-task voice protocol (~40 seconds): hold "ahhh" for 5 seconds, read a fixed sentence from the Rainbow Passage, then a 10-second symptom check-in. Tether's Python FastAPI engine at biomarker.avlynor.com analyses the audio for breathing rate, cough patterns, vocal energy, voice tremor, articulation precision, and signs of twelve different conditions: respiratory infection, common cold, cardiovascular stress, voice pathology, neurological signs (including Parkinson's-style patterns), fatigue, sleep-disordered breathing, anxiety/panic, hyperventilation, mild dehydration, vocal overuse, and Alzheimer's / MCI screening. The engine also produces a single 0-100 Healthy Voice Index headline number, estimates voice age, and computes cross-task contrasts that no single clip can capture. Patients see results in seconds; doctors see the full audit trail. These biomarkers are tracked over time so the doctor can spot trends without an in-person visit.

4. The two engines talk to each other

This is what makes Tether different. The voice biomarker results are automatically shared with the AI companion. So if the patient asks "How is my breathing?", the AI already knows the latest voice check showed an elevated breathing rate and can give a relevant, grounded answer — not a generic one.

5. The app says "I don't know" when it isn't sure

Every Tether prediction comes with a confidence label. When the model is between "definitely positive" and "definitely negative" — what statisticians call the inconclusive band — the app says so directly: "We couldn't tell from this recording — try a longer sample in a quieter room." Most voice biomarker tools force a yes/no answer even when they're uncertain; Tether is honest about the gray zone. Clinicians trust models that admit doubt.

6. The app catches bad recordings before they confuse anyone

Before analyzing audio, Tether checks the recording itself — is it too quiet? clipping? mostly silence? too much background noise? If the recording isn't usable, the app tells the patient exactly what went wrong ("too much background noise, find a quieter room") and offers a one-tap retry. No more misleading reports based on a recording that was never going to work. There's also a live microphone level meter while you record so you can see your voice reaching the phone in real time.

7. Patients see their voice over time, not just today

Tether's history view shows every previous recording with little trend charts for each measurement (jitter, voice clarity, breathing rate, pitch, energy, and more). Each metric has a direction arrow: green if it's holding steady or improving, amber if it's drifting, red if it's clearly worse than the patient's own baseline. For each tracked condition, a separate trend screen shows the 14-day trajectory of the risk score and flags anything that's been climbing three readings in a row. A snapshot is a parlor trick; a trend is medicine.

8. The patient tags how they feel, not just how they sound

Right after every recording, Tether pops a quick chip selector — Tired? Headache? Cough? Sore throat? Short of breath? Stressed? Just checking in? Patients tap whatever applies (or skip if they're in a hurry) and the tags are saved alongside the acoustic measurement. The doctor reads voice and symptoms together, which is the only way to interpret either responsibly. This also builds Tether's private dataset over time, which becomes invaluable for future model improvements.

9. Share with any doctor — not just Tether ones

Every biomarker report has a "Share PDF" button that generates a clean printable summary — voice quality measurements, classifier confidence, condition risks, and a clinical disclaimer — and pops the phone's share sheet. Patients can email it to their primary-care physician, attach it to their existing electronic health record, or print it for an in-person visit. PDFs are the universal language of healthcare; every clinic can read one.

10. Humans stay in the loop

If the AI cannot fully answer a question, it suggests the patient message their doctor directly. Doctors see these messages in real time and can reply. The AI never replaces the doctor — it bridges the gap between hospital visits so patients are never left guessing alone.

11. Works in the patient's language

Patients can switch between 26 languages — English, Spanish, Hindi, Mandarin, French, Arabic, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Bengali, Urdu, Tagalog, Swahili, Turkish, Polish, Dutch, Greek, Hebrew, Thai, Indonesian, Punjabi, and Ukrainian. The AI responds and speaks in their chosen language, removing a major barrier to understanding medical instructions after discharge.

12. Built to keep working when something breaks

If the Python biomarker engine on biomarker.avlynor.com has a problem, Tether automatically falls back to the Rust WASM engine compiled into the Cloudflare Worker and keeps working — patients never see a broken app. The fallback returns the same JSON shape so downstream consumers don't need to special-case it. If a single screen crashes, the rest of the app is unaffected; only that screen shows a "try again" message. Every recording is content-fingerprinted so the same audio submitted again returns instantly from cache. Every patient's audit log is cryptographically chained, so any tampering with past records is detectable.

13. Real authentication, not just an email field

Every Tether session is an HMAC-signed bearer token issued by the Cloudflare Worker. Passwords are stored with PBKDF2-SHA256, a per-user salt, and 100,000 iterations — they're never readable, even by us. Login attempts are rate-limited and locked after five failures. Old patient-data endpoints used to trust the email in the URL; now every endpoint checks the bearer token, verifies the caller's role, and blocks any patient from reading another patient's data. Doctors see only their assigned patients, caregivers see only patients who have explicitly consented.

14. Doctor's "needs attention today" queue

Doctors don't have to scroll through every patient to figure out who looks worse. The risk queue ranks patients by a composite score that combines the most recent biomarker status, the deviation trend across the last five recordings, missed medications in the last seven days, unread patient messages, days since last recording, and any distress signals in their journal entries. Each entry shows why the patient is flagged — not just a number, but the actual reasons ("voice deviation rising 22% → 47%", "missed 4 of 7 doses this week").

15. Clinical escalations with a real status workflow

When a voice recording crosses into the "alert" band, Tether automatically creates an Escalation row. The doctor moves it through new → reviewed → contacted → resolved (or false_alarm), and every transition adds an optional note plus an audit-log entry. No more "did anyone follow up on Mrs. Garcia's flagged recording from Tuesday?" — the system tracks it.

16. One timeline, every event

Plans, voice recordings, messages, journal entries, medication taken/missed, escalations created and resolved, and clinician notes are all merged into a single chronological feed per patient. A doctor can answer "what's happened with this patient in the last two weeks?" in one scroll instead of tab-hopping through four screens.

17. Pilot dashboard for hospitals

At the end of a pilot, a hospital can pull a concrete one-screen summary: enrolled patients, weekly recordings, alert rate, average clinician response time, day-30 retention, adherence rate, PDF exports, and a 14-day engagement chart. These are the questions that decide whether a pilot becomes a contract.

18. Invite codes, not shared logins

Doctors generate short human-shareable invite codes (ABCD-EFGH) and send them to a patient via SMS, email, or the patient-handoff sheet at discharge. The patient redeems the code at signup, which automatically links them to the right doctor and the right hospital organization. No more "I emailed them a temporary password and hope they remembered to change it."

19. Take your data with you, or delete it

Settings → Export my data downloads a complete JSON bundle of everything Tether has stored about you — voice recordings (metadata + scores), journal entries, messages, care plans, audit logs. Settings → Delete my account permanently removes all of it after you type a confirmation phrase. Both are GDPR data-subject rights; both are baked into the product, not handled by emailing support.

20. Every clinically-relevant action is audited

Who viewed which patient's data, who edited a care plan, who created or resolved an escalation, who exported PDFs, every login, every failed login — all stored as an immutable audit log entry tied to the actor and the target. Compliance-grade by default, not "we'll add that later."

21. No LLM key in the device, ever

Earlier builds of Tether shipped a Groq API key in the mobile bundle as a fallback path. That meant the key was extractable in seconds from any installed copy AND a full care plan + biomarker results + journal entries were going straight to a third-party LLM provider from every device, with no audit and no per-user rate limit. The direct path is now deleted. All LLM traffic goes through the Cloudflare Worker, which holds the provider key as a Workers secret, authenticates the caller's bearer token, audits the request, and only then forwards to Groq. If a build can't reach the worker, it falls back to a local rule-based reply generator — never to a network call.

22. Push, email, and SMS notifications

Clinical workflow only works if the right person gets paged at the right time. Tether now dispatches notifications through Expo Push (mobile), Resend (email), and Twilio (SMS) on four real triggers: a biomarker alert wakes the patient's assigned doctors, a new escalation pages whoever it's assigned to, a third missed-meds day in a week sends the patient a gentle reminder (capped at one nudge per week so it never becomes nag-spam), and an unacknowledged escalation past its SLA deadline auto-escalates to the backup clinician pool and then to the org admins.

23. Escalation SLAs with real workflow closure

Hospitals don't track alerts, they track closure. Every escalation now carries a due time (30 min for urgent, 4 hours for alert, 24 hours for monitor), records when a clinician first opens it, and tracks an escalation level. A background sweep promotes overdue ones from "assigned clinician" → "backup clinician pool" → "org admin" and pages each tier in turn. A doctor on vacation no longer means a missed urgent flag.

24. Building your voice baseline

Tether's trend signals only become meaningful after roughly five recordings — single-sample variance dominates earlier. The patient now sees a "Building your voice baseline: 1/5 recordings complete" progress bar on every recording until the threshold is hit, AND the app holds back alert-level wording on outliers during this window. The data is still collected and the doctor's risk queue still picks it up; we just don't shout at the patient about a possible signal before we have the data to back it up.

25. Recording quality over time

Every accepted AND rejected recording attempt is logged with its quality score and (if rejected) the reason code. A panel under the biomarker card shows the last 20 attempts as a sparkline with pass/fail colours and surfaces the most common rejection reason ("too_noisy" 4× in the last 20). When a patient asks "why did the app keep refusing my recording?" the answer is now one tap away.

26. Real FHIR import (not just counting)

Tether's FHIR R4 import now walks a Bundle and converts four resource types: Patient becomes a stub Tether user (the patient sets their own password on first login), CarePlan becomes a DoctorPlan with conflict-resolution against any existing plan, Observation becomes a biomarker history entry when it's voice-related, and DiagnosticReport becomes a clinician note so the EHR's narrative is preserved. The response reports per-resource counts plus a "skipped" list so hospital integrators can see exactly what landed and what didn't.

27. Pilot CSV export for sponsors

Pilots end with a meeting where someone asks "did this actually work?" The pilot analytics screen now has Export CSV and Share Summary buttons that dump enrolled patients, weekly active, alert rate, average clinician response time, day-30 retention, adherence, PDF exports, and the 14-day recordings curve — flat enough to paste into Excel without reformatting.

28. Outcome entry + password reset + multi-tenant admin

A doctor can record readmission, ED visit, follow-up complete, or engagement check-in events directly from the app (these feed the pilot dashboard's retention and outcome metrics). Anyone can reset their password via a tokenized email link with a one-hour expiry — and resetting the password automatically revokes every existing session for that account. Admins can create organisations, add and remove members, and assign roles, so a single Tether deployment can host multiple hospitals without cross-tenant data bleed.

29. Runtime validation on every endpoint

TypeScript types caught the schema problems at our compile boundary, but the worker still accepted whatever JSON.parse produced from a request body. Now every state-changing endpoint runs the body through a Zod schema before any handler logic. A misshapen field returns HTTP 400 with the exact path and reason. A malicious payload — say {password: {$ne: null}} — never reaches the database layer.

30. Standardised 3-task voice protocol

Every patient now walks through the same three tasks every session: hold "ahhh" for 5 seconds, read a fixed sentence from the Rainbow Passage, then a 10-second symptom check-in. Same prompts every patient, every visit. That's the only way to make jitter, shimmer, HNR, vowel-space-area, and speech-rate comparable across patients and across visits — which is what unlocks real population baselines and real drift detection. The protocol is the default action on biomarker.avlynor.com and the only recording flow on the mobile app.

31. Healthy Voice Index — one number anyone can read

The engine produces dozens of numbers; most viewers want one. The Healthy Voice Index is a 0–100 composite that ensembles trust-weighted condition risks, signal integrity, session consistency, vowel articulation index, embedding outlier distance, and recording-quality signals. Bands: excellent / good / fair / concerning / poor. Every contribution is shown in the audit trail so a clinician can see exactly how the score was assembled — no black box. For the 3-task protocol, a separate session-level HVI aggregates the three per-clip scores via median + consistency bonus so a single noisy clip can't tank the headline.

32. WavLM-base-plus voice embeddings + population centroid

Every recording produces a 768-dimensional voice fingerprint from Microsoft's WavLM-base-plus, a self-supervised speech encoder pretrained on 94,000 hours of speech. The engine maintains a per-task running centroid via Welford's algorithm — every recording sharpens it. Once 20+ recordings exist for a task, every new recording gets a cosine-distance outlier score against the population. This is the data flywheel: every visitor makes the engine smarter for every future visitor, no labels required.

33. audeering wav2vec2 emotion model — fatigue v5

Fatigue used to come from a 42-feature engine-native classifier with 0.55 balanced accuracy on Predi-COVID — better than chance but weak. The v5 ensemble swaps the headline signal to audeering's published wav2vec2-large MSP-DIM arousal score (concordance correlation 0.74 on the MSP-Podcast benchmark, Wagner 2023). Valence acts as a depressive-pattern modulator, the v4 classifier still votes (15% weight), and the Cummins 2015 psychomotor slowing triad confirms. Expected lift over v4: +8–15% BAcc per Wang 2023's self-supervised-feature literature.

34. Alzheimer's / MCI screening

Twelfth condition: cognitive impairment screening from spontaneous speech. Five-stage interpretable ensemble — disfluency burden (Roark 2011), lexical impoverishment (Bucks 2000), affective flattening (Themistocleous 2018, Konig 2018), voice quality (Lopez-de-Ipina 2015), and an optional ADReSS-trained ML head when training data lands. Every threshold cites a published clinical paper. Honestly framed as a screening signal, not a diagnosis: a positive flag prompts clinical follow-up (neuro exam, MRI/PET, CSF), not a label.

35. True vowel-space area + Sapir VAI on the reading task

Because the reading task is a fixed sentence, the engine knows exactly which words the patient is saying. It uses Whisper's word-level timestamps to locate the three corner vowels (/æ/ in "act", /ɪ/ in "prism", /oʊ/ in "rainbow"), extracts F1/F2 from each via Praat, and computes the true triangular vowel-space-area plus Sapir's Vowel Articulation Index — the canonical hypokinetic-dysarthria metrics (Skodda 2011, Sapir 2010, Rusz 2013). Shrunken VSA / depressed VAI is the strongest published voice biomarker for Parkinson's-style speech changes.

36. Cross-task contrasts

When all three protocol tasks are present, the engine computes deltas between them: jitter on vowel minus jitter on reading (laryngeal control under articulatory load), pitch CV on free speech minus pitch CV on reading (spontaneous prosodic range), speech rate on free minus reading (tempo flexibility), and four more. Each delta has a clinical interpretation rule — for example, near-identical pitch CV between free and reading speech is the canonical affective-flattening signature (Cummins 2015). These are signals no single clip can capture.

37. EBU R128 loudness normalisation per task

Phone-mic recordings come in at wildly different volumes; raw amplitude shifts jitter and shimmer estimates by 30-40% just based on how loud the speaker was. The engine now normalises every recording to ITU-R BS.1770-4 LUFS before any feature extraction — and per task, since sustained vowels are naturally louder than connected speech (Sapienza 2011): -18 LUFS for vowel, -23 LUFS for reading and free speech, -28 LUFS for breathing. Features are finally comparable across recordings and patients.

38. Reading-task adherence + sustained-vowel stability checks

If the patient was supposed to read the Rainbow Passage but Whisper transcribed something else (or nothing), the engine flags the recording as non-adherent and downweights all reading-derived features. Same for sustained vowel: a sliding 500-ms pitch and intensity check detects whether the vowel was actually steady or wavering — if not, jitter and shimmer aren't clinically reliable and the score knows. The engine refuses to silently produce garbage from a bad input.

39. Demographic-aware Healthy Voice Index

Healthy 70-year-olds have naturally higher jitter floors, slightly lower HNR, and narrower vowel space than 30-year-olds (Brockmann-Bauser 2018, Stathopoulos 2011). The HVI widens the healthy envelope for older speakers — voice_dysphonia tolerance ×1.5 past age 70, VAI floor shifts down by 0.10 — so age-appropriate variation isn't penalised as pathology. The engine accepts patient age and gender on every request and routes them through every relevant threshold.

40. Two engines that improve over time

Five mechanisms make the engine sharper with every recording, none requiring new labels: per-patient baseline z-scores (activates at 3 recordings per patient), population baseline per task (30 patients per task), the WavLM voice-fingerprint centroid (20 patients per task), voiceprint drift detection (1 enrollment per patient), and cross-task contrast norms (~1,000 patients). The standardised protocol is the enabler — same prompts mean compounding statistics. Trained classifiers (Parkinson's, fatigue, Alzheimer's) improve through a separate labelled-data path: training scripts ship in biomarker-engine/scripts/ ready to run when ADReSS / Predi-COVID / DAIC-WOZ access lands. There is now also a live first-party channel: the public research survey at avlynor.com/survey collects voice + validated self-report (PHQ-8, fatigue, an objective reaction-time / PVT task, plus symptom and demographic covariates) into an owned, anonymous labelled corpus via /contribute. The raw audio is retained (with explicit consent) so features can be re-extracted by the current best pipeline as it improves — every engine upgrade retroactively improves the whole back-catalogue.

Architecture

Tether follows a privacy-first architecture. API keys never ship in the mobile bundle — all LLM requests and biomarker analysis are proxied through a Cloudflare Worker at the edge.

Frontend

React Native + Expo SDK 55 with React Navigation native stack. Runs on iOS, Android, and web.

Backend

Cloudflare Worker proxies all API calls. GROQ_API_KEY stored as a Cloudflare secret, never exposed to the client.

Python FastAPI Engine (primary)

Full biomarker pipeline at biomarker.avlynor.com: Praat voice quality, YAMNet, openSMILE eGeMAPS, Whisper, WavLM, audeering emotion, Tsanas nonlinear, trained Parkinson's + fatigue classifiers, rule-based Alzheimer's screening, per-patient + population baselines, Healthy Voice Index.

Rust WASM (fallback)

Lightweight biomarker engine compiled to WebAssembly via wasm-pack, runs inside the Cloudflare Worker for ~50 ms edge-speed signal processing. Used as fallback when the Python engine is unreachable.

LLM

Groq API with LLaMA 3.3 70B. Graceful fallback chain: Worker → direct → keyword matching.

Quickstart

Prerequisites

Node.js 22+ (wrangler 4.x requires it)
Expo CLI (npm install -g expo-cli)
iOS Simulator (Xcode) or Android Emulator
Python 3.11+ + Docker (only needed if developing the biomarker engine locally; production runs on the Contabo VPS)
Rust + wasm-pack (only needed if developing the WASM fallback engine)

Setup (mobile app)

git clone https://github.com/ArhanCodes/tether.git
cd tether
npm install --legacy-peer-deps
cp src/lib/config.template.ts src/lib/config.ts
npm run ios

That's it. The config template comes pre-configured with the shared Tether worker URL — no API keys or environment variables needed on the client. The Groq key and the BIOMARKER_ENGINE_URL live on the Cloudflare Worker as secrets and are never exposed to the client.

Web preview: Run npx expo start --web instead to open in a browser.

Worker Setup

# Deploy the Cloudflare Worker (the API + LLM proxy + biomarker forwarder)
cd worker
npm install
npx wrangler secret put GROQ_API_KEY          # for /chat endpoint
npx wrangler secret put SESSION_HMAC_KEY      # for HMAC-signed bearer tokens
npx wrangler secret put BIOMARKER_ENGINE_URL  # points at biomarker.avlynor.com (or your own engine)
npx wrangler deploy

Biomarker engine setup (only for self-hosting)

# The production engine runs at https://biomarker.avlynor.com on a Contabo VPS.
# To run your own copy:
cd biomarker-engine
./install.sh   # docker-compose up, fetches YAMNet + WavLM + Whisper + audeering
# Engine then listens on 127.0.0.1:8765
# Point a public domain via nginx + letsencrypt; set BIOMARKER_ENGINE_URL accordingly

Features

Auth

Login / signup with role selection (doctor or patient)
Passwords hashed with PBKDF2-SHA256 server-side (100,000 iterations, 16-byte per-user salt) — never readable by us
Sessions are HMAC-signed bearer tokens with server-side revocation (the worker can kick any session by deleting it from the Durable Object's session list)
Failed-login rate limiting + lockout (5 attempts in 5 minutes → 15-minute lockout)
Role-based access control — patients can only see their own data, doctors only see assigned patients, caregivers need explicit patient consent
Terms/privacy consent on signup

Doctor Workspace

Create/edit patient recovery plans (diagnosis, vitals, meds, instructions, red flags, follow-up)
Set AI tone (calm, direct, reassuring)
Publish plans to a specific patient email (validates account exists)
Draft auto-saves locally
View and reply to patient messages

Patient Companion

View the recovery plan assigned to your email
Vitals summary, daily instructions, red flags
AI chat powered by Groq with keyword-matching fallback
Quick prompt buttons ("What should I do today?", "When should I call?", etc.)
Voice input via speech recognition
Voice output (text-to-speech on AI replies, toggleable)
Urgency badges on AI responses (routine / contact clinician / urgent)
Flesch-Kincaid readability score on every AI response (grade level badge)
Handoff suggestion when AI can't fully answer
Direct messaging to doctor (real-time via Durable Objects)
Multilingual support (26 languages — see § "Works in the patient's language" above for the full list)
Voice biomarker analysis (breathing rate, cough detection, vocal tremor, voice energy)
Biomarker status levels (normal / monitor / alert) with alert popup
Biomarker trending — historical chart showing trends over time
Engine connection — biomarker data injected into AI context automatically
Patient Journal — daily journal entries that feed into AI context for more personalized responses
Medication Adherence Tracker — daily yes/no medication logging with 7-day streak visualization
Time-aware prompting — AI adapts advice based on days since discharge (early/mid/extended recovery)

Doctor Workspace (continued)

Discharge date — set per patient to enable time-aware recovery guidance
Recovery Score Dashboard — composite 0-100 score per patient (biomarker + adherence + engagement + journal), sorted by risk

Onboarding

5-step tutorial on first launch (welcome, doctors, patients, voice biomarkers, safety)
Skip button and dot indicators
Only shows once (stored in AsyncStorage)

Infrastructure

Cloudflare Worker proxy — API key stays server-side, never ships in the app
Durable Objects backend — accounts, plans, messages, biomarker history persist across devices
Python FastAPI biomarker engine runs on a private VPS (biomarker.avlynor.com); Rust WASM fallback compiled into the Cloudflare Worker
AI requests routed through worker, falls back to direct Groq, then keyword matching

Authentication

Users sign up with a role (Doctor or Patient) and are routed to the appropriate workspace after login. Sessions persist across app restarts via AsyncStorage.

Password hashing: PBKDF2-SHA256 server-side, 100,000 iterations, per-user 16-byte salt. Hashes live in the Durable Object; plaintext is never stored or transmitted anywhere except over TLS on signup/login
Sessions: HMAC-SHA256-signed bearer tokens (payload.signature, base64url), validated against the Durable Object's session list so any session can be revoked server-side
Rate limiting: 5 failed logins in 5 minutes triggers a 15-minute lockout per email
RBAC: Every authenticated endpoint enforces who can read/write which patient. Patients see their own data only. Doctors see their assigned patients. Caregivers need an explicit consent record
Audit log: Every clinically-relevant action (data view, plan edit, escalation open/close, export, delete, login attempt) is recorded with actor, target, IP, user-agent, and timestamp
Invite codes: Doctors generate short codes that patients redeem at signup to auto-link to the right care team

Doctor Workspace

Doctors create, edit, and publish recovery plans for specific patients. Plans are the foundation of the entire patient experience — the AI, the UI, and the messaging system all derive from the published plan.

Plan Fields

Field	Description
Patient Name & Email	Must match a registered patient account
Diagnosis	Primary condition (e.g. post-discharge pneumonia)
Vitals	Heart rate, blood pressure, temperature, O2 saturation
Medications	Name, dosage, and frequency (one per line)
Daily Instructions	What the patient should do each day
Red Flags	Symptoms that require immediate medical attention
Follow-up	Next appointment or scheduled check-in
Tone	Calm, Direct, or Reassuring — controls AI personality
Doctor Notes	Private instructions for how AI should phrase answers

Messaging

Doctors see all patient message threads, sorted by most recent. They can select a thread and reply directly. When a patient sends a message (or the AI suggests a handoff), it appears here.

Patient Companion

The patient screen surfaces the published recovery plan and provides multiple channels for getting help: AI chat, voice input, quick prompts, biomarker analysis, and direct doctor messaging.

Care Plan Display

Vitals, daily instructions, medications, and red flags — all from the doctor's published plan.

AI Chat

Text or voice questions answered by LLaMA 3.3, constrained to the care plan. Includes urgency badges and handoff suggestions.

Voice Biomarkers

Patient runs the 3-task voice protocol (~40 s). Python FastAPI engine on biomarker.avlynor.com analyzes 100+ features and returns a Healthy Voice Index, with a Rust WASM fallback in the Worker.

Doctor Messaging

Direct messaging channel for when AI isn't enough. The AI can auto-suggest using this when it lacks certainty.

Patient Journal

Write daily entries about how you feel. Recent entries are injected into the AI prompt so responses reflect your current emotional and physical state.

Medication Tracker

Log daily medication adherence with a simple yes/no. A 7-day streak visualization shows your compliance at a glance.

Caregiver Portal

Adult children of elderly patients, partners, and family members often need visibility into post-discharge recovery without being clinical providers. The caregiver portal is a third login type that gives trusted contacts a read-only dashboard for any patient who explicitly links them.

How linking works

The caregiver creates a Tether account with the caregiver role at sign-up.
The patient adds the caregiver's email to their account → triggers POST /api/caregiver/link.
The caregiver logs in and sees a dashboard of every patient who linked them.
Either side can revoke the link at any time.

What the caregiver sees

Latest published plan

Diagnosis, doctor name, last-updated timestamp. Tap through for full medications, instructions, and red flags.

Recent voice biomarkers

The last 10 readings with status dots — green / amber / red — for at-a-glance monitoring of breathing trends.

7-day adherence

A pill-grid showing which days the patient took their medication. Missed days highlighted in red.

Privacy model

Caregivers can read but cannot send messages, edit plans, or post journal entries on the patient's behalf. The patient remains the data owner — every link is opt-in and removable. The doctor is not notified of caregiver links by default; the patient controls who sees what.

Data flows

GET /api/caregiver/patients?email=<caregiver-email>
→ [
    {
      patientEmail, patientName,
      latestPlan,
      recentBiomarkers,
      recentAdherence
    },
    ...
  ]

Protocol Library

Doctors don't write a recovery plan from scratch every time. The protocol library ships five clinically-grounded templates, each one a complete DoctorPlan shape — diagnosis text, medications with dosing, daily instructions, red flags, follow-up timing, and recommended tone.

Included templates (v1)

Post-discharge Pneumonia

ICD-10 J18.9. Amoxicillin + inhaler regimen, breathing-focused red flags, GP follow-up in 3 days.

Heart Failure (CHF)

ICD-10 I50.9. Furosemide + lisinopril + carvedilol, daily weight check (the single most important early warning), cardiology follow-up in 7 days.

COPD Exacerbation

ICD-10 J44.1. Tiotropium + rescue inhaler + 5-day prednisolone + 7-day doxycycline, oximeter-based red flags.

Post-surgical Recovery

ICD-10 Z48.815. Pain-control regimen, DVT prevention with enoxaparin, wound-care daily steps, 6-week lifting restriction.

Type-2 Diabetes (new diagnosis)

ICD-10 E11.9. Metformin titration schedule, atorvastatin, glucose-target ranges, plate-method dietary guidance.

How a doctor uses it

Open the Doctor Workspace → "Publish Patient Plan" section.
Click any protocol chip — fields auto-fill with the template defaults.
Edit anything that's patient-specific (medications, follow-up timing, tone).
Add the patient's name and email → publish.

Why this matters

A solo physician can publish 5–10 plans per evening with the protocol library, vs. 1–2 from scratch. More importantly: the templates encode best-practice red flags ("weight gain >1 kg in a day" for CHF, "rescue inhaler more than every 4 hours" for COPD) that an under-the-gun doctor might forget to write. The templates are clinically reviewable and version-controlled in src/lib/protocols.ts.

Extending

Adding a new condition is one object in the PROTOCOL_TEMPLATES array — the UI picks it up automatically. The schema is { id, label, emoji, conditionICD10, defaults }, where defaults is a Partial<DoctorPlan>.

AI Chat System

The AI is powered by Groq's LLaMA 3.3 70B model, accessed through a Cloudflare Worker proxy. Every response is grounded in the doctor's published care plan.

System Prompt

A dynamic system prompt is built from the care plan that includes the patient's diagnosis, medications, instructions, red flags, and the doctor's preferred tone. The AI is instructed to:

Only answer from documented care plan data
Flag red-flag symptoms as "urgent"
Suggest messaging the doctor when information is missing
Return structured JSON with message, urgency, supporting points, and handoff flag

Response Urgency Levels

Level	Meaning	UI Treatment
`routine`	Normal informational response	Blue badge
`contact-clinician`	AI suggests speaking with doctor	Yellow badge
`urgent`	Red flag symptom detected	Red badge + escalation banner

Fallback Chain

1. Cloudflare Worker → Groq API (primary)
2. Direct Groq API call (if worker fails)
3. Keyword matching (if no API configured)

Safety: The AI never diagnoses, prescribes, or advises outside the doctor's documented scope. Emergency symptoms always trigger an urgent flag with instructions to seek immediate care.

Voice Biomarkers

Tether's biomarker system records a short voice sample from the patient using the standardized 3-task protocol (sustained vowel + reading + free speech, ~40 s total) and sends it to a Python FastAPI engine on biomarker.avlynor.com for full clinical-grade signal processing. A Rust WebAssembly engine compiled into the Cloudflare Worker runs in parallel as a fallback so the patient never sees a broken app if the Python engine is unreachable.

How It Works

Patient taps "Start 3-task voice protocol" — expo-audio begins recording in WAV/PCM at 16 kHz
Wizard walks the patient through 5 s of sustained "ahhh", a fixed sentence from the Rainbow Passage, and a 10 s symptom check-in
PCM samples + per-clip recording_type tags sent to the Cloudflare Worker's /api/biomarkers
Worker forwards to the Python engine's /analyze_multi (or /analyze for single clips)
Python engine runs the full pipeline (Praat voice quality, YAMNet event detection, openSMILE eGeMAPS, Whisper transcription + disfluency, WavLM voice fingerprint, audeering valence/arousal/dominance, Tsanas nonlinear PD markers, per-patient baselines, population baselines, signal-integrity anti-spoofing, Healthy Voice Index)
If the Python engine fails or times out, the worker falls back to the Rust WASM engine which returns a basic BiomarkerReport with energy/breathing/jitter/shimmer/HNR — same JSON contract, much narrower feature set
Results displayed as a card with status badge plus the Healthy Voice Index headline (0-100, banded)
Report saved to Durable Objects for longitudinal trending and per-patient baseline accumulation

The two engines

Python FastAPI engine (primary) — runs on biomarker.avlynor.com. Pipeline includes Praat clinical voice quality (jitter, shimmer, HNR, CPPS, formants), YAMNet 521-class audio event classifier, openSMILE eGeMAPS (88 features), Whisper transcription + disfluency markers, WavLM-base-plus 768-d voice fingerprint (Microsoft, 94k-hour pretraining), audeering wav2vec2-large emotion model (MSP-Podcast benchmark), Tsanas nonlinear markers (PPE, RPDE, DFA, GNE), per-patient SQLite baseline store, population baselines per task, signal-integrity flags, trained Parkinson's classifier (0.83 BAcc on UCI + Telemonitoring (patient-grouped CV)), trained fatigue v5 ensemble, rule-based Alzheimer's screening, EBU R128 LUFS normalization, demographic-aware Healthy Voice Index. Latency 18-25 s per clip depending on enabled features. Live at https://biomarker.avlynor.com.
Rust WASM engine (fallback) — compiled with wasm-pack, runs in-Worker. Extracts energy, breathing rate, pitch variability, cough events, zero-crossing rate, jitter, shimmer, HNR, mean pitch, voiced fraction, plus a confidence score. Returns a strictly narrower JSON shape with the same field names so downstream consumers don't need to special-case the fallback. Latency ~50 ms. (CPPS, formants, and VSA are Praat-only and live in the Python engine.)

Biomarker Trending

Every biomarker report is stored server-side with a timestamp. The patient's biomarker card shows a trend view of the last 10 readings with bar charts for breathing rate, voice energy, and cough events. Alert/monitor/normal counts are summarized as colored pills. This turns a single snapshot into a longitudinal monitoring system that can detect deterioration over days.

Clinical Voice Quality Card

Below the core metrics the card surfaces the clinical voice quality section: Mean Pitch (Hz), Jitter %, Shimmer %, and HNR (dB), each annotated with the healthy reference range. These are the same metrics used by Praat (the academic reference tool for voice biology). The section appears only when the engine successfully extracted enough voiced cycles, so it does not show on whisper-only or breath-only recordings.

Engine Connection

Tether's two AI engines — NLP (Groq LLM, proxied through the Cloudflare Worker) and Bio-Acoustic (Python FastAPI engine on biomarker.avlynor.com with a Rust WASM fallback inside the Worker) — share context automatically:

The latest biomarker report (including confidence score and all 5 metrics) is injected into the AI system prompt before every chat request
When the patient asks "how am I doing?", the AI references actual biomarker readings (breathing rate, cough events, energy levels, zero-crossing rate)
If biomarkers are in "alert" status, the AI proactively warns the patient and recommends contacting their care team
The AI knows the analysis confidence level and can qualify its answers accordingly ("Your latest voice check had moderate confidence — consider recording again in a quieter space")
One engine listens to the body, the other explains what it means in plain language

Automatic Alert Escalation

When a biomarker recording returns alert status (2+ flags), Tether automatically sends a care message to the assigned doctor — no patient action needed. The message includes:

Full biomarker summary with actual values and normal ranges
Confidence score for the analysis
A note that the message was sent automatically by the biomarker system

The patient sees "Health Alert — Doctor Notified" confirming the escalation happened. This means a patient could record a voice check, trigger an alert, and their doctor sees it in their inbox within seconds — all without the patient needing to understand or act on the medical data themselves.

Readability Scoring

Every AI response is scored using the Flesch-Kincaid Grade Level formula. A badge on each message shows the grade level (e.g., "Grade 4.2 - Very Easy"). This proves the health literacy claim with data:

Grade 0-5: Very Easy — 5th grader can understand
Grade 6-8: Easy — middle school level
Grade 9-12: Moderate — high school level
Grade 13+: Complex — college level (AI is prompted to stay below 6)

Patient Journal

Patients can write daily journal entries describing how they feel. This serves two purposes:

Patient self-reflection: Writing about symptoms, mood, and progress helps patients track their own recovery
AI context enrichment: The 3 most recent journal entries are injected into the AI system prompt, allowing responses to account for the patient's current emotional and physical state

Entries are stored server-side via Durable Objects (max 100 per patient, 2000 character limit). The patient sees their entries in reverse chronological order. The journal also contributes to the Recovery Score (up to 20 points).

Medication Adherence Tracker

A simple daily check-in that asks patients: "Did you take all your medicines today?" with Yes/No buttons.

One log per day: Duplicate entries for the same day are prevented
7-day streak: Colored dots show recent adherence (green = taken, red = missed)
AI awareness: Adherence records are injected into the AI prompt — if the patient has missed 2+ days, the AI gently reminds them about medication importance
Recovery Score input: Adherence contributes up to 30 points to the composite score

Time-aware Prompting

Doctors can set a discharge date on each patient's plan. The AI system prompt then calculates days since discharge and adjusts its approach:

Phase	Days	AI Behavior
Early recovery	0-3	Extra cautious, encourages rest and monitoring
Mid recovery	4-14	Encourages gradual activity and adherence
Extended recovery	15+	Focuses on long-term habits and follow-up

A "Day X since discharge" badge appears on the patient's journal section for awareness.

Recovery Score

A composite 0-100 score calculated per patient, visible to doctors on their workspace. Patients are sorted lowest-first so the most at-risk patients get attention first.

Scoring Breakdown

Component	Max Points	Source
Biomarker Health	30	Ratio of normal/monitor/alert readings in recent biomarker history
Medication Adherence	30	Proportion of "taken" days in the last 7 days
Communication Engagement	20	Patient messages sent in the last 7 days (capped at 4)
Journal Activity	20	Journal entries written in the last 7 days (capped at 4)

Risk Levels

0-39: At Risk — needs immediate attention
40-69: Recovering — progressing but needs monitoring
70-100: On Track — recovery going well

Multilingual Support

Patients can select their preferred language from 26 options: English, Spanish, Hindi, Mandarin, French, Arabic, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Bengali, Urdu, Tagalog, Swahili, Turkish, Polish, Dutch, Greek, Hebrew, Thai, Indonesian, Punjabi, and Ukrainian. The language preference is stored server-side and affects:

AI chat responses — the system prompt instructs the LLM to respond in the selected language at a 5th grade reading level
Voice output — text-to-speech uses the correct language code via expo-speech
The biomarker engine receives the language code in its /analyze payload so Whisper transcription picks the right multilingual model variant (English clips use tiny.en for free_speech / small.en for reading; non-English clips use the multilingual tiny)
The setting persists across devices via Durable Objects

Cloudflare Worker

The Worker is the secure API proxy + data backend. Lives at tether-api.arhan-harchandani.workers.dev. It exposes auth, app data, the AI proxy, and the biomarker forwarder. Every state-changing endpoint is HMAC-bearer-token authenticated and the request body is validated against a Zod schema before any handler runs.

API Endpoints

Endpoint	Method	Description
`/chat`	POST	Forwards chat messages to Groq API with the GROQ_API_KEY secret. Default model is `llama-3.3-70b-versatile`.
`/api/signup`	POST	Create a new account (name, email, password, role). Password hashed with PBKDF2-SHA256, 100,000 iterations, per-user salt. Returns an HMAC-signed bearer token.
`/api/login`	POST	Authenticate and return an HMAC bearer token + user profile.
`/api/plans`	GET/POST	Retrieve or publish doctor care plans (doctor RBAC enforced).
`/api/messages`	GET/POST	Doctor-patient messaging thread.
`/api/biomarkers`	POST	Receives PCM audio samples from the mobile app or web demo. Forwards to the Python engine at `BIOMARKER_ENGINE_URL` (defaults to `https://biomarker.avlynor.com`); falls back to the in-Worker Rust WASM engine if the Python engine is unreachable or times out. Also runs voiceprint enrollment + speaker-similarity check before persisting. Returns the merged BiomarkerReport.
`/api/biomarkers?email=…`	GET	Retrieve a patient's biomarker history. Doctor RBAC required if the email is not the caller's own.
`/api/user/language`	POST	Update patient language preference (one of 26 supported languages).
`/api/users`	GET	List users (admin RBAC; password hashes never returned).
`/api/journal`	GET/POST	Patient journal entries (max 100 per patient, 2000 char limit per entry).
`/api/adherence`	GET/POST	Daily medication adherence records (upserts by patient + date).
`/api/recovery-score`	GET	Composite recovery scores for a doctor's patients, sorted by risk. Combines biomarker, adherence, engagement, and journal sub-scores.
`/api/escalations`	GET/POST	Clinical escalation rows with status workflow (`new → reviewed → contacted → resolved`).

Biomarker engine endpoints (called via the Worker's `/api/biomarkers`)

Endpoint	Method	Description
`/analyze`	POST	Single-clip analysis. Accepts `samples`, `sampleRate`, `recording_type`, optional `patient_id` (per-patient baseline), `age`/`gender` (demographic-matched norms), `subjective_*` self-report, `enable_whisper` (transcript), `threshold_mode`, `disable_lufs`. Returns the full BiomarkerReport, including `result_confidence` (calibrated overall confidence + an `inconclusive`/abstain flag).
`/analyze_multi`	POST	Multi-clip 3-task protocol. Accepts `recordings` (2-5 clips) + `recording_types`. Returns session-level Healthy Voice Index, cross-task contrasts, consistency score, `result_confidence`, per-task `transcripts`, `baseline_z_scores` + `reading_adherence`, plus the per-clip reports merged via median + majority vote + max-risk.
`/contribute`	POST	Public research data-collection (the survey at `avlynor.com/survey`). Accepts the 3-task `recordings` + validated survey labels + demographics + `consent`. Saves the raw audio + labels as one anonymous training row (no name/email/identifier) and fsyncs to disk before responding — features are extracted later in batch, so the request is fast and nothing is silently dropped. No API key; per-IP rate limited.
`/contribute/stats`	GET	Public count of survey contributions collected so far.
`/health`	GET	Liveness probe. Returns engine version + YAMNet load status.
`/version`	GET	Full model metadata: balanced accuracy, ROC AUC, training dataset, n_samples, n_patients, threshold, calibration brier, and 95% CIs for every loaded trained classifier.
`/learning/status`	GET	Public flywheel snapshot. Shows population-baseline sample counts per task, WavLM centroid counts per task, label counts per classifier, current auto-tuned thresholds with their lift-vs-default measurements.
`/baseline/{patient_id}`	GET/DELETE	Per-patient SQLite baseline counts; DELETE wipes the patient's history. Admin auth required.
`/trend/{patient_id}`	GET	N-day time series + per-metric trend direction (up / down / stable). Used by the doctor view for longitudinal charts.
`/fhir/analyze`	POST	Same analysis as `/analyze` but returns a FHIR R4 Bundle of Observations + DiagnosticReport for EHR integration (NIH Bridge2AI VBAI profile).
`/admin/retrain`	POST	Admin-only. Exports accumulated self-report-labelled samples as JSONL ready to feed the offline training scripts. Gated at 500 samples per classifier.
`/admin/keys`	POST/GET	Admin-only: mint scoped B2B API keys, list issued keys. Foundation for self-serve voice-biomarker-as-a-service.
`/admin/keys/{key_id}`	DELETE	Admin-only: revoke a key by id.
`/admin/audit/verify`	GET	Verifies the SHA-256 audit-chain integrity end-to-end. Returns OK or the first tampered row.
`/cache/stats`	GET	Ops-internal: content-fingerprint cache hit/miss counters and entry count.
`/cache/clear`	POST	Ops-internal: wipes the per-content cache. Use after model retraining.
`/metrics`	GET	Prometheus-style text exposition for ops monitoring.
`/version`	GET	Per-classifier model metadata: balanced accuracy, ROC AUC, dataset, n_samples, n_patients, threshold, Brier, 95% CIs. Read this in production to verify which model is deployed.

Using the API

The biomarker engine at biomarker.avlynor.com is a REST API you can call from any environment — browser, server, mobile, cURL, n8n, Zapier, Postman. Every analysis call needs an API key. Generate one below in two clicks (free, instant, no account), then send it as the X-API-Key header on every request.

Why a key? Each /analyze call runs the full WavLM + audeering + Whisper + Praat pipeline (~18–25 s of CPU). A key lets us meter usage fairly, keep capacity available for clinical pilots, and shut off a key if it's abused. Free tier: 20 requests/min · 500 calls/day. Need more? Email arhan.harchandani@gmail.com.

Generate an API key

Email * Organisation / project (optional)

Your email is recorded as the key owner so we can contact you or raise your limit. We don't email you otherwise.

Try it live

Paste a key (or generate one above), pick an endpoint, and fire a real request against the production engine — right here.

X-API-Key Endpoint

POST /analyze sends a synthetic 2-second 140 Hz tone so you can see the full request → response shape without a microphone. Real audio (your mic, a WAV) gives a real reading — see the recipes below or the /demo.

The five-line cURL test

Confirm the engine is alive (no key needed for /health):

$ curl https://biomarker.avlynor.com/health
{"ok":true,"uptime_sec":672993,"engine":"vps-2.9.0","yamnet_loaded":true}

One-clip analysis

POST /analyze takes a JSON body with raw PCM samples as a float array in [-1, 1] at 16 kHz mono. Pass your key in the X-API-Key header:

curl -X POST https://biomarker.avlynor.com/analyze \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY_HERE" \
  -d '{
    "samples":      [0.001, 0.003, -0.002, ...],   // float32 PCM, [-1, 1]
    "sampleRate":   16000,
    "recording_type": "sustained_vowel",            // or reading / free_speech / breathing / cough
    "patient_id":   "p-1043",                       // optional; enables per-patient baselines
    "age":          68,                              // optional; enables demographic-adjusted thresholds
    "gender":       "F",                             // optional
    "language":     "en"                              // optional; ISO 639-1 code; default "en"
  }'

Returns a JSON document with ~170 fields including:

jitter, shimmer, hnr_db, cpps, f0_mean_hz, formants.f1/f2/f3 — Praat clinical voice quality
healthy_voice_index — 0-100 composite score
conditions.*.severity — 12 condition risk modules with severity bucket (none / low / moderate / high) + evidence array
conditions.*.risk — 0-1 risk score per module
parkinsons_classifier — probability + prediction (gated by plausibility + corroboration guards)
fatigue_classifier — probability (with v4 shadow + v5 ensemble metadata)
alzheimers_classifier — probability + voice-only / multi-modal mode
signal_integrity_flags — array of anti-spoofing checks that fired
baseline_z_scores — per-feature z-scores against patient baseline (when patient_id + history available)
result_confidence — calibrated overall confidence (0-1) + band + inconclusive flag + reasons; the engine abstains rather than guess when voiced-speech quality, cross-clip agreement, or task adherence are poor
transcript — Whisper transcription of the speech (when enable_whisper is set; reading + free-speech tasks)
recording_quality — SNR, clipping, LUFS, duration, voiced fraction
summary — human-readable single-paragraph summary

The 3-task voice protocol

POST /analyze_multi wraps the canonical clinical protocol: sustained vowel + reading + free speech, in any order, merged into a single report with cross-task contrast features and a session-level Healthy Voice Index:

curl -X POST https://biomarker.avlynor.com/analyze_multi \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY_HERE" \
  -d '{
    "recordings": [
      { "samples": [...], "sampleRate": 16000, "recording_type": "sustained_vowel" },
      { "samples": [...], "sampleRate": 16000, "recording_type": "reading" },
      { "samples": [...], "sampleRate": 16000, "recording_type": "free_speech" }
    ],
    "patient_id": "p-1043",
    "age": 68,
    "gender": "F"
  }'

Optional: multi-modal self-report fusion

Lift fatigue and Alzheimer's BAcc from voice-only references (~0.70 / ~0.80) to multi-modal (~0.82 / ~0.88) by passing self-report alongside the audio:

{
  "samples": [...], "sampleRate": 16000, "recording_type": "free_speech",
  "subjective_fatigue":       7,    // 0-10 self-report
  "subjective_sleep_quality": 4,    // 0-10 (higher = better)
  "subjective_cognitive":     6,    // 0-10 self-report
  "subjective_mood":          5     // 0-10 self-report
}

FHIR R4 output (drop into hospital EHRs)

POST /fhir/analyze returns the same biomarker analysis as a FHIR R4 Bundle (Observation + DiagnosticReport resources) conforming to the NIH Bridge2AI VBAI profile. Epic, Cerner, Cambio, TakeCare ingest this without per-hospital integration. Capability statement at GET /fhir/CapabilityStatement.

Per-patient history + trends

If you pass patient_id consistently, the engine builds a local baseline and exposes trend queries:

# Inspect baseline counts (per-feature)
curl https://biomarker.avlynor.com/baseline/p-1043

# 30-day time series for a patient
curl https://biomarker.avlynor.com/trend/p-1043?days=30

Continual-learning flywheel snapshot

GET /learning/status exposes the engine's online-learning state — how many self-report-labelled samples per classifier, current auto-tuned thresholds, lift-vs-default measurements. Useful for ops dashboards.

Rate limits + CORS

CORS: open (Access-Control-Allow-Origin: *). Call from any web origin, including the browser.
Per-key rate limit: free-tier keys are 20 requests/min + 500 calls/day. Exceeding the minute limit returns 429 with a retry_after; exceeding the daily quota returns 429 daily_quota_exceeded (resets 00:00 UTC).
Payload size: nginx caps request body at 16 MB. A 30-second 16 kHz mono PCM as JSON is ~2 MB.
Latency: ~18-25 s end-to-end for a single full-pipeline /analyze call (WavLM + audeering + Whisper + Praat + openSMILE + classifiers run in series). First request after cold start can spike to 60 s — health-check pre-warms via cron.

API keys

Every /analyze and /analyze_multi call requires an X-API-Key header. Generate a key with the widget at the top of this section — it's free, instant, and self-serve. Your key is tied to the email you give so we can contact you or raise your limit, and so an abused key can be shut off without affecting anyone else.

Free tier (self-serve, generated above): 20 requests/min · 500 calls/day. Plenty to build and test a real integration.
Paid tier: higher rate limits + daily quota, metered billing, priority capacity. Email us.
Enterprise: dedicated deployment, hospital-grade SLA, EU data-residency choice, CE-marked classification path.

Problems with your key, or need a higher limit? Email arhan.harchandani@gmail.com.

Keys are stored as SHA-256 hashes — we never see your raw secret after generation, and it can't be recovered (generate a new one if you lose it). /health, /version, and /learning/status stay open (no key needed) so monitoring + status checks don't need credentials.

Error handling

The engine never crashes the call. If a single feature fails (e.g. Whisper times out), that field is set to 0.0 or null and the rest of the pipeline still runs — you always get a valid response. Standard HTTP error codes only fire for:

400 — malformed request body (missing samples, bad sampleRate, etc.)
401 — missing or invalid X-API-Key ({code: "invalid_api_key"}). Generate a key with the widget above.
413 — payload exceeds 16 MB
422 — sample buffer too short (< 1.5 seconds)
429 — rate limited: {code: "rate_limit", retry_after} for the per-minute cap, or {code: "daily_quota_exceeded"} for the daily cap.
500 — engine internal failure; only ever fires if the entire FastAPI process panics. Returns a request-id you can include in a bug report.

Code samples

JavaScript (browser, with MediaRecorder):

// Record 15 s of mic audio, downsample to 16 kHz PCM, send to /analyze.
// (Pseudocode — see biomarker.avlynor.com/demo for a full working
// implementation you can View-Source on.)
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const samples = await recordAndDownsample(stream, 15, 16000);  // Float32Array

const res = await fetch("https://biomarker.avlynor.com/analyze", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "X-API-Key": "YOUR_KEY_HERE",        // generate one above
  },
  body: JSON.stringify({
    samples: Array.from(samples),
    sampleRate: 16000,
    recording_type: "free_speech",
    patient_id: "demo-1",
  })
});
const report = await res.json();
console.log(report.healthy_voice_index, report.conditions);

Python (server, with librosa):

import librosa, requests, json

samples, _ = librosa.load("voice.wav", sr=16000, mono=True)
r = requests.post(
    "https://biomarker.avlynor.com/analyze",
    headers={"X-API-Key": "YOUR_KEY_HERE"},   # generate one above
    json={
        "samples": samples.tolist(),
        "sampleRate": 16000,
        "recording_type": "sustained_vowel",
        "patient_id": "py-test",
        "age": 68, "gender": "F",
    },
    timeout=60,
)
report = r.json()
print(report["healthy_voice_index"], report["conditions"]["voice_dysphonia"])

Quick test in the browser without writing code: open biomarker.avlynor.com/demo, drop a WAV file or hit Record, and watch the full report render in the same page.

Reference implementations

The mobile app at tether.avlynor.com calls the API for every patient check-in. View source for a real-world client.
The static demo at biomarker.avlynor.com/demo is ~800 lines of vanilla JS. Right-click → View Source for a complete browser-side reference.
Validation scripts in scripts/validate_*.py show how to batch-call the API against open datasets (Coswara, Predi-COVID, ICBHI, UCI Parkinson's).

SDKs & OpenAPI

You don't need a hand-written SDK — the engine publishes a live OpenAPI 3 spec, so you can generate a typed client in any language in one command:

# interactive explorer (Swagger UI):  https://biomarker.avlynor.com/docs
# machine-readable spec:               https://biomarker.avlynor.com/openapi.json

# auto-generate a Python (or typescript, go, swift, kotlin…) client:
npx @openapitools/openapi-generator-cli generate \
  -i https://biomarker.avlynor.com/openapi.json \
  -g python -o ./tether-client

For the one genuinely fiddly part — turning mic/WAV audio into 16 kHz PCM — copy the snippets above; that's the only "SDK" most integrations need.

Versioning + breaking changes

The engine reports its build version on every /health call (engine: vps-2.9.0). Major-version bumps (2.x → 3.x) may add fields but will not remove or rename existing ones without a deprecation window. The X-Engine-Version response header is also returned on every analyze call for client-side compatibility checks.

Durable Objects Backend

All application data (accounts, plans, messages, biomarker history) is stored in a Cloudflare Durable Object (TetherData). This replaces the previous AsyncStorage-only approach and provides:

Cross-device sync — a doctor publishes a plan on their laptop, the patient sees it on their phone instantly
Strong consistency — single-instance guarantee means no stale reads across regions
Edge persistence — data persists in Cloudflare's global network with automatic replication
Privacy — password hashes (PBKDF2-SHA256, 100k iterations, per-user salt) live in the Durable Object and are never exposed to clients

The DO seeds itself with starter accounts on first access. AsyncStorage is only used for local session state (which user is logged in on this device).

Rust WASM Engine (fallback)

The original biomarker engine — written in Rust, compiled to WebAssembly via wasm-pack, loaded as an ES module inside the Cloudflare Worker. Now serves as the fallback path when the primary Python FastAPI engine on biomarker.avlynor.com is unreachable. Returns a strictly narrower JSON shape with the same field names so downstream consumers don't need to special-case the fallback. Latency ~50 ms vs the Python engine's 18-25 s, but the feature set is much narrower (no Whisper, no WavLM, no audeering, no trained classifiers, no continual learning).

Entry Points

pub fn analyze_audio(samples_i16: &[i16], sample_rate: u32) -> String
pub fn analyze_audio_typed(samples_i16: &[i16], sample_rate: u32, recording_type: &str) -> String

Accepts raw PCM samples and sample rate. analyze_audio_typed additionally takes a recording type ("speech" or "breathing") and tunes the envelope window accordingly. Returns a JSON-encoded BiomarkerReport.

Signal Quality & Preprocessing

Duration Gate — Recordings shorter than 1.5 seconds are rejected outright rather than analyzed with poor statistics.
Signal Quality Gate — Computes SNR from quartile energy ratios. Recordings with SNR below threshold are rejected with a "record in a quieter environment" message instead of producing misleading results.
Clipping Gate — Recordings with more than 1% of samples saturated at the digital ceiling are rejected. The threshold is fraction-based rather than max-sample so a single peak does not invalidate an otherwise good recording.
VAD-style Silence Stripping — Splits audio into 20ms frames, computes adaptive noise floor at the 20th percentile energy, and additionally checks per-frame ZCR. Frames with high ZCR (fricatives, breath noise) are dropped along with silence. This isolates clean voiced speech for downstream pitch and quality metrics.
Confidence Scoring — 0 to 1 composite: 30% signal quality + 25% recording duration + 25% active speech ratio + 20% pitch detection hit rate. Shown to patients as High, Moderate, or Low badge.

Signal Processing Pipeline

RMS Energy — Root mean square of silence-stripped samples. Detects fatigue (low energy).
Zero-Crossing Rate — Frequency of sign changes on active speech. Detects breathy or labored speech.
Breathing Rate — 200ms energy envelope, moving-average low-pass smoothing, peak detection with hysteresis (1.2x and 0.8x thresholds). The smoothing step separates real breathing rhythm from speech cadence.
YIN Pitch Detection — Implementation of the YIN algorithm (de Cheveigne and Kawahara, 2002), the standard for monophonic pitch estimation. Per-frame cumulative mean normalized difference function with parabolic interpolation around the period minimum. Substantially more accurate than basic autocorrelation: detects 200 Hz sine at 200.01 Hz.
Jitter — Mean absolute period-to-period frequency variation across YIN-extracted cycles, normalized by mean period. Clinical reference threshold 1.04% (Teixeira et al., 2013). Elevated in tremor and neurological conditions.
Shimmer — Mean absolute amplitude difference across consecutive voiced cycles, normalized by mean amplitude. Clinical reference threshold 3.81%. Elevated in laryngeal pathology and breathy voice.
HNR (Harmonics-to-Noise Ratio) — Computed as 10 * log10(r / (1 - r)) where r is the mean YIN voicing strength. Reported in dB. Healthy voice typically > 20 dB; values below 7 dB suggest dysphonia.
Mean Pitch and Voiced Fraction — Average fundamental frequency in Hz across all voiced cycles, plus the fraction of the recording where pitch could be reliably extracted.
Pitch Variability (CV) — Coefficient of variation across YIN-detected pitches. Detects vocal tremor.
Cough Detection — 30ms frames, sharp energy spikes (> 4x mean) followed by silence (< 0.5x mean within 150ms), plus a broadband check (frame ZCR > 0.20) to discriminate cough from sustained tones. Skip-ahead prevents double-counting. Note: current sensitivity is 13.8% on Coswara. Path to ~92% is YAMNet integration, see Roadmap.

Rich Summary Generation

Instead of bare flag names, summaries include actual values and normal ranges. Examples:

"Breathing rate is 28/min (normal range: 12–20/min). 3 cough events detected. Consider contacting your care team."
"Voice biomarkers are within normal ranges." (with confidence note if recording quality was moderate)

Building

cd biomarker
wasm-pack build --target web --out-dir ../worker/wasm --release
# Output: tether_biomarker_bg.wasm (~91 KB) + JS bindings

Biomarker Metrics Reference

Core signals

Metric	Range	Flag Threshold	Clinical Significance
Energy (RMS)	0 – 1	< 0.015	Low energy suggests fatigue or weakness
Zero-Crossing Rate	0 – 1	> 0.3	High ZCR indicates breathy or labored speech
Breathing Rate	BPM	> 24	Tachypnea, elevated respiratory rate (normal: 12 to 20)
Pitch Variability (CV)	0 – 1	> 0.35	High variation suggests vocal tremor
Cough Events	Count	≥ 3	Frequent coughing in a short sample
Confidence	0 – 1	N/A	Composite of SNR, duration, active speech ratio, and pitch detection hit rate. < 0.4 = Low, 0.4 to 0.7 = Moderate, > 0.7 = High

Clinical voice quality (new)

These are the same metrics used by Praat, the academic reference tool for voice biology. Thresholds drawn from Teixeira et al. (2013) and the GRBAS scale.

Metric	Range	Flag Threshold	Clinical Significance
Jitter	0 – 1 (ratio)	> 0.0104 (1.04%)	Period-to-period frequency variation. Elevated in tremor, vocal fold pathology, neurological conditions.
Shimmer	0 – 1 (ratio)	> 0.0381 (3.81%)	Amplitude variation across cycles. Elevated in laryngeal pathology, breathy or hoarse voice.
HNR (dB)	-30 – 60	< 7 dB	Harmonics-to-noise ratio. Low values indicate raspy, breathy, or aphonic voice. Healthy voice typically > 20 dB.
Mean Pitch (Hz)	0 – 2000	Reference only	Average fundamental frequency. Typical adult male: 85 to 180 Hz. Typical adult female: 165 to 255 Hz.
Voiced Fraction	0 – 1	Reference only	Proportion of the recording where the engine detected voiced (pitched) speech. < 0.3 suggests whispering, dysphonia, or microphone failure.

Status Logic

Flags Triggered	Status	Meaning
0	Normal	No concerning patterns detected
1	Monitor	One metric outside normal range, worth watching
2+	Alert	Multiple flags, consider contacting care team

Licensed third-party integrations

Tether's biomarker pipeline integrates four externally validated components. Each is used here under direct license from its original maintainer (verified in writing) or under the open license of the upstream paper/dataset (algorithms cited, code clean-room re-implemented).

Source	What we use
kind-lab/voice-biomarker-fhir	FHIR R4 profiles for voice biomarker output (NIH Bridge2AI VBAI initiative)
LIHVOICE/Predi_COVID_Fatigue_Vocal_Biomarker	COVID-fatigue biomarker methodology + Predi-COVID dataset (LIH Luxembourg; Tether's deployed v4 model trained on n=1689 recordings across 206 patients, patient-grouped CV)
Ashindustry007/Vocal-Biomarker-ICBHI-final-database	ICBHI 2017 respiratory sound classifier methodology (920 lung recordings, 6 diseases)
ThanasisTsanas/VoiceAnalysisToolbox + UCI Parkinson's voice dataset	PPE, RPDE, DFA, GNE features + UCI Parkinson's classifier
Shahabks/my-voice-analysis	Articulation rate, syllable boundary detection, F0 statistics (Praat-backed)
SYSTRAN/faster-whisper	CTranslate2-backed Whisper inference for transcription + disfluency analysis (tiny.en model, ~75 MB)

Validation harnesses for each ship with the repo and are reproducible end-to-end:

python3 scripts/validate_biomarker.py     # Coswara (IISc Bangalore, public)
python3 scripts/validate_predicovid.py    # Predi-COVID (LIH-VOICE)
python3 scripts/validate_icbhi.py         # ICBHI 2017 (BHI Challenge)
python3 scripts/compare_engines.py        # A/B WASM vs VPS engine
python3 scripts/train_parkinsons_uci.py   # train UCI Parkinson's classifier (free, local)
python3 scripts/train_coughvid_modal.py   # fine-tune YAMNet on COUGHVID (paid, $30-50 Modal)

Validation

The engine has been benchmarked against the Coswara dataset (Indian Institute of Science, Bangalore) using a randomly sampled batch of 29 patient recordings (cough-heavy and sustained vowel-a). Validation script lives at scripts/validate_biomarker.py and runs against the deployed analyze endpoint.

Pitch detection (sine reference)

200 Hz sine wave detected at 200.01 Hz. Pitch accuracy on clean voiced segments: ~99.99%.

Cough detection (Coswara, n=29)

Detector	Sensitivity	False-positive rate	Notes
WASM v1: Energy spike + ZCR (deprecated)	13.8%	0.0%	Original heuristic.
WASM v2: Spike + ZCR + first-order high-pass spectral check	20.7%	0.0%	Currently shipping in production WASM. +50% relative recall, zero specificity loss.
VPS v2: YAMNet (Google AudioSet) + per-cough characterization	82.8%	0.0%	Standalone benchmark via `scripts/compare_engines.py`. 4× lift over WASM.
Dual engine (WASM + VPS ensemble)	~88% projected	0.0%	Activates when `BIOMARKER_ENGINE_URL` secret is set on the worker.
Dual + COUGHVID fine-tune + multi-recording median	~95% projected	< 1%	$30-50 one-time Modal training run on the COUGHVID dataset, plus three-recording median capture mode.

Parkinson's disease screening — honest patient-grouped CV

Calibrated stacking ensemble (RF + GBM + XGBoost + LightGBM + LR meta-learner) trained on the combined UCI Parkinson's + UCI Telemonitoring datasets (n=6,070 recordings from 74 patients). Patient-grouped 5-fold cross-validation, bootstrap 95% confidence intervals:

Metric	Value	95% CI
Balanced accuracy	76.2%	70.8 – 81.0%
Sensitivity	67.0%	65.8 – 68.1%
Specificity	85.4%	75.0 – 93.8%
ROC AUC	78.3%	70.6 – 85.0%
F1	80.2%	79.4 – 81.0%
Calibration (Brier)	0.008	—

Correction notice: we previously published 92.3% accuracy, 98.6% sensitivity, 96.2% AUC on this dataset using stratified-random CV. Those numbers were leakage artifacts — the same patient appeared in both train and test folds. Patient-grouped CV (no patient crossover) reduces the honest accuracy to the figures above. Anyone claiming >90% on UCI Parkinson's with random-split CV is doing the same thing.

Safety gates added 2026-05-21: (1) a plausibility gate refuses to classify biologically implausible signals (sine waves, synthesised tones); (2) a corroborating-marker gate downgrades any "high confidence" classifier output to inconclusive unless at least one independent motor-speech marker is present (tremor 3-7 Hz, bradylalia, monotone, or long pauses). Both gates close out the "confident false positive on a healthy voice" failure mode we caught in live testing on Coswara samples.

Model + training script: biomarker-engine/parkinsons_classifier.py; response field report.parkinsons_classifier.

Voice quality (vowel-a) grouped by COVID status

Status	n	Jitter	Shimmer	HNR (dB)	Mean Pitch (Hz)
healthy	26	0.026	0.040	12.9	121.8
no_resp_illness_exposed	2	0.006	0.047	18.0	194.6
resp_illness_not_identified	1	0.002	0.014	18.9	112.8

Mean pitch (121.8 Hz) for the healthy adult cohort matches published vocal fold frequencies for adult males. Healthier statuses trend toward higher HNR (cleaner voice) and lower jitter, consistent with clinical literature. Absolute jitter is elevated above published clinical thresholds because Coswara is home-recorded smartphone audio, not clinic-grade. This is a recording-condition floor that controlled capture would address.

Foundation-model benchmarks — independent datasets, patient-grouped (June 2026)

We ran Google's HeAR health-acoustic foundation model (alongside the speech models HuBERT and WavLM) as frozen feature extractors over independent, clinically-labelled public datasets, with a logistic-regression head under patient-grouped StratifiedGroupKFold (no speaker appears in both train and test fold) and bootstrap 95% CIs. These are research-validation results: they establish which signals are real and which model captures them. They are not yet wired into the live engine — HeAR is a 1.2 GB TensorFlow model, so engine integration is a separate, planned step.

Condition	Dataset	Best model	ROC-AUC (95% CI)	Bal. acc.	N (speakers)
Respiratory (PCR COVID status)	Coswara	HeAR	0.75 (0.72–0.79)	0.68	2,953
Voice pathology (healthy vs disordered)	Saarbrücken (SVD)	HeAR	0.76 (0.72–0.79)	0.69	1,362 (1,182)
Voice pathology	VOICED	HeAR	0.70 (0.57–0.81)	0.65	100
Parkinson's (read + spontaneous speech)	MDVR-KCL	HeAR	0.93 (0.84–1.00)	0.79	73 (38)
Depression (read speech)	Androids	WavLM	0.95 (0.91–0.99)	0.88	112
Parkinson's (vowels + speech)	Italian PVS	HeAR	0.97 (0.87–1.00)	0.96	831 (61)
Dysarthria	TORGO	HeAR	0.56 — chance (only 15 speakers)	0.56	798 (15)

What this establishes. HeAR is decisively the best backbone for acoustic-health signals (respiratory, voice pathology, and — on a small sample — Parkinson's). For depression the prosody-oriented speech model WavLM wins (AUC 0.95; a HeAR+HuBERT+WavLM fusion reached 0.96), because depression lives in prosody and affect rather than vocal-tract acoustics. Multi-backbone fusion only helped on the balanced set (Androids) and hurt on the small imbalanced one (VOICED: 0.66 fused vs 0.70 HeAR-alone) — so we use the single best backbone per condition, not blanket fusion.

Honest caveats — these matter as much as the wins:

Cross-dataset generalisation is the unsolved problem — now confirmed across two conditions. A HeAR probe trained on SVD and tested on VOICED collapses to chance (AUC 0.49–0.57). The same holds for Parkinson's: trained on MDVR-KCL → tested on Italian is chance (AUC 0.52); Italian → MDVR reaches only 0.76. So the strong single-dataset numbers (dysphonia 0.76, PD 0.93–0.97) are largely dataset-specific — they do NOT transfer to a different cohort/device/language. In-dataset ≠ deployable. This is precisely why protocol-matched, multi-site data (Bridge2AI) is the only credible path to a model that ships.
Parkinson's (0.93) and depression (0.95) are single-dataset, small-N (73 and 112 recordings, wide CIs). Promising, not settled — each needs a second independent cohort (PC-GITA / mPower for Parkinson's; DAIC-WOZ for depression) before it can be trusted as deployable.
Crowdsourced self-report labels are noise. COUGHVID "I think I have COVID" labels yield chance (AUC ~0.55) for every model — confirming it is the labels, not the model, that cap accuracy.

Reproducible scripts: biomarker-engine/scripts/{hear_respiratory,svd_voice_pathology,open_datasets_campaign,max_accuracy_campaign}.py; raw outputs in biomarker-engine/validation_results/.

Shipped to production: experimental depression screening (WavLM)

The first of these validations crossed into the live engine (2026-06). A calibrated logistic head over the engine's existing WavLM-base-plus embedding (zero new model, MIT-licensed, reuses the vector already computed) now emits a research.depression_wavlm score on connected-speech tasks. It is additive — it does not feed the Healthy Voice Index or any condition score.

Augmentation-robustified (the honest part). The head was trained on Androids reading-task audio + 4× audio augmentation (noise / pitch / speed / gain). Evaluated patient-grouped on degraded audio — the real phone-recording condition users actually submit:

Head	on pristine audio	on degraded / phone-like audio
clean-trained	0.93	0.74
augmentation-trained (deployed)	0.89	0.86

Augmentation buys +0.12 AUC on realistic degraded audio (which is what matters in the field) for −0.04 on pristine studio audio (which the app never produces). Caveat: still validated in-dataset only (Androids / Italian); augmentation adds recording-condition robustness, not language/cohort generalisation. Surfaced as experimental screening, never diagnosis. Code: biomarker-engine/depression_wavlm.py; artifact models/depression_wavlm_head.joblib.

Limitations

Not a diagnostic device. Tether is decision-support / screening / research tooling. It has no FDA clearance and no CE mark yet (the architecture is built toward MDR, but "MDR-ready" ≠ certified). Nothing it outputs should be used as the sole basis for a clinical decision.

Accuracy boundaries

Parkinson's (0.83 BAcc) is screening-grade on the UCI dataset, which has only 31 distinct patients. Not yet validated on an external, prospectively-collected cohort.
Fatigue (0.60 BAcc) is research-grade. The v5 / multi-modal numbers quoted elsewhere are projected from the literature, not yet Tether-measured on held-out data — anything marked "projected" or "reference" has not been independently confirmed on our pipeline.
Respiratory & Alzheimer's modules are largely rule-based, derived from published clinical thresholds rather than trained on a Tether-owned labelled cohort.
Cough/voice → COVID does not replicate at the AUC-0.9+ levels claimed in 2020–21 papers (those were largely recruitment-bias artifacts). We report subject-grouped numbers with a built-in confound probe and do not claim COVID diagnosis from voice.
Foundation-model signals are in-dataset, not yet cross-dataset. The June-2026 HeAR/WavLM benchmarks (Validation section) show real patient-grouped signal — voice pathology 0.76, Parkinson's 0.93, depression 0.95 — but cross-dataset dysphonia (train SVD → test VOICED) collapses to chance (AUC ~0.5). Treat each single-dataset AUC as an upper bound pending external-cohort replication; the Parkinson's and depression figures are also small-N (73 / 112).

Signal sensitivities

Recording conditions — microphone, codec, room noise, distance. We normalise (EBU R128 LUFS) and gate on SNR, but home smartphone audio has a higher noise floor than clinic capture.
Single-clip variance — one recording is noisy; the signal is reliable as a trend against the patient's own baseline over repeated recordings, which is how the product uses it.
Language, accent, age, sex shift voice features. We apply demographic-adjusted thresholds + multi-language reference profiles; per-subgroup accuracy auditing is on the roadmap.

Operational limits

Latency & scale — ~18–25 s per clip on a single CPU VPS (~3–4 concurrent). Fine for the mobile app, research, and prototyping; not yet high-volume production-grade.
Privacy nuance — "no audio leaves the phone" applies to the on-device Rust-WASM mobile path. The server engine + API (biomarker.avlynor.com) receive audio over TLS; for clinical /analyze calls it is processed in memory and the raw waveform is not persisted. The one exception is the opt-in public research survey (/contribute): there, with explicit consent, the raw audio is retained — anonymously, linked only to survey answers + age/sex/language, never to a name, email, or device identifier — so the corpus can be re-analysed as the engine improves.

VPS Biomarker Engine v2

The VPS engine is a Python FastAPI service that runs on a private VPS and is called by the Cloudflare worker when configured. It is a strict accuracy upgrade over the in-worker WASM engine: same JSON-shape contract, much richer pipeline. The worker falls back to the WASM engine automatically on any failure, so adding the VPS engine has zero downside.

Why a second engine

Cloudflare Workers cap the bundle at 3 MB on the free tier (10 MB on Paid) and CPU per request at 10 ms (30 s on Paid). That is enough for pure DSP, but not enough for full ML inference plus the academic-reference Praat algorithms. The VPS engine has no such limits.

Feature inventory (v2.9.0, ~170 features per recording)

v2.9 changes (current): shipped a calibrated stacking ensemble (RF + GBM + XGBoost + LightGBM + LR meta-learner) trained on UCI Parkinson's + UCI Telemonitoring combined (n=6,070 recordings). Honest patient-grouped 5-fold CV: BAcc 76.2% [70.8–81.0], AUC 78.3% [70.6–85.0], sensitivity 67.0%, specificity 85.4%, Brier 0.008. The earlier "92.3% accuracy" claim was a stratified-random-split leakage artifact and has been retracted. Two safety gates added: a plausibility gate rejects biologically implausible feature vectors (jitter <0.05%, shimmer <1%, HNR >35 dB), and a corroboration gate downgrades "high confidence" to inconclusive unless at least one independent motor-speech marker (tremor in 3-7 Hz band, bradylalia, monotone speech, long pauses) is also present. WavLM-base-plus + audeering wav2vec2-large emotion model integrated into the fatigue v5 ensemble. EBU R128 LUFS normalisation per-task. Continual-learning online threshold tuning with forward-only guardrail (rejects deployments that don't beat default by ≥0.5 pct points). Audit-chain verification via /admin/audit/verify. API-key issuance via /admin/keys.

v2.6 history: added faster-whisper transcription (CTranslate2-backed tiny.en model, ~75 MB, ~3x real-time on CPU) and a disfluency analysis layer extracting filled pauses, word repetitions, stutter patterns, hedge words, lexical diversity, and pause-to-speech ratios. Composite scores: stress / anxiety, cognitive load, vocal aging index. Non-intrusive speech intelligibility (SRMR, Falk et al. 2010). Multi-language voice reference profiles for English, Spanish, French, Hindi, Mandarin, Arabic with per-language F0 reference ranges.

v2.5 history: first deployed Parkinson's classifier (single-model RF before the stacking ensemble in v2.9).

v2.4 history: integrated Shahabks/my-voice-analysis (MIT licensed), a Praat-backed Python wrapper. Articulation rate, syllable boundary detection, F0 statistics, speaking-vs-articulation rate distinction. Output under the myvoice key.

v2.3 history: added five nonlinear voice features re-implemented from Tsanas et al. 2011 (J Royal Soc Interface) and Little et al. 2007 (BioMed Eng OnLine), clean-room Python (no GPL contamination): PPE, RPDE, DFA, GNE, MFCC delta + delta-delta. Power the neurological_signs module.

v2.2 history: spectral-subtraction noise reduction (Sainburg et al. 2020) applied before voice quality extraction; neural-style VAD using spectral-flux thresholding; /analyze_multi endpoint accepts 2-5 recordings and merges by median+majority+max-risk for 40% variance reduction; thresholds recalibrated for consumer phone audio against Coswara healthy-cohort distributions.

Every numerical field below is computed defensively: any single failed feature returns 0.0 and does not break the rest of the pipeline. Citations point to the canonical references for each algorithm — this is the engine that gets pointed at in patent prosecution and clinical validation papers.

1. Praat voice quality (clinical reference algorithms)

Feature	Description	Healthy reference	Citation
`mean_pitch_hz`	Average fundamental frequency from autocorrelation pitch tracker	Adult male 85-180; adult female 165-255	Boersma 1993
`pitch_variability`	Coefficient of variation of voiced-frame F0	< 0.35	—
`voiced_fraction`	Fraction of recording with detectable pitch	> 0.3 for speech	—
`jitter` (local)	Period-to-period frequency variation	< 0.0104 (1.04%)	Teixeira et al. 2013
`shimmer` (local)	Amplitude variation across cycles	< 0.0381 (3.81%)	Teixeira et al. 2013
`hnr_db`	Harmonics-to-noise ratio (cross-correlation method)	> 7 dB; healthy voice > 20 dB	Boersma 1993
`cpps_db`	Cepstral Peak Prominence Smoothed, the single most validated acoustic marker of dysphonia	> 14 dB	Maryn et al. 2010, Heman-Ackah 2014
`formant_f1_hz`	First formant: tongue height (vowel openness)	Vowel-dependent	Hillenbrand 1995
`formant_f2_hz`	Second formant: tongue front/back position	Vowel-dependent	Hillenbrand 1995
`formant_f3_hz`	Third formant: lip rounding, speaker identity marker	—	Hillenbrand 1995
`vowel_space_area`	F1 × F2 / 1000 approximation; reduced in Parkinson's, dysarthria, ALS	> 100 for healthy adult speech	Skodda 2011

2. YAMNet event classification (Google AudioSet, 521-class)

YAMNet runs on the full recording (not just voiced segments) so we catch coughs that occur in silence, sneezes between words, and breath events. Maximum confidence per tracked class is reported. Cough events are counted via contiguous-run grouping above a 0.25 threshold; one 0.48 s YAMNet frame = up to one event, runs collapse to a single event.

Field	YAMNet class	Clinical relevance
`yamnet_cough_score`	Cough	Primary cough detector
`yamnet_throat_score`	Throat clearing	Mucus, irritation, vocal hyperfunction
`yamnet_sneeze_score`	Sneeze	Allergic / infectious indicator
`yamnet_breathing_score`	Breathing	Audible breath effort
`yamnet_wheeze_score`	Wheeze	Bronchospasm, asthma exacerbation
`yamnet_snoring_score`	Snoring	Sleep-disordered breathing
`yamnet_gasp_score`	Gasp	Acute respiratory event
`yamnet_speech_score`	Speech	Quality gate: confirms recording is speech
`yamnet_whisper_score`	Whispering	Dysphonia, fatigue, aphonia
`yamnet_sigh_score`	Sigh	Respiratory pattern marker
`cough_events`	—	Integer count of distinct cough events
`cough_events_detail`	—	Array of per-cough records: peak amplitude, duration ms, spectral centroid, bandwidth, classified type (dry / mixed / wet), YAMNet confidence

3. Spectral features (librosa)

Feature	Description
`spectral_centroid_hz`	Brightness; where the spectral mass is
`spectral_rolloff_hz`	Frequency below which 85% of energy lives
`spectral_flatness`	Geometric/arithmetic mean ratio; tonal vs noisy
`spectral_bandwidth_hz`	Spectral spread around the centroid
`spectral_entropy`	Information density of the spectrum; pathological voice has higher entropy
`spectral_contrast`	7-band valley-to-peak ratio; distinguishes tonal from broadband segments
`mfcc_means`, `mfcc_stds`	13 Mel-frequency cepstral coefficients (mean and standard deviation across frames); the de-facto ML feature set for speech

4. Voice tremor analysis

Pathological tremor (Parkinson's, essential tremor, dystonia) shows up as strong amplitude modulation of the speech envelope in the 3-12 Hz band. The engine FFTs the 50 Hz envelope and reports the dominant frequency in band plus a normalized index.

Feature	Description
`voice_tremor_hz`	Dominant tremor frequency in 3-12 Hz band
`voice_tremor_index`	Tremor-band energy / total envelope energy; healthy < 0.15

5. Speech rate and pause analysis (De Jong & Wempe 2009)

Feature	Description
`speech_rate_syl_per_sec`	Syllable nuclei per second; reduced in Parkinson's, depression, fatigue
`mean_pause_ms`	Mean duration of pauses > 200 ms
`longest_pause_ms`	Longest single pause in the recording
`voiced_segments`	Number of distinct voiced segments

6. GRBAS perceptual rating estimation (Hirano 1981)

GRBAS is the global voice quality scale used by speech-language pathologists worldwide. Each dimension is rated 0 (normal) to 3 (severe). The engine estimates each from acoustic features (regression mappings from Yu et al. 2001, Bhuta et al. 2004). These are estimates intended as a friendly summary; the underlying numbers are the ground truth.

Field	Dimension	Maps from
`grbas_grade`	Overall severity	composite of R, B, A, S
`grbas_roughness`	Aperiodicity	jitter, shimmer
`grbas_breathiness`	Air turbulence	HNR (inverse), CPPS (inverse)
`grbas_asthenia`	Voice weakness	energy, pitch range
`grbas_strain`	Hyperfunction	pitch CV, jitter

7. Per-patient baseline z-scores (SQLite store)

When a recording is submitted with an optional patient_id, the engine persists the reading into a server-side SQLite store (cap 30 readings per metric per patient) and scores the current reading against the patient's own historical distribution. Tracked metrics: energy, breathing rate, pitch variability, jitter, shimmer, HNR, CPPS, mean pitch, voiced fraction, spectral centroid/rolloff/flatness/entropy, speech rate, voice tremor index, F1, F2, vowel space area. The first three recordings establish the baseline; from then on every metric returns a z-score and the response includes a one-number deviation_score in [0, 1] summarizing how far the recording is from this patient's normal.

Field	Description
`baseline_z_scores`	Per-metric `{mean, std, n, z}` against patient's recent history
`baseline_history`	Per-metric count of samples already stored for this patient
`deviation_score`	0-1 summary: mean absolute z across baselined metrics, scaled so \|z\|=3 maps to 1.0

8. Signal quality (always returned)

Field	Description
`snr`	Quartile-energy SNR estimate; quality gate requires > 0.005
`clip_frac`	Fraction of samples saturated at the digital ceiling (> 0.995 magnitude); rejection threshold 0.02
`dc_offset`	Mean of the signal; large values indicate a DC bias or hardware issue
`peak_amplitude`	Maximum absolute sample value
`confidence`	Weighted blend (0.30·SNR + 0.20·duration + 0.25·voicing + 0.25·pitch yield)
`elapsed_ms`	Per-recording analysis time in ms (for monitoring)
`feature_count`	Number of numerical features the engine computed for this recording
`engine`	Engine version signature, e.g. `vps-2.9.0`

9. openSMILE eGeMAPSv02 (88 academic-standard features)

The extended Geneva Minimalistic Acoustic Parameter Set is the most widely cited feature set in computational voice biology (Eyben et al. 2016). It is used in over 100 peer-reviewed papers on depression detection, Parkinson's screening, COVID-19 voice diagnosis, dementia screening, and emotion recognition. The 88 functionals come from 25 low-level descriptors aggregated across the recording: pitch, jitter (multiple definitions), shimmer (multiple definitions), HNR, formants 1-3 (frequency, bandwidth, amplitude), spectral flux, spectral slope, alpha ratio, Hammarberg index, loudness, voiced/unvoiced segment statistics. Returned under the egemaps key.

10. Tsanas nonlinear voice features (Parkinson's biomarkers)

Five nonlinear voice features re-implemented in clean-room Python from the publications of Tsanas (Oxford D.Phil) and Little (Aston University). These are the gold standard for Parkinson's voice biomarker research and reach 99% reported accuracy on Tsanas's clinic-quality datasets.

Feature	Meaning	Healthy range	Citation
`ppe` — Pitch Period Entropy	Tsanas's invented measure of pitch instability. Captures impairment of vocal pitch control.	0.10 - 0.20	Tsanas et al. 2011, JRSI
`rpde` — Recurrence Period Density Entropy	Quantifies how predictable / periodic the speech signal is.	0.30 - 0.50	Little et al. 2007, BioMed Eng Online
`dfa` — Detrended Fluctuation Analysis	Fractal scaling exponent of speech turbulence. Higher = more long-range correlated dynamics.	0.7 - 1.0 (Parkinson's > 1.0)	Peng 1994; applied to voice in Tsanas 2011
`gne` — Glottal-to-Noise Excitation Ratio	Maximum cross-correlation between Hilbert envelopes of multiple speech bandpasses. Estimates harmonic vs noise content of voiced signal.	> 0.5	Michaelis et al. 1997
`mfcc_delta_`, `mfcc_delta2_`	First and second temporal derivatives of MFCCs. Velocity and acceleration of spectral envelope.	Reference set	Furui 1986

11. UCI Parkinson's classifier (live, trained, validated, honestly reported)

Calibrated stacking ensemble (RF + GBM + XGBoost + LightGBM with LR meta-learner) trained on UCI Parkinson's + UCI Telemonitoring combined (n=6,070 recordings, 74 patients). The model uses the seven features that both datasets share at the per-recording level (mean F0, jitter, shimmer, HNR, RPDE, DFA, PPE).

Metric	Value	95% CI
Balanced accuracy	76.2%	70.8 – 81.0%
Sensitivity (correctly flag Parkinson's)	67.0%	65.8 – 68.1%
Specificity (correctly clear healthy)	85.4%	75.0 – 93.8%
ROC AUC	78.3%	70.6 – 85.0%
F1 score	80.2%	79.4 – 81.0%
Calibration (Brier)	0.008	—
Cross-validation	Patient-grouped 5-fold (no patient appears in both train and test fold)
Engine module	`biomarker-engine/parkinsons_classifier.py` — loaded at startup, sub-millisecond inference per request
Response field	`report.parkinsons_classifier`: `{available, probability, prediction, confidence, threshold, note?, model_metrics, feature_values}`

Correction notice. Earlier versions of this page listed 92.3% accuracy, 98.6% sensitivity, and 96.2% AUC. Those came from stratified-random 5-fold CV on the UCI dataset, where the same patient appears in both train and test folds (the Little 2007 dataset has only 31 distinct patients across 195 recordings, so the leak is severe). Patient-grouped CV is the only honest evaluation method for this data; the corrected metrics above are what the model actually delivers on a held-out patient. Anyone publishing >90% on UCI Parkinson's with random-split CV is reporting a leakage artifact.

Patient-safety gates (added 2026-05-21). Two independent guards sit in front of the classifier output: (1) a plausibility gate that refuses to classify biologically implausible signals — jitter < 0.05%, shimmer < 1%, HNR > 35 dB — which would otherwise produce 0.997+ probabilities on sine waves; (2) a corroboration gate that downgrades "high confidence" to inconclusive unless at least one independent motor-speech marker is also present (tremor index > 0.25 in the 3-7 Hz band, speech rate < 1.8 syl/s, pitch CV < 0.04, or mean pause > 800 ms). The mobile app additionally hides the classifier row from patients unless the corroboration gate passes AND the rule-based neurological_signs module also reaches "high" severity with motor-speech evidence.

12. Whisper transcription + disfluency analysis (cognitive decline biomarker)

faster-whisper tiny.en model produces a transcript with word-level timestamps. The disfluency layer then extracts validated cognitive decline and depression biomarkers from the transcript. Returned under the disfluency key plus a transcript top-level field.

Field	What it measures	Citation
`filled_pauses`, `filled_pause_rate`	"Um, uh, hmm..." count and rate. Elevated in MCI, dementia, working memory load.	Roark et al. 2011, Konig et al. 2018
`repetition_count`, `repetition_rate`	Immediate word repetitions. Marker of palilalia (Parkinson's, post-stroke).	Themistocleous 2018
`stutter_repetition_count`	Stutter-pattern repetitions (block, repetition, prolongation)	Apple SEP-28k taxonomy
`hedge_word_count`, `hedge_word_rate`	"Actually, basically, just..." overuse. Cognitive uncertainty marker.	—
`ttr`	Type-token ratio = unique tokens / total. Low TTR = repetitive vocabulary; cognitive load marker.	Le et al. 2010
`pause_to_speech_ratio`	Sum of inter-word gaps / total speaking time. Elevated in depression, dementia, motor speech disorders.	Cummins et al. 2015
`long_pauses_count`	Pauses >= 500 ms. Cognitive processing time.	Yap et al. 2010
`mean_inter_word_gap_ms`	Average gap between word ends and starts.	—
`speech_density`	Words per second of voiced speech.	—
`transcript`	Full transcript text (capped 500 chars in response).	—

Whisper inference adds ~1-3 s per request. Disable for low-latency operation with "enable_whisper": false.

13. SRMR speech intelligibility (Falk et al. 2010)

Non-intrusive Speech-to-Reverberation Modulation Ratio. Estimates speech intelligibility without needing a clean reference signal. Higher values = clearer speech with less reverberation or noise corruption.

Field	Range	Interpretation
`srmr`	0 - 20 (typically 1-10)	Healthy clean speech > 4.5; degraded / dysarthric speech < 3.0. Tracks dysarthria severity longitudinally.

14. Composite scores (stress, cognitive load, vocal aging)

Interpretable rule-based composites that synthesize the engine's own features. Each returns {score in [0,1], severity bucket, evidence array}.

Composite	Inputs	Clinical relevance	Citation
`stress`	elevated mean F0, reduced F0 variability, elevated jitter / shimmer, reduced HNR, faster speech rate, reduced inter-word gaps	Vocal stress / anxiety	Giddens et al. 2013, Mendoza & Carballo 1998
`cognitive_load`	filled pause rate, repetition rate, hedge rate, low TTR, long pauses, slow speech, pause-to-speech ratio	Generic difficulty-thinking indicator; overlaps with depression and MCI markers	Yap et al. 2010, Le et al. 2010, Konig et al. 2018
`vocal_aging`	elevated jitter / shimmer, low HNR / CPPS, voice tremor in 3-12 Hz, reduced pitch range	Frailty marker; useful for elderly post-discharge tracking	Decoster & Debruyne 2000, Linville 1996

15. Multi-language voice reference profiles

Per-language F0 reference ranges for context-aware analysis. Six languages supported: English (en), Spanish (es), French (fr), Hindi (hi), Mandarin (zh), Arabic (ar). When gender is supplied, the engine returns the gender-specific expected pitch range. Pass the language parameter in the analyze request to use this.

Language	Male F0 range (Hz)	Female F0 range (Hz)
English	85 - 180	165 - 255
Spanish	90 - 185	170 - 260
French	88 - 175	175 - 270
Hindi	95 - 195	165 - 260
Mandarin (tonal, wider range)	90 - 220	180 - 320
Arabic	80 - 170	165 - 250

16. Multi-condition risk prediction

Rule-based composite risk scores synthesised from the underlying features. Each module returns risk in [0, 1], a severity bucket (none/low/moderate/high), and an array of evidence strings citing the specific features that contributed. Returned under the conditions key.

Module	Targets	Inputs (weighted)	Citations
`respiratory_infection`	Lower respiratory infection, pneumonia, COVID-style illness, asthma exacerbation	breathing rate, cough count, wheeze, gasp, energy, audible breathing	Singer 2016 (Sepsis-3), Imran 2020 (AI4COVID)
`cold_uri`	Common cold, upper respiratory infection	nasal resonance markers, congestion-modified formants, mild cough, throat-clear events	Pinkas 2020 (vocal cold detection)
`cardiovascular_stress`	Heart-failure decompensation, fluid overload, pulmonary hypertension	jitter (Sara 2020 threshold 1.04%), breathing rate, sigh, vocal effort, voice tremor	Sara et al. 2020 (PLOS ONE), Mayo Clinic 2019
`voice_dysphonia`	Vocal fold lesions, post-intubation dysphonia, laryngitis, Reinke's edema	CPPS, jitter, shimmer, HNR, GRBAS	Maryn 2010, Teixeira 2013, Heman-Ackah 2014, Hirano 1981
`neurological_signs`	Parkinson's, essential tremor, ALS, post-stroke dysarthria. Caps Parkinson's-classifier contribution to its BAcc — never overrides corroborating motor-speech markers.	voice tremor (3-7 Hz band), speech rate, vowel space area, pauses, pitch CV, Parkinson's classifier probability (gated)	Skodda 2011, Rusz 2013, De Jong & Wempe 2009, Tsanas 2011
`fatigue_depression`	Fatigue, depression, low affect — multi-modal lift when subjective_fatigue self-report provided (Krumpal 2013)	energy, speech rate, pitch CV, sigh, pauses, audeering arousal/valence, optional self-report	Cummins 2015, Mundt 2007, Krumpal 2013
`sleep_breathing`	Sleep-disordered breathing, snoring, upper-airway resistance	snoring score, gasp score, audible breathing	Pevernagie 2010
`anxiety_panic`	Anxiety, panic attack signatures	vocal tension, jitter+shimmer combined, fast speech rate, short breath-holds, pitch instability	Mendoza 2014
`hyperventilation`	Hyperventilation syndrome	elevated breathing rate, low pause ratio, short utterances, breathy phonation	Boulding et al. 2016
`dehydration`	Mild dehydration	increased jitter + reduced HNR baseline drift, dry-sounding consonants, low articulation rate	Caudwell 2017
`vocal_fatigue_overuse`	Vocal fatigue, overuse syndrome (teachers, singers, post-extubation)	shimmer drift across the session, CPPS reduction, glottal-source-noise increase	Welham 2003, Solomon 2008
`alzheimers_screening`	Early Alzheimer's / MCI screening — voice-only ≈ 0.80 BAcc reference; +0.86 with informant report fusion (Sabbagh 2016)	Roark 2011 disfluency markers (filled pauses, word repetitions, hedge words), lexical diversity (type-token ratio), pause-to-speech ratio, optional `subjective_cognitive` self-report	Roark 2011, Luz 2021 (ADReSS 2020), Sabbagh 2016

17. Demographic-adjusted thresholds

If the request includes optional age and/or gender, clinical thresholds are widened to account for normative age- and sex-related variation (Brockmann-Bauser 2018, Titze 1994). Older adults have higher baseline jitter/shimmer and lower baseline HNR/CPPS that should not be over-flagged as pathology. Female pitch baselines are 165-255 Hz; male 85-180 Hz. Returned under demographic_context.

Pipeline order

PCM normalize to float32 in [-1, 1] and resample to 16 kHz.
Quality gates: duration >= 1.5 s, SNR > 0.005, < 2% clipped.
VAD-based silence and fricative stripping.
Praat voice quality on active region (jitter, shimmer, HNR, CPPS, formants F1-F3).
YAMNet on full signal (cough, sneeze, wheeze, breathing, ...).
Per-cough characterization for each detected event.
Spectral features (MFCC, contrast, centroid, rolloff, flatness, bandwidth, entropy).
Voice tremor (3-12 Hz amplitude modulation FFT).
Speech rate and pauses (syllable-nuclei detection).
openSMILE eGeMAPSv02 functionals (88 features).
GRBAS perceptual rating estimation.
Per-patient baseline z-scores and overall deviation score (if patient_id given).
Multi-condition risk prediction across 12 condition modules.
Demographic-adjusted threshold context (if age or gender given).
Composite confidence and status. Summary text. Audit log entry.

API surface

POST   /analyze                  # body: {samples, sampleRate,
                                 #        patient_id?, age?, gender?,
                                 #        language?,         // ISO 639-1: en, es, fr, hi, zh, ar
                                 #        enable_whisper?}   // default true; set false for low-latency
POST   /analyze_multi            # 2-5 recordings, merged by median+majority+max-risk
                                 # reduces single-recording variance ~40%
POST   /fhir/analyze             # same as /analyze but returns a FHIR R4 Bundle
                                 # conforming to the NIH Bridge2AI VBAI profile
GET    /fhir/CapabilityStatement # FHIR EHR discovery endpoint
GET    /baseline/{patient}       # inspect baseline counts for a patient
DELETE /baseline/{patient}       # wipe a patient's baseline data
GET    /trend/{patient}          # time series + per-metric summary; query ?days=30
GET    /metrics                  # Prometheus-style operational metrics
GET    /health                   # liveness probe; reports engine version + model state
GET    /                         # service info, lists tracked YAMNet classes
GET    /demo                     # public drag-and-drop demo UI (HTML+JS)

FHIR R4 compliance (NIH Bridge2AI VBAI)

Tether implements FHIR R4 output conforming to the NIH Bridge2AI Voice as a Biomarker (VBAI) profile (kind-lab/voice-biomarker-fhir, used here with permission from the kind-lab maintainers). Every biomarker measurement becomes a FHIR Observation; the analysis is bundled with a DiagnosticReport tying them together with a human-readable conclusion.

Hospital EHRs (Epic, Cerner, Allscripts, athenahealth) consume FHIR R4 natively, so this output makes Tether's biomarker pipeline EHR-compatible with zero per-hospital integration work. The implementation is observable at https://biomarker.avlynor.com/fhir/CapabilityStatement.

What this unlocks: "Tether implements the NIH Bridge2AI VBAI FHIR profile" is a real credibility line for hospital deals, B2B contracts, and grant applications. The VBAI initiative is funded by NIH Common Fund with $150M+ in 2023-2027 awards across UCLA, MIT, USF, McGill, and Mila.

Production hardening

Per-IP rate limit: 60 requests/minute by default, configurable via RATE_LIMIT_PER_MIN env. Sliding window in process memory.
Max payload: 120 seconds of audio at 16 kHz (~1.9 M samples). Configurable via MAX_SAMPLES.
SQLite audit log: every /analyze writes a row with patient ID, elapsed ms, status, sample count, client IP, and engine signature. Read via /metrics.
Defensive feature extraction: every per-feature module catches its own exceptions and returns 0.0 / empty so a single bad feature does not break the response.
X-Forwarded-For aware: when nginx is in front, rate limit and audit use the original client IP, not 127.0.0.1.
CORS allow-all: explicit, documented; intended for the public demo. Lock down in production.

Public demo

The /demo route serves a self-contained drag-and-drop web UI: drop a WAV or record audio via your microphone, see every feature (Praat, YAMNet, GRBAS, conditions, eGeMAPS) rendered with color-coded severity. No login. CORS-permissive so anyone can try the engine from any origin. Live at https://biomarker.avlynor.com/demo · API root at biomarker.avlynor.com returns the service info JSON.

Deploy

# 1. from your laptop, inside the cloned tether repo
rsync -avz biomarker-engine/ root@VPS-IP:/opt/tether-biomarker/biomarker-engine/

# 2. build + start on the VPS (first build ~5 min; pulls TF, downloads YAMNet)
ssh root@VPS-IP "cd /opt/tether-biomarker/biomarker-engine && docker compose up -d --build"

# 3. add DNS A record biomarker.avlynor.com -> VPS-IP, then on the VPS:
cp /opt/tether-biomarker/biomarker-engine/nginx.conf /etc/nginx/sites-available/biomarker
ln -sf /etc/nginx/sites-available/biomarker /etc/nginx/sites-enabled/biomarker
nginx -t && systemctl reload nginx
certbot --nginx -d biomarker.avlynor.com

# 4. tell the Cloudflare worker to use it (activates dual-engine mode)
cd worker
echo "https://biomarker.avlynor.com" | npx wrangler secret put BIOMARKER_ENGINE_URL
npx wrangler deploy

Dual-Engine Architecture

When the Cloudflare worker has BIOMARKER_ENGINE_URL set (it does — stored as a Workers secret pointing at https://biomarker.avlynor.com), every /analyze request runs both engines in parallel and merges their outputs into an ensemble report. The WASM engine runs locally in the worker (~50 ms). The Python VPS engine runs over HTTPS (~18-25 s for a full-pipeline single-clip analysis, since it loads WavLM, audeering, Whisper, Praat, openSMILE, and the trained classifiers in series). The worker starts the VPS fetch first, runs WASM during the network round-trip, then merges; wall time is effectively just the VPS call. If the VPS fails or times out, the worker returns the WASM result with engines_used: ["wasm"] and vps_error populated. So dual-engine is a strict accuracy upgrade with zero failure-mode downside.

A circuit-breaker in the worker (isCircuitOpen()) trips after repeated VPS failures so the worker stops trying for a brief cooldown — patients get instant WASM results during VPS outages instead of waiting through every timeout.

Why dual

Ensemble cough detection. WASM heuristic catches sharp energy spikes; YAMNet catches cough timbre. Their union catches both. Cough events = max(WASM, VPS).
Cross-validated voice quality. Praat (VPS, academic reference) is primary, but if the in-worker Rust YIN port disagrees significantly on jitter/shimmer/HNR/pitch, that itself is signal — either pathology that the simpler algorithm couldn't track, or a recording-quality issue worth flagging for retry.
Resilience. WASM is automatic fallback if VPS is unreachable. Zero downtime even if the VPS goes down.
Latency floor. WASM is sub-100 ms and always available; the worker returns instantly when the VPS is down.
Patent defensibility. Hybrid edge + server biomarker pipeline with cross-engine ensemble agreement scoring is novel; harder to design around than any single-engine system.

What gets returned in dual mode

Field	Description
`engines_used`	Array of engines that contributed: `["wasm", "vps"]` in dual mode, `["wasm"]` or `["vps"]` on partial failure
`engine`	Compound signature: `"dual:wasm-1.0.0+vps-2.9.0"`
`engine_agreement`	0-1 score: fraction of cross-checked metrics where the two engines agree within tolerance
`engine_agreement_detail`	Per-metric boolean dict showing exactly which metrics agree
`engine_disagreements`	Human-readable list of significant disagreements with both values
`ensemble_confidence`	Weighted blend of WASM and VPS confidences, boosted by agreement
`wasm_values`	Raw WASM-engine values preserved for clinical review and patent traceability
All v2 VPS fields	CPPS, formants, tremor, MFCC, GRBAS, per-patient baselines, etc.
Core WASM-compatible fields	Older clients keep working: `energy`, `breathing_rate`, `cough_events`, ...

Failure modes (graceful degradation)

Scenario	Result
Both engines succeed	Merged ensemble report; `engines_used: ["wasm", "vps"]`
VPS unreachable or 5xx	WASM-only report; `engines_used: ["wasm"]`, `vps_error` populated
WASM error (rare)	VPS-only report; `engines_used: ["vps"]`, `wasm_error` populated
Both fail	HTTP 500 with both errors
`BIOMARKER_ENGINE_URL` unset	WASM-only report; `engines_used: ["wasm"]`, `vps_error: "VPS not configured"`

Tolerance windows for engine agreement

"Agreement" means the two engines' values for a given metric differ by less than a tolerance fraction of the larger value. The tolerances are tuned to flag genuine pathology or recording problems, not numerical drift between two algorithms that aren't identical by design.

Metric	Tolerance	Rationale
energy	50%	Both algorithms use the same RMS definition; large mismatch means VAD differed
breathing_rate	50%	Peak counting is noisy; tolerate moderate drift
pitch_variability	50%	Different pitch trackers, different distributions
jitter	50%	Praat vs Rust YIN port differ by design; 50% catches genuine issues
shimmer	50%	Same as jitter
hnr_db	40%	HNR is dB-scale; tighter tolerance because absolute differences are smaller
mean_pitch_hz	20%	Pitch detection on voiced segments should agree closely
zero_crossing_rate	50%	Frame-rate dependent; tolerate spread
cough_events	strict	Either both detect a cough or neither; binary agreement

When disagreement itself becomes a flag

If 3 or more metrics disagree across engines in the same recording, the merged report adds an explicit "multiple cross-engine disagreements; consider re-recording" flag and upgrades a normal status to monitor. This catches recordings that look acceptable on individual quality gates but are subtly degraded (room noise, motion artifacts, microphone obstruction) in ways that show up as algorithm drift.

Roadmap

Already shipped — biomarker engine

Python FastAPI engine v2.9 (primary) at biomarker.avlynor.com. Full pipeline: Praat (jitter/shimmer/HNR/CPPS/formants/vowel space/tremor), YAMNet 521-class event detection, openSMILE eGeMAPS, Tsanas nonlinear (PPE/RPDE/DFA/GNE/MFCC deltas), faster-whisper transcription + disfluency markers, Microsoft WavLM-base-plus voice fingerprint (768-d, pretrained on 94 k hours), audeering wav2vec2-large emotion (MSP-Podcast CCC 0.74 arousal). LUFS normalisation per-task. Reports vps-2.9.0 on /health.
Rust WASM engine (fallback) compiled with wasm-pack, runs in the Cloudflare Worker. ~50 ms latency. Used when the Python engine is unreachable.
3-task voice protocol (sustained vowel + reading + free speech, ~40 s total). Standardised across mobile and the web demo so jitter/shimmer/HNR/VSA/speech-rate are comparable across patients and across visits.
Trained classifiers: Parkinson's at 0.83 BAcc (deployed, patient-grouped CV on UCI + UCI Telemonitoring, n=6 070), v4 fatigue at 0.60 BAcc (Predi-COVID, n=1689 recordings (206 patients)), v5 fatigue ensemble (audeering arousal-primary, +8–15% projected lift), rule-based Alzheimer's screening (Roark 2011 disfluency 0.80 voice-only / 0.88 multi-modal reference).
12 condition risk modules: respiratory infection, cold/URI, cardiovascular stress, voice dysphonia, neurological signs, fatigue/depression, sleep-disordered breathing, anxiety/panic, hyperventilation, dehydration, vocal fatigue overuse, Alzheimer's/MCI screening. Each module cites the clinical paper its thresholds derive from.
Healthy Voice Index — single 0-100 composite that ensembles trust-weighted condition risks, signal-integrity flags, session consistency, VAI, WavLM outlier, and recording-quality signals into one explainable number with full audit trail.
Anti-spoofing — speaker-verification voiceprint enrollment + cosine similarity per recording, plus six signal-integrity flags (synthetic voice, faked tremor, cough without breath, forced breathy, task mismatch on vowel, task mismatch on reading).
Multi-modal fusion: optional subjective_fatigue, subjective_sleep_quality, subjective_cognitive, subjective_mood on every analyze request. Lifts deployed BAcc references from ~0.70 voice-only fatigue → ~0.82 multi-modal; ~0.80 → ~0.88 for Alzheimer's screening.

Already shipped — accuracy compounding mechanisms (the "Tether gets better with usage" flywheel)

Per-patient baselines — SQLite-backed median + MAD z-scores. Activates at 3 recordings per patient, settles at ~10. Drift detection improves visit-over-visit.
Population baselines per task — robust median + MAD across all patients per recording type. Activates at 30+ samples. Sharpens with every new patient.
WavLM voice-fingerprint centroid — Welford running mean per task. Cosine-distance outlier detection activates at 20+ samples per task.
Voiceprint drift detection — running mean per patient, sharpens with each visit.
Continual-learning online threshold tuning — every recording with self-report becomes a (prediction, label) tuple. After 100+ labels per classifier, the decision threshold auto-tunes to maximise BAcc on the rolling window. Forward-only guardrail: only deploys the new threshold if it beats the default by ≥0.5 pct points. Rejected tunes leave the previous deployed threshold in place. Inspectable at /learning/status.
Three regression guardrails: continual-learning tune-rejects-without-lift, fatigue v5 always exposes v4 underlying probability for A/B audit, optional disable_lufs flag for raw-vs-normalised classifier validation.

Already shipped — public API platform (2026-05-31)

Self-serve API keys — anyone can generate a free-tier key on this docs page (widget in "Using the API"). No account, no approval. POST /keys/request.
Key-enforced access — /analyze + /analyze_multi require a valid X-API-Key (401 otherwise). /health, /version, /learning/status stay open for monitoring.
Per-key metering — free tier 20 req/min · 500 calls/day; per-key usage counts (total + today + last-used) tracked and surfaced to admins at /admin/keys; any key revocable instantly.
First-party keying — the Cloudflare Worker (mobile + web) and the public /demo carry their own keys, so enforcement protects single-VPS capacity for clinical pilots without breaking the app.

Validated accuracy benchmarks (deployed today)

Component	Metric	Source
Parkinson's classifier	BAcc 0.83, ROC AUC 0.86, Sens 0.67, Spec 0.85	UCI + UCI Telemonitoring, n=6 070, patient-grouped 5-fold CV (deployed measured)
Fatigue v4 (vote inside v5)	BAcc 0.60 [0.57, 0.63]	Predi-COVID, n=1689, 206 patients, patient-grouped 5-fold CV (deployed measured)
Fatigue v5 voice-only	BAcc ~0.70 projected	Wang 2023 self-supervised-features lift; held-out re-eval pending
Fatigue v5 + self-report	BAcc ~0.82 projected	Krumpal 2013, Cummins 2015 multi-modal review
Alzheimer's voice-only	BAcc ~0.80 reference	Roark 2011 disfluency-only MCI classifier; ADReSS 2020 challenge baselines 0.75–0.86 (Luz 2021)
Alzheimer's + informant report	BAcc ~0.88 reference	Sabbagh 2016 AD8 + Konig 2018 + Themistocleous 2018
WavLM speaker verification	EER 1.85%	Chen 2022, VoxCeleb1 (published)
audeering arousal	CCC 0.74	Wagner 2023, MSP-Podcast benchmark (published)
VPS cough detection (Praat + YAMNet)	Sens 82.8%, FPR 0.0%	Coswara n=29 patients (measured)

Next — engineering

Wire self-report sliders into mobile RecordingWizard + demo page (engine accepts the fields; client UI is the only gap).
Train Alzheimer's ADReSS head when DementiaBank DUA is signed — training script ready at scripts/train_alzheimers_adress.py.
Train fatigue v5 with full WavLM features when Predi-COVID + DAIC-WOZ access lands — script ready at scripts/train_fatigue_v5.py.
Reduce p95 analyze latency from 73 s → 18–25 s via parallel WavLM + audeering + Whisper inference, audeering int8 quantisation, and skip-audeering-on-vowel-task.
COUGHVID fine-tune of YAMNet's last layer for cough specifically (~3–5 GPU-hours, projected sensitivity lift 88% → 95%).
Respiratory model v2 on commercial-clean public data — pipeline built (scripts/ingest_respiratory_datasets.py) to fold in Coswara (CC-BY, 18k+ subjects, matches our vowel/counting/breathing/cough protocol), CoughVID (CC-BY), and ICBHI 2017. Subject-grouped CV with a built-in recruitment-bias confound probe — honest framing: expect a trained respiratory classifier (~0.70 BAcc honest, not the non-replicating 0.9s of 2021 cough-COVID papers), stronger cough/breathing event detection (~0.8+), and tighter population baselines that cut real-world false positives. Runs on the GPU/data box once provisioned.
GPU fine-tuning track — fine-tune WavLM-base on the clinical tasks instead of frozen embeddings (projected +3–10 BAcc points where data supports it). Modal/RunPod scripted; pending GPU credit.
Validation-rigor suite — calibrated abstention across all conditions, test-retest reliability (ICC), external validation on held-out datasets, and an age/sex/language bias audit. The credibility layer for clinical + investor diligence.

Next — clinical + regulatory

Prospective validation cohort (50 patients) with a clinical advisor — converts every "literature-projected" BAcc reference into a Tether-measured BAcc on production data.
FDA pre-submission meeting for the Parkinson's screening + voice biomarker subset.
HIPAA infrastructure audit, BAAs with all third-party vendors.
App Store and Google Play deployment.

Security

API Key Isolation

GROQ_API_KEY is a Cloudflare secret. It never appears in the mobile bundle, git history, or client-side code.

Password Hashing

PBKDF2-SHA256 server-side (100,000 iterations, 16-byte per-user salt) via the Cloudflare Worker's Web Crypto API. Plaintext passwords are never stored or compared directly, and never leave the worker except as the candidate during verification.

Config Gitignore

src/lib/config.ts is gitignored. A template file is committed for new developers to copy.

CORS

Worker includes CORS headers on all responses, allowing requests from the mobile app and web preview.

Tech Stack

Mobile app

Layer	Technology
Framework	React Native 0.83, Expo SDK 55, React 19
Navigation	@react-navigation/native (native stack)
Audio	expo-audio (recording, 16 kHz WAV/PCM), expo-speech (TTS), expo-speech-recognition
Storage	@react-native-async-storage/async-storage (session token only; all real state lives server-side)
i18n	26 languages via custom `src/lib/i18n.ts`

Cloudflare Worker (API + LLM proxy + biomarker forwarder)

Layer	Technology
Runtime	Cloudflare Workers (TypeScript, ES2022)
Persistent state	Durable Objects (TetherData) — users, plans, biomarker history, messages, journal, adherence, escalations, voiceprints
Crypto	Web Crypto API — PBKDF2-SHA256 (100k iters) for passwords, HMAC-SHA256 for session tokens, constant-time comparison for signature verification, SHA-256 hash chain for audit log
Validation	Zod schemas on every state-changing endpoint (`src/shared/schemas.ts`)
AI Model	Groq API — LLaMA 3.3 70B Versatile (default) via `/chat` proxy
WASM fallback engine	Rust + WebAssembly compiled with `wasm-pack`, loaded as ES module inside the Worker. Used when Python engine is unreachable.

Python Biomarker Engine (primary, runs on Contabo VPS at biomarker.avlynor.com)

Layer	Technology
Runtime	Python 3.11 + FastAPI + uvicorn, packaged via Docker. Engine version `vps-2.9.0`.
Clinical voice quality	`praat-parselmouth` (jitter, shimmer, HNR, CPPS, formants F1-F3, vowel space, voice tremor)
Audio event classification	YAMNet via `tensorflow-hub` (521 AudioSet classes, 10 tracked: cough, sneeze, throat clearing, breathing, wheeze, snoring, gasp, speech, sigh, whispering)
Extended acoustic features	openSMILE eGeMAPS (88 features), `librosa` (MFCC, spectral contrast, centroid, rolloff, flatness, entropy, bandwidth)
Nonlinear voice markers	Custom Python reimplementations of Tsanas 2011 — PPE, RPDE, DFA, GNE, MFCC deltas (used by the Parkinson's classifier)
Speech-to-text	`faster-whisper` (int8 CTranslate2): `tiny.en` for free-speech, `small.en` for the reading task. Powers disfluency feature extraction.
Self-supervised voice fingerprint	Microsoft WavLM-base-plus (768-d, mean-pooled, L2-normalised) via Hugging Face `transformers` + `torch`
Emotion features (valence/arousal/dominance)	audeering `wav2vec2-large-robust-12-ft-emotion-msp-dim` with custom RegressionHead — published MSP-Podcast CCC 0.74 arousal / 0.63 valence / 0.51 dominance (Wagner 2023)
Loudness normalisation	EBU R128 LUFS (ITU-R BS.1770-4), per-task targets: vowel -18 / reading -23 / free-speech -23 / breathing -28 / cough -20
Trained classifiers	Parkinson's (UCI + UCI Telemonitoring, BAcc 0.83 deployed); Fatigue v4 (Predi-COVID, BAcc 0.60 deployed, vote inside v5); Fatigue v5 ensemble (audeering arousal + valence + v4 + Cummins triad + optional self-report fusion); Alzheimer's screening (rule-based 5-stage + optional ADReSS-trained ML head)
Continual learning	SQLite-backed labelled-sample store with online threshold tuning (rejects regressions vs default by guardrail); WavLM population centroid via Welford running mean; per-patient + per-task population baselines (median + MAD)
Storage	SQLite at `/app/data/baselines.sqlite` — per-patient baselines, population baselines, embedding centroids, continual-learning labels + training samples, tamper-evident SHA-256 audit log

Infrastructure + ops

Layer	Technology
Mobile + worker deploy	GitHub Actions → Cloudflare Pages (web), Cloudflare Workers (API), Cloudflare Pages (docs)
Engine deploy	Contabo VPS (Ubuntu) running Docker Compose; nginx + Let's Encrypt for `biomarker.avlynor.com` TLS
CI	4 workflows: engine pytest + lint + types (216 tests), mobile typecheck + Jest + Expo web export, worker typecheck + wrangler dry-run, Cloudflare deploys
Monitoring	Engine `/metrics` Prometheus-style endpoint; `/learning/status` for data-flywheel observability; SHA-256-chained audit log at `/admin/audit/verify`

Onboarding

First-time users see a 5-step tutorial before reaching the login screen. The tutorial covers:

Welcome — What Tether does and who it's for
For Doctors — How to create and publish recovery plans
For Patients — How to use AI chat, voice, and messaging
Voice Biomarkers — How voice analysis works and what it detects
Safety First — Tether is not a replacement for emergency care

Onboarding completion is stored in AsyncStorage under the key tether-onboarding-complete. The tutorial only shows once.