A mauve rock image is shown to four cutting-edge large language models (LLMs). Models are asked about the location, origin, and extent of the potentially dangerous eye tumor.
LLaVA-Med claims the cancerous growth is in the cheek lining, whereas LLaVA maintains it’s in the breast. GPT-4V gives a long, imprecise response and can’t pinpoint it.
PathChat, developed in Brigham and Women’s Hospital’s Mahmood Lab, revolutionizes computational pathology. To locate, appraise, and diagnose cancers and other dangerous illnesses, it can consult with human pathologists.
PathChat outperforms current models on multiple-choice diagnostic questions and delivers clinically meaningful open-ended responses. It comes with an exclusive license from Boston-based biomedical AI company Modella AI starting this week.
“PathChat 2 is a multimodal large language model that understands pathology images and clinically relevant text and can basically have a conversation with a pathologist,” Modella founding CTO Richard Chen revealed in a demo video.
PathChat Outperforms ChatGPT-4, LLaVA, And LLaVA-Med
Researchers coupled a pathology-specific vision encoder with a pre-trained LLM and fine-tuned it with visual language instructions and question-answer turns to create PathChat. Questions included 54 diagnoses from 11 major pathology practices and organ locations.
Each question had two evaluation strategies: an image with 10 multiple-choice questions and an image with patient sex, age, clinical history, and radiology findings.
PathChat performed with 78% accuracy (on the image alone) and 89.5% accuracy (with context) on X-rays, biopsies, slides, and other medical testing. The model summarized, classified, captioned, described morphological details, and answered pathology and biomedicine-related inquiries.
PathChat was tested against ChatGPT-4V, open-source LLaVA, and biomedical domain-specific LLaVA-Med. PathChat beat all three in both tests. PathChat beat LLaVA by over 52% and LLaVA-Med by over 63% in image-only. Clinical context improved the new model by 39% over LLaVA and about 61% over LLaVA-Med.
PathChat outperformed GPT-4 by 53% with image-only questions and 27% with clinical context prompts.
Harvard Medical School assistant professor of pathology Faisal Mahmood told VentureBeat that AI models for pathology have mostly been developed for specific diseases (like prostate cancer) or tasks (like recognizing tumor cells). Pathologists cannot use these models in a “intuitive, interactive manner” once taught because they cannot adjust.
“PathChat moves us one step forward towards general pathology intelligence, an AI copilot that can interactively and broadly assist both researchers and pathologists across many different areas of pathology, tasks and scenarios,” Mahmood told VentureBeat.
Pathology Advice With Knowledge
PathChat was given an image-only, multiple-choice prompt about a 63-year-old man with chronic cough and accidental weight loss for five months. A thick, spiky chest X-ray was also used.
PathChat correctly recognized lung adenocarcinoma from 10 answers.
In the prompt technique with clinical context, PathChat was shown a closeup of blue and purple sprinkles on a cake and told: “This tumor was found in the liver of a patient. Primary tumor or metastasis?
The model correctly diagnosed the tumor as metastasis (“the presence of spindle cells and melanin-containing cells further supports the possibility of a metastatic melanoma”). Melanoma metastasis from the skin often occurs in the liver.
Most surprising, Mahmood noted, was that by training on comprehensive pathology knowledge, the model was able to adapt to downstream tasks like differential diagnosis (when symptoms match more than one condition) and tumor grading (classifying a tumor on aggressivity) without labeled training data.
As a “notable shift” from prior research, model training for specific tasks like predicting metastatic tumor origin or heart transplant rejection typically requires “thousands if not tens of thousands of labeled examples specific to the task in order to achieve reasonable performance.”
Clinical Guidance, Research Support
PathChat could help human-in-the-loop diagnosis by providing context after an AI-assisted assessment, researchers say. Like the examples above, the model may consume a histopathology image to assess structural appearance and indicate malignancy.
The pathologist could then provide more details and request a differential diagnosis. If that idea is sensible, the human user could ask for more testing and feed the model the data to diagnose.
In long, complicated instances like malignancies of unknown source (where diseases have migrated from another portion of the body), this may be useful, researchers say. It could also be useful in low-resource environments with few experienced pathologists.
An AI copilot might summarize massive image cohorts and automate morphological marker quantification and interpretation in research.
“The potential applications of an interactive, multimodal AI copilot for pathology are immense,” researchers write. LLMs and generative AI will offer a new computational pathology frontier that prioritizes natural language and human interaction.
Effects Beyond Pathology
PathChat is a milestone, but hallucinations might be improved with reinforcement learning from human feedback (RLHF), researchers say. They also recommend training models with current knowledge using retrieval augmented generation (RAG) to stay abreast of changing terminology and guidelines.
Integrations like digital slide viewers or electronic health data could make models more valuable for pathologists and researchers.
Mahmood said PathChat might be used in genomics and proteomics, two more medical imaging disciplines and data modalities.
His lab will collect lots of human feedback to better match model behavior with human intent and enhance replies. PathChat will be integrated with clinical databases to let the model retrieve patient data to answer particular inquiries.
Mahmood added, “We plan to work with expert pathologists across many different specialties to curate evaluation benchmarks and more comprehensively evaluate PathChat’s capabilities and utility across diverse disease models and workflows.”