Launched in November 2022, ChatGPT is a chatbot that can not only engage in human-like conversation, but also provide accurate answers to questions in a wide range of knowledge domains. The chatbot, created by the firm OpenAI, is based on a family of "large language models"—algorithms that can recognize, predict, and generate text based on patterns they identify in datasets containing hundreds of millions of words.
In a study appearing in PLOS Digital Health this week, researchers report that ChatGPT performed at or near the passing threshold of the U.S. Medical Licensing exam (USMLE)—a comprehensive, three-part exam that doctors must pass before practicing medicine in the United States.
In an editorial accompanying the paper, Leo Anthony Celi, a principal research scientist at MIT's Institute for Medical Engineering and Science, a practicing physician at Beth Israel Deaconess Medical Center, and an associate professor at Harvard Medical School, and his co-authors argue that ChatGPT's success on this exam should be a wake-up call for the medical community.
Q: What do you think the success of ChatGPT on the USMLE reveals about the nature of the medical education and evaluation of students?
A: The framing of medical knowledge as something that can be encapsulated into multiple choice questions creates a cognitive framing of false certainty. Medical knowledge is often taught as fixed model representations of health and disease. Treatment effects are presented as stable over time despite constantly changing practice patterns. Mechanistic models are passed on from teachers to students with little emphasis on how robustly those models were derived, the uncertainties that persist around them, and how they must be recalibrated to reflect advances worthy of incorporation into practice.
ChatGPT passed an examination that rewards memorizing the components of a system rather than analyzing how it works, how it fails, how it was created, how it is maintained. Its success demonstrates some of the shortcomings in how we train and evaluate medical students. Critical thinking requires appreciation that ground truths in medicine continually shift, and more importantly, an understanding how and why they shift.
A: Learning is about leveraging the current body of knowledge, understanding its gaps, and seeking to fill those gaps. It requires being comfortable with and being able to probe the uncertainties. We fail as teachers by not teaching students how to understand the gaps in the current body of knowledge. We fail them when we preach certainty over curiosity, and hubris over humility.
Medical education also requires being aware of the biases in the way medical knowledge is created and validated. These biases are best addressed by optimizing the cognitive diversity within the community. More than ever, there is a need to inspire cross-disciplinary collaborative learning and problem-solving. Medical students need data science skills that will allow every clinician to contribute to, continually assess, and recalibrate medical knowledge.
Q: Do you see any upside to ChatGPT's success in this exam? Are there beneficial ways that ChatGPT and other forms of AI can contribute to the practice of medicine?
A: There is no question that large language models (LLMs) such as ChatGPT are very powerful tools in sifting through content beyond the capabilities of experts, or even groups of experts, and extracting knowledge. However, we will need to address the problem of data bias before we can leverage LLMs and other artificial intelligence technologies. The body of knowledge that LLMs train on, both medical and beyond, is dominated by content and research from well-funded institutions in high-income countries. It is not representative of most of the world.
We have also learned that even mechanistic models of health and disease may be biased. These inputs are fed to encoders and transformers that are oblivious to these biases. Ground truths in medicine are continuously shifting, and currently, there is no way to determine when ground truths have drifted. LLMs do not evaluate the quality and the bias of the content they are being trained on. Neither do they provide the level of uncertainty around their output. But the perfect should not be the enemy of the good. There is tremendous opportunity to Improve the way health care providers currently make clinical decisions, which we know are tainted with unconscious bias. I have no doubt AI will deliver its promise once we have optimized the data input.
More information: Amarachi B. Mbakwe et al, ChatGPT passing USMLE shines a spotlight on the flaws of medical education, PLOS Digital Health (2023). DOI: 10.1371/journal.pdig.0000205
This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.
Citation: Q&A: Three questions on ChatGPT and medicine (2023, February 9) retrieved 19 February 2023 from https://techxplore.com/news/2023-02-qa-chatgpt-medicine.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
You don't have permission to access "http://www.usnews.com/education/best-graduate-schools/top-medical-schools/articles/what-is-the-mcat-test-like-and-how-do-you-prepare-for-it" on this server.
Doctors could soon have a new tool to help diagnose chronic obstructive pulmonary disease (COPD).
A questionnaire called CAPTURE successfully identified almost half of clinical trial participants who had moderate to severe forms of previously undiagnosed COPD, researchers report.
"The goal with trying to find COPD is to treat it earlier, which will help make patients feel better and hopefully prevent their disease from progressing," said principal investigator Dr. Fernando Martinez, chief of the pulmonary and critical care medicine division at Weill Cornell Medicine in New York City.
More than 15 million Americans have been diagnosed with COPD, and experts think millions more have it but don't know it. COPD is a leading cause of death in the United States.
Common COPD symptoms include coughing, shortness of breath, wheezing or whistling in the chest, and tightness or heaviness of the chest.
The CAPTURE tool asks patients to answer five questions that assess their breathing and exposure to chemicals or air pollution.
Those with medium scores take an in-office breathing test to gauge the force of their exhalation, a sign of lung function.
People who score low on that test - or who scored high on the CAPTURE questionnaire - proceed to a spirometry breathing test, which is considered the gold standard for diagnosing COPD.
CAPTURE screening gives doctors additional information to assess patients with respiratory symptoms, the study authors said.
Only about one-third of COPD assessments include spirometry, because the tests can be difficult to integrate into a short visit with a primary care doctor.
"CAPTURE was designed to be easy for physicians to use. The screening is simple, takes less than a minute, and helps identify adults with trouble breathing who should be evaluated further," Dr. Antonello Punturieri, program director of the U.S. National Heart, Lung, and Blood Institute's Chronic Obstructive Pulmonary Disease/Environment Program, said in an institute news release.
CAPTURE's clinical trial involved more than 4,300 adults aged 45 to 80, and ran from October 2018 to April 2022.
By the end, about 2.5% of the study sample had been diagnosed with moderate to severe forms of COPD. Of those cases, CAPTURE accurately identified about 48% as having COPD.
The researchers estimated that 1 in 81 CAPTURE screenings would identify an adult with treatable but previously undiagnosed COPD, based on these results.
However, CAPTURE also gave a false positive result for 479 participants who did not have COPD.
The researchers said they are studying ways to Improve the tool's accuracy through minor changes like altering questions or adding others. But they emphasized that the goal is to identify people who would benefit from COPD testing with spirometry.
The findings were published recently in the Journal of the American Medical Association.
"The study shows that there is a high degree of respiratory burden in primary care, and physicians need to ask about it and do the appropriate testing to determine if symptoms are driven by COPD or another process so that patients can get the right treatment," said principal investigator Dr. MeiLan Han, a professor of medicine in the division of pulmonary and critical care at the University of Michigan, in Ann Arbor.
Larger studies are underway to further assess CAPTURE and how doctors use the tool in practice. Results are expected later this year.
The U.S. National Heart, Lung, and Blood Institute has more about COPD and lung health.
Copyright © 2023 HealthDay. All rights reserved.
A new 5-question tool to help identify adults with undiagnosed, treatable chronic obstructive pulmonary disease (COPD) has been developed with the support of the National Heart, Lung, and Blood Institute.
The efficacy of the new screening tool was examined in a study published today in the journal JAMA Network.
Researchers conducted a multi-year, multi-site clinical trial from October 2018 until April 2022 with 4,325 participants between 45 and 80 years of age.
They used the COPD Assessment in Primary Care to Identify Undiagnosed Respiratory Disease & Exacerbation Risk (CAPTURE) screening tool.
Researchers said the CAPTURE system identified 53 of 110 previously undiagnosed study participants who had moderate to severe COPD.
However, the screening tool also provided 479 false positives. The scientists are looking at modifying or changing questions to Improve accuracy.
All participants received COPD testing for the scientists to determine the results.
The CAPTURE screening tool consists of a five-question assessment.
A score of 2 to 4 is indicative of moderate breathing issues.
Those participants then take an in-office breathing test called peak expiratory flow rate (PEFR) that measures the force of exhalation or lung function.
For women, a score of less than 250L/min suggests further COPD testing is needed. For men, it is 350 L/min.
People who score 5 or 6 on the CAPTURE screening do not need the PEFR. Instead, it’s recommended they have COPD testing.
The researchers expect 1 in 81 people who take the CAPTURE screening will be identified as having treatable but previously unidentified COPD.
“I think this is a promising step toward an easy-to-use screening tool for primary care physicians to identify those who might benefit from lung function testing to diagnose COPD,” said Dr. Jimmy Johannes, a pulmonologist and critical care medicine specialist at MemorialCare Long Beach Medical Center in California who was not involved in the study. “Nevertheless, the performance of this screening tool could be better; only about half of those diagnosed with COPD had a positive screening test.”
“However, implementing this screening tool may help standardize the approach to deciding who should undergo lung function testing,” Johannes told Healthline. “Importantly, it may also increase awareness among doctors and patients to think about lung health during a short clinic visit.“
In a 2022 report, the United States Preventive Services Task Force (USPSTF) recommended against screening asymptomatic people for COPD. This was a reaffirmation of a previous recommendation in 2016.
“The USPSTF reviewed the evidence for COPD and found that screening for COPD in asymptomatic adults has no net benefit,” task force members concluded.
The scientists conducting the current study disagree.
“The recommendation to not screen is because of a lack of studies evaluating the benefit of screening, not because there is no benefit to patients,” said Dr. Barry Make, the co-director of the COPD program at National Jewish Health and co-senior author of the study. “My view is that finding undiagnosed patients with COPD is beneficial because diagnosis and recognition will lead to earlier treatment and Improve patient symptoms.”
“Symptoms in patients with COPD develop slowly and insidiously over a long period, and patients fail to recognize their slow development.” Make told Healthline. “For example, if you ask most people who are current smokers if they have cough and phlegm, they respond ‘no.’ However, if asked if they have a ‘smoker’s cough,’ the answer is more likely to be ‘yes’ with an additional refrain that this is normal and they had it for a long time.”
“As another example, people may ‘slow down’ and give up some activities,” Make added.
“They relate this to aging when it is due to COPD. To avoid the troubling symptom of shortness of breath, people unconsciously avoid activities that make them short of breath.”
Although he would like to see improvements in accuracy, Johannes does agree that there are benefits to early screening.
“COPD is underdiagnosed, leading to significant untreated symptom burden and overall morbidity,” he said. “Diagnosing COPD early may lead to earlier treatment and symptom improvement. An earlier diagnosis may help identify ongoing risk factors for COPD and create an opportunity to mitigate further lung function decline by modifying those risk factors. For example, it may help motivate those still smoking cigarettes to quit earlier.”
COPD refers to a group of diseases, including emphysema and chronic bronchitis, that cause airway blockage and breathing-related problems, according to the Centers for Disease Control and Prevention.
Estimates indicate that 16 million people in the United States have been diagnosed with COPD and millions more have it but have not been diagnosed.
There is no cure or way to reverse COPD, but treatments are available to make living with the symptoms more manageable.
According to the American Lung Association, symptoms include the following:
Spirometry is the most common diagnostic tool.
For this test, a person breathes into a mouthpiece with tubing connected to a machine. It measures lung function and can detect COPD before symptoms appear.
Other tests include a CT scan and arterial blood gas test to measure the oxygen in your blood.
The Boston Globe 5 days ago
Many of the comments on my accurate firing by New York University took me to task for “failing” so many organic chemistry students with grades in the 60s. Is 65 seriously a low grade? Are your chances for admission to the medical school of your choice gone? Is your life really ruined with such a score?
No, your life isn’t ruined — 65 is no big deal.
In my 43 years teaching at Princeton and 15 at NYU, I often got complaints that an average grade of 65 was unacceptably low. They came not only from students but also from parents and even NYU deans. For years, I had aimed for that 65 mean in my organic chemistry exams. I almost always hit it.
It has always seemed to me that getting about two-thirds of difficult material right was actually pretty good in an introductory course. An exam that yields 65 allows for a range of questions, some easy to get students relaxed and started on the right path, one hard enough so that you find the students who can think beyond what they already know. And, of course, there is the in-between. Getting two-thirds of the exam right in my course put you at the B/B- line. Your life was decidedly not ruined.
However, in accurate years at NYU, those 65-average exams did not yield the desired mean/median, and complaints intensified. The scores began to slip about 10 years ago and crashed during the COVID-19 pandemic. Moreover, I have to admit that there has been a certain amount of accommodation on my exams. I am not proud of it, but it proved impossible to resist completely the cries for easier exams. If I had given those old exams today, the grades would have been worse.
In post-COVID 2022, my class inadvertently did the critical experiment that made the point — the exams were significantly easier than those of yesteryear and the grades were much worse. I am not talking about minor drops — we were getting single-digit scores and even zeros. Teachers never saw that in the old days. There was definitely trouble at the low end of the grade distribution. And there were changes at the other end of the scale as well. The top students in the course were not getting their usual 90′s — they were getting 100 on everything. That seems great, but in fact it is not good. Why are those excellent scores not ideal?
If you get a 92, you still are well within the A range, but at 100 you lose the opportunity to learn from these missing eight points. Students in the 90′s always find out what they missed — they go to the book; they go to my office hours; they review the videos; they pester me until they get it. Those top students are stretched by those missing few points and a lot of serious learning takes place.
That learning usually takes place at the limit of the student’s knowledge. They are led onward by those missing, short-of-perfection points. None of that occurs with a 100. In an e-mail to the class after I was fired, I apologized to the students who were getting those 100′s. My exams should have been hard enough to send them into the 90′s. But that apology produced outrage. Students accused me of trying to “ruin more lives.” I might point out that a score just short of perfection is scarcely ruinous and that to a student scoring zero, easy questions and hard questions are equally impossible.
But these days there is no denying that a 65 upsets many students, even though they have been told that it puts them somewhere in the B range. One might wonder why. Perhaps they have grown up in an education system that is relentlessly upbeat — everyone gets a prize, no team “loses” a game. It’s fine to be dissatisfied with a B — everyone should want an A (81-82 for my exams) and should strive to make that 65 their lowest exam — one that gets dropped.
But it is not fine to be discouraged and disgruntled. I fear that many of today’s students have little or no experience in climbing out of holes — or recovering from adversity. Possibly, they have never felt they were in a hole. Digging out of holes is a critical life skill, as is realizing when that hole is so deep that digging out isn’t likely and a change of direction is necessary. I recall math courses that overwhelmed me. I could struggle and pass but not easily. I could not internalize the concepts and had to survive by blindly learning how to solve certain kinds of problems. Anything else was out of reach. It was clear that for me, heading toward math and physics was a bad idea. I wish that weren’t true, but it was.
What should the students who were getting those single-digit scores do? The answer is not complicated but is, these days, strangely hard to convince students to follow: Go to class. Sit in the front. Go to office hours with your problems. You do yourself no favor to say, as many do, “I was afraid to show how little I know.” Take notes. When I took those math courses, and even some chemistry courses, I was not quick enough to get everything as it unfolded on the blackboard; but I tried to write down everything the professor said. Everything!
If you follow those suggestions and things still don’t work, do what I did and change direction. You are not a bad person if you don’t fully grasp chemistry — go find what does work for you. Among other things, college is for discovering what you were born to do.
Maitland Jones Jr. taught at Princeton University from 1964 to 2007 and at New York University from 2007 to 2022.
Before long, it’ll be easier to list the tasks ChatGPT can’t complete than the ones it can. We have already shared reports about ChatGPT passing law school and business school exams, and now a new study reveals that the AI chatbot can also pass the United States Medical Licensing exam (USMLE), though its score isn’t especially impressive.
Researchers from healthcare startup Ansible Health shared the results of their study in the journal PLOS Digital Health on February 9. They found that ChatGPT was able to score “at or around the approximately 60 percent passing threshold” for the licensing exam.
As the website explains, the USMLE is a three-step exam that physicians are required to take for medical licensure in the US. In addition to testing the skills and medical knowledge of prospective physicians, the test also assesses their values and attitudes.
After eliminating image-based questions, the researchers fed ChatGPT 350 of the 376 questions from the June 2022 USMLE. Across the three exams, ChatGPT scored between 52.4% and 75%. In most years, the passing threshold is around 60%. ChatGPT also outscored PubMedGPT — a model trained exclusively on biomedical literature — which scored 50.8%.
The authors say: “Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation.”
Shortly after the study was published, the Federation of State Medical Boards and National Board of Medical Examiners, both USMLE co-sponsors, shared a statement of their own. They note that two accurate studies used test prep material and practice questions as opposed to real USMLE exam questions. As such, ChatGPT’s achievement comes with an asterisk:
…it’s important to note that the practice questions used by ChatGPT are not representative of the entire depth and breadth of USMLE exam content as experienced by examinees. For example, certain question types were not included in the studies, such as those using pictures, heart sounds, and computer-based clinical skill simulations. This means that other critical test constructs are not being represented in their entirety in the studies.
“Although there is insufficient evidence to support the current claims that AI can pass the USMLE Step exams, we would not be surprised to see AI models Improve their performance dramatically as the technology evolves,” the groups added. “If utilized correctly, these tools can have a positive impact on how assessments are built and how students learn.”
MONDAY, Feb. 13, 2023 (HealthDay News) — A new artificial intelligence system, ChatGPT, scores at or around the passing threshold for the U.S. Medical Licensing Exam, according to a study published online Feb. 9 in PLOS Digital Health.
Tiffany H. Kung, from AnsibleHealth Inc. Mountain View in California, and colleagues examined the performance of the language model ChatGPT on the U.S. Medical Licensing Exam, which consists of Step 1, Step 2CK, and Step 3 exams.
The researchers found that ChatGPT performed at or near the passing threshold of 60 percent accuracy, without any specialized training or reinforcement. Across all questions, ChatGPT outputted answers and explanations with 94.6 percent concordance; the high concordance was sustained across all exams. ChatGPT produced at least one significant insight in 88.9 percent of responses.
“We believe that large language models such as ChatGPT are reaching a maturity level that will soon impact clinical medicine at large, enhancing the delivery of individualized, compassionate, and scalable health care,” the authors write.
Copyright © 2023 HealthDay. All rights reserved.
AARP and New York Life offer group term and whole life insurance policies for seniors, as well as whole life insurance coverage for minors. All of their policies are either simplified issue or guaranteed acceptance, meaning there are no medical exams and coverage is issued very quickly. The downside to this is that insurers assume applicants are at higher risk and, therefore, charge significantly more costly premiums.
You should also note that these policies are only available to AARP members (meaning you have to be at least 50 to qualify) and membership can cost between $12 to $16 per year, depending on your method of payment.
Unless you have a significant pre-existing condition and doubt your ability to pass a medical exam, we would not recommend AARP’s level benefit term life insurance through New York Life. The premiums are incredibly high and increase over time (in contrast to "level term" policies, "level benefit" means the death benefit stays the same while rates rise), and coverage ends when you turn 80.
The AARP offers term life insurance coverag for members between the ages of 50 to 74 and policies can be converted into a permanent life insurance policy at any point during coverage. Term life insurance death benefits only range from $10,000 to $100,000, meaning you may not be able to cover larger financial obligations, such as a mortgage. The AARP’s term death benefits are limited, in part, because their policies don’t require a medical exam in order to get coverage.
AARP’s term life insurance policies from New York Life are 1-year annually renewable policies. While this offers flexibility, it has the downside of causing your premiums to increase as you get older. Your initial premiums are determined by what 5-year age bracket you fall into and, each time you enter a new age bracket, the rates increase.
To demonstrate, say you are a 60-year old male that wants $100,000 of coverage for 15 years. With New York Life AARP’s program, you would pay three different premiums over that 15-year period:
|60 to 64||$109|
|65 to 69||$144|
|70 to 74||$208|
We compared this to quotes for a $100,000 15-year term policy from New York Life and five other top life insurance companies. As you can see below, whether you’re in great health, just average or even a smoker, AARP term life insurance from New York Life is significantly more costly.
Find Cheap Life Insurance in Your Area
While the AARP’s term life insurance rates are incredibly high, they are competitive to other no medical exam policies for some health profiles. However, you should still shop around and get multiple quotes. Depending on your age, height-to-weight ratio, tobacco use and health responses, no medical exam quotes for term life insurance can vary significantly. Using the example above, a $100,000 15-year no medical exam term policy for a 60-year old would cost:
|AARP/New York Life||$226||$226|
|New York Life||$311||$487|
Permanent life insurance policies, particularly those that have no medical insurance, consistently have higher premiums. Given this, we also would not recommend the AARP and New York Life’s simplified issue whole life insurance unless you have a pre-existing condition that would preclude you from passing a medical exam. However, if you’re a senior and have had a medical condition for over 2 years that’s well managed, such as diabetes, their whole life insurance policy is a strong option.
The AARP’s no medical exam whole life insurance policy is a form of final expense insurance (also called burial insurance), as the amount of coverage available is usually just sufficient to cover end-of-life expenses. AARP’s whole life insurance policy offers $5,000 to $50,000 as a death benefit and is available if you’re between the ages of 50 and 80. While this is certainly enough to cover a funeral and minor debts, it is likely not be a large enough death benefit to cover your mortgage. So, if you have large outstanding debts, you would want to consider other insurers.
As with other whole life insurance policies, AARP’s whole life coverage builds cash value over time. This is essentially the surrender value of the policy and can be borrowed against if, for example, you have an emergency medical expense. However, the AARP’s whole life insurance policy is relatively unique in that premium payments end when you turn 95. Relatively few people live to be 95, but the opportunity to stop making payments and continue to have coverage isn’t common among whole life insurance companies.
In addition, AARP’s whole life insurance comes with two riders that offer financial assistance in the case you become disabled or ill:
AARP and New York Life also offer guaranteed acceptance whole life insurance, though this option isn’t available in New Jersey or Washington. AARP’s guaranteed acceptance policy is similar to their no medical exam policy in that:
A key difference is that their guaranteed acceptance policy only offers between $2,500 and $25,000 of coverage. In addition, if you pass away during the first two years of coverage due to a non-accident, your beneficiary won’t receive the full death benefit. Instead, they will receive 125% of the value of premiums you had paid until that point. Waiting periods are fairly standard for guaranteed acceptance coverage as insurers want to avoid large payments in the case terminally ill patients sign up.
While AARP’s guaranteed acceptance coverage offers competitive rates, if you aren’t already a member, you shouldn’t join the AARP to get access to this product. As you can see below, AARP’s quotes are on-par with those from competitor policies with nearly identical features that don’t require any sort of membership.
AARP’s Young Start program allows you to purchase whole life insurance coverage for a child or grandchild that’s younger than 18. There’s no medical exam and the policy builds cash value, similar to their standard whole life policy, but there are only three levels of coverage:
This type of policy is typically intended to shield parents and relatives from the costs associated with a child dying early. However, AARP’s policy is different from those offered by other insurers as coverage is not interrupted when the child comes of age (turning 21) and premiums are level for as long as the policy remains in force. This means that, while the policy’s cash value will grow very slowly, it can continue to grow for decades and is available if your child or grandchild ever wants to access it.
In addition, if you pass away, the child or grandchild won’t have to pay premiums to keep coverage in place until they turn 21. At that point, premium payments will continue to be set according to whatever amount of coverage was initially purchased.
The AARP’s life insurance policies are underwritten and managed by New York Life, an insurer with an A++ (Superior) financial strength rating for life insurance companies from A.M. Best. However, New York Life’s program with the AARP receives a significant number of critical reviews with regards to claims handling.
When you pass away, your beneficiary will need to file a claim in order to receive your policy’s death benefit. This is normally a simple process unless you pass away during the first two years of coverage, as the insurer is able to investigate and contest the circumstances of your death (they may not have to pay if you die from suicide or a pre-existing condition you failed to disclose). Since AARP’s policies are sold to seniors, a large number of policyholders pass away during that 2-year period and reviews from beneficiaries indicate long and challenging investigations.
To reduce issues, make sure to carefully read all application questions from AARP and New York Life and answer them as honestly and completely as possible. You should also give your beneficiaries access to a copy of the policy and all payment records. Finally, let your beneficiaries know that they’ll be better served directly contacting the AARP if they have issues during the claims process with New York Life.
To compare AARP/New York Life insurance's cost with its competitors, our editors gathered sample rates for term and guaranteed acceptance whole life policies from AARP/New York Life, as well as top competitors in each category. We considered rates for seniors at a variety of ages and health levels, including whether or not the insured person smokes.
To evaluate the service from AARP/New York Life, ValuePenguin considered online customer reviews from around the web. We also considered A.M. Best's Financial Strength Rating, which describes the company's overall financial health and ability to pay claims.
Guaranteed acceptance quotes were collected directly from each insurer's website. Term life insurance quotes were pulled from Compulife, a software subscription.
ChatGPT can pass parts of the US medical licensing exam, researchers have found, raising questions about whether the AI chatbot could one day help write the exam or help students prepare for it.
Victor Tseng, MD, and his colleagues at Ansible Health, a company that manages mostly homebound patients with chronic lung disease, initially wanted to see whether ChatGPT could aggregate all the communications regarding these patients, which would allow Ansible to better coordinate care.
"Naturally, we wondered how ChatGPT might augment patient care," Tseng, Ansible's vice president and medical director, told Medscape. A group of volunteers at the company decided to test its capabilities by asking it multiple choice questions from the US Medical Licensing Examination (USMLE), given that so many of them had taken the medical licensing exam.
"The results were so shocking to us that we sprinted to turn it into a publication," said Tseng. The results were published as a preprint on medRxiv. They were so impressed that they allowed ChatGPT to collaborate as a contributing author.
ChatGPT wrote the abstract and results sections "with minimal prompting and largely cosmetic adjustments from the human co-authors," said Tseng. The bot also contributed large sections to the introduction and methods sections. The authors "frequently asked it to synthesize, simplify, and offer counterpoints to drafts in progress," Tseng said. He likened it to how co-authors might interact over email. They decided they would not credit ChatGPT as an author, however.
The article has been accepted in the peer-reviewed journal PLOS Digital Health and will be published soon, Tseng told Medscape.
Alex Mechaber, MD, vice president of the USMLE Program at the National Board of Medical Examiners (NBME), said the organization is not surprised by the study's results, "in part because the input material that's used for ChatGPT is largely representative of medical knowledge." AI is most likely to be successful with multiple-choice type questions, Mechaber told Medscape.
San Francisco–based OpenAI developed ChatGPT, a large language model. The tech giant Microsoft considers ChatGPT and OpenAI's other applications so promising that it has already invested $3 billion and is reportedly poised to put another $10 billion into the company.
ChatGPT's algorithms are "trained to predict the likelihood of a given sequence of words based on the context of the words that come before it." Theoretically, it is "capable of generating novel sequences of words never observed previously by the model, but that represent plausible sequences based on natural human language," according to Tseng and his co-authors.
Released to the public in November 2022, ChatGPT has been used to write everything from love poems to high school history papers to website editorial content. The bot draws on a data store that includes everything that has been uploaded to the internet through 2021.
Tseng and colleagues tested ChatGPT on hundreds of multiple-choice questions covered in the three steps of the USMLE exam.
For each step, the researchers prompted the chatbot in three ways. First, it was given a theoretical patient's signs and symptoms and asked to pontificate on what might be the underlying cause or diagnosis.
Next, after ChatGPT was refreshed to eliminate potential bias from any retained information from the previous exercise, it was given the questions from the exam and asked to pick an answer. After again refreshing ChatGPT, the researchers asked it to "please explain why the correct answers are correct and why the incorrect answers are incorrect."
The answers were reviewed and scored by three board-certified, licensed physicians.
For the open-ended format, ChatGPT's accuracy for Step 1 ranged from 43% when "indeterminate" answers were included in the analysis to 68% when those responses were excluded. An indeterminate answer is one in which the chatbot either gave a response that was not available among the multiple choices that were presented or said it could not commit to an answer. For Step 2, the pass rate was 51%/58%, and for Step 3, it was 56%/62%.
When asked the questions verbatim, ChatGPT's accuracy was 36/55% for Step 1, 57%/59% for Step 2CK, and 55%/61% for Step 3. When asked to justify its responses, its accuracy rate was 40%/62% for Step 1, 49%/51% for Step 2, and 60%/65% for Step 3.
The pass rate for students varies according to whether it's a first exam or a repeat exam and whether the test-taker is from the United States or a different country. In 2021, for Step 1, the pass rate ranged from a low of 45% for repeaters to a high of 96%. For Step 2, the range was 62% to 99%, and for Step 3, the range was 62% to 98%.
"What's fascinating is that in the Step 2 and 3, which are more clinically advanced, only around 10% of [ChatGPT's] responses were indeterminate," said Tseng.
USMLE's Mechaber noted that ChatGPT was only given a sampling of questions, not an real practice test. And it did not attempt questions that use images or sounds or the case-based computer simulation studies administered in Step 3, he said.
Tseng suggests in his article that ChatGPT could potentially be used as a study aide for students preparing for the USMLE or to write questions for the exam.
"We're thinking about that," Mechaber said about its use as a study tool. But since ChatGPT still produces so many wrong answers, the technology is not likely "ready for prime time," he said. As to whether ChatGPT could write test questions, the NBME has shown interest in "automated item generation," he said.
"We're investigating [ChatGPT] with excitement and curiosity" for its potential use in medicine, Mechaber said.
An NBME staff member decided to query ChatGPT about whether it was a threat to the USMLE. The bot said that while it is a "powerful tool for natural language processing," it "is not a threat to the United States Medical Licensing Examination (USMLE)."
In a lengthy response, the bot added, "ChatGPT, while impressive in its ability to generate human-like text, is not specifically designed to test medical knowledge and is not a substitute for the rigorous training and education required to become a licensed physician."
In addition, ChatGPT "does not have the ability to think critically or solve problems in the way that a human physician would," it said.
The bot also brought up ethical considerations, noting that since AI models "are based on machine learning which can be biased, hence the results generated by the model may not be accurate and unbiased.
"ChatGPT is an impressive tool for natural language processing, but it is not a replacement for the specialized knowledge, critical thinking and ethical considerations that are essential for the practice of medicine," it said. "The USMLE remains an important and valid way to evaluate the knowledge and abilities of aspiring physicians," said the bot.
The study was conducted by volunteers and was not funded by any source. Tseng is a full-time employee of and writes test questions for U World, a USMLE test prep company.
Alicia Ault is a Saint Petersburg, Florida–based freelance journalist whose work has appeared in publications including JAMA and Smithsonian.com. You can find her on Twitter @aliciaault.
For more news, follow Medscape on Facebook, Twitter, Instagram, and YouTube.
In part one of this two-part article, design engineer Michael Paloian outlined the first four milestones of a product-design project, ending with the “cornerstone” of the product-design cycle — concept refinement and detailing. The milestones leading up to the final phase — testing, verification, validation — are discussed here.
As a reminder, most product-design projects progress through similar evolutionary steps of development. They are:
Design projects vary in complexity, and some may require only a few of the steps outlined above while others may require more phases of development. Each of these development steps can be considered a project milestone. In part two of this article, we will discuss the considerations that must be included within each of these milestones starting with step five, as well as the real-world factors that often alter the best-laid plans. Steps one through four are discussed in part one of this article.
Most projects gradually transition from one phase of development to the next versus abruptly closing one chapter and opening a new one. A realistic transition from the previous phase to engineering development is gradual, iterative, and highly interactive. As aesthetic design details are worked out by industrial designers, engineers are concurrently superimposing the proposed concept over technical parameters. This lengthy process requires extremely good communication between the engineering department and the entire design team. Engineering development of plastic parts within a complex system requires multidisciplinary skills, experience, and knowledge. Engineers or engineering teams must completely understand the product, its use and market, environmental considerations, safety, structural requirements, design for manufacturing (DFM), plastics material options, tool design, and many other areas of knowledge. Engineers with expertise in designing plastic parts should have a basic understanding of plastic materials. Since there are hundreds of thousands of plastic materials to choose from, it’s advantageous for these individuals to rely on material suppliers, molders, or plastics material consultants to assist them in selecting the optimal material for a given application. Choosing the right plastic material is critical to the performance, safety, and reliability of any plastic product. It will also affect the design. Material selection could influence the number of parts required for a sub-assembly, wall thickness, or structural features such as ribs. In addition, engineers should also reach out to molders and tool makers during the initial phases of part design to avoid the costly task of completely rebuilding a CAD model due to feature changes at the root of the feature tree. Since solid CAD models are created from a series of interdependent features that can often exceed many hundreds, it’s undesirable to change those at the root of the list. These changes can result in costly man-hours of rework. This milestone phase of design development requires designers and engineers to constantly interact with numerous project contributors to avoid constant and unnecessary rework. A key requirement for this interactive process to progress efficiently is the regular exchange of honest and decisive information. There is no room in this process for bureaucrats and indecisive individuals solely concerned with job security. Information must be shared quickly, honestly, and decisively. Engineers must maintain an open mind and be willing to continually subject their designs to critical scrutiny.
All complicated innovative products require one or more intermediate prototypes during every phase of development. The type of prototype will depend on the parameter being evaluated. It’s surprising how many companies invest hundreds or thousands of man-hours developing a complete design before fabricating a prototype. This approach often results in products with serious hidden design flaws or extremely delayed product introductions with huge cost overruns. Product designs can be virtually guaranteed to be introduced on time, on budget, and with the highest performance if critical parameters are identified from the outset of the project. Identifying critical parameters that could affect the reliability or performance of plastic parts requires lots of knowledge. Therefore, it is ideally conducted by a team of project contributors with different areas of expertise. Materials experts, for example, may raise questions about environments of use and proposed product assembly. Exposure to ultraviolet light, chemicals, thermal considerations, regulatory requirements, fatigue, and so forth will affect the type of plastic material to be specified. Tool designers and molders may identify potential molding problems that could result in poor fills, knit lines, sink marks, or undercuts. Engineers are often faced with situations where no reliable information is available. This is when a prototype is required to determine the best solution. An example might be something as simple as evaluating the long-term performance of a plastic part exposed to a chemical or lubricant under stress for a specified period of time. Another example might be evaluating the feel of a snap lock for a cover or cap based on a specific material. A third example might include the impact strength of a knit line for a glass-filled nylon material. Each of these examples will require a prototype that could range from a sample chip of plastic to a small injection-molded test part or a 3D-printed prototype. Establishing the test parameters and procedures is as critical as identifying the potential failure point of a product.
Product documentation is a closing chapter of the engineering development and design process. At one time before 3D CAD, parts were fully detailed and dimensioned with tolerances in 2D orthographic drawings, which were released to tool makers for production tooling. Today, 3D CAD files are released to tool makers for machining molds and the 2D documentation is used as reference information. 2D drawings are rarely fully dimensioned, since CNC machines are programmed directly from data within 3D CAD files that contain all the dimensional and geometric information. Although this process does not depend upon 2D drawings for fabricating molds, it is highly dependent on the technical information stated in the documentation. 2D orthographic production control drawings typically contain the following information:
Although this list is not inclusive of every significant specification, it does represent the most common parameters. Dimensions can be specified using conventional formats or with geometric tolerances that provide more explicit information.
This significant milestone in the design process can be considered equivalent to a contract between the molder and you, the customer. A significant portion of the documentation includes assembly drawings, which are critical to the overall performance, safety, and reliability of the product. Plastic subassemblies frequently include any of the following hardware or information:
Although this list is not all inclusive, it does represent many of the commonly assembled parts and operations for plastic parts. Engineers should have a comprehensive understanding of these operations, including materials, techniques, precautions, and consequences. Improper specifications or omissions could result in serious product failures or chronic product malfunctions. For example, specifications for adhesives are extremely technical and complex. It’s advisable for engineers to discuss the application, environmental conditions, and materials to be bonded with a reputable adhesives’ provider as part of this project milestone. A technical adhesives engineer will describe critical requirements for surface preparation, application, and curing conditions. He or she will also provide chemical resistance, thermal limits, and manufacturing considerations, all of which must be considered as well as specified within the documentation. Similar technical advice can be provided by ultrasonic assembly experts, printers, and insert suppliers.
It’s customary for engineers to create, test, and evaluate a fully functional preproduction plastic prototype before releasing CAD files to a tool maker for production tooling. This last step before production release of CAD files should be completed according to the objectives of evaluating the prototype.
Rapid prototyping offers engineers the most efficient and cost-effective means of fabricating a preproduction prototype, but it has limitations that must be considered. For example, rapid prototypes will not reveal potential tolerance problems that could be experienced during molding. This information can only be obtained or confirmed by a molder. Rapid prototypes will not represent the real physical properties of the final injection-molded part, which are dependent on the specific resin grade, processing conditions, and mold design. Finally, rapid prototypes will not reveal potential molding problems such as sink marks, knit lines, excessively thin walls, or warpage. These can only be predicted by simulation software or experienced molders who can pinpoint these potential problems based on their experience. It’s therefore recommended to seek the advice of appropriate experts during this final milestone of product development.
A preproduction prototype will provide invaluable information pertaining to overall fit, basic function, overall appearance, and product use. Fully functional prototypes can be finished to look like the final production unit. Engineers can also measure heat dissipation, vibration, effectiveness of snap fits, and many other design features with a fully functional rapid prototype. Interaction and close communication with prototyping resources is as important as any other milestone during the development process. It’s common for pre-production designs to be creatively modified for prototyping purposes, since some molded features may be impossible to effectively prototype. Collaboration with a prototyping partner will often present options for achieving difficult features with good representation.
The most exciting moment during the design of plastic products is releasing CAD files and documentation to a tool maker for fabrication of injection molds. After this milestone is achieved, there is no turning back or opportunity to make changes after metal is cut. Last-minute design changes can only be made without serious cost impact if they are made before molds are designed or, more importantly, before molds are in the process of being machined. This major milestone requires the engineer to be certain about the plastic material specified in the documentation, since this affects shrinkage and mold features. It is also critical to be aware of gate location, which will affect knit lines, part fill, appearance, and function. Part design must include enough draft for all features, including textured surfaces that require additional draft depending on depth of texture. Wall-thickness cross sections should have been Verified to allow adequate fill during molding as well as preventing sink marks in parts. Discussions with a mold maker will impart information on tool design, knockout pin locations, and overall mold quality, which will affect the final part form, fit, and function.
This milestone typically spans a time period ranging from one to six months, depending on mold complexity, number of molds, backlog of the mold shop, and potential changes. Communication between product design engineers and tool makers typically drops off after the machining operations begin. Toward the end of the tooling fabrication phase of the project it’s important to coordinate with the tool maker to discuss the final steps of the process. This last operation pertains to surface texturing, which involves acid etching specified areas of the molds. Since this final step in the tool-making process prohibits any design modifications to be added to the mold without major costs, it’s advisable to schedule mold texturing after first shots are evaluated. Although this precautionary step adds time to the schedule, it does provide you with an added level of assurance that your design and the molded production parts are within specifications.
After plastic injection molds are finished and approved, a pilot production run is customarily completed. This milestone typically requires several previous milestones to have been completed. sample lots of material for each part must be available. Some materials may require long lead times and must be ordered well in advance of the pilot run. Often, custom colors or special formulations require high minimum orders, which make it prohibitive for sample runs. Engineers must be ready to suggest alternative substitutes that satisfy performance criteria.
Frequently, first articles may exhibit slight sink marks, splay, slight warpage, or minor tolerance problems that require processing adjustments and minor tooling revisions. The engineering and design team must be prepared to methodically examine every part based on overall product specifications. This critical phase of a project requires professionals with exceptional problem solving and analytical skills. Identifying the root cause of parts not fitting to each other as expected is often challenging and very difficult to pinpoint. The worst decision choice during this process is yielding to suggestions from others for changing the design without definitively identifying the problem. This decision path is costly, time consuming, and, worst of all, it complicates solving the problem.
Well-designed parts that are optimized for injection molding are easily molded in spec if they are properly analyzed. The completion of this milestone is accomplished after the final assembled pre-production product has been carefully reviewed by all team members and accepted as suitable for production.
The last milestone in the design and development process is verification and validation. Preproduction samples should be tested and proven to perform in accordance with specifications before authorization for full production is granted by corporate management. Preproduction parts are typically evaluated based on several parameters, some of which are listed below:
All these parameters must be evaluated based on very well-defined protocols and procedures. The results are only as good as the scope of the evaluation. It’s imperative for engineers defining the test procedure to fully understand the product and its potential failure points. Some of these tests may be destructive, while others may require many hours, weeks, or months to complete. Only after the testing is completed can a product be confidently authorized for full production, which officially completes the last milestone in a product-development cycle.
I hope this article has enlightened you to the many considerations and critical milestones throughout the product-development cycle. If you wish to contact me to discuss your experiences or share comments, please feel free to e-mail me at [email protected].
About the author
Michael Paloian is President of Integrated Design Systems Inc. (IDS), located in Oyster Bay, New York. He has an undergraduate degree in plastics engineering from UMass Lowell and a master's of industrial design from Rhode Island School of Design. Paloian has an in-depth knowledge of designing parts in numerous processes and materials, including plastics, metals, and composites. Paloian holds more than 40 patents and was past chair of SPE RMD and PD3. He frequently speaks at SPE, SPI, ARM, MD&M, and IDSA conferences. He has also written hundreds of design-related articles for many publications.