Insights
Exploring the Causes of Clinical Trial Failures
Allan M. Green, MD,PhD,JD
Failed clinical trials can cost biotechnology companies tens of millions of dollars and may impair seriously the survivability of a development stage company. These are good reasons to examine the causes of clinical trial failures in the hope of learning from the experience. There are no experiences so bad, it is said, that they cannot serve as terrible examples for others.
Clinical trials are formed from a complex interplay of basic scientific research data, clinical experience and instinct, marketing and financial pressures and labyrinthine FDA statutes, regulations, guidelines and practices. Most often, data from animal models has suggested a bright clinical future for a new compound. It has often been noted that animals are not people, and certain assumptions are always built into preclinical modeling of human disease. Nonetheless, prudence and experience necessitates such models be studied as guidelines to the design of safe human trials.
Over the past two decades, the biotechnology industry has achieved a number of impressive clinical successes. It has also waded optimistically into areas of clinical research that were only dimly and imperfectly understood. For example, two areas of research that have yielded little success for many companies are the fields of anti-sepsis agents and anti-angiogenic drugs. In the period of 1993-95, the failure of a series of drugs intended to interfere with endotoxin-mediated septic shock cost the industry hundreds of millions of dollars and lead to the sale of several promising R&D stage biotechnology companies. More recently, there have been a number of failures of clinical trials of new drugs intended to prevent the formation of new blood vessels required for tumor growth. Are there lessons to be learned from these failures?
Six Causes for Clinical Trial Failure
A number of common factors appear to be involved in the failure of phase II or phase III clinical trials to demonstrate clear evidence of therapeutic efficacy:
1) The compound may simply be ineffective. That is, the agent may not be useful for the intended purpose under study. For example, The Scientist reported last year that there had been 12 clinical trial failures since 1999 on angiogenesis inhibitors. Pharmacia's SU-5416, which targeted the VEGF receptor failed in a phase III study in late 2003.. When the agent was first developed, it was not appreciated that there are a broad array of pro-angiogenic factors at work in the neighborhood of growing tumors. Even if the VEGF receptor is blocked, the body uses other pro-angiogenic molecules to work around the single blocked pathway and support the growth of new tumor blood vessels.
2) The dose or the dosage schedule administered may be ineffective. In the anti-VEGF study noted above, it was also found that the twice weekly administration schedule for the drug was probably not frequent enough to maintain the blockade of the VEGF receptor.
3) The test compound may fail to reach its intended site of action. Many monoclonal antibodies directed against solid tumors have failed in human clinical trials because they are removed from the circulation by the liver or kidneys before they reach the tumor or the tumor blood flow an interstitial pressure prevent the antibody from penetrating the tumor.
4) The compound may be administered at the wrong time in the disease process to mediate the pathophysiological process it was designed to affect. This problem was found in the numerous trials of anti-sepsis agents studied in individuals who had already passed the point of endotoxin induction of what is now called systemic inflammatory response syndrome and probably had irreversible end-organ damage when enrolled.
5) The trial population group may be inappropriate for demonstrating the effect of the agent. For instance, an anticancer drug may be effective for certain tumors, but not for others.
6) The outcome measures used to determine drug effect may lack the sensitivity to detect a change. For example, the standard clinical psychometric scales used to demonstrate changes in Alzheimer's disease may only show a change in 70% of untreated subjects with early stage disease after a year of observation. Unvalidated outcome measures, such as pain scales or activity indices, are a particular source of problems leading to clinical study failures.
Analysis of Failed Clinical Trials for Fabry's disease
A striking example of the importance of selection of appropriate study outcome measures—or, as the FDA puts it, study end-points—was provided by the recent FDA medical advisory committee reviews of two agents intended to treat Fabry's Disease. Fabry's is a rare genetic disorder in which an enzyme known as a-Galactosidase-A (Gal-A) is not functional. Gal-A is necessary to metabolize a class of lipids known as ceramides (GL-3) which then build up in the endothelial cells lining the blood vessels of vital tissues, interfering with organ function. The disease is characterised by pain, renal failure and cardiac failure as the abnormal GL-3 lipid accumulates in the blood vessels supplying the kidneys and the heart.
Since Fabry's is an "orphan" disease, one affecting less than 200,000 Americans, the first company which receives approval for a treatment product receives exclusivity for seven years. Two companies were both working to develop a form of Gal-A for replacement therapy that might prevent the abnormal lipid accumulation and subsequent organ damage. The first product which received approval would pre-empt approval of the other product for seven years.
Company A produced recombinant Gal-A and embarked on a clinical trial of the effectiveness of replacement therapy using the expected clinical benefits (i.e., the study end-points) of 1) pain reduction, 2) stabilization or improvement in kidney function, 3) increase in body weight, and 4) reduction of cardiac enlargement and improvement of cardiac function. All phase II clinical studies used a dose of 0.2mg/kg given intravenously every two weeks. The FDA reviewer noted that this was not a dose evaluated in the Phase I study. Because this is an extremely rare disorder, the sponsor was able to perform two placebo-controlled studies of six month duration in which only 41 subjects were enrolled. One of these studies, with 26 subjects, employed a novel pain scale which required subjects to be temporarily off pain medication to assess benefit. It also included kidney biopsies and an assessment of kidney function. The second study enrolled 15 subjects, and focused on cardiac outcomes, including endomyocardial biopsies. Renal biopsies were not done in the second study.
Unfortunately, the primary pain endpoint for the first study—change in pain while off pain medications—was difficult to measure reproducibly and did not reach statistical significance in a population of this size. Moreover, the robustness of the study data was undermined by a failure of the sponsor to audit the data and correct certain problems before an initial unblinded data analysis. In addition, the lack of any prospective explicit definition of "pain medications" caused a confusing situation in which non-traditional analgesics such as anti-epileptic agents were considered pain medicines, but narcotics were not. Consequently, the FDA felt that it could not draw any conclusions about the effect of the treatment on pain—the primary study endpoint. Although there was an increase in creatinine clearance, a measure of the adequacy of kidney function, in the treated group at 24 weeks, the FDA reviewers noted that all of the improvement was seen between week 23 and week 24, a physiologically improbable change. Moreover, the glomerular filtration rate, another measure of kidney function which may be expected to parallel changes in creatinine clearance, was no different between the treated and untreated groups.
Finally, the sponsor had looked at kidney biopsies using a complex measuring instrument known as the Acute Lipid Damage Score which involved many different histologic measurements. Of these, the only improvement noted was in the vascular endothelium—the cells lining the blood vessels. The FDA reviewers noted that the degree to which all histologic changes related to renal function was unknown and that the histopathology data was "hampered by a lack of rigor in the slide grading."
A common error in clinical study design is the prospective selection of many secondary endpoints for the study. This problem often arises from a misguided notion that the more endpoints one selects, the better the chance that one of them will reach statistical significance. The sponsor's use of many secondary endpoints actually worked against them in this case. One advisory committee member focused on this failing: "in neither [randomized] study was the primary outcome significant, and one always has to devalue the P values [for clinical significance] that you find in specific [secondary endpoint] components underneath that by the fact that there are very large numbers of other examinations. Therefore, even the solidity of the [statistically significant] finding that we see is attenuated by the fact that there are many, many tests." The Advisory Committee Chairman reviewing the submission summed up the matter: "While some renal function or renal histology outcomes suggested a treatment effect, there were secondary or exploratory endpoints in these studies that were inconsistent and/or contradictory with multiple other endpoints. These data prohibit reaching clear conclusions regarding beneficial effects of treatment on these organs." Based on the many questions raised by the conduct of these clinical studies, FDA requested additional information that would have required the sponsor to launch new clinical trials before product approval would be possible.
Company B, a competitor, had also been studying a recombinant Gal-A product as replacement therapy in Fabry's disease. This sponsor performed a Phase I/II dose ranging study which employed three dose levels of the enzyme replacement administered in two dosing schedules. This allowed the sponsor to plausibly determine that the optimal dose for their product was 1 mg/kg intravenously every two weeks.
This sponsor considered a number of endpoints for its phase II randomized, controlled trial for efficacy. It argued that pain is a subjective endpoint that is difficult to quantify, is episodic and wanes over time. The absence of validated pain measurement instruments in Fabry's disease suggested they would need an impossibly large number of study subjects to gather valid data. This endpoint was discarded. They also considered cardiac function as an endpoint; however this was also eliminated because the cardiac event rates were poorly documented and could not be distinguished from problems due to atherosclerosis or hypertension.
They then focused on kidney function as an endpoint. It was noted that renal function may remain normal for many years in these subjects before it deteriorates. Thus, demonstration of a significant difference between treated and untreated subjects might take many years and would require a large trial. However, using the rules available under the accelerated approval mechanism available for novel drugs intended for the treatment of life-threatening diseases, the sponsor worked with the FDA to define a so-called surrogate endpoint "reasonably likely" to predict clinical benefit. The hallmark of the disease was known to be the abnormal deposit deposit of GL-3 lipid in tissues. The Phase I /II study included kidney biopsies. These revealed significant dose-dependent resolution of GL-3 deposits in the region of the endothelial cells lining the kidney capillaries. Thus, the FDA agreed to a randomized phase II study whose surrogate principal endpoint was reduction in GL-3 in the capillary endothelium of the kidney. By week 20 of treatment with 1 mg/kg Company B Gal-A every two weeks, 69% of the treated population had no detectable abnormal deposits of the GL-3 lipid in the capillaries of the kidney. The phase II study had unequivocally reached its primary endpoint.
Within four months, FDA announced the approval of the Company B drug stating that "it is believed likely that this reduction of fat deposition will prevent the development of life-threatening organ damage and have a positive health effect on patients." Company B had worked with the agency in the context of the accelerated approval regulations. Company A had not laid such groundwork. During the advisory committee hearing, an FDA reviewer summed up the problem for Company A: "The framework in which we are bringing it [the Company A submission] to you [the advisory committee] is that this BLA was submitted to the agency asking for a conventional approval on the basis of the clinical data that was supplied to us." Since Company A's two randomized studies failed to reach statistical significance for their primary endpoints, there was no regulatory basis for product approval.
Close consultation with the FDA leading to the design of a robust clinical trial with a reasonably achievable primary endpoint was clearly key to clinical and commercial success.
"It is a good morning exercise for a research scientist to discard a pet hypothesis every day before breakfast." – Konrad Lorenz, "On Aggression," 1966