Kert Viele and Christina Saunders
My colleague Scott Berry recently presented a webinar on historical borrowing.
This blog post doesn’t require you to have seen it, but his webinar is a helpful primer to this discussion.
Scott’s webinar focuses on borrowing historical control information, rather than treatment effects. If we are investigating a novel antibiotic and comparing it to doripenem (the control arm), Scott’s webinar focuses on finding external information on doripenem, with the hope of requiring fewer doripenem patients in the current trial. This is a distinct situation from borrowing information on our novel antibiotic (the risks are greater for reasons beyond the scope of this blog). To my knowledge, borrowing information on the novel arm is quite rare, with one exception- the borrowing of adult treatment effects for the analysis of a pediatric trial that uses the same treatment arms.
Publications, FDA Guidance, and Industry Collaborations
Historical borrowing has a long history in the literature. I apologize to all the worthy papers I’ve omitted in the references below. One of the earliest papers is Pocock 1976, with a lot more literature building in the early 2000s and particularly in the 2010s. Despite this academic interest, there has been limited use of historical borrowing in practice due to concern about the risks of borrowing the “wrong” control information. However, based on the financial and patient costs of repeatedly generating the same control information over and over again, focus has recently shifted towards properly balancing the benefits and risks of borrowing control information. Thus, while historical borrowing is now explicitly permitted, sponsors should properly quantify and mitigate the risks.
See, for example, the FDA Draft Guidance document on complex innovative trials found at
Section IV.A of this document only discusses the somewhat narrow setting of leveraging Phase 2 control data for Phase 3 trials. I hope this will be expanded in the final guidance, since the text applies more generally and other FDA guidances do not restrict in this way.
The FDA draft guidance on adaptive trials also addresses borrowing. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/adaptive-design-clinical-trials-drugs-and-biologics
Information on borrowing starts on page 20, line 836. This guidance gives explicit consideration to external borrowing. Importantly, it allows for the potential increase in type 1 error, which was previously the main regulatory objection to historical borrowing.
On the industry side, TransCelerate is an extensive pharma collaboration. One part of that collaboration is the systematic storage of clinical trial data explicitly intended for borrowing in future trials. They have had extensive discussions with FDA and EMA on borrowing and report generally positive interactions. https://www.efspi.org/Documents/Events/Events%202018/Statistical%20Leaders%209th%20Meeting/06_EFSPI_StatsLeaders_2018_WS_HistoricalControls.pdf
Thus, with FDA accepting historical borrowing, a wealth of methodology existing in academic journals, and the development of large-scale databases to borrow from, when should we actually borrow?
Benefits and Risks of Historical Borrowing… is it for you?
Historical borrowing, whether from real world evidence or randomized clinical trials, comes with benefits and risks. When people borrow, they hope the historical evidence will add useful information to the current trial, either allowing a decrease in sample size (more common) or an increase in the accuracy/power of the current trial. Unfortunately, the extra information may be deceiving (e.g., a historical trial conducted in 1970, however large, might provide very poor information on survival rates in 2019). When this occurs, borrowing produces inflated type I error or reduced power, combined with heavily biased estimates of treatment effects.
In his webinar, Scott lays out why these benefits and risks occur. The key quantity driving the performance of borrowing is “drift”. Drift is the difference between the observed borrowed data and the unknown true parameter in our current trial. To make this concrete, suppose we were borrowing historical doripenem data into our trial comparing doripenem to a novel antibiotic. We search the literature and find two historical studies that are “on point” (recently conducted, similar inclusion/exclusion criteria, similar sites, etc.). If both of these 2 studies showed around an 83% cure rate for doripenem, then 83% is our observed historical control estimate.
Our current control parameter is the true doripenem rate for the study we are designing. That true control parameter is unknown, but we can say a lot about “what ifs”. If the doripenem cure rate in our current study is also around 83%, this would be “small drift”. When drift is small, borrowing is beneficial because the historical data will pull any estimate from our current study closer to the truth. The result is decreased type I error, greater power, and better point estimates (you could trade these advantages for a smaller sample size). In contrast, if the true doripenem rate for today’s trial is 70% (suppose resistance to doripenem as increased), then the historical data will incorrectly raise estimates toward the historical 83%. This creates biases and lowers power. If the true rate for today’s trial is 90% (suppose our current trial enrolls a less sick population), then the borrowing will incorrectly lower current trial estimates toward the historical 83%. This makes it easier to win than it should be, inflating type I error. This inflation of type I error is the primary regulatory concern, but all these risks (biases, lower power, inflated type I error) should concern everyone.
Drift drives the performance of the trial. If drift is small, we should borrow. If drift is large, we shouldn’t. Statistical methods, based on similar calculations to Scott’s webinar, can quantify the range of drift where borrowing will be advantageous (in our doripenem example, we might get something like “if the true rate for doripenem is between 80% and 86%, then borrowing is valuable”). But we are still stuck not knowing if the true doripenem rate actually falls in that range.
Thus, sponsors must assess the likelihood that the drift will be small enough. In our example we must ask, “What is the chance the true rate is between 80% and 86%?”. While we can never be certain, this discussion can be guided by the similarity of the current study to previous historical studies. Were they conducted at similar times? Are the patient populations comparable, both in terms of inclusion/exclusion and any other relevant factors? Real world evidence is governed by the same principles but has the added challenge of often non-randomized data.
Whether or not to borrow then becomes a sponsor decision. Long-run performance should be considered. If you need historical rates of 80-86% for good performance in the doripenem trial and only feel confident of 78-88%, borrowing may still be worthwhile. If you borrow historical data repeatedly, you will make better decisions for your portfolio as a whole, even if some individual trials are worsened.
How much control do you have over benefits and risks?
The sponsor can choose how aggressively to borrow. This can go anywhere from “completely” (just run a single arm trial and assume the historical data is exactly right) to “none” (just ignore the historical data). The more aggressively you borrow, the larger the benefits to borrowing, but the range of benefit gets smaller. Returning to doripenem, if you did a calculation and got a benefit for borrowing as long as the true rate is between 80-86%, you could borrow less aggressively and get a smaller benefit over a larger range (this might make you more comfortable if you thought 78-88% was more likely for the current trial). You could also borrow more aggressively and get greater benefit, but in a much narrower range (for example 82-84%). Any increased benefit in the narrow range of small drift also comes with greater risks for large drift. If you borrow aggressively, you might get great benefits from 82-84%, but the risks will be elevated for all true doripenem rates outside that range. Unless you are sure of 82-84%, this is a very risky strategy (don’t do this).
This latter point is why I’m very reticent to use single arm trials unless absolutely necessary. Without any current control patients, you typically have a huge benefit to borrowing, but it’s typically in an exceptionally narrow range. And most people running single arm trials don’t quantify that range. I think if they did, they would be surprised and very worried. Even enrolling a modest number of controls significantly mitigates the risks in that setting.
There is a lot of good work going on in historical borrowing right now. In addition to Scott’s webinar, Satrajit Roychoudhury from Pfizer will be speaking in the Bayesian KOL webinar series on November 22.
If you miss it, slides should be posted at
after the webinar.
There should also be a Transcelerate paper coming out soon (working through publication process after acceptance) which discusses in great detail which studies might be ideal candidates for borrowing. I’ll try to update this post with a link when that occurs.
Below is a very abbreviated set of references. The first few are methodological, the last is a review article from the DIA Bayesian working group on borrowing information. Again, apologies to the many I missed!
Pocock SJ. The combination of randomized and historical controls in clinical trials. Journal of Chronic Diseases 1976;29;175-188.
Hobbs BP, Carlin BP, Sargent DJ. Commensurate prior for incorporating historical information in clinical trials using general and generalized linear models. Bayesian Analysis 2012;7;1-36.
Ibrahim JG, Chen MH. Power prior distributions for regression models. Statistical Science 2000;15;46-60.
Neuenschwander B, Capkun-Niggli G, Branson M, Spiegelhalter D. Summarizing historical information on controls in clinical trials. Clinical Trials 2010;7;5-18.
Chen MH, Ibrahim JG. The relationship between the power prior and hierarchical models. Bayesian Analysis 2006;1;551-74.
Lim J, Walley R, Yuan J, Liu J, Dabral A, Best N, Grieve A, Hampson L, Wolfram J, Woodward P, Yong F, Zhang X, Bowen E. Minimizing patient burden through the use of historical subject-level data in innovative confirmatory clinical trials: review of methods and opportunities. Therapeutic Innovation and Regulatory Science. 2018;52;
Viele K, Berry S, Neuenschwander B, Amzal B, Chen F, Enas N, Hobbs B, Ibrahim J, Kinnersley N, Lindborg S, Micallef S, Roychoudhury S, Thompson L. Use of historical control data for assessing treatment effects in clinical trials. Pharmaceutical Statistics 2014;13;41-54.