Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

The Supreme Court's interpretation of statistical correlation may determine the future of personalized medicine

Summary/Background

The Supreme Court heard oral arguments last week in the case Mayo Collaborative Services vs. Prometheus Laboratories (No 10-1150). At issue is a patent Prometheus Laboratories holds for making decisions about the treatment of disease on the basis of a measurement of a specific, naturally occurring molecule and a corresponding calculation. The specific language at issue is a little technical, but the key claim from the patent under dispute is:

  1. A method of optimizing therapeutic efficacy for treatment of an immune-mediated gastrointestinal disorder, comprising:

(a) administering a drug providing 6-thioguanine to a subject having saidimmune-mediated gastrointestinal disorder; and

(b) determining the level of 6-thioguanine in said subject having said immune-mediated gastrointestinal disorder,

wherein the level of 6-thioguanine less than about 230 pmol per 8x10^8 red blood cells indicates a need to increase the amount of said drug subsequently administered to said subject and

wherein the level of 6-thioguanine greater than about 400 pmol per 8x10^8 red blood cells indicates a need to decrease the amount of said drug subsequently administered to said subject.

So basically the patent is on a decision made about treatment on the basis of a statistical correlation. When the levels of a specific molecule (6-thioguanine) are too high, then the dose of a drug (thiopurine) should be decreased, if they are too low then the dose of the drug should be increased. Here (and throughout the post) correlation is interpreted more loosely as a relationship between two variables; rather than the strict definition as the linear relationship between two quantitative variables.

This correlation between levels of 6-thioguanine and patient response was first reported by a group of academics in a paper in 1996. Prometheus developed a diagnostic test based on this correlation. Doctors (including those at the Mayo clinic) would draw blood, send it to Prometheus, who would calculate the levels of 6-thioguanine and report them back.

According to Mayo’s brief, some Doctors at the Mayo, who used this test, decided it was possible to improve on the test. So they developed their own diagnostic test, based on a different measurement of 6-thioguanine (6-TGN) and reported different information including:

  • A blood reading greater than 235 picomoles of 6-TGN is a “target therapeutic range,” and a reading greater than 250 picomoles of 6-TGN is associated with remission in adult patients; and
  • A blood reading greater than 450 picomoles of 6-TGN indicates possible adverse health effects, but in some instances levels over 700 are associated with remission without significant toxicity, while a “clearly defined toxic level” has not been established; and
  • A blood reading greater than 5700 picomoles of 6-MMP is possibly toxic to the liver.

They subsequently created their own proprietary test and started to market that test. At which point Prometheus sued the Mayo Clinic for infringement. The most recent decision on the case was made by a federal circuit court who upheld Prometheus’ claim. A useful summary is here.

The arguments for the two sides are summarized in the briefs for each side; for Mayo:

Whether 35 U.S.C. § 101 is satisfied by a patent claim that covers observed correlations between blood test results and patient health, so that the patent effectively preempts use of the naturally occurring correlations, simply because well-known methods used to administer prescription drugs and test blood may involve “transformations” of body chemistry.

and for Prometheus:

Whether the Federal Circuit correctly held that concrete methods for improving the treatment of patients suffering from autoimmune diseases by using individualized metabolite measurements to inform the calibration of the patient’s dosages of synthetic thiopurines are patentable processes under 35 U.S.C. §101.

Basically, Prometheus claims that the patent covers cases where doctors observe a specific data point and make a decision about a specific drug on the basis of that data point and a known correlation with patient outcomes. Mayo, on the other hand, says that since the correlation between the data and the outcome are naturally occurring processes, they can not be patented.

In the oral arguments, the attorney for Mayo makes the claim that the test is only patentable if Prometheus specifies a specific level for 6-thioguanine and a specific treatment associated with that level (see page 21-24 of the transcript). He then goes on to suggest that the Mayo would then be free to pick another level and another treatment option for their diagnostic test. Justice Breyer disagrees even with this specific option (see page 38 of the transcript and his fertilizer example). He has made this view known before in his dissent to the dismissal of the Labcorp writ of certori (a very similar case focusing on whether a correlation can be patented).

Brief summary: Prometheus is trying to patent a correlation between a molecule’s level and treatment decisions. Mayo is claiming this is a natural process and can’t be patented.

Implications for Personalized Medicine (a statistician’s perspective)

I believe this case has major potential consequences for the entire field of personalized medicine. The fundamental idea of personalized medicine is that treatment decisions for individual patients will be tailored on the basis of data collected about them and statistical calculations made on the basis of that data (i.e. correlations, or more complicated statistical functions).

According to my interpretation, if the Supreme Court rules in favor of Mayo in a broad sense, then this suggests that decisions about treatment made on the basis of data and correlation are not broadly patentable. In both the Labcorp dissent and the oral arguments for the Prometheus case, Justice Breyer argues that the process described by the patents:

…instructs the user to (1) obtain test results and (2) think about them.

He suggests that these are natural correlations and hence can not be patented, just the way a formula like E = mc^2 can not be patented. The distinction seems to be subtle, where E=mc^2 is a formula that exactly describes a property of nature, the observed correlation is an empirical estimate of a parameter calculated on the basis of noisy data.

From a statistical perspective, there is little difference between calculating a correlation and calculating something more complicated, like the Oncotype DXsignature. Both return a score that can be used to determine treatment or other health care decisions. In some sense, they are both “natural phenomena” - one is just more complicated to calculate than the other. So it is not surprising that Genomic Health, the developers of Oncotype, have filed an amicus in favor of Prometheus.

Once a score is calculated, regardless of the level of complication in calculating that score, the personalized decision still comes down to a decision made by a doctor on the basis of a number.So if the court broadly decides in favor of Mayo, from a statistical perspective, this would seemingly pre-empt patenting any personalized medicine decision made on the basis of observing data and making a calculation.

Unlike traditional medical procedures like surgery, or treatment with a drug, these procedures are based on data and statistics. But in the same way, a very specific set of operations and decisions is taken with the goal of improving patient health. If these procedures are broadly ruled as simply “natural phenomena”, it suggests that the development of personalized decision making strategies is not, itself, patentable. This decision would also have implications for other companies that use data and statistics to make money, like software giant SAP, which has also filed anamicus brief in support of the federal circuit court opinion (and hence Prometheus).

A large component of medical treatment in the future will likely be made on the basis of data and statistical calculations on those data - that is the goal of personalized medicine. So the Supreme Court’s decision about the patentability of correlation has seemingly huge implications for any decision made on the basis of data and statistical calculations.Regardless of the outcome, this case lends even further weight to the idea that statistical literacy is critical, including for Supreme Court justices.

Simply Statistics will be following this case closely; look for more in depth analysis in future blog posts.

Interview w/ Mario Marazzi, Puerto Rico Institute of Statistics Director, on the importance of Government Statisticians

[Desplace hacia abajo para traducción al español]

In my opinion, the importance of government statisticians is underappreciated. In the US, agencies such as the CDC, the Census Bureau, and the Bureau of Labor Statistics employ statisticians to help collect and analyze data that contribute to important policy decisions. How many students will enroll in public schools this year? Is there a type II diabetes epidemic? Is unemployment rising? How many homeless people are in Los Angeles? The answers to these questions can guide policy and spending decisions and they can’t be answered without the help of the government statisticians that collect and analyze relevant data.

Until recently the Puerto Rican government had no formal mechanisms for collecting data. Puerto Rico, an unincorporated territory of the United States, has many serious economic and social problems .  With a very high murder rate, less than 50% of the working-age population in the labor force, an economy that continues to worsen after 5 years of recession , and a substantial traffic problem , Puerto Rico can certainly benefit from sound government statistics to better guide policy-making.  Better measurement, information and knowledge can only improve the situation.

In 2007, the Puerto Rico Institute of Statistics was founded. Mario Marazzi, who obtained his PhD in Economics from Cornell University, left a prestigious job at the Federal Reserve to become the first Executive Director of the Institute.  Given the complicated political landscape in Puerto Rico, Mario made an admirable sacrifice for his home country. He was kind enough to answer some questions for Simply Statistics:

What is the biggest success story of the Institute?

I would say that our biggest success story has been to revive the idea that high-quality statistics are critical for the success of any organization in Puerto Rico.  For too long, statistics were neglected and even abused in Puerto Rico.  There is now a palpable sense in Puerto Rico that it is important to devote resources and time to ensure that data are produced with care.

We have also undertaken a number of critical statistical projects since our inauguration in 2007.  For instance, the Institute completed the revision to Puerto Rico’s Consumer Price Index, after identifying that official inflation had been overestimated by more than double for 15 years.  The Institute revised Puerto Rico’s Mortality Statistics, after detecting the use of an inconsistent selection methodology for the cause of death, as well as discovering thousands of deaths that had not been previously included in the official data.  We also undertook Puerto Rico’s first-ever Science and Technology Survey that allowed us to measure the economic impact of Research and Development activities in Puerto Rico.

What discovery, made from collecting data in Puerto Rico, has most surprised you?

We performed a study on migration patterns during the last decade.  From anecdotal evidence, it was fairly clear that in the last five years there had been an elevated level of migration out of Puerto Rico.  Nevertheless, the data revealed a few stunning conclusions.  For five consecutive years, about 1 percent of Puerto Rico’s population simply left Puerto Rico every year, even after taking into account the people who migrated to Puerto Rico.  The demographic consequences were significant: migration had been accelerating the aging of Puerto Rico’s population, and people who left Puerto Rico had a greater level of educational achievement than those who arrived.  In fact, for the first-time ever in recorded history, Puerto Rico’s population actually declined between the 2000 and 2010 Census.  Despite declining fertility rates, it is now clear migration was the cause of the overall population decrease.

Are government agencies usually willing to cooperate with the Institute? If not, what resources does the Institute have available to make them comply?

Frequently, statistical functions are not very high on policymakers’ lists of priorities.  As a result, government statisticians are usually content to collaborate with the Institute, since we can bring resources to help solve the common problems they face.

At times, some agencies can be reluctant to undertake the changes needed to produce high-quality statistics.  In these instances, the Institute is endowed with the authority by law to move the process along, through statistical policy mandates approved by the Board of Directors of the Institute. 

If there is a particular agency that excels at collecting and sharing data, can others learn from them?

Definitely, we encourage agencies to share their best practices with one another.  To facilitate this process, the Institute has the responsibility of organizing the Puerto Rico Statistical Coordination Committee, where representatives from each agency can share practical experiences, and enhance interagency coordination.

Do you think Puerto Rico needs more statisticians?

Absolutely.  Some of our brightest minds in statistics work outside of Puerto Rico, both in Universities and in the Federal Government.  Puerto Rico needs an injection of human resources to bring its statistical system up to global standards.

What can academic statisticians do to help institutes such as yours?

Academic statisticians are instrumental to furthering the mission of the Institute.  Governments produce statistics in a wide array of disciplines.  Each area can have very specific and unique methodologies.  It is impossible for one to be an expert in every methodology. 

As a result, the Institute depends on the collaboration of academic statisticians that can bring to bear their expertise in specific fields.  For example, academic biostatisticians can help identify needed improvements to existing methodologies in health statistics.  Index theorists can train government statisticians in the latest index methodologies.  Computational statisticians can analyze large data sets to help us explain the otherwise unexplained behavior of the data. 

We also host several Puerto Rico datasets on the Institute’s website, which were provided by professors from a number of different fields.  


Entrevista con Mario Marazzi (version en español)

En mi opinión, la importancia de los estadísticos que trabajan para el gobierno se subestima.En los EEUU, agencias como el Center for Disease Control, el Census Bureau y el Bureau of Labor Statistics emplean estadísticos para ayudar a recopilar y analizar datos que contribuyen a importantes decisiones de política pública. Por ejemplo, ¿cuántos estudiantes se matricularán en las escuelas públicas este año? ¿Hay una epidemia de diabetes tipo II?  ¿El desempleo está aumentando? ¿Cuántos deambulantes viven en Los Ángeles?  Las respuestas a estas preguntas ayudan determinar las decisiones presupuestarias y de política pública y no se pueden contestar sin la ayuda de los estadísticos del gobierno que recogen y analizan los datos pertinentes.

Hasta hace poco el gobierno de Puerto Rico no tenía mecanismos formales de recolección de datos. Puerto Rico, un territorio no incorporado de Estados Unidos, tiene muchos problemas socioeconómicos. Con una tasa de asesinatos muy alta, menos de 50% de la población con edad de trabajar en la fuerza laboral, una economía que sigue empeorando después de 5 años de recesión y problemas serios de tráfico, Puerto Rico se beneficiaría de estadísticas gubernamentales de alta calidad para mejor guíar la formulación de política pública. Mejores medidas, información y conocimientos sólo pueden mejorar la situación.

En 2007, se inaguró el Institute de Estadísticas de Puerto Rico. Mario Marazzi, quien obtuvo su doctorado en Economía de la Universidad de Cornell, dejó un trabajo prestigioso en Federal Reserve para convertirse en el primer Director Ejecutivo del Instituto.

Tomando en cuenta el complicado panorama político en Puerto Rico, Mario hizo un sacrificio admirable por su país y cordialmente aceptó contestar unas preguntas para nuestro blog:

¿Cuál ha side el mayor éxito del Instituto?

Yo diría que nuestro mayor éxito ha sido revivir la idea de que las estadísticas de alta calidad son cruciales para el éxito de cualquier organización en Puerto Rico.  Por mucho tiempo, las estadísticas fueron descuidadas e incluso abusadas en Puerto Rico. En la actualidad existe una sensación palpable en Puerto Rico que es importante dedicar recursos y tiempo para asegurarse de que los datos se produzcan con cuidado.

También, desde nuestra inauguración en 2007, hemos realizado una serie de proyectos críticos de estadística.  Por ejemplo, el Instituto concluyó la revisión del Índice de Precios al Consumidor de Puerto Rico, después de identificar que la inflación oficial había sido sobreestimada por más del doble durante 15 años. El Instituto revisó las Estadísticas de Mortalidad de Puerto Rico, después de detectar el uso de una metodología de selección inconsistente para determinar la causa de muerte y tras descubrir miles de muertes que no habían sido incluidos en los datos oficiales.  Además, realizamos por primera vez en Puerto Rico la primera Encuesta de Ciencia y Tecnología que nos permitió medir el impacto económico de las actividades de investigación y desarrollo en Puerto Rico.

¿Cuál descubrimiento, realizado a partir de la recopilación de datos en Puerto Rico, más te ha sorprendido?

Nosotros realizamos un estudio sobre los patrones de migración durante la última década. A partir de la evidencia anecdótica, era bastante claro que durante los últimos cinco años ha habido un nivel elevado de emigración de Puerto Rico. Sin embargo, los datos revelaron algunas conclusiones sorprendentes. Durante cinco años consecutivos, 1 por ciento de la población de Puerto Rico se ha ido de Puerto Rico todos los años, incluso después de tomar en cuenta la gente que emigró a Puerto Rico. Las consecuencias demográficas eran importantes: la migración ha acelerado el envejecimiento de la población de Puerto Rico y las personas que se fueron de Puerto Rico tienen un mayor nivel de preparación escolar que los que llegaron. De hecho, por primera vez en la historia, la población de Puerto Rico disminuyó entre el Censo de 2000 y el del 2010.  A pesar de tasas de fecundidad que disminuyen, ahora está claro que la migración es la causa principal de la reducción de población.

¿Por lo general, las agencias gubernamentales están dispuestas a cooperar con el Instituto?  Si no, ¿qué recursos tiene disponible el Instituto para obligarlos?

Frecuentemente, las estadísticas no aparecen muy altas en las listas de prioridades de los políticos. Como resultado, los estadísticos del gobierno por lo general están contentos de colaborar con el Instituto, ya que nosotros podemos aportar recursos para ayudar a resolver los problemas comunes a que se enfrentan.

A veces, algunas agencias pueden mostrarse reacios a emprender los cambios necesarios para producir estadísticas de alta calidad. En estos casos, el Instituto posee la autoridad legal de acelerar el proceso, a través de mandatos aprobados por el Consejo de Administración del Instituto.

Si hay un organismo en particular que se destaca en la recopilación y el intercambio de datos, ¿otros pueden aprender de ellos?

Definitivamente.  Nosotros animamos a las agencias a compartir sus mejores prácticas con otros. Para facilitar este proceso, el Instituto tiene la responsabilidad de organizar el Comité de Coordinación Estadística de Puerto Rico, donde representantes de cada agencia pueden compartir experiencias prácticas y mejorar la coordinación interinstitucional.

 ¿Cree usted que Puerto Rico necesita más estadísticos?

Por supuesto. Algunas de nuestras mentes más brillantes en estadísticas trabajan fuera de Puerto Rico, tanto en las universidades como en el Gobierno Federal. Puerto Rico necesita una inyección de recursos humanos para que su sistema estadístico llegue a los estándares mundiales.

¿Qué pueden hacer los estadísticos académicos hacer ayudar a instituciones como la suya?

Los estadísticos académicos son fundamentales para promover la misión del Instituto. Los gobiernos generan las estadísticas en una amplia gama de disciplinas. Cada área puede tener metodologías muy específicas y únicas. Es imposible que uno sea un experto en cada metodología.

Como resultado, el Instituto cuenta con la colaboración de estadísticos académicos que pueden ejercer sus conocimientos en campos específicos. Por ejemplo, los bioestadísticos académicos pueden ayudar a identificar las mejoras necesarias a las metodologías existentes en el contexto de la salud pública.  Los “Index theorists” pueden entrenar a los estadísticos del gobierno en las últimas metodologías de índice. Los estadísticos computacionales pueden analizar grandes “datasets” que nos ayudan explicar comportamientos de otra manera  no explicados de los datos.

También organizamos varios datasets de Puerto Rico en la página web del Instituto, que fueron proporcionados por profesores en varios campos diferentes.

Plotting BeijingAir Data

Here’s a bit of R code for scraping the BejingAir Twitter feed and plotting the hourly PM2.5 values for the past 24 hours. The script defaults to the past 24 hours but you can modify that by simply changing the value for the variable ‘n’. 

You can just grab the code from this R script. Note that you need to use the latest version of the ‘twitteR’ package because the data structure has changed from previous versions.

Using a modified version of the code in the script, I made a plot of the 24-hour average PM2.5 levels in Beijing over the last 2 months or so. The dashed line shows the US national ambient air quality standard for 24-hour average PM2.5. Note that the plot below is 24-hour averages so it is comparable to the US standard and also looks (somewhat) less extreme than the hourly values.

Clean Air A 'Luxury' In Beijing's Pollution Zone

Clean Air A ‘Luxury’ In Beijing’s Pollution Zone

Outrage Grows Over Air Pollution and China’s Response

Outrage Grows Over Air Pollution and China’s Response