Blog Archive

Tuesday, June 27, 2023

06-27-2023-1534 - falsifiability, reproducibility, definitions, less risky predictions, preference, opinion, vienna circle, paradigm shift, special considerations, testability, counterexamples, superdeterminism, dogma, falsificationism, skepticism, etc. draft

https://www.digital-science.com/blog/2019/09/new-report-on-falsifiability-and-reproducibility-in-scientific-research/

Pair of black swans swimming
Here are two black swans, but even with no black swans to possibly falsify it, "All swans are white" would still be shown falsifiable by "Here is a black swan"—a black swan would still be a state of affairs, only an imaginary one.[A]

Falsifiability is a deductive standard of evaluation of scientific theories and hypotheses, introduced by the philosopher of science Karl Popper in his book The Logic of Scientific Discovery (1934).[B] A theory or hypothesis is falsifiable (or refutable) if it can be logically contradicted by an empirical test.

Popper proposed falsifiability as the cornerstone solution to both the problem of induction and the problem of demarcation. He insisted that, as a logical criterion, falsifiability is distinct from the related concept "capacity to be proven wrong" discussed in Lakatos' falsificationism.[C][D][E] Even being a logical criterion, its purpose is to make the theory predictive and testable, and thus useful in practice. 

https://en.wikipedia.org/wiki/Falsifiability

The problem of induction and demarcation

One of the questions in the scientific method is: how does one move from observations to scientific laws? This is the problem of induction. Suppose we want to put the hypothesis that all swans are white to the test. We come across a white swan. We cannot validly argue (or induce) from "here is a white swan" to "all swans are white"; doing so would require a logical fallacy such as, for example, affirming the consequent.[4]

Popper's idea to solve this problem is that while it is impossible to verify that every swan is white, finding a single black swan shows that not every swan is white. We might tentatively accept the proposal that every swan is white, while looking out for examples of non-white swans that would show our conjecture to be false. Falsification uses the valid inference modus tollens: if from a law we logically deduce , but what is observed is , we infer that the law is false. For example, given the statement "all swans are white", we can deduce "the specific swan here is white", but if what is observed is "the specific swan here is not white" (say black), then "all swans are white" is false. More accurately, the statement that can be deduced is broken into an initial condition and a prediction as in in which "the thing here is a swan" and "the thing here is a white swan". If what is observed is C being true while P is false (formally, ), we can infer that the law is false.

For Popper, induction is actually never needed in science.[J][K] Instead, in Popper's view, laws are conjectured in a non-logical manner on the basis of expectations and predispositions.[5] This has led David Miller, a student and collaborator of Popper, to write "the mission is to classify truths, not to certify them".[6] In contrast, the logical empiricism movement, which included such philosophers as Moritz Schlick, Rudolf Carnap, Otto Neurath, and A.J. Ayer wanted to formalize the idea that, for a law to be scientific, it must be possible to argue on the basis of observations either in favor of its truth or its falsity. There was no consensus among these philosophers about how to achieve that, but the thought expressed by Mach's dictum that "where neither confirmation nor refutation is possible, science is not concerned" was accepted as a basic precept of critical reflection about science.[7][8][9]

Popper said that a demarcation criterion was possible, but we have to use the logical possibility of falsifications, which is falsifiability. He cited his encounter with psychoanalysis in the 1910s. It did not matter what observation was presented, psychoanalysis could explain it. Unfortunately, the reason it could explain everything is that it did not exclude anything also.[L] For Popper, this was a failure, because it meant that it could not make any prediction. From a logical standpoint, if one finds an observation that does not contradict a law, it does not mean that the law is true. A verification has no value in itself. But, if the law makes risky predictions and these are corroborated, Popper says, there is a reason to prefer this law over another law that makes less risky predictions or no predictions at all.[M][N] In the definition of falsifiability, contradictions with observations are not used to support eventual falsifications, but for logical "falsifications" that show that the law makes risky predictions, which is completely different.

On the basic philosophical side of this issue, Popper said that some philosophers of the Vienna Circle had mixed two different problems, that of meaning and that of demarcation, and had proposed in verificationism a single solution to both: a statement that could not be verified was considered meaningless. In opposition to this view, Popper said that there are meaningful theories that are not scientific, and that, accordingly, a criterion of meaningfulness does not coincide with a criterion of demarcation.[O] 

https://en.wikipedia.org/wiki/Falsifiability

Basic statements

In Popper's view of science, statements of observation can be analyzed within a logical structure independently of any factual observations.[W][X] The set of all purely logical observations that are considered constitutes the empirical basis. Popper calls them the basic statements or test statements. They are the statements that can be used to show the falsifiability of a theory. Popper says that basic statements do not have to be possible in practice. It is sufficient that they are accepted by convention as belonging to the empirical language, a language that allows intersubjective verifiability: "they must be testable by intersubjective observation (the material requirement)".[23][Y] See the examples in section § Examples of demarcation and applications

https://en.wikipedia.org/wiki/Falsifiability

Natural selection

In the 5th and 6th editions of On the Origin of Species, following a suggestion of Alfred Russel Wallace, Darwin used "Survival of the fittest", an expression first coined by Herbert Spencer, as a synonym for "Natural Selection".[AK] Popper and others said that, if one uses the most widely accepted definition of "fitness" in modern biology (see subsection § Evolution), namely reproductive success itself, the expression "survival of the fittest" is a tautology.[AL][AM][AN]

Great Darwinist Ronald Fisher worked out mathematical theorems to help answer questions regarding natural selection. But, for Popper and others, there is no (falsifiable) law of Natural Selection in this, because these tools only apply to some rare traits.[AO][AP] Instead, for Popper, the work of Fisher and others on Natural Selection is part of an important and successful metaphysical research program.[43]

Mathematics

Popper said that not all unfalsifiable statements are useless in science. Mathematical statements are good examples. Like all formal sciences, mathematics is not concerned with the validity of theories based on observations in the empirical world, but rather, mathematics is occupied with the theoretical, abstract study of such topics as quantity, structure, space and change. Methods of the mathematical sciences are, however, applied in constructing and testing scientific models dealing with observable reality. Albert Einstein wrote, "One reason why mathematics enjoys special esteem, above all other sciences, is that its laws are absolutely certain and indisputable, while those of other sciences are to some extent debatable and in constant danger of being overthrown by newly discovered facts."[44] 

https://en.wikipedia.org/wiki/Falsifiability

Lakatos' falsificationism

Imre Lakatos divided the problems of falsification in two categories. The first category corresponds to decisions that must be agreed upon by scientists before they can falsify a theory. The other category emerges when one tries to use falsifications and corroborations to explain progress in science. Lakatos described four kind of falsificationisms in view of how they address these problems. Dogmatic falsificationism ignores both types of problems. Methodological falsificationism addresses the first type of problems by accepting that decisions must be taken by scientists. Naive methodological falsificationism or naive falsificationism does not do anything to address the second type of problems.[62][63] Lakatos used dogmatic and naive falsificationism to explain how Popper's philosophy changed over time and viewed sophisticated falsificationism as his own improvement on Popper's philosophy, but also said that Popper some times appears as a sophisticated falsificationist.[64] Popper responded that Lakatos misrepresented his intellectual history with these terminological distinctions.[65] 

https://en.wikipedia.org/wiki/Falsifiability

Controversies

Methodless creativity versus inductive methodology

As described in section § Naive falsificationism, Lakatos and Popper agreed that universal laws cannot be logically deduced (except from laws that say even more). But unlike Popper, Lakatos felt that if the explanation for new laws can not be deductive, it must be inductive. He urged Popper explicitly to adopt some inductive principle[BM] and sets himself the task to find an inductive methodology.[BU] However, the methodology that he found did not offer any exact inductive rules. In a response to Kuhn, Feyerabend and Musgrave, Lakatos acknowledged that the methodology depends on the good judgment of the scientists.[BP] Feyerabend wrote in "Against Method" that Lakatos' methodology of scientific research programmes is epistemological anarchism in disguise[BQ] and Musgrave made a similar comment.[BR] In more recent work, Feyerabend says that Lakatos uses rules, but whether or not to follow any of these rules is left to the judgment of the scientists.[BS] This is also discussed elsewhere.[BT]

Popper also offered a methodology with rules, but these rules are also not inductive rules, because they are not by themselves used to accept laws or establish their validity. They do that through the creativity or "good judgment" of the scientists only. For Popper, the required non deductive component of science never had to be an inductive methodology. He always viewed this component as a creative process beyond the explanatory reach of any rational methodology, but yet used to decide which theories should be studied and applied, find good problems and guess useful conjectures.[BV] Quoting Einstein to support his view, Popper said that this renders obsolete the need for an inductive methodology or logical path to the laws.[BW][BX][BY] For Popper, no inductive methodology was ever proposed to satisfactorily explain science. 

https://en.wikipedia.org/wiki/Falsifiability

Normal science versus revolutionary science

Thomas Kuhn analyzed what he calls periods of normal science as well as revolutions from one period of normal science to another,[84] whereas Popper's view is that only revolutions are relevant.[BZ][CA] For Popper, the role of science, mathematics and metaphysics, actually the role of any knowledge, is to solve puzzles.[CB] In the same line of thought, Kuhn observes that in periods of normal science the scientific theories, which represent some paradigm, are used to routinely solve puzzles and the validity of the paradigm is hardly in question. It is only when important new puzzles emerge that cannot be solved by accepted theories that a revolution might occur. This can be seen as a viewpoint on the distinction made by Popper between the informal and formal process in science (see section § Naive falsificationism). In the big picture presented by Kuhn, the routinely solved puzzles are corroborations. Falsifications or otherwise unexplained observations are unsolved puzzles. All of these are used in the informal process that generates a new kind of theory. Kuhn says that Popper emphasizes formal or logical falsifications and fails to explain how the social and informal process works. 

https://en.wikipedia.org/wiki/Falsifiability

Unfalsifiability versus falsity of astrology

Popper often uses astrology as an example of a pseudoscience. He says that it is not falsifiable because both the theory itself and its predictions are too imprecise.[CC] Kuhn, as an historian of science, remarked that many predictions made by astrologers in the past were quite precise and they were very often falsified. He also said that astrologers themselves acknowledged these falsifications.[CD] 

https://en.wikipedia.org/wiki/Falsifiability

Epistemological anarchism vs the scientific method

Paul Feyerabend rejected any prescriptive methodology at all. He rejected Lakatos' argument for ad hoc hypothesis, arguing that science would not have progressed without making use of any and all available methods to support new theories. He rejected any reliance on a scientific method, along with any special authority for science that might derive from such a method.[85] He said that if one is keen to have a universally valid methodological rule, epistemological anarchism or anything goes would be the only candidate.[86] For Feyerabend, any special status that science might have, derives from the social and physical value of the results of science rather than its method.[87] 

https://en.wikipedia.org/wiki/Falsifiability

See also

 
https://en.wikipedia.org/wiki/Falsifiability
https://en.wikipedia.org/wiki/Jurisprudence
https://en.wikipedia.org/wiki/Positivism
https://en.wikipedia.org/wiki/Category:Razors_(philosophy)
https://en.wikipedia.org/wiki/Scientific_skepticism
https://en.wikipedia.org/wiki/Superdeterminism
https://en.wikipedia.org/wiki/Falsifiability#Dogmatic_falsificationism
 

Testability is a primary aspect of Science[1] and the Scientific Method and is a property applying to an empirical hypothesis, involves two components:

  1. Falsifiability or defeasibility, which means that counterexamples to the hypothesis are logically possible.
  2. The practical feasibility of observing a reproducible series of such counterexamples if they do exist.

In short, a hypothesis is testable if there is a possibility of deciding whether it is true or false based on experimentation by anyone. This allows anyone to decide whether a theory can be supported or refuted by data. However, the interpretation of experimental data may be also inconclusive or uncertain. Karl Popper introduced the concept that scientific knowledge had the property of falsifiability as published in The Logic of Scientific Discovery.[2] 

https://en.wikipedia.org/wiki/Testability

See also

https://en.wikipedia.org/wiki/Testability

https://www.scribbr.co.uk/category/research-methods/


1.2: Science- Reproducible, Testable, Tentative, Predictive, and Explanatory
 

https://chem.libretexts.org/Bookshelves/Introductory_Chemistry/Chemistry_for_Changing_Times_(Hill_and_McCreary)/01%3A_Chemistry/1.02%3A_Science-_Reproducible_Testable_Tentative_Predictive_and_Explanatory


Learning Objectives
  • Describe the differences between hypothesis and theory as scientific terms.
  • Describe the difference between a theory and scientific law.
  • Identify the components of the scientific method.

Although many have taken science classes throughout their course of studies, incorrect or misleading ideas about some of the most important and basic principles in science are still commonplace. Most students have heard of hypotheses, theories, and laws, but what do these terms really mean? Before you read this section, consider what you have learned about these terms previously, and what they mean to you. When reading, notice if any of the text contradicts what you previously thought. What do you read that supports what you thought?

https://chem.libretexts.org/Bookshelves/Introductory_Chemistry/Chemistry_for_Changing_Times_(Hill_and_McCreary)/01%3A_Chemistry/1.02%3A_Science-_Reproducible_Testable_Tentative_Predictive_and_Explanatory

 

What is a Fact?

A fact is a basic statement established by experiment or observation. All facts are true under the specific conditions of the observation.

What is a Hypothesis?

One of the most common terms used in science classes is a "hypothesis". The word can have many different definitions, dependent on the context in which it is being used:

  • An educated guess: a scientific hypothesis provides a suggested solution based on evidence.
  • Prediction: if you have ever carried out a science experiment, you probably made this type of hypothesis, in which you predicted the outcome of your experiment.
  • Tentative or proposed explanation: hypotheses can be suggestions about why something is observed. In order for a hypothesis to be scientific, a scientist must be able to test the explanation to see if it works, and if it is able to correctly predict what will happen in a situation. For example, "if my hypothesis is correct, I should see _____ result when I perform _____ test."

A hypothesis is tentative; it can be easily changed.

 
https://chem.libretexts.org/Bookshelves/Introductory_Chemistry/Chemistry_for_Changing_Times_(Hill_and_McCreary)/01%3A_Chemistry/1.02%3A_Science-_Reproducible_Testable_Tentative_Predictive_and_Explanatory
 

What is a Theory?

The United States National Academy of Sciences describes a theory as:

"Some scientific explanations are so well established that no new evidence is likely to alter them. The explanation becomes a scientific theory. In everyday language a theory means a hunch or speculation. Not so in science. In science, the word theory refers to a comprehensive explanation of an important feature of nature supported by facts gathered over time. Theories also allow scientists to make predictions about as yet unobserved phenomena."

"A scientific theory is a well-substantiated explanation of some aspect of the natural world, based on a body of facts that have been repeatedly confirmed through observation and experimentation. Such fact-supported theories are not "guesses," but reliable accounts of the real world. The theory of biological evolution is more than "just a theory." It is as factual an explanation of the universe as the atomic theory of matter (stating that everything is made of atoms) or the germ theory of disease (which states that many diseases are caused by germs). Our understanding of gravity is still a work in progress. But the phenomenon of gravity, like evolution, is an accepted fact."

Note some key features of theories that are important to understand from this description:

  • Theories are explanations of natural phenomenon. They aren't predictions (although we may use theories to make predictions). They are explanations of why something is observed.
  • Theories aren't likely to change. They have a lot of support and are able to explain many observations satisfactorily. Theories can, indeed, be facts. Theories can change in some instances, but it is a long and difficult process. In order for a theory to change, there must be many observations or evidence that the theory cannot explain.
  • Theories are not guesses. The phrase "just a theory" has no room in science. To be a scientific theory carries a lot of weight—it is not just one person's idea about something

Theories aren't likely to change.

What is a Law?

Scientific laws are similar to scientific theories in that they are principles that can be used to predict the behavior of the natural world. Both scientific laws and scientific theories are typically well-supported by observations and/or experimental evidence. Usually, scientific laws refer to rules for how nature will behave under certain conditions, frequently written as an equation. Scientific theories are overarching explanations of how nature works, and why it exhibits certain characteristics. As a comparison, theories explain why we observe what we do, and laws describe what happens.

For example, around the year 1800, Jacques Charles and other scientists were working with gases to, among other reasons, improve the design of the hot air balloon. These scientists found, after numerous tests, that certain patterns existed in their observations of gas behavior. If the temperature of the gas increased, the volume of the gas increased. This is known as a natural law. A law is a relationship that exists between variables in a group of data. Laws describe the patterns we see in large amounts of data, but do not describe why the patterns exist.

Laws vs Theories

A common misconception is that scientific theories are rudimentary ideas that will eventually graduate into scientific laws when enough data and evidence has been accumulated. A theory does not change into a scientific law with the accumulation of new or better evidence. Remember, theories are explanations; laws are patterns seen in large amounts of data, frequently written as an equation. A theory will always remain a theory, a law will always remain a law.

https://chem.libretexts.org/Bookshelves/Introductory_Chemistry/Chemistry_for_Changing_Times_(Hill_and_McCreary)/01%3A_Chemistry/1.02%3A_Science-_Reproducible_Testable_Tentative_Predictive_and_Explanatory


Does falsifiability require reproducibility according to Popper?
https://philosophy.stackexchange.com/questions/57897/does-falsifiability-require-reproducibility-according-to-popper
 

Reproducibility and Replicability in Science.

https://www.ncbi.nlm.nih.gov/books/NBK547546/

THE EVOLVING PRACTICES OF SCIENCE

Scientific research has evolved from an activity mainly undertaken by individuals operating in a few locations to many teams, large communities, and complex organizations involving hundreds to thousands of individuals worldwide. In the 17th century, scientists would communicate through letters and were able to understand and assimilate major developments across all the emerging major disciplines. In 2016—the most recent year for which data are available—more than 2,295,000 scientific and engineering research articles were published worldwide (). In addition, the number of scientific and engineering fields and subfields of research is large and has greatly expanded in recent years, especially in fields that intersect disciplines (e.g., biophysics); more than 230 distinct fields and subfields can now be identified. The published literature is so voluminous and specialized that some researchers look to information retrieval, machine learning, and artificial intelligence techniques to track and apprehend the important work in their own fields.

Another major revolution in science came with the recent explosion of the availability of large amounts of data in combination with widely available and affordable computing resources. These changes have transformed many disciplines, enabled important scientific discoveries, and led to major shifts in science. In addition, the use of statistical analysis of data has expanded, and many disciplines have come to rely on complex and expensive instrumentation that generates and can automate analysis of large digital datasets.

Large-scale computation has been adopted in fields as diverse as astronomy, genetics, geoscience, particle physics, and social science, and has added scope to fields such as artificial intelligence. The democratization of data and computation has created new ways to conduct research; in particular, large-scale computation allows researchers to do research that was not possible a few decades ago. For example, public health researchers mine large databases and social media, searching for patterns, while earth scientists run massive simulations of complex systems to learn about the past, which can offer insight into possible future events.

Another change in science is an increased pressure to publish new scientific discoveries in prestigious and what some consider high-impact journals, such as Nature and Science. This pressure is felt worldwide, across disciplines, and by researchers at all levels but is perhaps most acute for researchers at the beginning of their scientific careers who are trying to establish a strong scientific record to increase their chances of obtaining tenure at an academic institution and grants for future work. Tenure decisions have traditionally been made on the basis of the scientific record (i.e., published articles of important new results in a field) and have given added weight to publications in more prestigious journals. Competition for federal grants, a large source of academic research funding, is intense as the number of applicants grows at a rate higher than the increase in federal research budgets. These multiple factors create incentives for researchers to overstate the importance of their results and increase the risk of bias—either conscious or unconscious—in data collection, analysis, and reporting.

In the context of these dynamic changes, the questions and issues related to reproducibility and replicability remain central to the development and evolution of science. How should studies and other research approaches be designed to efficiently generate reliable knowledge? How might hypotheses and results be better communicated to allow others to confirm, refute, or build on them? How can the potential biases of scientists themselves be understood, identified, and exposed in order to improve accuracy in the generation and interpretation of research results? How can intentional misrepresentation and fraud be detected and eliminated?

Researchers have proposed approaches to answering some of the questions over the past decades. As early as the 1960s, Jacob Cohen surveyed psychology articles from the perspective of statistical power to detect effect sizes, an approach that launched many subsequent power surveys (also known as meta-analyses) in the social sciences in subsequent years ().

Researchers in biomedicine have been focused on threats to validity of results since at least the 1970s. In response to the threat, biomedical researchers developed a wide variety of approaches to address the concern, including an emphasis on randomized experiments with masking (also known as blinding), reliance on meta-analytic summaries over individual trial results, proper sizing and power of experiments, and the introduction of trial registration and detailed experimental protocols. Many of the same approaches have been proposed to counter shortcomings in reproducibility and replicability.

Reproducibility and replicability as they relate to data and computation-intensive scientific work received attention as the use of computational tools expanded. In the 1990s, Jon Claerbout launched the “reproducible research movement,” brought on by the growing use of computational workflows for analyzing data across a range of disciplines (). Minor mistakes in code can lead to serious errors in interpretation and in reported results; Claerbout's proposed solution was to establish an expectation that data and code will be openly shared so that results could be reproduced. The assumption was that reanalysis of the same data using the same methods would produce the same results.

In the 2000s and 2010s, several high-profile journal and general media publications focused on concerns about reproducibility and replicability (see, e.g., ; ), including the cover story in The Economist () noted above. These articles introduced new concerns about the availability of data and code and highlighted problems of publication bias, selective reporting, and misaligned incentives that cause positive results to be favored for publication over negative or nonconfirmatory results. Some news articles focused on issues in biomedical research and clinical trials, which were discussed in the general media partly as a result of lawsuits and settlements over widely used drugs ().

Many publications about reproducibility and replicability have focused on the lack of data, code, and detailed description of methods in individual studies or a set of studies. Several attempts have been made to assess non-reproducibility or non-replicability within a field, particularly in social sciences (e.g., ; ). In Chapters 4, 5, and 6, we review in more detail the studies, analyses, efforts to improve, and factors that affect the lack of reproducibility and replicability. Before that discussion, we must clearly define these terms.

DEFINING REPRODUCIBILITY AND REPLICABILITY

Different scientific disciplines and institutions use the words reproducibility and replicability in inconsistent or even contradictory ways: What one group means by one word, the other group means by the other word. These terms—and others, such as repeatability—have long been used in relation to the general concept of one experiment or study confirming the results of another. Within this general concept, however, no terminologically consistent way of drawing distinctions has emerged; instead, conflicting and inconsistent terms have flourished. The difficulties in assessing reproducibility and replicability are complicated by this absence of standard definitions for these terms.

In some fields, one term has been used to cover all related concepts: for example, “replication” historically covered all concerns in political science (). In many settings, the terms reproducible and replicable have distinct meanings, but different communities adopted opposing definitions (; ; ). Some have added qualifying terms, such as methods reproducibility, results reproducibility, and inferential reproducibility to the lexicon (). In particular, tension has emerged between the usage recently adopted in computer science and the way that researchers in other scientific disciplines have described these ideas for years ().

In the early 1990s, investigators began using the term “reproducible research” for studies that provided a complete digital compendium of data and code to reproduce their analyses, particularly in the processing of seismic wave recordings (; ). The emphasis was on ensuring that a computational analysis was transparent and documented so that it could be verified by other researchers. While this notion of reproducibility is quite different from situations in which a researcher gathers new data in the hopes of independently verifying previous results or a scientific inference, some scientific fields use the term reproducibility to refer to this practice. , p. 783) referred to this scenario as “replicability,” noting: “Scientific evidence is strengthened when important results are replicated by multiple independent investigators using independent data, analytical methods, laboratories, and instruments.” Despite efforts to coalesce around the use of these terms, lack of consensus persists across disciplines. The resulting confusion is an obstacle in moving forward to improve reproducibility and replicability ().

In a review paper on the use of the terms reproducibility and replicability, outlined three categories of usage, which she characterized as A, B1, and B2:

  • A: The terms are used with no distinction between them.
  • B1: “Reproducibility” refers to instances in which the original researcher's data and computer codes are used to regenerate the results, while “replicability” refers to instances in which a researcher collects new data to arrive at the same scientific findings as a previous study.
  • B2: “Reproducibility” refers to independent researchers arriving at the same results using their own data and methods, while “replicability” refers to a different team arriving at the same results using the original author's artifacts.

B1 and B2 are in opposition of each other with respect to which term involves reusing the original authors' digital artifacts of research (“research compendium”) and which involves independently created digital artifacts. collected data on the usage of these terms across a variety of disciplines (see Table 3-1).

TABLE 3-1. Usage of the Terms Reproducibility and Replicability by Scientific Discipline.

TABLE 3-1

Usage of the Terms Reproducibility and Replicability by Scientific Discipline.

The terminology adopted by the Association for Computing Machinery (ACM) for computer science was published in 2016 as a system for badges attached to articles published by the society. The ACM declared that its definitions were inspired by the metrology vocabulary, and it associated using an original author's digital artifacts to “replicability,” and developing completely new digital artifacts to “reproducibility.” These terminological distinctions contradict the usage in computational science, where reproducibility is associated with transparency and access to the author's digital artifacts, and also with social sciences, economics, clinical studies, and other domains, where replication studies collect new data to verify the original findings.

Regardless of the specific terms used, the underlying concepts have long played essential roles in all scientific disciplines. These concepts are closely connected to the following general questions about scientific results:

  • Are the data and analysis laid out with sufficient transparency and clarity that the results can be checked?
  • If checked, do the data and analysis offered in support of the result in fact support that result?
  • If the data and analysis are shown to support the original result, can the result reported be found again in the specific study context investigated?
  • Finally, can the result reported or the inference drawn be found again in a broader set of study contexts?

Computational scientists generally use the term reproducibility to answer just the first question—that is, reproducible research is research that is capable of being checked because the data, code, and methods of analysis are available to other researchers. The term reproducibility can also be used in the context of the second question: research is reproducible if another researcher actually uses the available data and code and obtains the same results. The difference between the first and the second questions is one of action by another researcher; the first refers to the availability of the data, code, and methods of analysis, while the second refers to the act of recomputing the results using the available data, code, and methods of analysis.

In order to answer the first and second questions, a second researcher uses data and code from the first; no new data or code are created by the second researcher. Reproducibility depends only on whether the methods of the computational analysis were transparently and accurately reported and whether that data, code, or other materials were used to reproduce the original results. In contrast, to answer question three, a researcher must redo the study, following the original methods as closely as possible and collecting new data. To answer question four, a researcher could take a variety of paths: choose a new condition of analysis, conduct the same study in a new context, or conduct a new study aimed at the same or similar research question.

For the purposes of this report and with the aim of defining these terms in ways that apply across multiple scientific disciplines, the committee has chosen to draw the distinction between reproducibility and replicability between the second and third questions. Thus, reproducibility includes the act of a second researcher recomputing the original results, and it can be satisfied with the availability of data, code, and methods that makes that recomputation possible. This definition of reproducibility refers to the transparency and reproducibility of computations: that is, it is synonymous with “computational reproducibility,” and we use the terms interchangeably in this report.

When a new study is conducted and new data are collected, aimed at the same or a similar scientific question as a previous one, we define it as a replication. A replication attempt might be conducted by the same investigators in the same lab in order to verify the original result, or it might be conducted by new investigators in a new lab or context, using the same or different methods and conditions of analysis. If this second study, aimed at the same scientific question but collecting new data, finds consistent results or can draw consistent conclusions, the research is replicable. If a second study explores a similar scientific question but in other contexts or populations that differ from the original one and finds consistent results, the research is “generalizable.”

In summary, after extensive review of the ways these terms are used by different scientific communities, the committee adopted specific definitions for this report.

CONCLUSION 3-1: For this report, reproducibility is obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis. This definition is synonymous with “computational reproducibility,” and the terms are used interchangeably in this report.

Replicability is obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.

Two studies may be considered to have replicated if they obtain consistent results given the level of uncertainty inherent in the system under study. In studies that measure a physical entity (i.e., a measurand), the results may be the sets of measurements of the same measurand obtained by different laboratories. In studies aimed at detecting an effect of an intentional intervention or a natural event, the results may be the type and size of effects found in different studies aimed at answering the same question. In general, whenever new data are obtained that constitute the results of a study aimed at answering the same scientific question as another study, the degree of consistency of the results from the two studies constitutes their degree of replication.

Two important constraints on the replicability of scientific results rest in limits to the precision of measurement and the potential for altered results due to sometimes subtle variation in the methods and steps performed in a scientific study. We expressly consider both here, as they can each have a profound influence on the replicability of scientific studies.

PRECISION OF MEASUREMENT

Virtually all scientific observations involve counts, measurements, or both. Scientific measurements may be of many different kinds: spatial dimensions (e.g., size, distance, and location), time, temperature, brightness, colorimetric properties, electromagnetic properties, electric current, material properties, acidity, and concentration, to name a few from the natural sciences. The social sciences are similarly replete with counts and measures. With each measurement comes a characterization of the margin of doubt, or an assessment of uncertainty (). Indeed, it may be said that measurement, quantification, and uncertainties are core features of scientific studies.

One mark of progress in science and engineering has been the ability to make increasingly exact measurements on a widening array of objects and phenomena. Many of the things taken for granted in the modern world, from mechanical engines to interchangeable parts to smartphones, are possible only because of advances in the precision of measurement over time ().

The concept of precision refers to the degree of closeness in measurements. As the unit used to measure distance, for example, shrinks from meter to centimeter to millimeter and so on down to micron, nanometer, and angstrom, the measurement unit becomes more exact and the proximity of one measurand to a second can be determined more precisely.

Even when scientists believe a quantity of interest is constant, they recognize that repeated measurement of that quantity may vary because of limits in the precision of measurement technology. It is useful to note that precision is different from the accuracy of a measurement system, as shown in Figure 3-1, demonstrating the differences using an archery target containing three arrows.

FIGURE 3-1. Accuracy and precision of a measurement.

FIGURE 3-1

Accuracy and precision of a measurement. NOTE: See text for discussion. SOURCE: Chemistry LibreTexts. Available: https://chem.libretexts.org/Bookshelves/Introductory_Chemistry/Book%3A_IntroductoryChemistry_(CK-12)/03%3A_Measurements/3.12%3A_Accuracy_and_Precision. (more...)

In Figure 3-1, A, the three arrows are in the outer ring, not close together and not close to the bull's eye, illustrating low accuracy and low precision (i.e., the shots have not been accurate and are not highly precise). In B, the arrows are clustered in a tight band in an outer ring, illustrating low accuracy and high precision (i.e., the shots have been more precise, but not accurate). The other two figures similarly illustrate high accuracy and low precision (C) and high accuracy and high precision (D).

It is critical to keep in mind that the accuracy of a measurement can be judged only in relation to a known standard of truth. If the exact location of the bull's eye is unknown, one must not presume that a more precise set of measures is necessarily more accurate; the results may simply be subject to a more consistent bias, moving them in a consistent way in a particular direction and distance from the true target.

It is often useful in science to describe quantitatively the central tendency and degree of dispersion among a set of repeated measurements of the same entity and to compare one set of measurements with a second set. When a set of measurements is repeated by the same operator using the same equipment under constant conditions and close in time, metrologists refer to the proximity of these measurements to one another as measurement repeatability (see Box 3-1). When one is interested in comparing the degree to which the set of measurements obtained in one study are consistent with the set of measurements obtained in a second study, the committee characterizes this as a test of replicability because it entails the comparison of two studies aimed at the same scientific question where each obtained its own data.

Box Icon

BOX 3-1

Terms Used in Metrology and How They Differ from the Committee's Definitions.

Consider, for example, the set of measurements of the physical constant obtained over time by a number of laboratories (see Figure 3-2). For each laboratory's results, the figure depicts the mean observation (i.e., the central tendency) and standard error of the mean, indicated by the error bars. The standard error is an indicator of the precision of the obtained measurements, where a smaller standard error represents higher precision. In comparing the measurements obtained by the different laboratories, notice that both the mean values and the degrees of precision (as indicated by the width of the error bars) may differ from one set of measurements to another.

FIGURE 3-2. Evolution of scientific understanding of the fine structure constant over time.

FIGURE 3-2

Evolution of scientific understanding of the fine structure constant over time. NOTES: Error bars indicate the experimental uncertainty of each measurement. See text for discussion. SOURCE: Reprinted figure with permission from Peter J. Mohr, David B. (more...)

We may now ask what is a central question for this study: How well does a second set of measurements (or results) replicate a first set of measurements (or results)? Answering this question, we suggest, may involve three components:

1.

proximity of the mean value (central tendency) of the second set relative to the mean value of the first set, measured both in physical units and relative to the standard error of the estimate

2.

similitude in the degree of dispersion in observed values about the mean in the second set relative to the first set

3.

likelihood that the second set of values and the first set of values could have been drawn from the same underlying distribution

Depending on circumstances, one or another of these components could be more salient for a particular purpose. For example, two sets of measures could have means that are very close to one another in physical units, yet each were sufficiently precisely measured as to be very unlikely to be different by chance. A second comparison may find means are further apart, yet derived from more widely dispersed sets of observations, so that there is a higher likelihood that the difference in means could have been observed by chance. In terms of physical proximity, the first comparison is more closely replicated. In terms of the likelihood of being derived from the same underlying distribution, the second set is more highly replicated.

A simple visual inspection of the means and standard errors for measurements obtained by different laboratories may be sufficient for a judgment about their replicability. For example, in Figure 3-2, it is evident that the bottom two measurement results have relatively tight precision and means that are nearly identical, so it seems reasonable these can be considered to have replicated one another. It is similarly evident that results from LAMPF (second from the top of reported measurements with a mean value and error bars in Figure 3-2) are better replicated by results from LNE-01 (fourth from top) than by measurements from NIST-89 (sixth from top). More subtle may be judging the degree of replication when, for example, one set of measurements has a relatively wide range of uncertainty compared to another. In Figure 3-2, the uncertainty range from NPL-88 (third from top) is relatively wide and includes the mean of NIST-97 (seventh from top); however, the narrower uncertainty range for NIST-97 does not include the mean from NPL-88. Especially in such cases, it is valuable to have a systematic, quantitative indicator of the extent to which one set of measurements may be said to have replicated a second set of measurements, and a consistent means of quantifying the extent of replication can be useful in all cases.

VARIATIONS IN METHODS EMPLOYED IN A STUDY

When closely scrutinized, a scientific study or experiment may be seen to entail hundreds or thousands of choices, many of which are barely conscious or taken for granted. In the laboratory, exactly what size of Erlenmeyer flask is used to mix a set of reagents? At what exact temperature were the reagents stored? Was a drying agent such as acetone used on the glassware? Which agent and in what amount and exact concentration? Within what tolerance of error are the ingredients measured? When ingredient A was combined with ingredient B, was the flask shaken or stirred? How vigorously and for how long? What manufacturer of porcelain filter was used? If conducting a field survey, how exactly, were the subjects selected? Are the interviews conducted by computer or over the phone or in person? Are the interviews conducted by female or male, young or old, the same or different race as the interviewee? What is the exact wording of a question? If spoken, with what inflection? What is the exact sequence of questions? Without belaboring the point, we can say that many of the exact methods employed in a scientific study may or may not be described in the methods section of a publication. An investigator may or may not realize when a possible variation could be consequential to the replicability of results.

In a later section, we will deal more generally with sources of non-replicability in science (see Chapter 5 and Box 5-2). Here, we wish to emphasize that countless subtle variations in the methods, techniques, sequences, procedures, and tools employed in a study may contribute in unexpected ways to differences in the obtained results (see Box 3-2).

Box Icon

BOX 3-2

Data Collection, Cleaning, and Curation.

Finally, note that a single scientific study may entail elements of the several concepts introduced and defined in this chapter, including computational reproducibility, precision in measurement, replicability, and generalizability or any combination of these. For example, a large epidemiological survey of air pollution may entail portable, personal devices to measure various concentrations in the air (subject to precision of measurement), very large datasets to analyze (subject to computational reproducibility), and a large number of choices in research design, methods, and study population (subject to replicability and generalizability).

RIGOR AND TRANSPARENCY

The committee was asked to “make recommendations for improving rigor and transparency in scientific and engineering research” (refer to Box 1-1 in Chapter 1). In response to this part of our charge, we briefly discuss the meanings of rigor and of transparency below and relate them to our topic of reproducibility and replicability.

Rigor is defined as “the strict application of the scientific method to ensure robust and unbiased experimental design” (). Rigor does not guarantee that a study will be replicated, but conducting a study with rigor—with a well-thought-out plan and strict adherence to methodological best practices—makes it more likely. One of the assumptions of the scientific process is that rigorously conducted studies “and accurate reporting of the results will enable the soundest decisions” and that a series of rigorous studies aimed at the same research question “will offer successively ever-better approximations to the truth” (, p. 311). Practices that indicate a lack of rigor, including poor study design, errors or sloppiness, and poor analysis and reporting, contribute to avoidable sources of non-replicability (see Chapter 5). Rigor affects both reproducibility and replicability.

Transparency has a long tradition in science. Since the advent of scientific reports and technical conferences, scientists have shared details about their research, including study design, materials used, details of the system under study, operationalization of variables, measurement techniques, uncertainties in measurement in the system under study, and how data were collected and analyzed. A transparent scientific report makes clear whether the study was exploratory or confirmatory, shares information about what measurements were collected and how the data were prepared, which analyses were planned and which were not, and communicates the level of uncertainty in the result (e.g., through an error bar, sensitivity analysis, or p-value). Only by sharing all this information might it be possible for other researchers to confirm and check the correctness of the computations, attempt to replicate the study, and understand the full context of how to interpret the results. Transparency of data, code, and computational methods is directly linked to reproducibility, and it also applies to replicability. The clarity, accuracy, specificity, and completeness in the description of study methods directly affects replicability.

FINDING 3-1: In general, when a researcher transparently reports a study and makes available the underlying digital artifacts, such as data and code, the results should be computationally reproducible. In contrast, even when a study was rigorously conducted according to best practices, correctly analyzed, and transparently reported, it may fail to be replicated.

Footnotes

1

“High-impact” journals are viewed by some as those which possess high scores according to one of the several journal impact indicators such as Citescore, Scimago Journal Ranking (SJR), Source Normalized Impact per Paper (SNIP)—which are available in Scopus—and Journal Impact Factor (IF), Eigenfactor (EF), and Article Influence Score (AIC)—which can be obtained from the Journal Citation Report (JCR).

2

See Chapter 5, Fraud and Misconduct, which further discusses the association between misconduct as a source of non-replicability, its frequency, and reporting by the media.

3

One such outcome became known as the “file drawer problem”: see Chapter 5; also see .

4

For the negative case, both “non-reproducible” and “irreproducible” are used in scientific work and are synonymous.

5

See also for a discussion of the competing taxonomies between computational sciences (B1) and new definitions adopted in computer science (B2) and proposals for resolving the differences.

6

The committee definitions of reproducibility, replicability, and generalizability are consistent with the National Science Foundation's Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science ().

Copyright 2019 by the National Academy of Sciences. All rights reserved.
Bookshelf ID: NBK547546

https://www.ncbi.nlm.nih.gov/books/NBK547546/

ORIGINAL ARTICLE

Falsificationism is not just ‘potential’ falsifiability, but requires ‘actual’ falsification: Social psychology, critical rationalism, and progress in science

First published: 27 January 2017
Citations: 9
https://onlinelibrary.wiley.com/doi/10.1111/jtsb.12134
 
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwifiKvJl-T_AhWntokEHScFCW84ChAWegQIAxAB&url=https%3A%2F%2Fphilarchive.org%2Farchive%2FSFEFAR&usg=AOvVaw31pQjAnEQU_UIfF9o9WbB7&opi=89978449
https://www.frontiersin.org/articles/10.3389/fpsyg.2015.00621/full
https://royalsocietypublishing.org/doi/10.1098/rsta.2020.0210
https://journals.sagepub.com/doi/10.1177/2515245920970949?icid=int.sj-full-text.similar-articles.3
https://www.annualreviews.org/doi/10.1146/annurev-psych-020821-114157
https://academic.oup.com/nc/article/2021/1/niab001/6232324
 
https://twitter.com/naval/status/798265186301845504?lang=en
https://slsa.dev/spec/v0.1/requirements
https://webshop.elsevier.com/language-editing-services/language-editing/?_ga=2.29395135.1267004794.1687881229-806759668.1687881229
 
https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_hypothesistest-means-proportions/bs704_hypothesistest-means-proportions3.html
https://courses.lumenlearning.com/introstats1/chapter/null-and-alternative-hypotheses/
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwiKx4egmOT_AhVsg4kEHaSxDN84ChAWegQICBAB&url=https%3A%2F%2Ffiles.eric.ed.gov%2Ffulltext%2FEJ1057150.pdf&usg=AOvVaw1TwGQ2zahfwa1mtQmzNQKs&opi=89978449
 
https://hypothesis.readthedocs.io/en/latest/stateful.html
https://www.accountingnest.com/articles/research/hypothesis-testing-decision-rule-null-hypothesis-z-score
https://online.stat.psu.edu/statprogram/reviews/statistical-concepts/hypothesis-testing/p-value-approach
https://www.nature.com/articles/d41586-022-04581-9
 
https://www.researchgate.net/publication/289963902_In_Search_of_Golden_Rules_Comment_on_Hypothesis-Testing_Approaches_to_Setting_Cutoff_Values_for_Fit_Indexes_and_Dangers_in_Overgeneralizing_Hu_and_Bentler%27s_1999_Findings
 
https://ieeexplore.ieee.org/document/7852310
https://www.investopedia.com/terms/n/null_hypothesis.asp
https://en.wikipedia.org/wiki/Scientific_method
 

 

 

 

 

 

 

 

 

 

 

 

 

 

No comments:

Post a Comment