# Combining Probability and Logic

The 10th event in the Progic series of conferences is set to take place **1. Sep - 3. Sep 2021 - now online**! This series has a long-standing interest in combining probability and logic.

The conference is preceded by a one day Summer School on **31. Aug 2021**, where introductory lectures on the themes of the conference will be given by invited speakers.

## Invited Speakers

- Claudia d'Amato (Bari): Summer School Lecture: Mining the Semantic Web: the problems that need to be known; Invited Talk: Is it really the time to give up with semantics?
- Vincenzo Crupi (Torino)
- Peter Grünwald (CWI and Leiden University)
- Simon Huttegger (Irvine)
- Ute Schmid (Bamberg)

## Venue

Now online

## Program

## Summer School (31 August 2021)

Time | Event |
---|---|

09:40 - 10:00 | Welcome |

10:00 - 11:00 | Peter Grünwald: Safe Probability Download Slides (1,40 MB) Watch the talk @youtube |

11:00 - 11:15 | Coffee Break |

11:15 - 12:15 | Claudia d'Amato: Machine Learning and Knowledge Graphs: possible issues to be taken into account Download Slides (1,75 MB) Watch the talk @youtube |

12:15 - 14:00 | Lunch Break |

14:00 - 15:00 |
Vincenzo Crupi: Generalised Entropies, Uncertainty, and Evidence |

15:00 - 15:15 | Coffee Break |

15:15 - 16:15 | Simon Huttegger: Bayesian Randomness |

## Conference Day 1 (01 September 2021)

## Conference Day 2 (02 September 2021)

Time | Event |
---|---|

10:00 - 10:50 | Claudia d’Amato: Is it Really the Time to Give Up With Semantics? Download Slides (1,74 MB) Watch the lecture @youtube |

10:50 - 11:00 | Coffee Break |

11:00 - 11:40 | Giuliano Rosella and Jan Michael Sprenger: Expanding Causal Modelling Semantics Watch the lecture @youtube |

11:40 - 12:20 | Esther Anna Corsi, Tommaso Flaminio and Hykel Hosni: When Belief Functions and Lower Probabilities are Indistinguishable Watch the lecture @youtube |

12:20 - 14:00 | Lunch Break |

14:00 - 14:40 | Network on Inductive Logic Gathering |

14:40 - 15:20 | Joe Roussos: Awareness Growth and Belief Revision Watch the talk @youtube |

15:20 - 15:30 | Coffee Break |

15:30 - 16:10 | Francesca Zaffora Blando: Merging, Polarisation and Algorithmic Randomness |

16:10 - 16:50 | Jonathan Vandenburgh: Learning as Hypothesis Testing Download Slides (158 kb) Watch the lecture @youtube |

## Conference Day 3 (03 September 2021)

Time | Event |
---|---|

10:00 - 10:50 | Peter Grünwald, R. de Heide, W. Koolen, A. Ly, M. Perez, R. Turner and J. Ter Schure: E Is for Evidence Download Slides ( 1,55 MB) Watch the talk @youtube |

10:50 - 11:00 | Coffee Break |

11:00 - 11:40 | Niki Pfeifer and Giuseppe Sanfilippo: Connexivity in Coherence-Based Probability Logic Download Slides (1,88 MB) Watch the talk @youtube |

11:40 - 12:20 | Paul Thorn and Gerhard Schurz: The Dependence of Inference Reliability on Environment Entropy |

12:20 - 14:00 | Lunch Break |

14:00 - 14:40 | Mantas Radzvilas, William Peden and Francesco De Pretis: A Battle in the Statistics Wars |

14:40 - 15:20 | Tom Sterkenburg and Peter Grünwald: The No-Free-Lunch Theorems of Supervised Learning Download Slides (1,59 MB) |

15:20 - 15:30 | Coffee Break |

15:30 - 16:10 | Bernhard Salow and Jeremy Goodman: Normality and Probability Download Slides (202 kb) |

16:10 - 17:00 | Simon Huttegger: Superconditioning Download Slides (329 kb) |

## Abstracts

**Alexandru Baltag & Soroush Rafiee Rad: From Qualitative Probabilities to Full Belief**

In recent work, H. Leitgeb has developed a method for extracting qualitative beliefs (satisfying all the traditional doxastic postulates including Additivity of Beliefs) from probabilistic degrees of belief. His approach vindicates a version of the so-called Lockean thesis, that interprets A is believed a A has high enough subjective probability (above some given threshold t in (0.5, 1]), showing that in many cases it is consistent to have a threshold t < 1, without falling prey to the Lottery Paradox. Moreover, Leitgeb shows that this setting gives rise in a natural way to a total plausibility pre-order on possible worlds (or equivalently, a Grove system of nested spheres), thus leading naturally to the AGM theory of Belief Revision. Some doubts have been expressed in this sense (Lin and Kelly [2]), based on a \no=go" result suggesting an inherent tension between Bayesian conditioning and the AGM axiom of Super-expansion (corresponding to the postulate known as Rational Monotonicity to the researchers in conditional logic), but these problems can be successfully addressed in a number of natural ways.

In this paper, we show that Leitgeb's method can be generalised to de Finetti's "qualitative probability", that takes the relation A < B ("event B is at least as probable as event A") as the primitive notion. In fact, the method works even if we give up de Finetti's totality requirement, thus allowing for events whose likelihoods are incomparable. This is an extremely general setting, encompassing practically all known formal approaches to uncertainty. So our result is a testimony to the generality and robustness of the AGM Belief Revision, which can be given a Lockean interpretation in almost any reasonable theory of uncertainty.

### Esther Anna Corsi, Tommaso Flaminio and Hykel Hosni: When Belief Functions and Lower Probabilities are Indistinguishable

This paper reports on a geometrical investigation of de Finetti’s Dutch Book method as an operational foundation for a wide range of generalisations of probability measures, including lower probabilities, necessity measures and belief functions. Our main result identifies a number of non-limiting circumstances under which de Finetti’s coherence fails to lift from less to more general models. In particular, our result shows that rich enough sets of events exist such that the coherence criteria for belief functions and lower probability collapse.top

### Vincenzo Crupi: Evidential Conditionals

(Joint work with Andrea Iacona)

Once upon a time, some thought that indicative conditionals could be effectively analyzed as material conditionals. Nowadays, an alternative theoretical construct prevails and receives wide acceptance, namely, the conditional probability of the consequent given the antecedent. Partly following earlier critical remarks made by others (most notably, Igor Douven), I advocate a revision of this consensus and suggest that incremental probabilistic support (rather than conditional probability alone) is key to the understanding of indicative conditionals and their role in human reasoning. There have been motivated concerns that a theory of such evidential conditionals (unlike their more traditional suppositional counterparts) can not generate a sufficiently interesting logical system. I will present results largely dispelling these worries. Happily, and perhaps surprisingly, appropriate technical variations of Ernst Adams’s classical approach allow for the construction of a new logic of evidential conditionals which is nicely superclassical, fairly strong, restrictedly connexive, and potentially of consequence for empirical investigation. Moreover, much like in the case of Adams’s suppositional conditional, a parallel possible-world account is also available, so that the same logical profile is preserved and made consistent with a truth-conditional semantics. As a bonus, we get new insight into the logical analysis of concessives. top

### Vincenzo Crupi: Generalised Entropies, Uncertainty, and Evidence

Epistemic inaccuracy, uncertainty, information, evidential support, the value of given evidence, and the expected value of an evidence search option (to wit, an experiment): all of these notions have been involved in classical and recent discussions within probabilistic approaches to philosophy of science, epistemology, and cognitive science. The formal representations of all these key pieces are tightly and neatly connected at the mathematical and foundational level, as anticipated by leading figures such as Jimmy Savage and others. What we will do is starting out with a parametric family of scoring rules (including so-called logarithmic and Brier scores as special cases) as a basic building block, which will then provide a firm grasp on how to navigate this theoretical maze. Once the foundations are clarified, a number of fascinating (and open) issues will assume a much clearer profile.top

### Claudia d’Amato: Is it Really the Time to Give Up With Semantics?

This abstract is available as PDF file (64 kb).top

### Claudia d'Amato: Machine Learning and Knowledge Graphs: possible issues to be taken into account

This abstract is available as PDF file (64 kb).top

### Konstantin Genin and Conor Mayo-Wilson: Statistical Decidability in Confounded, Linear Non-Gaussian Models

This abstract is available as PDF file (149 kb).top

### Peter Grünwald, R. de Heide, W. Koolen, A. Ly, M. Perez, R. Turner and J. Ter Schure: E Is for Evidence

How much evidence do the data give us about one hypothesis versus another? The standard way to measure evidence is still the p-value, despite a myriad of problems surrounding it.One central such problem is its inability to deal with optional stopping and its dependence on unknowable counterfactuals. We introduce the E-value, a notion of evidence which overcomes these issues. When both hypotheses are simple, the E-value is a likelihood ratio - nowadays the standard notion of probabilistic evidence in courts of law. When there is a null hypothesis and it is simple, the E-value coincides with the Bayes factor, the notion of evidence preferred by Bayesians. But when both hypotheses are composite, or an alternative cannot be formulated, E-values and Bayes factors become distinct. Unlike the Bayes factor, the E-value allows for Type-I error control, therefore making it hopefully more acceptable to frequentist practitioners. E-values were first put forward by Levin (of P vs NP fame) in the 1970s but have lain dormant until the last three years, which saw a plethora of new results. top

### Peter Grünwald: Safe Probability

We introduce the idea of probability distributions that lead to reliable predictions about some, but not all aspects of a domain. This idea takes the sting out of many a foundational issue in statistics: so-called 'objective Bayes', 'fiducial' or 'maximum entropy' distributions are typical examples of distributions that are safe for some, but unsafe for other inferences. It also sheds light on the 'dilation' phenomenon in imprecise probability.

A probability distribution P(X,Y) is 'safe' for prediction of random variable Y given X relative to a class of decision problems D if for every problem in D, predictions based on P will be just as good as if P were 'correct' - even though P may be wrong

There is a hierarchy of 'safety' notions, ranging from very strong (validity) via intermediate (calibration) to very weak, depending on the richness of the class D for which safety holds. These are really new, pragmatic notions of *probabilistic truth*: someone who entertains a wrong belief may find the world behaving exactly as if her beliefs were correct.

### Simon Huttegger: Bayesian Randomness

(Joint work with Sean Walsh and Francesca Zaffora Blando)

The theorem on convergence to the truth establishes that a Bayesian agent expects opinions to converge to the truth almost surely as evidence accumulates. The merging of opinions theorem establishes that two Bayesian agents achieve intersubjective agreement almost surely as evidence accumulates, provided that their prior beliefs are sufficiently aligned to begin with. Both results come with a probability zero qualification: the states of the world for which convergence to the truth or merging of opinions fail have prior probability zero. Thus, these results don’t establish an objective guarantee of convergence or merging, but rather express a kind of internal consistency. This is significant in and of itself, but the classical results remain somewhat elusive because they don’t specify which states of the world belong to the probability one sets that are conducive to merging and convergence. In this paper we study merging and convergence within the framework of algorithmic randomness to shed some light on this issue. The theory of algorithmic randomness uses computable probability measures to characterize states that are effectively random, or effectively typical. Loosely speaking, algorithmic randomness zooms in on properties of individual states and shows that states that can be effectively identified as nonrandom constitute a set of probability zero. Our main results establish equivalences between certain notions of algorithmic randomness (Schnorr randomness, Martin Löf randomness, and density randomness), on the one hand, and convergence to the truth and merging of opinions, on the other. We also discuss what these connections tell us about the significance of the classical results

### Simon Huttegger: Superconditioning

According to conditionalization, updating from a prior to a posterior proceeds by conditioning on a proposition in the agent’s probability space. But in more general Bayesian models of learning such propositions may not be available. In the most extreme case, we only have shifts from priors to posteriors. In their paper “Updating Subjective Probability” (1982), Diaconis and Zabell proved under what conditions an arbitrary shift from a prior to a posterior can be represented by conditionalization within a “superconditioning” space that is more fine-grained than the agent’s original probability space. The proposition conditioned upon can be thought of as an idealized characterization of an agent’s learning experience. In this presentation I extend the result of Diaconis and Zabell. The first extension considers shifts from a prior to a set of possible posteriors and shows that such a shift can be embedded in a conditioning model if and only if a generalized version of the reflection principle holds (namely, that the prior is a weighted average of the posteriors). This result provides some new insights into the significance of the reflection principle for updating. The second extension considers shifts from a prior to two or more distinct sets of posteriors and shows that those shifts can be simultaneously embedded in a conditioning model if and only if the sets of posteriors are sufficiently aligned (their convex spans overlap). This result speaks to the common prior assumption in game theory and a model of learning in time slice epistemology.

**Niki Pfeifer and Giuseppe Sanfilippo: Connexivity in Coherence-Based Probability Logic **

We present two probabilistic approaches to check the validity of key principles of connexive logic within the setting of coherence. Specifically, we analyze connections between antecedents and consequents firstly, in terms of probabilistic constraints on conditional events (in the sense of defaults, or negated defaults) and secondly, in terms of constraints on compounds of conditionals and iterated conditionals, which was developed recently within the setting of conditional random quantities. We conclude by remarking that coherence-based probability logic offers a rich language to investigate the validity of various connexive principles.

### Mantas Radzvilas, William Peden and Francesco De Pretis: A Battle in the Statistics Wars

There are many inductive logics, with contrasting implications for statistical reasoning. Given a particular practical problem, one way to choose among these approaches is by comparing their short-run decision making performances. We simulated a decision problem involving bets on a series of binomial events. We coded three players, each based on a position within the “statistics Wars”: a standard Bayesian, a frequentist, and an entropy-maximising Bayesian based on Jon Williamson’s inductive logic. We varied the simulation/player parameters in many ways. Surprisingly, all players had comparable (and good) performances, even in the very short-run where their decisions differed considerably. Our study indicates that all three approaches should be taken seriously. Our study is unprecedented, apart from a preliminary and relatively obscure 1990s conference paper. The decision problem and the players are very modifiable, so our study has great potential fruitfulness for the philosophy of statistics.top

### Giuliano Rosella and Jan Michael Sprenger: Expanding Causal Modelling Semantics

In the present paper, we aim to expand Causal Modelling Semantics of Counterfactuals (CMS) introduced by Pearl (2000) and Galles&Pearl (1998) in order to account for the probability of counterfactuals with complex antecedents. Our work builds upon Briggs' Interventionist Counterfactuals (2012) that extends CMS to counterfactuals with disjunctive antecedents via Fine's Truthmaker Semantics. By combining Briggs' semantics with the work of Eva, Stern and Hartmann (2019) on the similarity among causal structures, and by appealing to the intuitions underlying Lewis' similarity semantics, we introduce an expanded version of CMS that allows us to calculate the probability of counterfactuals with disjunctive antecedents with respect to a causal model.top

### Joe Roussos: Awareness Growth and Belief Revision

This abstract is available as PDF file (310 kb).top

### Bernhard Salow and Jeremy Goodman: Normality and Probability

We present a new theory of knowledge and belief, building on two recent strands of research. The first is the idea that knowledge and (rational) belief can be analyzed in terms of a notion of normality: among the possibilities that are compatible with a person's evidence, their knowledge rules out those that are sufficiently less normal than their actual circumstances, and their beliefs rule out those that are sufficiently less normal than some other evidential possibilities. The second is that what a person knows or believes is always relative to a question. Our key observation is that, relative to a question, there are natural ways of defining comparative normality in terms of evidential probability. We defend this reduction by showing that it generates attractive models of various theoretically significant examples, representing knowledge of the future and of lawful regularities, knowledge obtained through noisy instruments, and knowledge in cases that generate versions of the 'preface paradox'. We also use these examples to argue that, despite some important formal similarities to standard belief revision theories, our proposed account rightly predicts counterexamples to many of the principles featuring in these theories.top

### Ute Schmid: Reconciling Knowledge-Based and Data-Driven AI for Human-in-the-Loop Machine Learning

Currently, artificial intelligence (AI) is often used synonymous with data-intensive deep learning. Some decades ago, knowledge-based approaches were dominant and the nearly exclusive focus on methods dealing with explicit knowledge caused an AI winter. Researchers had to acknowledge that there is a knowledge engineering bottleneck because a significant part of human knowledge is implicit, that is, not accesible to introspection and not verbalizable. Today, on the other hand, machine learning is sometimes applied even if explicit knowledge is available. However, a growing number of researchers and practicioners recognize that an exclusive focus on machine learning is neither plausible nor practical. In specialized domains, such as quality control in industrial manufactoring or medical diagnosis, providing correctly labelled data in an amount necessary to train deep networks is either very expensive or even impossible. To avoid a new AI winter due to the data engineering bottleneck, it is appropriate or even necessary to make use of human expertise to compensate a too small amount or low quality of data. Taking into account knowledge which is available in explicit form reduces the amount of data needed for learning. Furthermore, even if domain experts cannot formulate knowledge explicitly, they typically can recognize and correct erroneous decisions or actions. This type of implicit knowledge can be injected into the learning process to guide model adapation. In the talk, I will introduce inductive logic programming (ILP) as a powerful interpretable machine learning approach which allows to combine logic and learning. ILP allows to represent domain theories, background knowledge, training examples, and the learned model in the same representation format, namely Horn theories. I will argue that, although ILP-learned models are symbolic (white-box), it might nevertheless be necessary to explain system decisions. Depending on who needs an explanation for what goal in which situation, different forms of explanations are necessary. I will show how ILP can be combined with different methods for explanation generation and propose a framework for human-in-the-loop learning. There, explanations are designed to be mutual -- not only from the AI system for the human but also the other way around. The presented approach will be illustrated with different application domains from medical diagnostics, file management, and accountability.top

### Tom Sterkenburg and Peter Grünwald: The No-Free-Lunch Theorems of Supervised Learning

The no-free-lunch theorems promote a skeptical conclusion that all possible machine learning algorithms equally lack justification. But how could this leave room for a learning theory, that shows that some algorithms are better than others? Drawing parallels to the philosophy of induction, we point out that the no-free-lunch results presuppose a conception of learning algorithms as purely data-driven. On this conception, every algorithm must have an inherent inductive bias, that wants justification. We argue that many standard learning algorithms should rather be understood as model-dependent: in each application they also require for input a model, representing a bias. Generic algorithms themselves, they can be given a model-relative justification. top

### Paul Thorn and Gerhard Schurz: The Dependence of Inference Reliability on Environment Entropy

TIn the present talk, we present two case studies of the connection between environment entropy and the reliability of non-deductive inference. Both studies are based on computer simulations. The first study concerns online learning and the comparative performance of inductive methods, non-inductive methods, and meta-inductive methods. Put succinctly: Inductive methods perform best in low entropy environments, non-inductive methods perform best in high entropy environments, and meta-inductive methods perform well in both low and high entropy environments. The second study concerns inheritance inference (cases where one reasons from the typicality of a property among a class, e.g., the capacity of flight among birds, to the typicality of the property among sub-classes of the class, e.g., the capacity of flight among Seabirds). Two approaches to inheritance inference are compared. One approach proceeds by treating any atomic property as determining an admissible class. The second approach identifies classes with the cells of a partition (of size k) of the set of objects that satisfies the condition of maximizing the similarity of objects that are assigned to the same class. The performance of the two approaches varies according to environment entropy. Both approaches perform well when the entropy of the environment is very low. However, while the error-rate of the second approach remains low regardless of environment entropy, the error-rate of the first approach increases as a more or less linear function of environment entropy, with an error-rate of about 4% when entropy is close to zero, reaching an error-rate of about 29% when entropy is 0.8. The overarching theme of the talk is that attending to variations in the reliability of inference methods across environments with different entropy is an excellent a means of evaluating the reasonableness of the methods.top

### Jonathan Vandenburgh: Learning as Hypothesis Testing

Complex constraints like conditionals ('If A, then B') and probabilistic constraints ('The probability that A is p') pose problems for Bayesian theories of learning. Since these propositions do not express constraints on outcomes, agents cannot simply conditionalize on the new information. Furthermore, a natural extension of conditionalization, relative information minimization, leads to many counterintuitive predictions, evidenced by the sundowners problem and the Judy Benjamin problem. Building on the notion of a 'paradigm shift' and empirical research in psychology and economics, I argue that the model of hypothesis testing can explain how people learn complex, theory-laden propositions like conditionals and probability constraints. Theories are formalized as probability distributions over a set of possible outcomes and theory change is triggered by a constraint which is incompatible with the initial theory. This leads agents to consult a higher order probability function, or a 'prior over priors,' to choose the most likely alternative theory which satisfies the constraint. I apply the hypothesis testing model to two examples: learning a simple probabilistic constraint involving coin bias and the sundowners problem for conditional learning.top

### Olav Vassend: A Novel Accuracy-Based Argument for the Principle of Indifference and a Pragmatist Solution to the Problem of Language Dependence

The principle of indifference states that agents who have no evidence relevant to determining which member of a partition is true should assign an equal probability to each proposition in the partition. This paper gives a novel argument, based on simple and plausible assumptions, which shows that the principle of indifference minimizes expected inaccuracy in cases where agents have no information, where the expectation is calculated with respect to an objective probability distribution rather than any agent’s subjective distribution. The principle of indifference has been criticized for being partition variant, but I argue that the fact that the principle of indifference is partition variant is explained and justified by the fact that epistemic inaccuracy is also partition variant. I furthermore argue that the correct way of partitioning a set of possibilities depends on pragmatic considerations, and that this resolution of the problem turns the language dependence exhibited by the principle of indifference into a feature rather than a flaw. An important consequence of the proposal is that it introduces a pervasive sort of pragmatic encroachment into Bayesian inference.

**Alena Vencovská: Binary Pure Inductive Logic**

This contribution concerns pure inductive logic in the case of languages with binary and possibly unary predicates. We consider a principle BEx (Binary Signature Exchangeability) and we survey properties that probability functions satisfying it have been shown to satisfy using a particular representation theorem. We describe further interesting results that can be obtained for these probability functions from the representation theorem using methods that have worked in the unary context and are adaptable to the binary case.

### Jon Williamson, Jürgen Landes and Soroush Rafiee Rad: Determining Maximal Entropy Functions for Objective Bayesian Inductive Logic

According to the objective Bayesian approach to inductive logic, premisses inductively entail a conclusion just when every probability function with maximal entropy, from all those that satisfy the premisses, satisfies the conclusion. However, when premisses and conclusion are constraints on probabilities of sentences of a first-order predicate language, it is by no means obvious how to determine these maximal entropy functions. This paper makes progress on the problem in the following ways. Firstly, we introduce the concept of an entropy limit point and show that, if the set of probability functions satisfying the premisses contains an entropy limit point then this limit point is unique and is the maximal entropy probability function. Next we turn to the special case in which the premisses are simply sentences of the logical language. We show that if the uniform probability function gives the premisses positive probability then the maximal entropy function can be found by simply conditionalising this uniform prior on the premisses. We generalise our results to Jeffrey conditionalisation and finally discuss the case in which the uniform prior give the premisses zero probability.top

### Francesca Zaffora Blando: Merging, Polarisation and Algorithmic Randomness

We study the phenomenon of merging of opinions and the phenomenon of polarisation of opinions for computationally limited Bayesian agents from the perspective of algorithmic randomness. When they agree on which data streams are algorithmically random, two agents beginning the learning process with different subjective priors may be seen as having compatible beliefs about the global uniformity of nature. This is because the algorithmically random data streams are of necessity globally regular: they are precisely the sequences that satisfy certain important statistical laws. By virtue of agreeing on what data streams are random, two Bayesian agents can thus be taken to concur on what global regularities they expect to see in the data. We show that this type of compatibility between priors suffices to ensure that two computationally limited Bayesian agents will reach inter-subjective agreement with increasing information. In other words, it guarantees that their respective probability assignments will almost surely become arbitrarily close as the number of observations increases. Thus, when shared by computable learners with different subjective priors, the beliefs about uniformity captured by algorithmic randomness provably lead to merging of opinions. Conversely, we will see that various types of disagreement over algorithmic randomness lead to polarisation of opinions for computable Bayesian agents.top

## Organisation

Jürgen Landes (LMU Munich)

## Progic Steering Committee

- Niki Pfeifer: Theoretical Philosophy, University of Regensburg
- Jan-Willem Romeijn: Department of Philosophy, University of Groningen
- Marta Sznajder: Faculty of Philosophy, University of Groningen
- Gregory Wheeler: Centre for Human and Machine Intelligence, Frankfurt School of Finance and Management
- Jon Williamson: Department of Philosophy & Centre for Reasoning, University of Kent

## Acknowledgements

Progic 2021 is also part of a series of event of a network project devoted to Foundations, Applications & Theory of Inductive Logic (FAT IL). These events are generously funded by the German Research Foundations (DFG).

## Downloads

- pfeifer slides (482 KByte)
- schmid slides (8 MByte)
- vandenburgh slides (163 KByte)
- vencovska slides (180 KByte)