TopicsLarge Language Models, Reinforcement Learning, Variational Inference
I am a Staff Research Engineer at Google DeepMind in London. My research interests span reinforcement learning, data-efficient learning, multimodal modeling, and training large-scale models. I’m also interested in building tools that accelerate the pace of research in machine learning and AI.
Among other contributions, I pioneered Distributed Reinforcement Learning at DeepMind and in the greater academic community. Our papers Distributed Prioritized Experience Replay and Distributed Distributional Deterministic Policy Gradients (D4PG) helped to prove the effectiveness of using Distributed Reinforcement Learning. We developed and open-sourced Acme, Reverb, and Launchpad to make Distributed RL easier.
Recently I have been working on extending transformers to multiple modalities. One example of this is Gato, a multi-modal, multi-task, multi-embodiment generalist policy. As part of Google DeepMind’s Gemini team I am working on the next generation of large-scale multimodal transformer models.
I hold a BA in mathematical economics and a ScM in computer science from Brown University.
Large-scale pre-trained transformers (aka foundation models) have revolutionized natural language processing and computer vision. These models have been shown to be effective at a wide range of downstream tasks, including text classification, machine translation, image captioning, and question answering. However, most research on transformers has focused on single-model data.
In this talk, I will discuss the current state of multimodal foundation models, which combine multiple modalities, such as text and images, into a single input. I will then introduce some of the challenges we face when combining multiple modalities for transformers. I will also discuss several recent research directions that have shown promising results. I will conclude by highlighting some of the open research questions in this area.
TopicsFoundation Models, AI, Reasoning
Tony Cohn is Professor of Automated Reasoning at the University of Leeds. He holds BSc and PhD degrees from the University of Essex where he studied under Pat Hayes. He spent 10 years at the University of Warwick before moving to Leeds in 1990 where he founded a research group working on knowledge representation and reasoning with a particular focus on qualitative spatial/spatio-temporal reasoning, the best known being the well cited region connection calculus (RCC) – the KR-92 paper describing RCC won the 2020 KR Test-of-Time award. He was awarded the 2021 Herbert A. Simon Prize for Advances in Cognitive Systems for for his research on qualitative representation and reasoning about space and time, cognitive vision and robotics, and visually-grounded language processing.
He is Editor-in-Chief Spatial Cognition and Computation and has been Chairman/President of the UK AI Society SSAISB, the European Association for Artificial Intelligence (EurAI), KR inc, the IJCAI Board of Trustees and was the Editor-in-Chief for Artificial Intelligence 2007-2014 and of the AAAI Press 2004-14. He remains a Director of KR Inc.
He is the recipient of the 2015 IJCAI Donald E Walker Distinguished Service Award which honours senior scientists in AI for contributions and service to the field during their careers, as well as the 2012 AAAI Distinguished Service Award for “extraordinary and sustained service to the artificial intelligence community”. He is a Fellow of the Royal Academy of Engineering, and is also a Fellow of AAAI, AISB, EurAI (Founding Fellow), AAIA, the BCS, and the IET. He was a member of the UK Research Excellence Framework (REF) 2014 Sub Panel 11 (Computer Science and Informatics) of Panel B.
Foundation Models such as GPT/ChatGPT have become very popular recently and many claims have been made about their abilities, including for commonsense reasoning. In this talk I will present some investigations to determine the extent to which this is true, for the particular case of spatial reasoning. I will also talk about benchmarks for commonsense reasoning, a schema for categorising evaluation instruments such as benchmarks and competitions, and argue for a shift away from aggregation of metrics and towards reporting granular performance breakdowns.
TopicsFoundation Models, Large Language Models, Deep Learning
Sven Giesselbach is the leader of the Natural Language Understanding (NLU) team at the Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS). His team develops solutions in the areas of medical, legal and general document understanding which in their core build upon (large) pre-trained language models. Sven Giesselbach is also part of the Lamarr Institute and the OpenGPT-X project in which he investigates various aspects of Foundation Models. Based on his project experience of more than 25 natural language understanding projects he studies the effect of Foundation Models on the execution of Natural Language Understanding projects and the novel challenges and requirements which arise with them. He has published several papers on Natural Language Processing and Understanding, which focus on the creation of application-ready NLU systems and the integration of expert knowledge in various stages of the solution design. Most recently he co-authored a book on “Foundation Models for Natural Language Processing – Pre-trained Language Models Integrating Media” which will be published by Springer Nature.
Gerhard Paaß, Sven Giesselbach, Foundation Models for Natural Language Processing – Pre-trained Language Models Integrating Media, , Springer, May, 2023