About me
I am a Principal Researcher and Group Leader at the Samsung AI Centre in Cambridge, U.K., working on efficient Vision and Language. The mission at SAIC-Cambridge is to conduct research that acts as a key enabler for the commercialization of new features within the ever-growing portfolio of Samsung's AI products. As part of this mission, we routinely conduct novel research and publish it at top venues, while also working with other teams within Samsung to bring the technology into products.
In the past years, my group has worked on contrastively-trained V&L models, LMMs, and image generation. We have significant expertise in both large-scale training and on-device porting, allowing us to exploit synergies, e.g. from optimization-based compression and/or quantization techniques, to architectural changes that improve the efficiency of on-device inference.
Before joining Samsung, I worked for about 3 years for Amazon in Seattle, where I enjoyed being part of the Amazon Go and AWS Rekognition teams.
I am interested in a wide variety of topics in machine learning and computer vision - in fact, most often it is the process, the team, and the prospect of impact that I find most appealing rather than the topic itself. While most of my work during my years in academia has been on face analysis, I have worked on topics as diverse as human action recognition, binary neural networks, knowledge distillation, and lipreading.
This is my Google Scholar profile.
News
-
1 paper at NeurIPS
A Bayesian Approach to Data Point Selection
https://arxiv.org/abs/2411.03768 -
2 papers at EMNLP
EMNLP findings:
MobileQuant: Mobile-friendly Quantization for On-device Language Models
https://arxiv.org/abs/2408.13933
https://github.com/saic-fi/MobileQuantEMNLP main track:
Efficient Vision-Language pre-training via domain-specific learning for human activities -
2xECCV and IJCV
Knowledge Distillation Meets Open-Set Semi-Supervised Learning @ IJCV
https://arxiv.org/abs/2205.06701You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation @ ECCV'24
https://arxiv.org/abs/2401.17258CLIP-DPO: Vision-Language Models as a Source of Preference for Improved Vision-LLMs @ ECCV'24
https://arxiv.org/abs/2408.10433 -
New arxiv released
You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
Super-resolution with Stable Diffusion: We achieve SOTA in terms of quality, and need only one step at inference time to do so!
https://arxiv.org/pdf/2401.17258.pdf -
1 paper at EACL'24 (Oral)
Graph Guided Question Answer Generation for Procedural Question-Answering
Paper: https://arxiv.org/abs/2401.13594
We train compact AI assistants for procedural tasks (e.g. cooking a meal) that can compete or even beat ChatGPT, yet easily run on your phone. The key is to use graph representations of the procedures to automatically create exhaustive and high-quality QA pairs in a controllable manner so that a specialized on-domain model can be trained. -
4 papers at ICCV'23
ReGen: A good Generative zero-shot video classifier should be Rewarded
Paper: Openaccess linkBlack Box Few-Shot Adaptation for Vision-Language Models
Paper: https://arxiv.org/abs/2304.01752 Code: https://github.com/saic-fi/LFAFSD-Prompt: Few-Shot Detection Prompting without retraining
Paper: https://arxiv.org/abs/2210.04845Bayesian Prompt Learning for Image-Language Model Generalization
Paper: https://arxiv.org/abs/2210.02390 Code: https://github.com/saic-fi/Bayesian-Prompt-Learning -
1 paper at ICLR'23
Efficient Self-supervised Pre-training on Low-compute Networks without Distillation
Paper: https://arxiv.org/abs/2210.02808 Code: https://github.com/saic-fi/SSLight -
3 papers at ECCV'22
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Efficient transformers for Mobile devices
Paper: https://arxiv.org/abs/2205.03436 Code: https://github.com/saic-fi/edgevitLearning hand-held object appearance for compositional action recognition:
SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action RecognitionI was also lucky to be (a small) part of the work led by SAIC-Toronto on instructional videos, accepted as an oral:
Flow graph to Video Grounding for Multi-Step Localization -
2 papers at BMVC'21
Preprints of the two BMVC'21 papers are available in arXiv:
Few-shot Action Recognition with Prototype-centered Attentive Learning
Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention -
2 papers at NeurIPS'21
Check out the pre-print versions of our NeurIPS'21 papers:
Space-time Mixing Attention for Video Transformer
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization -
1 new paper on ICCV'21
One ICCV'21 on temporal action localization: Boundary-sensitive Pre-training for Temporal Localization in Videos
-
1 paper on ICASSP'21
Our work on Lipreading has been accepted for publication on ICASSP'21. See the arxiv version.
-
Two new ICLR'21 papers
One ICLR'21 on knowledge distillation: Knowledge Distillation via Softmax Regression Representation Learning
Code is publicly available hereOne ICLR'21 paper on binary neural networks: High-Capacity Expert Binary Networks
-
AC@ICCV'21
I've been selected to act as Area Chair for the upcoming ICCV'21.
-
Organizing CVPR'21 Workshop
I'm co-organizing the 1st workshop on Binary Networks, to be held in conjunction with CVPR'21.
-
ECCV'20 paper accepted
BATS: Binary ArchitecTure Search has been accepted for ECCV'20.
-
ICLR'20 accepted
Our paper on Binary CNNs has been accepted at ICLR'20. It sets a new state of the art for binary networks: 65.4% top-1 accuracy on ImageNet using a binary ResNet18 (an improvement of over 5%!).
-
ICASSP'20 accepted
Our paper on Lipreading has been accepted at ICASSP'20 for an oral presentation. It raises the state of the art on LRW and LRW1000 on 1.2% and 3.2% top-1 accuracy respectively. You can check out the arXiv version here.
-
ICCV'19 accepted
Our paper on Action Recognition has been accepted at ICCV'19. We achieve 78.8 on Kinetics400 and 53.4 on Something-SomethingV1, and without even using two-stream or non-local NN. The paper can be accessed here.
-
Moving to Samsung
From April 2019 I'll be part of the Samsung AI Research Center in Cambridge, UK, on a new role as Senior Researcher.
-
TPAMI accepted!
You can check it through the IEEExplorer page. Alternatively, there is an Arxiv version.
-
Paper on ECCV'16
You can check it on Arxiv here: https://arxiv.org/abs/1608.01137
-
Amazon move
From June 2016 I'll be part of Amazon on a new role as Research Scientist. I'll thus be leaving my position at the University of Nottingham
-
Co-organizing ChaLearn
I'm co-organizer of the Chalearn LAP and FotW challenge and workshop @ CVPR 2016
The challenge page: http://gesture.chalearn.org/ -
Organising BMVA Technical Meeting
The Computational Face - Automatic Face Analysis and Synthesis
One Day BMVA symposium in London, UK on 14th October, 2015
Chairs: Brais Martinez, Yorgos Tzimiropoulos and Michel Valstar
Keynote speakers: Tim Cootes (University of Manchester), Darren Cosker (University of Bath), Maja Pantic (Imperial College London), Richard Bowden (University of Surrey)
Webpage and Registration: http://www.bmva.org/meetings