I work as a Principal Researcher at the Samsung AI center in Cambridge, UK., where I lead a group focusing on video anaysis, with special attention to human action analysis. This is scoped within the theme of the center, human-centric AI. Our task at SAIC-Cambridge is to solve the challenges stemming from Samsung's product family through technically novel approaches, and to produce technical advancements that can act as enablers for the development of future products. We publish our work on the top venues on computer vision and machine learning on a regular basis.
I am interested in a wide variety of topics in machine learning and computer vision - in fact most often it is the process, the team or the prospect of the impact that I find most appealing rather than the topic itself. While most of my work during my years in academia has been on face analysis, in recent years I have worked on topics as diverse as human action recognition, binary CNNs, knowledge distillation and lipreading.
This is my Google Scholar profile.
New arxiv released
You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
Super-resolution with Stable Diffusion: We achieve SOTA in terms of quality, and need only one step at inference time to do so!
1 paper at EACL'24 (Oral)
Graph Guided Question Answer Generation for Procedural Question-Answering
We train compact AI assistants for procedural tasks (e.g. cooking a meal) that can compete or even beat ChatGPT, yet easily run on your phone. The key is to use graph representations of the procedures to automatically create exhaustive and high-quality QA pairs in a controllable manner so that a specialized on-domain model can be trained.
4 papers at ICCV'23
ReGen: A good Generative zero-shot video classifier should be Rewarded
Paper: Openaccess link
FSD-Prompt: Few-Shot Detection Prompting without retraining
Bayesian Prompt Learning for Image-Language Model Generalization
Paper: https://arxiv.org/abs/2210.02390 Code: https://github.com/saic-fi/Bayesian-Prompt-Learning
3 papers at ECCV'22
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Efficient transformers for Mobile devices
Paper: https://arxiv.org/abs/2205.03436 Code: https://github.com/saic-fi/edgevit
Learning hand-held object appearance for compositional action recognition:
SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition
I was also lucky to be (a small) part of the work led by SAIC-Toronto on instructional videos, accepted as an oral:
Flow graph to Video Grounding for Multi-Step Localization
2 papers at BMVC'21
Preprints of the two BMVC'21 papers are available in arXiv:
Few-shot Action Recognition with Prototype-centered Attentive Learning
Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention
2 papers at NeurIPS'21
Check out the pre-print versions of our NeurIPS'21 papers:
Space-time Mixing Attention for Video Transformer
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization
1 new paper on ICCV'21
One ICCV'21 on temporal action localization: Boundary-sensitive Pre-training for Temporal Localization in Videos
1 paper on ICASSP'21
Our work on Lipreading has been accepted for publication on ICASSP'21. See the arxiv version.
Two new ICLR'21 papers
One ICLR'21 on knowledge distillation: Knowledge Distillation via Softmax Regression Representation Learning
Code is publicly available here
One ICLR'21 paper on binary neural networks: High-Capacity Expert Binary Networks
I've been selected to act as Area Chair for the upcoming ICCV'21.
Organizing CVPR'21 Workshop
I'm co-organizing the 1st workshop on Binary Networks, to be held in conjunction with CVPR'21.
ECCV'20 paper accepted
BATS: Binary ArchitecTure Search has been accepted for ECCV'20.
Our paper on Binary CNNs has been accepted at ICLR'20. It sets a new state of the art for binary networks: 65.4% top-1 accuracy on ImageNet using a binary ResNet18 (an improvement of over 5%!).
Our paper on Lipreading has been accepted at ICASSP'20 for an oral presentation. It raises the state of the art on LRW and LRW1000 on 1.2% and 3.2% top-1 accuracy respectively. You can check out the arXiv version here.
Our paper on Action Recognition has been accepted at ICCV'19. We achieve 78.8 on Kinetics400 and 53.4 on Something-SomethingV1, and without even using two-stream or non-local NN. The paper can be accessed here.
Moving to Samsung
From April 2019 I'll be part of the Samsung AI Research Center in Cambridge, UK, on a new role as Senior Researcher.
Paper on ECCV'16
You can check it on Arxiv here: https://arxiv.org/abs/1608.01137
From June 2016 I'll be part of Amazon on a new role as Research Scientist. I'll thus be leaving my position at the University of Nottingham
I'm co-organizer of the Chalearn LAP and FotW challenge and workshop @ CVPR 2016
The challenge page: http://gesture.chalearn.org/
Organising BMVA Technical Meeting
The Computational Face - Automatic Face Analysis and Synthesis
One Day BMVA symposium in London, UK on 14th October, 2015
Chairs: Brais Martinez, Yorgos Tzimiropoulos and Michel Valstar
Keynote speakers: Tim Cootes (University of Manchester), Darren Cosker (University of Bath), Maja Pantic (Imperial College London), Richard Bowden (University of Surrey)
Webpage and Registration: http://www.bmva.org/meetings