Manish Gawali

I'm an Applied Scientist at Amazon. I have research experience in Computer Vision, NLP, Generative AI, Agentic AI, and Distributed Deep Learning.

I was a Data Science Techlead at DeepTek Inc

I completed my Bachelor's in Computer Science at Pune Institute of Computer Technology (University of Pune) in 2018 and Master's in Computer Science at University of Southern California in 2023.

Email / CV / Github / LinkedIn / Google Scholar

Updates

[April 2025] Reviewer at Machine Learning For Healthcare (MLHC) 2025
[August 2023] Reviewer at ML4H: Machine Learning for Health
[August 2022] Reviewer at ML4H: Machine Learning for Health
[August 2021] Reviewer at ML4H: Machine Learning for Health
[June 2021] Reviewer at Secure and Privacy-Preserving Machine Learning for Medical Imaging: MICCAI 2021 Workshop and Tutorial
[May 2021] Reviewer at 2021 International Conference on Artificial Intelligence and its Applications

Research

My current research interest is to work on Agentic AI and Evaluation of Agents. My publications are listed below:

	SetLexSem Challenge: Using set operations to evaluate the lexical and semantic robustness of language models Bardiya Akhbari, Manish Gawali, Nicholas Dronen NeurIPS 2024 Conference Page / Publication / Code / Blog/Media Coverage Set theory is foundational to mathematics and, when sets are finite, to reasoning about the world. An intelligent system should perform set operations consistently, regardless of superficial variations in the operands. Initially designed for semantically-oriented NLP tasks, large language models (LLMs) are now being evaluated on algorithmic tasks. Because sets are comprised of arbitrary symbols (e.g. numbers, words), they provide an opportunity to test, systematically, the invariance of LLMs’ algorithmic abilities under simple lexical or semantic variations. To this end, we present the SETLEXSEM CHALLENGE, a synthetic benchmark that evaluates the performance of LLMs on set operations. SETLEXSEM assesses the robustness of LLMs’ instruction-following abilities under various conditions, focusing on the set operations and the nature and construction of the set members. Evaluating seven LLMs with SETLEXSEM, we find that they exhibit poor robustness to variation in both operation and operands. We show — via the framework’s systematic sampling of set members along lexical and semantic dimensions — that LLMs are not only not robust to variation along these dimensions but demonstrate unique failure modes in particular, easy-to-create semantic groupings of "deceptive" sets. We find that rigorously measuring language model robustness to variation in frequency and length is challenging and present an analysis that measures them independently.
	Comparsion of Privacy-Preserving Distributed Deep Learning Methods in Healthcare Manish Gawali, Arvind C S, Harshit Madaan, Shriya Suryavanshi, Ashrika Gaikwad, Bhanu Prakash KN, Viraj Kulkarni, Aniruddha Pant MIUA 2021 Conference Page / Publication / Video Proposed SplitFedv3 and Alternate Mini-batch training. Also, compared all SOTA privacy-preserving distributed learning methods in terms of four key metrics like AI model performance, elapsed training time, data communication between entities, and computations.
	Key Technology Considerations in Developing and Deploying Machine Learning Models in Clinical Radiology Practice Viraj Kulkarni, Manish Gawali, Amit Kharat JMIR Medical Informatics Journal Page / Publication The development, deployment, and eventual adoption of AI models in clinical practice is fraught with challenges. In this paper, we propose a list of key considerations that machine learning researchers must recognize and address to make their models accurate, robust, and usable in practice.
	Deep Learning Models for Calculation of Cardiothoracic Ratio from Chest Radiographs for Assisted Diagnosis of Cardiomegaly Tanveer Gupte, Mrunmai Niljikar, Manish Gawali, Viraj Kulkarni, Amit Kharat, Aniruddha Pant icABCD 2021 Conference Page / Publication Proposed an automated method based on deep learning to compute the cardiothoracic ratio and detect the presence of cardiomegaly from chest radiographs.
	Vulnerability Due to Training Order in Split Learning Harshit Madaan, Manish Gawali, Viraj Kulkarni, Aniruddha Pant ICT4SD 2021 Conference Page / Publication We demonstrate a flaw with sequential training in Split Learning which can lead to 'Catastrophic Forgetting'. SplitFedv3 algorithm mitigates this problem while still leveraging the privacy benefits provided by split learning.
	A deep learning approach for automated diagnosis of pulmonary embolism on computed tomographic pulmonary angiography Pranav Ajmera, Amit Kharat, Jitesh Seth, Snehal Rathi, Richa Pant, Manish Gawali , Viraj Kulkarni, Ragamayi Maramraju, Isha Kedia, Rajesh Botchu, Sanjay Khaladkar BMC Medical Imaging Journal Page / Publication The development of an AI model and its use for the identification of pulmonary embolism will support healthcare workers by reducing the rate of missed findings and minimizing the time required to screen the scans.
	Application of Federated Learning in building a robust COVID-19 Chest X-ray classification Model Amartya Bhattacharya, Manish Gawali , Jitesh Seth, Viraj Kulkarni Arxiv Arxiv Pre-Print In this paper, we applied the Federated Learning-based framework to present a robust solution for classifying COVID and nonCOVID chest X-ray images. We trained 5 different models to compare the results. Three of those models were built on the corresponding clients’ data, one was built using Federated Learning, and another one by combining all the data.
	Automated assessment of chest CT severity scores in patients suspected of COVID-19 infection Pranav Ajmera, Snehal Rathi, Udayan Dosi, Suvarna Lakshmi Kalli, Avinav Luthra, Sanjay Khaladkar, Richa Pant, Jitesh Seth, Pranshu Mishra, Manish Gawali , Yash Pargaonkar, Viraj Kulkarni, Amit Kharat Medrxiv Medrxiv Pre-Print A deep learning model capable of identifying consolidations and ground-glass opacities from the chest CT images of COVID-19 patients was used to provide CT severity scores on a 25-point scale for definitive pathogen diagnosis. The model was tested on a dataset of 469 confirmed COVID-19 cases from a tertiary care hospital. The quantitative diagnostic performance of the model was compared with three experienced human readers.