Publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- Towards Safer Social Media Platforms: Scalable and Performant Few-Shot Harmful Content Moderation Using Large Language ModelsAkash Bonagiri, Lucen Li, Rajvardhan Oak, and 3 more authorsarXiv preprint arXiv:2501.13976, 2025
The prevalence of harmful content on social media platforms poses significant risks to users and society, necessitating more effective and scalable content moderation strategies. Current approaches rely on human moderators, supervised classifiers, and large volumes of training data, and often struggle with scalability, subjectivity, and the dynamic nature of harmful content (e.g., violent content, dangerous challenge trends, etc.). To bridge these gaps, we utilize Large Language Models (LLMs) to undertake few-shot dynamic content moderation via in-context learning. Through extensive experiments on multiple LLMs, we demonstrate that our few-shot approaches can outperform existing proprietary baselines (Perspective and OpenAI Moderation) as well as prior state-of-the-art few-shot learning methods, in identifying harm. We also incorporate visual information (video thumbnails) and assess if different multimodal techniques improve model performance. Our results underscore the significant benefits of employing LLM based methods for scalable and dynamic harmful content moderation online.
@article{bonagiri2025towards, title = {Towards Safer Social Media Platforms: Scalable and Performant Few-Shot Harmful Content Moderation Using Large Language Models}, author = {Bonagiri, Akash and Li, Lucen and Oak, Rajvardhan and Babar, Zeerak and Wojcieszak, Magdalena and Chhabra, Anshuman}, journal = {arXiv preprint arXiv:2501.13976}, year = {2025}, url = {https://arxiv.org/abs/2501.13976} }
2024
- NLP4Gov: A Comprehensive Library for Computational Policy AnalysisMahasweta Chakraborti, Sailendra Akash Bonagiri, Santiago Virgüez-Ruiz, and 1 more authorIn Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024
Formal rules and policies are fundamental in formally specifying a social system: its operation, boundaries, processes, and even ontology. Recent scholarship has highlighted the role of formal policy in collective knowledge creation, game communities, the production of digital public goods, and national social media governance. Researchers have shown interest in how online communities convene tenable self-governance mechanisms to regulate member activities and distribute rights and privileges by designating responsibilities, roles, and hierarchies. We present NLP4Gov, an interactive kit to train and aid scholars and practitioners alike in computational policy analysis. The library explores and integrates methods and capabilities from computational linguistics and NLP to generate semantic and symbolic representations of community policies from text records. Versatile, documented, and accessible, NLP4Gov provides granular and comparative views into institutional structures and interactions, along with other information extraction capabilities for downstream analysis.
@inproceedings{chakraborti2024nlp4gov, title = {NLP4Gov: A Comprehensive Library for Computational Policy Analysis}, author = {Chakraborti, Mahasweta and Bonagiri, Sailendra Akash and Virg{\"u}ez-Ruiz, Santiago and Frey, Seth}, booktitle = {Extended Abstracts of the CHI Conference on Human Factors in Computing Systems}, pages = {1--8}, year = {2024}, url = {https://dl.acm.org/doi/abs/10.1145/3613905.3650810}, organization = {ACM} }
2022
- Aletheia: A fake news detection system for HindiJathin Badam, Akash Bonagiri, Kvln Raju, and 1 more authorIn Proceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD), 2022
Received Best Paper Award for this submission
“Fake News” and Misinformation can have far-reaching negative social impacts. Scalable fake news classification techniques for resource-poor languages such as Hindi are in their infancy due to the lack of data sets and lack of robust NLP libraries in these languages. We present Aletheia, a Fake News classification system for Hindi. We curate a dataset of approximately 13,000 news articles by media organizations that flag authentic and fake news. We present preliminary results using several Machine Learning models on this dataset. We also developed a system accessible over the web (http://responsible-tech.bits-hyderabad.ac.in/aletheia/demo/) using which users can test if a given piece of news is fake or authentic. We also use the website to collect crowd-sourced labelled news data and present additional information on the dataset and the models to the users.
@inproceedings{badam2022aletheia, title = {Aletheia: A fake news detection system for Hindi}, author = {Badam, Jathin and Bonagiri, Akash and Raju, Kvln and Chakraborty, Dipanjan}, booktitle = {Proceedings of the 5th Joint International Conference on Data Science \& Management of Data (9th ACM IKDD CODS and 27th COMAD)}, pages = {255--259}, year = {2022}, url = {https://dl.acm.org/doi/abs/10.1145/3493700.3493736}, organization = {ACM} }
2021
- A poster on learnings from an attempt to build an NLP-based fake news classification system for HindiBS Akash, Jathin Badam, KVLN Raju, and 1 more authorIn Proceedings of the 4th ACM SIGCAS Conference on Computing and Sustainable Societies, 2021
Proliferation of “Fake News” and misinformation is resulting in widespread negative social fallout. Scalable Fake News classification techniques for resource poor languages like Hindi are in early stages because of a lack of datasets and lack of robust NLP libraries for these languages. In this exploratory study we curate a dataset of around 13,000 data points of true news articles, and, articles on fake news authored by media organisations which flag fake news. We then use seven ML classification models on this dataset and present the preliminary results. Our results show that concerted efforts need to be made by the research community towards dataset curation and improving the NLP models for resource poor languages in order to make scalable classification systems.
@inproceedings{akash2021poster, title = {A poster on learnings from an attempt to build an NLP-based fake news classification system for Hindi}, author = {Akash, BS and Badam, Jathin and Raju, KVLN and Chakraborty, Dipanjan}, booktitle = {Proceedings of the 4th ACM SIGCAS Conference on Computing and Sustainable Societies}, pages = {397--401}, year = {2021}, url = {https://dl.acm.org/doi/abs/10.1145/3460112.3471974}, organization = {ACM} }