Sagnik Ray Choudhury

Title: Assistant Professor

Department: Computer Science and Engineering

College: College of Engineering

Discovery Park Building F262
Sagnik.Raychoudhury@unt.edu
personal

General Information Previous Courses Publications Research

Curriculum Vitae

Curriculum Vitae Link

Education

PhD, Pennsylvania State University, 2017
Major: Information Sciences and Technology

Current Scheduled Teaching

CSCE 5900.878

Special Problems

Summer 10W 2025

Texas Education Code 51.974 (HB 2504) requires each institution of higher education to make available to the public, a syllabus for undergraduate lecture courses offered for credit by the institution.

Previous Scheduled Teaching

CSCE 5934.878	Directed Study	Spring 2025
CSCE 6940.978	Individual Research	Spring 2025
CSCE 4290.002	Introduction to Natural Language Processing	Spring 2025	Syllabus	SPOT
CSCE 5950.878	Master's Thesis	Spring 2025
CSCE 5290.002	Natural Language Processing	Spring 2025		SPOT
CSCE 5934.878	Directed Study	Fall 2024
CSCE 6940.878	Individual Research	Fall 2024
CSCE 5950.878	Master's Thesis	Fall 2024
CSCE 5290.005	Natural Language Processing	Fall 2024	Syllabus	SPOT

Published Intellectual Contributions

Conference Proceeding

Akella, A.P., Choudhury, S., Koop, D., Alhoori, H., Serra, E., Spezzano, F. (2024). Navigating the Landscape of Reproducible Research: A Predictive Modeling Approach. Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, CIKM 2024, Boise, ID, USA, October 21-25, 2024. 24--33. ACM. https://doi.org/10.1145/3627673.3679831
Dutt, R., Ray Choudhury, S., Rao, V.V., Rose, C., Vydiswaran, V., Hupkes, D., Dankers, V., Batsuren, K., Kazemnejad, A., Christodoulopoulos, C., Giulianelli, M., Cotterell, R. (2024). Investigating the Generalizability of Pretrained Language Models across Multiple Dimensions: A Case Study of NLI and MRC. Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. 165--182. Miami, Florida, USA, Association for Computational Linguistics. https://aclanthology.org/2024.genbench-1.11/
Yaneva, V., North, K., Baldwin, P., Ha, Le An, Rezayi, S., Zhou, Y., Ray Choudhury, S., Harik, P., Clauser, B., Kochmar, E., Bexte, M., Burstein, J., Horbach, A., Laarmann-Quante, R., Tack, Ana\"\is, Yaneva, V., Yuan, Z. (2024). Findings from the First Shared Task on Automated Prediction of Difficulty and Response Time for Multiple-Choice Questions. Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024). 470--482. Mexico City, Mexico, Association for Computational Linguistics. https://aclanthology.org/2024.bea-1.39
Choudhury, S.R., Atanasova, P., Augenstein, I., Bouamor, H., Pino, J., Bali, K. (2023). Explaining Interactions Between Text Spans. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023. 12709--12730. Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.783
Choudhury, S.R., Kalra, J., Jiang, J., Reitter, D., Deng, S. (2023). Implications of Annotation Artifacts in Edge Probing Test Datasets. Proceedings of the 27th Conference on Computational Natural Language Learning, CoNLL 2023, Singapore, December 6-7, 2023. 575--586. Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.conll-1.39
Choudhury, S.R., Bhutani, N., Augenstein, I., Calzolari, N., Huang, C., Kim, H., Pustejovsky, J., Wanner, L., Choi, K., Ryu, P., Chen, H., Donatelli, L., Ji, H., Kurohashi, S., Paggio, P., Xue, N., Kim, S., Hahm, Y., He, Z., Lee, T.K., Santus, E., Bond, F., Na, S. (2022). Can Edge Probing Tests Reveal Linguistic Knowledge in QA Models?. Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022. International Committee on Computational Linguistics. https://aclanthology.org/2022.coling-1.139
Choudhury, S.R., Rogers, A., Augenstein, I., Calzolari, N., Huang, C., Kim, H., Pustejovsky, J., Wanner, L., Choi, K., Ryu, P., Chen, H., Donatelli, L., Ji, H., Kurohashi, S., Paggio, P., Xue, N., Kim, S., Hahm, Y., He, Z., Lee, T.K., Santus, E., Bond, F., Na, S. (2022). Machine Reading, Fast and Slow: When Do Models ``Understand'' Language?. Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022. International Committee on Computational Linguistics. https://aclanthology.org/2022.coling-1.8
Lester, B., Choudhury, S.R., Prasad, R., Bangalore, S., Kim, Y., Li, Y., Rambow, O. (2021). Intent Features for Rich Natural Language Understanding. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, NAACL-HLT 2021, Online, June 6-11, 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-industry.27
Lester, B., Pressel, D., Hemmeter, A., Choudhury, S.R., Bangalore, S., Cohn, T., He, Y., Liu, Y. (2020). Constrained Decoding for Computationally Efficient Named Entity Recognition Taggers. Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020. EMNLP 2020 Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.166
Chiatti, A., Cho, M.J., Gagneja, A., Yang, X., Brinberg, M., Roehrick, K., Choudhury, S.R., Ram, N., Reeves, B., Giles, C.L., Haddad, H.M., Wainwright, R.L., Chbeir, R. (2018). Text extraction and retrieval from smartphone screenshots: building a repository for life in media. Proceedings of the 33rd Annual ACM Symposium on Applied Computing, SAC 2018, Pau, France, April 09-13, 2018. ACM. https://doi.org/10.1145/3167132.3167236
Pressel, D., Ray Choudhury, S., Lester, B., Zhao, Y., Barta, M. (2018). Baseline: A Library for Rapid Modeling, Experimentation and Development of Deep Learning Algorithms targeting NLP. Proceedings of Workshop for NLP Open Source Software (NLP-OSS). 34--40. Melbourne, Australia, Association for Computational Linguistics. https://aclanthology.org/W18-2506
Wu, J., Choudhury, S., Chiatti, A., Liang, C., Giles, C.L. (2017). HESDK: A Hybrid Approach to Extracting Scientific Domain Knowledge Entities. 2017 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017, Toronto, ON, Canada, June 19-23, 2017. IEEE Computer Society. https://doi.org/10.1109/JCDL.2017.7991580
Al-Zaidy, R.A., Choudhury, S.R., Giles, C.L., Khabsa, M., Giles, C.L., Wade, A.D. (2016). Automatic Summary Generation for Scientific Data Charts. Scholarly Big Data: AI Perspectives, Challenges, and Ideas, Papers from the 2016 AAAI Workshop, Phoenix, Arizona, USA, February 13, 2016. WS-16-13 AAAI Press. http://www.aaai.org/ocs/index.php/WS/AAAIW16/paper/view/12661
Choudhury, S.R., Wang, S., Giles, C.L., Adam, N.R., Lillian (Boots) Cassel, Yesha, Y., Furuta, R., Weigle, M.C. (2016). Curve Separation for Line Graphs in Scholarly Documents. Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2016, Newark, NJ, USA, June 19 - 23, 2016. ACM. https://doi.org/10.1145/2910896.2925469
Choudhury, S.R., Wang, S., Giles, C.L., Groppe, S., Le Gruenwald. (2016). Scalable algorithms for scholarly figure mining and semantics. Proceedings of the International Workshop on Semantic Big Data, San Francisco, CA, USA, July 1, 2016. ACM. https://doi.org/10.1145/2928294.2928305
Ray Choudhury, S., Giles, C.L. (2015). An Architecture for Information Extraction from Figures in Digital Libraries. Proceedings of the 24th International Conference on World Wide Web. 667–672. New York, NY, USA, Association for Computing Machinery. https://doi.org/10.1145/2740908.2741712
Choudhury, S.R., Mitra, P., Giles, C.L., Vanoirbeek, C., Pierre Genev\`es. (2015). Automatic Extraction of Figures from Scholarly Documents. Proceedings of the 2015 ACM Symposium on Document Engineering, DocEng 2015, Lausanne, Switzerland, September 8-11, 2015. ACM. https://doi.org/10.1145/2682571.2797085
Wu, J., Killian, J., Yang, H., Williams, K., Choudhury, S., Tuarob, S., Caragea, C., Giles, C.L., Barker, K., Jos\'e Manu\'el G\'omez-P\'erez. (2015). PDFMEF: A Multi-Entity Knowledge Extraction Framework for Scholarly Documents and Semantic Search. Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015, Palisades, NY, USA, October 7-10, 2015. ACM. https://doi.org/10.1145/2815833.2815834
Williams, K., Wu, J., Choudhury, S.R., Khabsa, M., Giles, C.L. (2014). Scholarly big data information extraction and integration in the CiteSeer\(^\mbox\(\chi\)\) digital library. Workshops Proceedings of the 30th International Conference on Data Engineering Workshops, ICDE 2014, Chicago, IL, USA, March 31 - April 4, 2014. IEEE Computer Society. https://doi.org/10.1109/ICDEW.2014.6818305
Wu, Z., Wu, J., Khabsa, M., Williams, K., Chen, H., Huang, W., Tuarob, S., Choudhury, S.R., Ororbia, A., Mitra, P., Giles, C.L. (2014). Towards building a scholarly big data platform: Challenges, lessons and opportunities. IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014, London, United Kingdom, September 8-12, 2014. IEEE Computer Society. https://doi.org/10.1109/JCDL.2014.6970157
Choudhury, S., Tuarob, S., Mitra, P., Rokach, L., Kirk, A., Szep, S., Pellegrino, D.A., Jones, S., Giles, C.L., Downie, J.S., McDonald, R.H., Cole, T.W., Sanderson, R., Shipman, F. (2013). A figure search engine architecture for a chemistry digital library. 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL '13, Indianapolis, IN, USA, July 22 - 26, 2013. ACM. https://doi.org/10.1145/2467696.2467757
Choudhury, S., Mitra, P., Kirk, A., Szep, S., Pellegrino, D.A., Jones, S., Giles, C.L. (2013). Figure Metadata Extraction from Digital Documents. 12th International Conference on Document Analysis and Recognition, ICDAR 2013, Washington, DC, USA, August 25-28, 2013. IEEE Computer Society. https://doi.org/10.1109/ICDAR.2013.34
Williams, K., Chen, H., Choudhury, S.R., Giles, C.L., Forner, P., Navigli, R., Tufis, D., Ferro, N. (2013). Unsupervised Ranking for Plagiarism Source Retrieval Notebook for PAN at CLEF 2013. Working Notes for CLEF 2013 Conference , Valencia, Spain, September 23-26, 2013. 1179 CEUR-WS.org. https://ceur-ws.org/Vol-1179/CLEF2013wn-PAN-WilliamsEt2013.pdf
Khabsa, M., Carman, S., Choudhury, S.R., Giles, C.L., Trotman, A., Clarke, C.L., Ounis, I., Culpepper, J.S., Cartright, M., Geva, S. (2012). A Framework for Bridging the Gap Between Open Source Search Tools. Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval, OSIR@SIGIR 2012, Portland, Oregon, USA, 16th August 2012. University of Otago, Dunedin, New Zealand.

Journal Article

Stańczak, K., Ray Choudhury, S., Pimentel, T., Cotterell, R., Augenstein, I. (2023). Quantifying gender bias towards politicians in cross-lingual language models. PLOS One. 18 (11) 1-24. Public Library of Science. https://doi.org/10.1371/journal.pone.0277640
Lester, B., Pressel, D., Hemmeter, A., Choudhury, S.R., Bangalore, S. (2020). Multiple Word Embeddings for Increased Diversity of Representation. Other. abs/2009.14394 https://arxiv.org/abs/2009.14394
Kanan, T., Choudhury, S.R., Giles, C.L., Chandrasekar, P., Fox, E.A. (2015). Digital Library and Archiving for Qatar. Other. 11 (2) https://bulletin.jcdl.org/Bulletin/v11n2/papers/kanan.pdf
Lahiri, S., Choudhury, S.R., Caragea, C. (2014). Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks. Other. abs/1401.6571 http://arxiv.org/abs/1401.6571

,
Overall Summative Rating	Challenge and Engagement Index	Response Rate
out of 5	out of 7	% of students responded

Overall Summative Rating (median):
This rating represents the combined responses of students to the four global summative items and is presented to provide an overall index of the class’s quality. Overall summative statements include the following (response options include a Likert scale ranging from 5 = Excellent, 3 = Good, and 1= Very poor):
- The course as a whole was
- The course content was
- The instructor’s contribution to the course was
- The instructor’s effectiveness in teaching the subject matter was
Challenge and Engagement Index:
This rating combines student responses to several SPOT items relating to how academically challenging students found the course to be and how engaged they were. Challenge and Engagement Index items include the following (response options include a Likert scale ranging from 7 = Much higher, 4 = Average, and 1 = Much lower):
- Do you expect your grade in this course to be
- The intellectual challenge presented was
- The amount of effort you put into this course was
- The amount of effort to succeed in this course was
- Your involvement in course (doing assignments, attending classes, etc.) was