I'm an MSc in Artificial Intelligence student at Oregon State University, currently collaborating with HP under the mentorship of Dr. Allen Fern . Our project aims to revolutionize the printing industry by automatic AI-driven characterization of PDF documents to be printed on HP's large-scale million-dollar presses.

Alongside the HP project, I'm working with Dr. Prasad to explore the commonsense reasoning in large language models, aiming to enhance the logical and mathematical reasoning capabilities in LLMs, a key to achieving advanced artificial intelligence.

In addition to my research, I work as a data analyst for OSU's Academic Affairs, where, in collaboration with the University Innovation Alliance, we're crafting data-driven strategies to decrease student dropout rates and support their journey to graduation.


Projects

  • Computer Vision

    • AI Image Analysis of HP's Large-scale Printing Presses (2024)
    • Partnered with HP to automate their million-dollar printing press by characterizing the PDF documents to be printed. Characterization of the documents was tackled as a semantic image segmentation challenge, using models such as DeepLab, U-Net, and vision transformers, resulting in significant increase in both speed and accuracy over their classical system.


  • Natural Language Processing

    • Commonsense reasoning in Large Language Models (2024)
    • The project aims to investigate and enhance the reasoning capabilities of LLMs. We are working with problems which require deductions and/or multi-hop reasoning to reach to a valid conclusion. We have forced the LLMs to formulate all the reasoning problems as Constrained Satisfaction Problems (CSP), that is helpful in enabling LLMs with the understanding of constraints, resulting in better deductions and calculations.

    • Mitigating Bias in Downstream NLP Models (2023)
    • The research on debiasing in downstream tasks has mainly focused on a single bias dimension, which is frequently not transferable to other dimensions. This is the gap we aimed to fill, by introducing a generalized adversarial technique for debiasing downstream tasks. We have worked with hate speech detection and experimented with three bias dimensions namely, gender, race and religion. Based on the investigation, it appears that adversarial training has the potential to serve as a generalized debiasing technique.

  • Data Science

    • Black Student Success Initiative @ Oregon State University (2023)
    • We have devised data-driven strategic interventions based on both descriptive and predictive analytics that will reduce stop out of Black students at Oregon State University, and will also enhance the graduation rate among Black students. The interventions if implemented as policies, will help identify students at risk early and automatically. It also will streamline the process for getting required help before reaching the critical level.

    • Missing Values Imputaion with Variational Autoencoder (2023)
    • In this project, we studied a paper emphasizing the superior performance of deep neural networks (DNNs) in the task of matrix completion, outpacing both conventional linear and nonlinear methods. Expanding upon this notion, we incorporated the use of Variational Autoencoder (VAE) to capitalize on its generative ability for imputing the missing entries in matrices. The results from our experimentation validate the superior performance of VAEs over both DNNs and CNNs, also exhibiting robustness as an additional advantage.

      View code on Github

    • Topic Modeling with Polya Distribution (2023)
    • In this project, we have estimated the parameters of the Polya's distribution, concentrating on the beta-binomial model. Through mathematical analysis, we derived and computed the Fisher Information Matrix (FIM) and the Cramer-Rao Lower Bound (CRLB) under well-defined assumptions. Additionally, we delved into Maximum Likelihood Estimation (MLE) and the Method of Moments to estimate our parameters, confronting challenges like the absence of closed-form solutions and the limitations of fixed-point iteration methods.

      View code on Github Presentation on Google Slides

Publications

  • Video Understanding

    • Towards Generalized Violence Detection: a Pose Estimation Approach
      Ahad, Jubayer Hossain and Yesmin, Sadiya and Ananna, Jannatul Joinal and Sumon, Shakil Ahmed and Hashem, Niyaz Bin and Gani, Raihan and Mohammed, Nabeel, Towards Generalized Violence Detection; a Pose Estimation Approach. ((Submitted to Image and Vision Computing)
      [pdf] [Abstract]

      In recent years, there has been a growth in the development of automated systems which can identify human actions. Some of these automated systems are also being developed to detect violence of various forms. Deep learning has shown a lot of promise in detecting violence in videos, especially with the rise of Neural Networks. However, a trend in most of the papers is to report intra-dataset test results. In this paper we evaluate multiple models and demonstrate that they generalize poorly while performing inter-dataset testing. Furthermore, we propose several approaches to incorporate pose data to improve the initial inter-dataset testing results.Three different datasets have been used to carry out the proposed method. These are Hockey Fights Dataset, Movie Dataset, and a dataset based on South- Asian Context.
    • Violent Crowd Flow Detection Using Deep Learning
      Shakil Ahmed Sumon, Tanzil Shahria Himel, Raihan Goni, Nazmul Hossain, Al Marufuzzaman Sajal, Rashedur M. Rahman, Violent Crowd Flow Detection Using Deep Learning, Book Chapter In Lecture Notes in Computer Science, Springer Nature, vol. 11431, pp. 613-625, 2019
      [pdf] [Abstract]

      A dataset has been proposed for detecting violent crowd flows. The dataset has been collected on the context of Bangladesh which includes both violent and non-violent crowd flows. However, different deep learning algorithms and approaches have been applied on this dataset to detect scenarios which con-tain violence. Convolutional neural networks (CNN) and long short-term memory network (LSTM) based architectures have been experimented separately on this dataset and in combination as well. Moreover, a model that was pretrained on violent movie scenes has also been used to leverage transfer learning which outperformed all other experimented approaches with an accuracy of 95.67%. Surprisingly, the sequence model alone or in combination with CNN has not per-formed well on this particular dataset. However, the model is lightweight hence it can be deployed easily in any security systems consisting of CCTV cameras or unmanned aerial vehicles (UAVs).

    • Violence Detection by Pretrained Modules with Different Deep Learning Approaches
      Shakil Ahmed Sumon, Raihan Goni, Niyaz Bin Hashem, Tanzil Shahria Himel, Rashedur M. Rahman, Violence Detection by Pretrained Modules with Different Deep Learning Approaches, In Vietnam Journal of Computer Science, World Scientific, vol. 07, no. 01, pp. 19-40, 2020.
      [pdf] [Abstract]

      In this paper, we have explored different strategies to find out the saliency of the features from different pretrained models in detecting violence in videos. A dataset has been created which consists of violent and non-violent videos of different settings. Three ImageNet models; VGG16, VGG19, ResNet50 are being used to extract features from the frames of the videos. In one of the experiments, the extracted features have been feed into a fully connected network which detects violence in frame level. Moreover, in another experiment, we have fed the extracted features of 30 frames to a long short-term memory (LSTM) network at a time. Furthermore, we have applied attention to the features extracted from the frames through spatial transformer network which also enables transformations like rotation, translation and scale. Along with these models, we have designed a custom convolutional neural network (CNN) as a feature extractor and a pretrained model which is initially trained on a movie violence dataset. In the end, the features extracted from the ResNet50 pretrained model proved to be more salient towards detecting violence. These ResNet50 features, in combination with LSTM provide an accuracy of 97.06% which is better than the other models we have experimented with.

  • Speech Signal Processing

    • Bangla Short Speech Commands Recognition Using Convolutional Neural Networks
      Shakil Ahmed Sumon, Joydip Chowdhury, Sujit Debnath, Nabeel Mohammed, Sifat Momen, Bangla Short Speech Commands Recognition Using Convolutional Neural Networks, In International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1-6, 2018.
      [pdf] [Abstract]

      Despite being one of the most widely spoken languages of the world, no significant efforts have been made in Bangla speech recognition. Speech recognition is a difficult task, particularly if the demand is to do so in noisy real-life conditions. In this study, Bangla short speech commands data set has been reported, where all the samples are taken in the real-life setting. Three different convolutional neural network (CNN) architectures have been designed to recognize those short speech commands. Mel-frequency cepstral coefficients (MFCC) features have been extracted from the audio files in one approach whereas only the raw audio files have been used in another CNN architecture. Lastly, a pre-trained model which is trained on a large English short speech commands data set has been fine-tuned by retraining on Bangla data set. Experimental results reveal that the MFCC model shows better accuracy in recognizing Bangla short speech commands where, surprisingly, the model predicting on raw audio data is very competitive. The models have shown proficiency in identifying single syllable words but encounter difficulties in recognizing multi-syllable commands

  • Theory of Fuzzy Systems

    • Fuzzy Predictive Model for Estimating the Risk Level of Maternal Mortality while Childbirth
      Shakil Ahmed Sumon, Rashedur M. Rahman, Fuzzy Predictive Model for Estimating the Risk Level of Maternal Mortality while Childbirth, In 9th International Conference on Intelligent Systems (IS'18), pp. 87-93, 2018.
      [pdf] [Abstract]

      The paper proposes a predictive model to estimate the risk level of maternal mortality while giving birth using fuzzy logic. The model is developed based on the fuzziness of following variables: place, total children, antenatal visits, blood pressure checked, tetanus injection taken, iron tablets/syrup taken, habit of watching television. The Mamdani fuzzy inference method is used and the output generated by the model is the risk level in percentage. 23 rules are generated based on the inputs and a real data set is used to validate the model. Moreover, an ANFIS model is generated. Confusion matrices are used to evaluate the accuracy, sensitivity and specificity of both models. Fuzzy predictive model has an edge over the ANFIS model in terms of accuracy, sensitivity and specificity. Fuzzy predictive model has an accuracy of 88.95 percent which is reliable and can be deployed in remote areas to help in dealing the problem of maternal mortality.

Datasets

  • Bangla Short Speech Commands Dataset: [link]
  • Violence Dataset in the context of South-East Asia: [link]

Contact

  • Email: sumons@oregonstate.edu
  • Phone: +1 (541) 250-7201