Shakil Ahmed Sumon

Projects

Computer Vision
- AI Image Analysis of HP's Large-scale Printing Presses (2024)

Natural Language Processing
- Commonsense reasoning in Large Language Models (2024)
- Mitigating Bias in Downstream NLP Models (2023)

Data Science
- Black Student Success Initiative @ Oregon State University (2023)
- Missing Values Imputaion with Variational Autoencoder (2023)
- Topic Modeling with Polya Distribution (2023)

Publications

Video Understanding
- Towards Generalized Violence Detection: a Pose Estimation Approach
  Ahad, Jubayer Hossain and Yesmin, Sadiya and Ananna, Jannatul Joinal and Sumon, Shakil Ahmed and Hashem, Niyaz Bin and Gani, Raihan and Mohammed, Nabeel, Towards Generalized Violence Detection; a Pose Estimation Approach. ((Submitted to Image and Vision Computing)
  [pdf] [Abstract]
  
  In recent years, there has been a growth in the development of automated systems which can identify human actions. Some of these automated systems are also being developed to detect violence of various forms. Deep learning has shown a lot of promise in detecting violence in videos, especially with the rise of Neural Networks. However, a trend in most of the papers is to report intra-dataset test results. In this paper we evaluate multiple models and demonstrate that they generalize poorly while performing inter-dataset testing. Furthermore, we propose several approaches to incorporate pose data to improve the initial inter-dataset testing results.Three different datasets have been used to carry out the proposed method. These are Hockey Fights Dataset, Movie Dataset, and a dataset based on South- Asian Context.
- Violent Crowd Flow Detection Using Deep Learning
  Shakil Ahmed Sumon, Tanzil Shahria Himel, Raihan Goni, Nazmul Hossain, Al Marufuzzaman Sajal, Rashedur M. Rahman, Violent Crowd Flow Detection Using Deep Learning, Book Chapter In Lecture Notes in Computer Science, Springer Nature, vol. 11431, pp. 613-625, 2019
  [pdf] [Abstract]
  
  A dataset has been proposed for detecting violent crowd flows. The dataset has been collected on the context of Bangladesh which includes both violent and non-violent crowd flows. However, different deep learning algorithms and approaches have been applied on this dataset to detect scenarios which con-tain violence. Convolutional neural networks (CNN) and long short-term memory network (LSTM) based architectures have been experimented separately on this dataset and in combination as well. Moreover, a model that was pretrained on violent movie scenes has also been used to leverage transfer learning which outperformed all other experimented approaches with an accuracy of 95.67%. Surprisingly, the sequence model alone or in combination with CNN has not per-formed well on this particular dataset. However, the model is lightweight hence it can be deployed easily in any security systems consisting of CCTV cameras or unmanned aerial vehicles (UAVs).
- Violence Detection by Pretrained Modules with Different Deep Learning Approaches
  Shakil Ahmed Sumon, Raihan Goni, Niyaz Bin Hashem, Tanzil Shahria Himel, Rashedur M. Rahman, Violence Detection by Pretrained Modules with Different Deep Learning Approaches, In Vietnam Journal of Computer Science, World Scientific, vol. 07, no. 01, pp. 19-40, 2020.
  [pdf] [Abstract]
  
  In this paper, we have explored different strategies to find out the saliency of the features from different pretrained models in detecting violence in videos. A dataset has been created which consists of violent and non-violent videos of different settings. Three ImageNet models; VGG16, VGG19, ResNet50 are being used to extract features from the frames of the videos. In one of the experiments, the extracted features have been feed into a fully connected network which detects violence in frame level. Moreover, in another experiment, we have fed the extracted features of 30 frames to a long short-term memory (LSTM) network at a time. Furthermore, we have applied attention to the features extracted from the frames through spatial transformer network which also enables transformations like rotation, translation and scale. Along with these models, we have designed a custom convolutional neural network (CNN) as a feature extractor and a pretrained model which is initially trained on a movie violence dataset. In the end, the features extracted from the ResNet50 pretrained model proved to be more salient towards detecting violence. These ResNet50 features, in combination with LSTM provide an accuracy of 97.06% which is better than the other models we have experimented with.

Speech Signal Processing
- Bangla Short Speech Commands Recognition Using Convolutional Neural Networks
  Shakil Ahmed Sumon, Joydip Chowdhury, Sujit Debnath, Nabeel Mohammed, Sifat Momen, Bangla Short Speech Commands Recognition Using Convolutional Neural Networks, In International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1-6, 2018.
  [pdf] [Abstract]
  
  Despite being one of the most widely spoken languages of the world, no significant efforts have been made in Bangla speech recognition. Speech recognition is a difficult task, particularly if the demand is to do so in noisy real-life conditions. In this study, Bangla short speech commands data set has been reported, where all the samples are taken in the real-life setting. Three different convolutional neural network (CNN) architectures have been designed to recognize those short speech commands. Mel-frequency cepstral coefficients (MFCC) features have been extracted from the audio files in one approach whereas only the raw audio files have been used in another CNN architecture. Lastly, a pre-trained model which is trained on a large English short speech commands data set has been fine-tuned by retraining on Bangla data set. Experimental results reveal that the MFCC model shows better accuracy in recognizing Bangla short speech commands where, surprisingly, the model predicting on raw audio data is very competitive. The models have shown proficiency in identifying single syllable words but encounter difficulties in recognizing multi-syllable commands

Theory of Fuzzy Systems
- Fuzzy Predictive Model for Estimating the Risk Level of Maternal Mortality while Childbirth
  Shakil Ahmed Sumon, Rashedur M. Rahman, Fuzzy Predictive Model for Estimating the Risk Level of Maternal Mortality while Childbirth, In 9th International Conference on Intelligent Systems (IS'18), pp. 87-93, 2018.
  [pdf] [Abstract]
  
  The paper proposes a predictive model to estimate the risk level of maternal mortality while giving birth using fuzzy logic. The model is developed based on the fuzziness of following variables: place, total children, antenatal visits, blood pressure checked, tetanus injection taken, iron tablets/syrup taken, habit of watching television. The Mamdani fuzzy inference method is used and the output generated by the model is the risk level in percentage. 23 rules are generated based on the inputs and a real data set is used to validate the model. Moreover, an ANFIS model is generated. Confusion matrices are used to evaluate the accuracy, sensitivity and specificity of both models. Fuzzy predictive model has an edge over the ANFIS model in terms of accuracy, sensitivity and specificity. Fuzzy predictive model has an accuracy of 88.95 percent which is reliable and can be deployed in remote areas to help in dealing the problem of maternal mortality.

Datasets

Bangla Short Speech Commands Dataset: [link]
Violence Dataset in the context of South-East Asia: [link]

Contact

Email: sumons@oregonstate.edu
Phone: +1 (541) 250-7201