Dr. Sumit Kumar Bag

Senior Principal Scientist

Research Interests

The group has mainly been involved in the design and development of biological software development using Machine learning and Artificial Intelligence. We have also studied the gene expression analysis during plant development and stresses. We have investigated the regulatory genomics of Arabidopsis thaliana during stress. Our group is also interested in studying the evolution of the various gene families within Cotton sp.

Dr. Sumit Kumar Bag

Senior Principal Scientist

Research Summary

Design and Development of Model based SNP pipeline in plants

Major issue now a days is to differentiate the false positive SNPs in biological systemsusing computational tools. This provides a motivation for developing the computational system which is able predict the SNPs in diploid and polyploid species and classify the potential SNPs and provides the complete biological detail of predicted SNPs. Features spanning around the SNPs sites, for classification of True SNPs have not yet been reported. Present work is an attempt to predict efficient SNPs in plant dataset. SNPs flanking nucleotide sequences of four six dataset i.e.,Arabidopsis thaliana, Secalecereale, Solanum lycopersicum, Oryzasativa, Gossypiumhirsutum and Triticumaestivum were analysed for the selection of the distinguishable patterns.  This study represents the highly accurate prediction method capable to classify the potential SNPs by using features solely from the DNA flanking sequences. In this study, mono, di, tri and tetra nucleotide composition and binary composition were introduced to improve the prediction performance in machine learning classifiers trained on known SNPs sequences.  As a result achieved high performance in terms of ROC ranged from 0.75 to 0.95 under 10 fold cross validation. Developed model have been integrated within complex pipeline having Graphical User Interface for ease of multiple users. Hence concluded that developed pipeline PLANET-SNP is a very good prospect to predict the potential SNPs and annotate through single system highly beneficial for the research community belong to non-computational area.

Evolutionary and conservation analysis of core promoter architecture in Gossypiumhirsutum for fiber specific genes.

Gossypiumhirsutum (AADD), allotetraploid cotton are more preferred for agriculture over its diploid progenitors Gossypiumarboreum (AA) and Gossypiumraimondii (DD). To illuminate the domestication process of Gossypiumhirsutum during the speciation event, its genome was re-sequenced over the past few years by different independent groups and multi-dimensional data was generated to decipher the molecular mechanisms of fiber development. With the emergence of such big data in bioinformatics and high-performance computing power gives an opportunity to find out the significant biological information which was neglected somehow. In this series, we remapped the publicly available mock-treated or untreated RNAseq data to extract the fiber specific or exclusive genes through the data mining process. It is also important to understand the evolutionary pressure of cis-regulatory elements of fiber-related genes with their diploid progenitors to postulates a promoter architecture model that will correlate with its innate expression.

Genome-wide identification, functional and evolutionary analysis of Histone deacetylase 2 (HD2) gene family in Gossypium species

Cotton crops are mainly affected by biotic and abiotic stresses which cause loss of yields. Whiteflies (Bemissia tabacci) are main biotic stress factors to influence the growth of cotton yield. Drought and salt are the main causes for abiotic loss of cotton yield. In barley, it is reported that Histone deacetylase 2 (HD2) genes are involved in biotic and abiotic stresses and in Arabidopsis and rice involvement of HD2 genes shown in abiotic stresses. Our research has identified the nine cotton HD2 genes in Gossypium hirsutum and Gossypium barbadense and comprises conserved HD2s domain. These HD2 genes are in cotton also showing significant expression in both biotic and abiotic stress conditions. Therefore, the HD2 gene family may serve as important targets for the improvement of stress tolerance in cotton as well.

Dr. Sumit Kumar Bag

Senior Principal Scientist

Publications

– A Bhardwaj & SK Bag, PLANET-SNP pipeline: PLants based ANnotation and Establishment of TrueSNP pipeline. Genomics 111 (5) (2019) 1066-1077

– S Bhambhani, D Lakhwani, P, A Pandey, Y V Dhar, S K Bag, M H Asif & P K Trivedi, Transcriptome and metabolite analyses in Azadirachta indica: identification of genes involved in biosynthesis of bioactive triterpenoids. Scientific Reports 7 (2017), 5043

– G Verma, Y V Dhar, D Srivastava, M Kidwai, P S Chauhan, S K Bag, M H Asif & Debasis Chakrabarty, Genome-wide analysis of rice dehydrin gene family: Its evolutionary conservedness and expression pattern in response to PEG induced dehydration stress. PLoS ONE (2017) 12(5): e0176399.

– A Bhardwaj, YV Dhar, MH Asif & SK Bag, In Silico identification of SNP diversity in cultivated and wild tomato species: insight from molecular simulations. Scientific Reports 6 (2016), 38715

– Y Indoliya, P Tiwari, A S Chauhan, R Goel, M Shri, S K Bag & D Chakrabarty, Decoding regulatory landscape of somatic embryogenesis reveals differential regulatory networks between japonica and indica rice subspecies. Scientific Reports 6 (2016), 23050

Dr. Sumit Kumar Bag

Senior Principal Scientist

Patents

Dr. Sumit Kumar Bag

Senior Principal Scientist

Research Scholars

Ms. Priti Prasad (SRF)

Ms. Nasreen Bano (SRF)

Mr. Shahre Alam (Project Associate-I)

Dr. Sumit Kumar Bag

Senior Principal Scientist

Address

Computational Biology Lab, Molecular Biology and Biotechnology Division

CSIR-National Botanical Research Institute, Rana Pratap Marg, Lucknow-226001

Phone: 0522-2297914

Email: sumit.bag@nbri.res.in