data mining ppt

Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Applications and Trends in Data Mining :

Applications and Trends in Data Mining Data Mining For Biological Data Analysis

Factors that led for the development:

Factors that led for the development The past decade has seen an explosive growth in: 1.Genomics 2.Proteomics 3.Functional genomics 4.Biomedical research Identification and comparative analysis of genomes of humans and other species for investigation of genetic networks. Development of new Pharmaceuticals and advances in cancer therapies.

PowerPoint Presentation:

DNA sequences form the foundation of genetic codes of all living organisms. DNA sequences are comprised of four basic building blocks called nucleotides : 1.adenine (A) 2.cytosine (C) 3.guanine (G) 4.thymine (T) These four nucleotides ( or bases ) are combined to form long chains that resemble a twisted ladder.

PowerPoint Presentation:

DNA sequence … CTA CAC ACG TGT AAC … A gene usually comprises hundreds of individual nucleotides arranged in particular order. A genome is the complete set of genes of an organism. Genomics is the analysis of genome sequences. A proteome is the complete set of protein molecules present in a cell, tissue, or organism. Proteomics is the study of proteome sequences.

Data mining may contribute to the biological data analysis in the following aspects. :

Data mining may contribute to the biological data analysis in the following aspects.

Biological data mining has become an essential part of new research field called bioinformatics.:

Biological data mining has become an essential part of new research field called bioinformatics .

1)Semantic integration of heterogeneous, distributed genomic and proteomic data bases.:

1)Semantic integration of heterogeneous, distributed genomic and proteomic data bases. Genomic and proteomic data sets are often generated at different labs and by different methods. They are distributed, heterogeneous, and of wide variety. Integration of such data is essential to cross-site analysis of biological data . Such integration and linkage analysis would facilitate the systematic and coordinated analysis of genome and biological data.

PowerPoint Presentation:

This has promoted the development of integrated data warehouses to store and manage derived biological data. Data cleaning, data integration, reference reconciliation, classification, and clustering methods will facilitate the integration of biological data and the construction of data warehouses for biological data analysis.

2)Alignment, indexing, similarity search, and comparative analysis of multiple nucleotide/protein sequences. :

2)Alignment, indexing, similarity search, and comparative analysis of multiple nucleotide/protein sequences. BLAST and FASTA, in particular, are the tools for the systematic analysis of genomic and proteomic data. Biological sequence analysis methods differ from many sequential pattern analysis algorithms proposed in data mining. For protein sequences, two amino acids should also be considered a “match” if one can be derived from the other by substitutions that are likely to occur in nature.

PowerPoint Presentation:

There is a combinatorial number of ways to approximately align multiple sequences: 1)reducing a multiple alignment to a series of pair wise alignments and then combining the result. 2)using Hidden Markow Models or HMMs. Multiple alignment can be used to identify highly conserved residues among genomes and they can be used to build phylogenetic trees to infer evolutionary relationships among species. Genomic and proteomic sequences isolated from diseased and healthy tissues can be compared to identify critical differences between them. Sequences occurring in the diseased samples may indicate the genetic factor of the disease.

3)Discovery of structural patterns and analysis of genetic networks and protein pathways.:

3)Discovery of structural patterns and analysis of genetic networks and protein pathways. Protein sequences are folded into 3D structures, and such structures interact with each other based on the relative position and distances between them. Such complex interactions lead to the formation of genetic networks and protein pathways. It is important to develop powerful and scalable data mining to discover patterns and to study about regularities and irregularities among complex biological network.

4)Association and path analysis: identifying co-occurring gene sequences and linking genes to different stages of disease development .:

4)Association and path analysis: identifying co-occurring gene sequences and linking genes to different stages of disease development . Many studies have been focused on comparison of one gene to another. Most diseases are not triggered by a single gene but by a combination of genes acting together. Association analysis methods can be used to determine the kinds of genes that are likely to co-occur in target samples. A group of genes may contribute to a disease process, here path analysis is expected to play an important role.

5)Visualization tools in genetic data analysis.:

5)Visualization tools in genetic data analysis. Alignments among genomic or proteomic sequences and interactions between them can be expressed in 1)Graphic forms. 2)Transformed into various kinds of easy-to-understand visual displays. They facilitate pattern understanding, knowledge discovery, and interactive data exploration .

Thank you:

Thank you

authorStream Live Help