Computational Biology
scRNA
3_ENTRIES- Gene Expression Omnibus
Public functional genemics database.
- Single Cell PORTAL
Public database for single cell RNA.
- Single Cell Expression Atlas
Public database for single cell RNA.
Compound
13_ENTRIES- PubChem
One of the biggest chemical database such as compounds, genes and proteins.
- ChEBI
Chemical database focused on small chemical compounds.
- ChEMBL
Database of bioactive molecules with drug-like properties.
- ChemSpider
Chemical structure database.
- KEGG COMPOUND
Collection of small molecules and biopolymers.
- LIPID MAPS
Database of lipids.
- Rhea
Database of chemical reactions.
- Drug Repurposing Hub
Collections of drug repurposing data containing drug, moa, target etc.
- Therapeutic Target Database
collections of drug-target, target-disease, and drug-disease dataset.
- ZINC ligand discovery database
Free database of commercially-available compounds for virtual screening.
- MoleculeNet
Benchmark for molecular machine learning.
- Ames Mutagenicity dataset
Dataset for predicting mutagenicity.
- ADCdb
Database for antibody-drug conjugates.
Pathway
3_ENTRIES- PathwayCommons
Database of Pathways and Interactions.
- KEGG PATHWAY
Collection fo drawn pathway maps.
- WikiPathways
Database of biological pathways.
Mass Spectra
2_ENTRIES- MassBank
Open souce databases and tools for mass spectrometry reference spectra.
- MoNA MassBank of North America
Meta database of metabolite mass spectra, metadata and associated compounds.
Protein
7_ENTRIES- THE HUMAN PROTEIN ATLAS
One of the biggest human protein database contained cells, tissues, and organs.
- RCSB Protein Data Bank (PDB)
Repository of 3D structural data of large biological molecules.
- UniProt
The collection of functional information on proteins.
- AlphaFold Protein Structure Database
Database of 3D protein structures.
- Critical Assessment of Structure Prediction (CASP)
Experiment for advancing the methods of predicting protein structure from sequence.
- Uniclust
Collection of clustered protein sequence databases.
- CATH database
Hierarchical classification of protein domain structures.
Genome
10_ENTRIES- Human Genome Resources at NCBI
Database of image, proteomics, transcriptomics and systems biology.
- GenBank
Database of genetic sequence offered by NCBI.
- UCSC Genome Browser
Genome blowser offered by UCSC.
- cBioPortal
Database of Cancer Genomics. This has overall metaview for a lot of patients.
- 10x Genomics Dataset
Collection of single-cell datasets.
- The Genotype-Tissue Expression (GTEx)
Resource for studying human gene expression and regulation.
- Dependency Map (DepMap)
Genome-wide CRISPR-Cas9 screens in cancer cell lines.
- Catalogue Of Somatic Mutations In Cancer (COSMIC)
Comprehensive resource for exploring somatic mutations in human cancers.
- MGnify
Free resource for archiving, analysis, and browsing of metagenomic and metatranscriptomic data.
- JASPAR
Open-access database of curated, non-redundant transcription factor binding profiles.
Disease
2_ENTRIESInteraction
2_ENTRIES-
Drug (-Cell line) ResponseNCI60 A database which focus on 60 cancer cell lines with many drugs.Genomics of Drug Sensitivity in Cancer (GDSC) - A database of drug sensitibity which has 1000 human cancer cell lines and 100s compounds.Cancer Cell Line Encyclopedia - A database of cancer cell lines. This has 1000 cell lines.CellMiner Cross Database (CellMinerCDB) - Integration of multiple cancer cell line databases.
-
Chemical Protein InteractionSTITCH - A database of Chemical Protein Interaction.BindingDB - A database of compounds and targes.PDBBind - Database of experimentally measured binding affinity data for biomolecular complexes.CrossDocked2020 - Large-scale dataset fβ¦
- Drug Gene Interaction[DGIdb](https://...
Drug Gene InteractionDGIdb - A database of drug-gene interactions and the druggable genome.Comparative Toxicogenomics Database - A database of Chemical-gene interactions, Chemical-disease associations, Gene-disease associations, and Chemical-phenotype associations.SNAP - A dataset which contains Drug-gene interβ¦
- Knowledge Graph[Drug Mechanism Databa...
Knowledge GraphDrug Mechanism Database (DrugMechDB): database of the mechanism of action from a drug to a disease.DRKG - A library for biological knowledge graph.
Clinical Trial
4_ENTRIES- ClinicalTrials.gov
Database of privately and publicly funded clinical studies.
- ICD10
International Classification of Diseases, 10th revision.
- EU Drug Regulating Authorities Clinical Trials DB (EudraCT)
European database of clinical trials.
- MIMIC-IV
Freely accessible critical care database.
API
1_ENTRIES- PubMed esearch
API for searching articles in PubMed.
Preprocess
7_ENTRIES- Chemistry Development Kit
A software of cheminformatics and Machine Learning.
- FlashDeconv
High-performance spatial transcriptomics deconvolution. Processes 1M spots in ~3 minutes.
- RDKit
A software of cheminformatics and Machine Learning.
- ChatSpatial
MCP server enabling spatial transcriptomics analysis via natural language.
- Scanpy
scRNA analysis library in Python.
- Seurat
scRNA analysis library in R.
- Squidpy
Spatial single cell analysis library in Python.
Drug Response Prediction
5_ENTRIESDrug Repurposing
1_ENTRIES- DeepPurpose
A DL Library for Drug Repurposing.
Drug Target Interaction
1_ENTRIES- NeoDTI
A library for Drug Target Interaction.
Compound Protein Interaction
2_ENTRIES- MCPINN
A library for drug discovery using Compound Protein Interaction and Machine Learning.
- TransformerCPI
A library for Compound Protein Interaction prediction using Transformer.
Pre-trained embedding
2_ENTRIES- Evolutionary Scale Modeling
a library for protein embeddings.
- ChemBERTa-2
a library for chemical embeddingg and prediction.
LLM for biology
5_ENTRIES- AI4Chem/ChemLLM-7B-Chat
LLM for chemical and molecule science
- BioGPT
LLM for Biomedical text generation
- GeneGPT
LLM for biomedical information with several API.
- GenePT
foundation LLM for single cell data
- scPRINT
scPRINT is pretrained on 50M cells to denoise and perform zero imputation of any single cell RNAseq profile.
Foundation models
3_ENTRIES- scFoundation
A large-scale pretrained foundation model for single-cell gene expression data, enabling multiple downstream analysis tasks.
- scGPT
A transformer-based foundation model pretrained on millions of single-cell profiles to support various single-cell analysis tasks.
- BulkFormer
A foundation model pretrained on large-scale bulk RNA-seq data to learn general transcriptomic representations for downstream analysis tasks.