ProtTrans Toward Understanding the Language of Life Through Self-Supervised Learning

Abstract:

Protein sequence data from computational biology and bioinformatics is ideal for NLP Language Models (LMs). Low-inference LMs seek new prediction frontiers. We trained two auto-regressive models (Transformer-XL, XLNet) and four auto-encoder models (BERT, Albert, Electra, T5) on UniRef and BFD data containing up to 393 billion amino acids. Summit supercomputer 5616 GPUs and TPU Pod up to 1024 cores trained protein LMs (pLMs). Dimensionality reduction showed that raw pLM- embeddings from unlabeled data captured some protein sequence biophysical features. We validated the advantage of using embeddings as exclusive input for several subsequent tasks: (1) a per-residue (per-token) prediction of protein secondary structure (3-state accuracy Q3=81%-87%); (2) per-protein (pooling) predictions of protein sub-cellular location (ten-state accuracy Q10=81%) and membrane versus water-soluble (2-state accuracy Q2=91%). For secondary structure, the most informative embeddings (ProtT5) for the first time outperformed the state-of-the-art without MSAs or evolutionary information, bypassing expensive database searches. The results suggested pLMs learned some of the grammar of life.

Note: Please discuss with our team before submitting this abstract to the college. This Abstract or Synopsis varies based on student project requirements.

Did you like this final year project?

To download this project Code with thesis report and project training... Click Here

Ameerpet

We are South India’s largest edu-tech company and Training institute in Hyderabad, India, proudly serving as the creator of a unique and innovative live project training platform for students, engineers, and researchers.

Feature Selection Boosted by Unselected Features

Abstract: Feature selection eliminates unimportant features. Embedded feature selection methods, which learn feature weights during classifier training, have garnered attention recently. Traditional embedded methods only optimize all features combinatorially. They sometimes select weakly relevant features…

Attention in Attention Networks for Person Retrieval

Abstract: This paper generalizes the Attention in Attention (AiA) mechanism in P. Fang et al., 2019 by using explicit mapping in reproducing kernel Hilbert spaces to generate input feature map attention values. Inner and outer…

VARID Viewpoint-Aware Re-IDentification of Vehicle Based on Triplet Loss

Abstract: Vehicle re-identification (Re-ID) research has grown due to intelligent traffic control and monitoring. Vehicle Re-ID is more difficult and unpredictable than person Re-ID because viewpoint variations greatly affect vehicle appearance. Existing studies mostly extract…

Tips for choosing the perfect machine learning project

15 Tips for Choosing the Perfect Machine Learning Project for Your Skill Level If you’re new to machine learning, picking a project can be difficult. With the right approach, you can find projects that match…

Deep Attention-Based Imbalanced Image Classification

Abstract: Real-world image classification problems often have class imbalance, with some classes having abundant data and others not. In this case, classifier representations are biased toward the majority classes and learning proper features is difficult,…

Semantic Interpretation of Top-N Recommendations

Abstract: In various domains and settings, model-based approaches have produced effective recommendation lists. They can accurately recommend items using latent factors. Even if the model embeds content-based information, we lose references to the recommended item’s…

ProtTrans Toward Understanding the Language of Life Through Self-Supervised Learning

Abstract:

Did you like this final year project?

Feature Selection Boosted by Unselected Features

Attention in Attention Networks for Person Retrieval

VARID Viewpoint-Aware Re-IDentification of Vehicle Based on Triplet Loss

Tips for choosing the perfect machine learning project

Deep Attention-Based Imbalanced Image Classification

Semantic Interpretation of Top-N Recommendations

RESOURCES

COMPANY

WORK WITH US

Ameerpet Courses

Ameerpet Trainings

Ameerpet Projects

Abstract:

Did you like this final year project?

You may also like:

RESOURCES

COMPANY

WORK WITH US

Ameerpet Courses

Ameerpet Trainings

Ameerpet Projects