10 Exciting Project Ideas Using Large Language Models (LLMs) for Your Portfolio | by Leonie Monigatti | May, 2023
Here are the rough steps you would follow to realize this project:
- Download the video or podcast transcript and load into documents
- Split long documents into chunks
- Summarize the transcript with an LLM
- Optional: Wrap it all in a user-friendly command line interface or even a web application.
Project Idea 4: Information Extraction
Another useful use case of LLMs is information extraction. For example, you can provide an LLM with a few examples that contain text and the information you want it to extract.
Rember the cover letter generator from earlier? You could extend it with a component to extract the relevant information from a job posting directly:
prompt = """
This program will extract relevant information from a job posting.
Here are some examples:Job posting:
Lead engineer for software integration (remote possible)
At XYZ Co. we are making the world a better place.
To do so we are looking for a lead engineer with experience in Python and JIRA.
Extracted Text:
Role: Lead engieer for software integration.
Company: XYZ Co.
Requirements: Python, JIRA
--
Job posting:
Senior software engineer - Autonomous Mobility
ABC Inc. is a great company.
We are looking for someone with great ability to write complex C code.
Extracted Text:
"""
Here are the rough steps you would follow to realize this project:
- Load job description from job posting into a document
- Extract the relevant information with the LLM by prompt engineering a prompt using examples
Project Idea 5: Web Scraper
LLMs are exceptional at rewriting (transforming) texts, such as
- rewriting text in a specific style (e.g., the style of “The Economist” or “New Yorker”)
- rewriting text in a specific reading level (e.g., level grade 6 for easier readability)
- reformatting information from any format to any other format
- text correction (e.g., spelling and grammar)
- translations
It is very common to use LLMs to convert text from one form to another.
A creative idea to apply this rewriting capability is to use it for web scraping. If you have ever written a web scraper, you know how tedious it is. What if you could use LLMs to build a more generic solution to extract data from unstructured websites?
This is exactly what mangotree has done:
Here are the rough steps you would follow to realize this project:
- Scrape the website’s source code and load into a document
- Split long documents into chunks
- Extract the relevant data from the source code using the LLM (see extraction)
- Reformat the extracted data into the desired format with the LLM by prompt engineering a prompt using examples
The project ideas so far were based on the idea of generating new text. But another use case of LLMs is based on the idea of text representations. You can input the text to an embeddings model and extract the numerical representation of this text — the “text embeddings”.
These text embeddings enable you to perform mathematical operations, including similarity calculations, or apply Machine Learning algorithms.
In this section, we will discuss some project ideas based on use cases related to them:
- Search and similarity: searchable database of your documents
- Question answering: question answering over documents or code base
- Clustering: clustering social media posts and podcast episodes into topics
- Classification: classify business inquiries from e-mails
Project Idea 6: Searchable Database of Your Documents
Embeddings can help us search for content based on similarity. In contrast to keyword-based search engines, we can calculate the similarity of a document’s embeddings to the embeddings of a search query.
For example, you could turn your personal documents into a searchable database:
Another neat project is Andrej Karpathy’s weekend hack that enables you to search for a specific movie:
Here are the rough steps you would follow to realize a project like these:
- Load the files into documents
- Split long documents into chunks
- Generate and store the embeddings from the documents with an embeddings model
- Define the index query to retrieve the relevant files
Project Idea 7: Question Answering over Documents
Question answering can be viewed as a combination of search (see search) and summarization (see summarization). It can help work through any document in a more intuitive way.
You can use it to chat with your documents or any code base:
Here are the rough steps you would follow to realize this project:
- Load source code into documents
- Split long documents into chunks
- Generate and store the embeddings from the documents with an embeddings model
- Define the index query to retrieve context and prompt the LLM on it
Project Idea 8: Clustering Documents into Topics
Aside from querying documents or information from said documents, you can also use embeddings to put documents into categories by using clustering (unsupervised learning).
For example, you can use clustering to find topics in a podcast episode.
Here are the rough steps you would follow to realize this project:
- Download the video or podcast transcript and load into documents
- Split long documents into chunks
- Summarize the transcript with an LLM
- Optional: Wrap it all in a user-friendly command line interface or even a web application.
Project Idea 4: Information Extraction
Another useful use case of LLMs is information extraction. For example, you can provide an LLM with a few examples that contain text and the information you want it to extract.
Rember the cover letter generator from earlier? You could extend it with a component to extract the relevant information from a job posting directly:
prompt = """
This program will extract relevant information from a job posting.
Here are some examples:Job posting:
Lead engineer for software integration (remote possible)
At XYZ Co. we are making the world a better place.
To do so we are looking for a lead engineer with experience in Python and JIRA.
Extracted Text:
Role: Lead engieer for software integration.
Company: XYZ Co.
Requirements: Python, JIRA
--
Job posting:
Senior software engineer - Autonomous Mobility
ABC Inc. is a great company.
We are looking for someone with great ability to write complex C code.
Extracted Text:
"""
Here are the rough steps you would follow to realize this project:
- Load job description from job posting into a document
- Extract the relevant information with the LLM by prompt engineering a prompt using examples
Project Idea 5: Web Scraper
LLMs are exceptional at rewriting (transforming) texts, such as
- rewriting text in a specific style (e.g., the style of “The Economist” or “New Yorker”)
- rewriting text in a specific reading level (e.g., level grade 6 for easier readability)
- reformatting information from any format to any other format
- text correction (e.g., spelling and grammar)
- translations
It is very common to use LLMs to convert text from one form to another.
A creative idea to apply this rewriting capability is to use it for web scraping. If you have ever written a web scraper, you know how tedious it is. What if you could use LLMs to build a more generic solution to extract data from unstructured websites?
This is exactly what mangotree has done:
Here are the rough steps you would follow to realize this project:
- Scrape the website’s source code and load into a document
- Split long documents into chunks
- Extract the relevant data from the source code using the LLM (see extraction)
- Reformat the extracted data into the desired format with the LLM by prompt engineering a prompt using examples
The project ideas so far were based on the idea of generating new text. But another use case of LLMs is based on the idea of text representations. You can input the text to an embeddings model and extract the numerical representation of this text — the “text embeddings”.
These text embeddings enable you to perform mathematical operations, including similarity calculations, or apply Machine Learning algorithms.
In this section, we will discuss some project ideas based on use cases related to them:
- Search and similarity: searchable database of your documents
- Question answering: question answering over documents or code base
- Clustering: clustering social media posts and podcast episodes into topics
- Classification: classify business inquiries from e-mails
Project Idea 6: Searchable Database of Your Documents
Embeddings can help us search for content based on similarity. In contrast to keyword-based search engines, we can calculate the similarity of a document’s embeddings to the embeddings of a search query.
For example, you could turn your personal documents into a searchable database:
Another neat project is Andrej Karpathy’s weekend hack that enables you to search for a specific movie:
Here are the rough steps you would follow to realize a project like these:
- Load the files into documents
- Split long documents into chunks
- Generate and store the embeddings from the documents with an embeddings model
- Define the index query to retrieve the relevant files
Project Idea 7: Question Answering over Documents
Question answering can be viewed as a combination of search (see search) and summarization (see summarization). It can help work through any document in a more intuitive way.
You can use it to chat with your documents or any code base:
Here are the rough steps you would follow to realize this project:
- Load source code into documents
- Split long documents into chunks
- Generate and store the embeddings from the documents with an embeddings model
- Define the index query to retrieve context and prompt the LLM on it
Project Idea 8: Clustering Documents into Topics
Aside from querying documents or information from said documents, you can also use embeddings to put documents into categories by using clustering (unsupervised learning).
For example, you can use clustering to find topics in a podcast episode.