Techno Blender
Digitally Yours.

10 Exciting Project Ideas Using Large Language Models (LLMs) for Your Portfolio | by Leonie Monigatti | May, 2023

0 86


Here are the rough steps you would follow to realize this project:

  1. Download the video or podcast transcript and load into documents
  2. Split long documents into chunks
  3. Summarize the transcript with an LLM
  4. Optional: Wrap it all in a user-friendly command line interface or even a web application.

Project Idea 4: Information Extraction

Another useful use case of LLMs is information extraction. For example, you can provide an LLM with a few examples that contain text and the information you want it to extract.

Rember the cover letter generator from earlier? You could extend it with a component to extract the relevant information from a job posting directly:

prompt = """
This program will extract relevant information from a job posting.
Here are some examples:

Job posting:
Lead engineer for software integration (remote possible)

At XYZ Co. we are making the world a better place.
To do so we are looking for a lead engineer with experience in Python and JIRA.

Extracted Text:
Role: Lead engieer for software integration.
Company: XYZ Co.
Requirements: Python, JIRA
--
Job posting:
Senior software engineer - Autonomous Mobility

ABC Inc. is a great company.
We are looking for someone with great ability to write complex C code.

Extracted Text:
"""

Here are the rough steps you would follow to realize this project:

  1. Load job description from job posting into a document
  2. Extract the relevant information with the LLM by prompt engineering a prompt using examples

Project Idea 5: Web Scraper

LLMs are exceptional at rewriting (transforming) texts, such as

  • rewriting text in a specific style (e.g., the style of “The Economist” or “New Yorker”)
  • rewriting text in a specific reading level (e.g., level grade 6 for easier readability)
  • reformatting information from any format to any other format
  • text correction (e.g., spelling and grammar)
  • translations

It is very common to use LLMs to convert text from one form to another.

A creative idea to apply this rewriting capability is to use it for web scraping. If you have ever written a web scraper, you know how tedious it is. What if you could use LLMs to build a more generic solution to extract data from unstructured websites?

This is exactly what mangotree has done:

Here are the rough steps you would follow to realize this project:

  1. Scrape the website’s source code and load into a document
  2. Split long documents into chunks
  3. Extract the relevant data from the source code using the LLM (see extraction)
  4. Reformat the extracted data into the desired format with the LLM by prompt engineering a prompt using examples

The project ideas so far were based on the idea of generating new text. But another use case of LLMs is based on the idea of text representations. You can input the text to an embeddings model and extract the numerical representation of this text — the “text embeddings”.

These text embeddings enable you to perform mathematical operations, including similarity calculations, or apply Machine Learning algorithms.

In this section, we will discuss some project ideas based on use cases related to them:

  1. Search and similarity: searchable database of your documents
  2. Question answering: question answering over documents or code base
  3. Clustering: clustering social media posts and podcast episodes into topics
  4. Classification: classify business inquiries from e-mails

Project Idea 6: Searchable Database of Your Documents

Embeddings can help us search for content based on similarity. In contrast to keyword-based search engines, we can calculate the similarity of a document’s embeddings to the embeddings of a search query.

For example, you could turn your personal documents into a searchable database:

Another neat project is Andrej Karpathy’s weekend hack that enables you to search for a specific movie:

Here are the rough steps you would follow to realize a project like these:

  1. Load the files into documents
  2. Split long documents into chunks
  3. Generate and store the embeddings from the documents with an embeddings model
  4. Define the index query to retrieve the relevant files

Project Idea 7: Question Answering over Documents

Question answering can be viewed as a combination of search (see search) and summarization (see summarization). It can help work through any document in a more intuitive way.

You can use it to chat with your documents or any code base:

Here are the rough steps you would follow to realize this project:

  1. Load source code into documents
  2. Split long documents into chunks
  3. Generate and store the embeddings from the documents with an embeddings model
  4. Define the index query to retrieve context and prompt the LLM on it

Project Idea 8: Clustering Documents into Topics

Aside from querying documents or information from said documents, you can also use embeddings to put documents into categories by using clustering (unsupervised learning).

For example, you can use clustering to find topics in a podcast episode.

Or you can cluster posts on an online forum into topics.


Here are the rough steps you would follow to realize this project:

  1. Download the video or podcast transcript and load into documents
  2. Split long documents into chunks
  3. Summarize the transcript with an LLM
  4. Optional: Wrap it all in a user-friendly command line interface or even a web application.

Project Idea 4: Information Extraction

Another useful use case of LLMs is information extraction. For example, you can provide an LLM with a few examples that contain text and the information you want it to extract.

Rember the cover letter generator from earlier? You could extend it with a component to extract the relevant information from a job posting directly:

prompt = """
This program will extract relevant information from a job posting.
Here are some examples:

Job posting:
Lead engineer for software integration (remote possible)

At XYZ Co. we are making the world a better place.
To do so we are looking for a lead engineer with experience in Python and JIRA.

Extracted Text:
Role: Lead engieer for software integration.
Company: XYZ Co.
Requirements: Python, JIRA
--
Job posting:
Senior software engineer - Autonomous Mobility

ABC Inc. is a great company.
We are looking for someone with great ability to write complex C code.

Extracted Text:
"""

Here are the rough steps you would follow to realize this project:

  1. Load job description from job posting into a document
  2. Extract the relevant information with the LLM by prompt engineering a prompt using examples

Project Idea 5: Web Scraper

LLMs are exceptional at rewriting (transforming) texts, such as

  • rewriting text in a specific style (e.g., the style of “The Economist” or “New Yorker”)
  • rewriting text in a specific reading level (e.g., level grade 6 for easier readability)
  • reformatting information from any format to any other format
  • text correction (e.g., spelling and grammar)
  • translations

It is very common to use LLMs to convert text from one form to another.

A creative idea to apply this rewriting capability is to use it for web scraping. If you have ever written a web scraper, you know how tedious it is. What if you could use LLMs to build a more generic solution to extract data from unstructured websites?

This is exactly what mangotree has done:

Here are the rough steps you would follow to realize this project:

  1. Scrape the website’s source code and load into a document
  2. Split long documents into chunks
  3. Extract the relevant data from the source code using the LLM (see extraction)
  4. Reformat the extracted data into the desired format with the LLM by prompt engineering a prompt using examples

The project ideas so far were based on the idea of generating new text. But another use case of LLMs is based on the idea of text representations. You can input the text to an embeddings model and extract the numerical representation of this text — the “text embeddings”.

These text embeddings enable you to perform mathematical operations, including similarity calculations, or apply Machine Learning algorithms.

In this section, we will discuss some project ideas based on use cases related to them:

  1. Search and similarity: searchable database of your documents
  2. Question answering: question answering over documents or code base
  3. Clustering: clustering social media posts and podcast episodes into topics
  4. Classification: classify business inquiries from e-mails

Project Idea 6: Searchable Database of Your Documents

Embeddings can help us search for content based on similarity. In contrast to keyword-based search engines, we can calculate the similarity of a document’s embeddings to the embeddings of a search query.

For example, you could turn your personal documents into a searchable database:

Another neat project is Andrej Karpathy’s weekend hack that enables you to search for a specific movie:

Here are the rough steps you would follow to realize a project like these:

  1. Load the files into documents
  2. Split long documents into chunks
  3. Generate and store the embeddings from the documents with an embeddings model
  4. Define the index query to retrieve the relevant files

Project Idea 7: Question Answering over Documents

Question answering can be viewed as a combination of search (see search) and summarization (see summarization). It can help work through any document in a more intuitive way.

You can use it to chat with your documents or any code base:

Here are the rough steps you would follow to realize this project:

  1. Load source code into documents
  2. Split long documents into chunks
  3. Generate and store the embeddings from the documents with an embeddings model
  4. Define the index query to retrieve context and prompt the LLM on it

Project Idea 8: Clustering Documents into Topics

Aside from querying documents or information from said documents, you can also use embeddings to put documents into categories by using clustering (unsupervised learning).

For example, you can use clustering to find topics in a podcast episode.

Or you can cluster posts on an online forum into topics.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment