Your Data and Large Language Models

Eric Nosal

June 19, 2023

‍

In our previous blog post about The Road to Large Language Models (LLMs), we mentioned that we would explore LLMs further and provide an in-depth example of how we can implement them to solve a real business problem. We're back to dive deeper into practical usage of LLM's, and the sensitivity, privacy, and governance of the data within them.

Past Language Models

Natural Language Processing (NLP) often involves a variety of common algorithms such as N-gram models which groups words together to predict the next set of words; transformers which utilize intricate encoder-decoder mechanisms to understand speech sequences, and neural networks/deep learning that simulate brain-like functions capable of learning trends and patterns. While each of these models have their own specific purpose and advantages, getting a working solution to run is very onerous and requires a considerable amount of expertise in the field of NLP. Of these algorithms, neural networks are one of particular interest because of its crucial role in the construction of the deep learning used to train models capable of NLP. This is precisely where ChatGPT and Bard have expended a substantial effort into refining. Using huge training datasets and deep learning neural networks, these models were able to learn patterns and relationships in text data. Essentially, ChatGPT takes your prompt and generates a response that it analyzed to be the best answer.

A screenshot of a computerDescription automatically generated with low confidence

Present Large Language Models

With the emergence of open AI and other open source LLM tools, it is now possible to create interactive chatbot type applications with only a few lines of code. This is a major contributor to the current disruption in the industry. For example, let’s say that your company has oil production information (dataset courtesy of luciodias: https://www.kaggle.com/datasets/luciodias/brazil-oil-production) stored in a relational database. An individual without any prior knowledge of structured query languages can pull insight from the data using simple English queries.

Output: 'The top oil producing basin is Campos with 2121923845.8102937 m³ of oil.'

With just a small bit of code, citizen development on complex datasets is now easily accessible.

‍

Sensitivity of Data and Data Governance

Data can contain sensitive personal identification information, trade secrets, and corporate information that cannot be made public. Misuse of public models like ChatGPT means that sensitive information is being uploaded to the public web. With the emergence of open-source packages such as ‘LLaMA’ from Meta and ‘Dolly’ from Databricks, it has now become possible to run a private LLM processing endpoint. These custom models can then be fine-tuned and tailored to specific business requirements – making the models much more reliable, precise, and efficient.

Is ChatGPT Ready for Enterprise Knowledge and Analytics?

LLMs are changing the way we look at data. It is important to remember that it is merely a tool in the large toolbox of Advanced Analytics, Machine Learning and Artificial intelligence; it won’t necessarily be the magic bullet that solves all problems. It’s unlikely for LLMs to replace existing models and implementations; but this doesn’t mean to skip implementation altogether. Adoption is a learning process, and each journey is unique, so it’s recommended to explore options early – it can provide valuable insight and shortcomings about current data strategies.

Natural Language Processing has progressed very rapidly with the advent of large language models like ChatGPT and Bard. While these models have tremendous potential to solve unique problems and help customers in various industries, it is crucial to be transparent about their applications and the ethics of their use. At Arcurve, we continue to invest time and effort into these emerging technologies to better understand how they can be leveraged to add value to our client's businesses and to be prepared for the potential risks of an unproven innovation. We understand the importance of practical judgment, robustness, and error handling, especially in sensitive areas like IP and ownership. We strive to remove the marketing veneer and help our clients understand the practical value as well as the risks and how to manage both.

‍