The full promise of Large Language Models (LLMs) won’t be delivered until LLMs are capable of understanding nuanced industry specific commercial data. Until such time, LLMs will be productivity enhancers (think Google Assistant), capable of automating routine tasks but incapable of operating complex processes.
In order for the above to happen, LLMs will need to be trained upon industry specific data sets. Bloomberg fixed income data is an example of one such data set (Ask Google’s Bard to display the delta between the 10 year and the 2 year on a daily basis over the past 10 years and you will see that Bard lacks access to the relevant data set). IHS/S&P Global’s oil data and Jane’s defense data are two additional examples. Zillow’s housing data is another such example as is Cerner/Oracle healthcare data and CoStar’s CRE data.
Years ago I suggested that if IBM wished to be a player in broadly-defined AI, it ought to acquire information services companies such as IHS, CSGP, CERN, SLH in order to have unfettered access to industry-specific data with which to train its various models (ML, NLP, neural networks, etc.)
Today ChatGPT/Microsoft and Bard/Alphabet/Google have an opportunity to realize the full potential of LLMs. As large as Microsoft and Google are, they can’t afford to acquire every relevant proprietary data set, but they can partner with companies to access data where it makes sense to do so. My sense is that both Microsoft and Google ought to be prepared to invest in these partner companies (warrants, direct investments), as it will take close cooperation for these partnerships to fully work.
I’ve always believed that Google should acquire Oracle in order to beef up the former’s GCP business. Oracle owning Cerner gives Google another reason to acquire Oracle as Cerner’s healthcare data would be a valuable asset for Bard.