Do llamas dream of electric sheep? Or are they just too scared to sleep?

If the reason you are not already looking at how machine learning and artificial intelligence can benefit your organization is because your parole conditions require that you never touch a computing device again, then this article may be a little late to be of use.

But if you currently have no experience of advanced data crime here are some pointers that may just help you to stay one step ahead of replicants, or their LEA colleagues who always seem closer.

One thing you will have noticed is that data is expensive.  It is expensive to acquire, it is expensive to process, it is certainly expensive to use, it is expensive to maintain, and it is expensive to get wrong.  All of which makes properly acquired, processed, and usable data an extremely valuable resource for business, and an equally attractive object to steal for those who are cyber criminally inclined.

As you build out your machine learning capabilities and start to master the art of adaptive algorithms and data mining or begin sourcing the appropriate data to teach your growing large language model to deliver incredibly profitable insights, you will have to invest significant time and effort. 

As business risks go, the investment in tools and models are attractive targets on their own, but the real data value at risk may be hidden far deeper than you realise.

If your data model is designed to identify a trend, or enable a decision that leads to greater profit, have you recognized the risk of that model being stolen and used by a competitor?  Or perhaps more subtly any trend identification, or decision-making being subtly altered by irregular data ingestion?   Could your design and operation process detect use by an unauthorised party, or a functional alteration to the model that would represent a risk to your business or your investment?

Another aspect to consider is the capital invested in a custom model, potentially collection and validation of many Petabytes of data, and the cost of thousands of rented Teraflops of computing time? Regardless of how much future business benefit the use of that model might provide, when complete the model will easily represent several million dollars of sunk cost even before it is used. It would also very likely fit onto a large thumb drive…

Could your model simply be stolen by an adversary and used to deliver value for an unknown (criminal) third party?  How have you prevented this from happening?  You did build in run-time security into your model, didn’t you?  So nobody but you can benefit from your artificial intelligence?

If your model is being used in a real-time cybersecurity analytics and detection role, or in dangerous plant safety management, did your design include ways to prevent incident or event masking rules from being added outside of a normal change process? Could the very cyber criminals you are trying to keep out use your data model to hide? Or worse, to actually deliver an attack?

Or perhaps your particular idea to take over the world, or at least make your employer much richer, is to develop a fintech model that will improve your returns by an extra 0.5% on every trade it supervises. Does the development of this model need to be a secret itself?  Or can anyone be allowed to use it?  How do you plan to keep it up to date?

And of course if you have chosen to use a shared public large language model, how do you ensure that none of your trade secrets get included in other users’ results, or more importantly, that your decision making is not prejudiced by intentionally false data being either used to build the model, or used to incrementally tune the part you have licensed? Can your model be used to harm your customers, through deliberately incorrect advice, or dangerous information?

While we are talking about imaginary data, do data hallucinations pose any potential risks in your use of the data model you are building? Can unexpected data processing artefacts be dangerous in your particular application?  How are you maintaining the provenance of all data that you learn from?  And can you prove the processing lineage of every decision or recommendation that your model delivers? 

This last item is certainly enough to keep your data scientists awake at night, perhaps even more nightmarish is that conventional system and network based cybersecurity technology can’t easily detect these data attack types unless you build defensive triggers for them at the data engineering level, and that should be enough to keep your llamas awake too!

None of the above threats may be of any concern to your planned use of machine learning and large language models, but if you have any expectation that generative pre-trained transformers might play any part in your future business success please include these in discussions with your data analytics and data sciences team before you invest the millions that you will inevitably need.

As any correctly prompted query to GPT 4 will reveal, it is extremely difficult to include the cyber risk removing security by design components that your business expects you to have used into a data project after it has already been finished. 

And don’t forget, although everything we are talking about has been available for many years, the commercial use of massive AI currently represents the bleeding edge of advanced IT hardware and software engineering.  What could possibly go wrong?  You already know that even if it works first time, it is going to take so much longer, and cost so much more than you expected, so …… Keep your code and your data safe out there!

*Title based on “Do Androids Dream of Electric Sheep” by Philp K. Dick – which was the inspiration for the movie “Blade Runner” which curiously, although a book all about artificial intelligence in the future, contained absolutely no references to GPT or llamas.

 Advisory: Blue Battle Llamas are currently known not to be sentient and unlike Philip K. Dick’s replicants are not normally equipped with lethal defensive capabilities. But why take the chance. Be nice to your llama. 

Scroll to Top