Not all AI solutions are created equal!
20th April 2026 – Written by Zahid Bilgrami
Ask for: The names/versions of the foundation language model(s) used, and any other third-party technology in the stack. Where does this technology reside? Try and work out what the AI solution provider is actually paying for and what is their cost model.
Red flags: No plan if the upstream model provider or cloud provider changes price, inability to fix a price regardless of scale.
Green flags: Ability to provide a scalable solution for a fixed price.
Ask for: A concrete list of what is unique in the service and cannot be replicated by anyone else.
Red flags: The UI or idea is the only asset; everything else is off-the-shelf. Overly reliant on third parties. Solution searching internet for required information.
Green flags: Proprietary non-replicable data sets. Proprietary integrations with third parties.
Ask for: Will the software provider indemnify us if they infringe IP of any of the components used?
Red flags: “We can’t indemnify because we use Open-Source/Upstream,” or shifting IP/usage risk to you.
Green flags: Indemnities in place because of properly licenced models and/or where the supplier has their own IP in place.
Ask for: A Data Processing Agreement that names all sub processors (including the model provider) and explicitly disables training/retention by default (“zero-retention/no-training” mode).
Red flags: “By default the model provider may use data to improve services,” or no DPA/sub processor list. Lack of clarity on data IP.
Ask for: Country for storage and processing; list of sub processors and cross-border transfers; encryption details.
Red flags: “It depends” with no documented locations; cross-border transfers with no security or contractual controls.
Green flags: Clarity on storage and processing. Guarantees on geographic location. Encrypted data on transfer.
Ask for: ISO 27001 status, recent third-party pen test summary, encryption in transit/at rest, secrets handling.
Red flags: “We’re pursuing ISO/SOC 2,” no independent testing, non-corporate email/Slack for operations, or used for transferring data (including PII data).
Green flags: ISO 27001 accreditation. Documentation on third-party pen testing.
Ask for: Supported webhooks/APIs, error handling, security protocols and monitoring.
Red flags: CSV uploads/emails as the only option; manual mappings with no ownership of supporting integration in the future.
Green flags: Published APIs with a data dictionary and security protocols. Controls and protocols around breaking changes.
Ask for: If I have the same data input multiple times into the solution, will the output be identical on every occasion? Will there be any differences?
Red flags: Unable to guarantee consistency. Unable to explain how consistency is achieved.
Green flags: Clear explanation as to how consistency is achieved, for example, using a ‘small language model’, where any randomness is seeded.
Note: Typically ‘Large Language Models’ cannot easily replicate consistent output because of the very nature of the underlying technology. Try keying in the same prompt into Chat GPT in two diAerent chats (ie. not in the same
context) – and see the diAerence.
Ask for: Clear unit economics (per page/per token/per call), pass-through fees and markups, minimums, overage pricing, cost controls (caps/budgets), and fixed-price options.
Red flags: “It depends on tokens,” no way to cap spend, surprise add-ons, inability to fix price regardless of use.
Ask for: Proven concurrency limits, latency by workload, upstream rate-limit ceilings, results from load tests at your expected volumes.
Red flags: “No one’s asked us to scale yet,” or dependence on a single region/endpoint. “We only have a Proof in Concept at present”
Green flags: Contractual guarantees on concurrent loads and processing speeds. Specification of large language models and of hardware that they are hosted on.
Ask for: A rough split of spend and roadmap; evidence of investment in security/compliance and domain datasets – not just UI or marketing. Number of people involved and their skill sets. How long have they spent developing the solution?
Red flags: Spend concentrated in demoware/ proof in concepts; no budget for validation, compliance, or support. Compressed timelines in creating a solution.
Green flags: Clarity on investment to date and future investment (and source of funds) with a breakdown on spend.
Ask for: Cash runway at current burn, key-person risk plan, escrow or step-in rights, customer data/export guarantees.
Red flags: “We’re raising investment in the future or we pause,” or “We will be OK if we sign you up as a customer”, or no off-boarding plan. Revenue from your business forms a large part of the companies’ business.
Green flags: Demonstrable longevity regardless of whether you become a customer.
Ask for: A written summary of allowed/prohibited uses from integrated or adjacent parties. If using agentic process on third party websites, have they written authorisation to do so? For example, screen scraping from third party websites (e.g. lenders’ extranets).
Red flags: “We didn’t check,” or uses that breach upstream terms (which can cut you off overnight).
Ask for: How often do you update the foundation model used? How do you know that the new model is fit for purpose? Model risk management practices (independent validation, regression packs, approval gates), release notes, rollback plan, environment separation (dev/test/prod).
Red flags: Continuous pushes to prod with no approvals or rollback.
Green flags: Giving notice in advance of any model changes, and giving you the ability to test on the new model before it changes.
Ask for: Red-team results, content/egress filters, input sanitisation, retrieval whitelisting, tool-use restrictions, and automated policy checks on responses.
Red flags: “Users won’t do that,” or reliance on upstream filters only.
Note:
Prompt injection: When hidden instructions are slipped into a user prompt to trick an AI into doing something unintended, like revealing information or ignoring rules.
Data exfiltration: The unauthorised copying or sending of private data out of a system, often through malicious code or tricked AI outputs.
Jailbreak: A method of bypassing an AI’s built-in safety controls to make it generate restricted or unsafe content
Ask for: Retention schedules for inputs, outputs, embeddings, and logs; deletion SLAs; certificates of destruction; customer-controlled retention settings.
Red flags: “We can’t delete per-request,” or pooled logs across customers.