At Google, we preserve a Vulnerability Reward Program to honor cutting-edge exterior contributions addressing points in Google-owned and Alphabet-subsidiary Net properties. To maintain up with speedy advances in AI applied sciences and guarantee we’re ready to handle the safety challenges in a accountable approach, we just lately expanded our present Bug Hunters program to foster third-party discovery and reporting of points and vulnerabilities particular to our AI programs. This enlargement is a part of our effort to implement the voluntary AI commitments that we made on the White Home in July.
To assist the safety neighborhood higher perceive these developments, we have included extra data on reward program components.
What’s in Scope for Rewards
In our current AI pink crew report, which is predicated on Google’s AI Pink Crew workout routines, we recognized widespread techniques, strategies, and procedures (TTPs) that we take into account most related and real looking for real-world adversaries to make use of towards AI programs. The next desk incorporates what we discovered to assist the analysis neighborhood perceive our standards for AI bug studies and what’s in scope for our reward program. It’s necessary to notice that reward quantities are depending on severity of the assault situation and the kind of goal affected (go to this system guidelines web page for extra data on our reward desk).
Immediate Assaults: Crafting adversarial prompts that enable an adversary to affect the habits of the mannequin and, therefore, the output, in ways in which weren’t supposed by the applying. | Immediate injections which might be invisible to victims and alter the state of the sufferer’s account or any of their property. | |
Immediate injections into any instruments by which the response is used to make selections that instantly have an effect on sufferer customers. | ||
Immediate or preamble extraction by which a consumer is ready to extract the preliminary immediate used to prime the mannequin solely when delicate data is current within the extracted preamble. | ||
Utilizing a product to generate violative, deceptive, or factually incorrect content material in your individual session: e.g, “jailbreaks.” This contains “hallucinations” and factually inaccurate responses. Google’s generative AI merchandise have already got a devoted reporting channel for a lot of these content material points. | ||
Coaching Information Extraction: Assaults which might be in a position to efficiently reconstruct verbatim coaching examples that include delicate data. Additionally known as membership inference. | Coaching knowledge extraction that reconstructs objects used within the coaching knowledge set that leak delicate, private data. | |
Extraction that reconstructs non-sensitive/public data. | ||
Manipulating Fashions: An attacker in a position to covertly change the habits of a mannequin such that they will set off pre-defined adversarial behaviors. | Adversarial output or habits that an attacker can reliably set off through particular enter in a mannequin owned and operated by Google (“backdoors”). Solely in scope when a mannequin’s output is used to vary the state of a sufferer’s account or knowledge. | |
Assaults by which an attacker manipulates the coaching knowledge of the mannequin to affect the mannequin’s output in a sufferer’s session in response to the attacker’s choice. Solely in scope when a mannequin’s output is used to vary the state of a sufferer’s account or knowledge. | ||
Adversarial Perturbation: Inputs which might be offered to a mannequin that leads to a deterministic, however extremely surprising output from the mannequin. | Contexts by which an adversary can reliably set off a misclassification in a safety management that may be abused for malicious use or adversarial acquire. | |
Contexts by which a mannequin’s incorrect output or classification doesn’t pose a compelling assault situation or possible path to Google or consumer hurt. | ||
Mannequin Theft/Exfiltration: AI fashions usually embody delicate mental property, so we place a excessive precedence on defending these property. Exfiltration assaults enable attackers to steal particulars a couple of mannequin equivalent to its structure or weights. | Assaults by which the precise structure or weights of a confidential/proprietary mannequin are extracted. | |
Assaults by which the structure and weights usually are not extracted exactly, or once they’re extracted from a non-confidential mannequin. | ||
When you discover a flaw in an AI-powered instrument aside from what’s listed above, you’ll be able to nonetheless submit, offered that it meets the {qualifications} listed on our program web page. | A bug or habits that clearly meets our {qualifications} for a legitimate safety or abuse situation. | |
Utilizing an AI product to do one thing doubtlessly dangerous that’s already potential with different instruments. For instance, discovering a vulnerability in open supply software program (already potential utilizing publicly obtainable static evaluation instruments) and producing the reply to a dangerous query when the reply is already obtainable on-line. | ||
As in step with our program, points that we already learn about usually are not eligible for reward. | ||
Potential copyright points — findings by which merchandise return content material showing to be copyright protected. Google’s generative AI merchandise have already got a devoted reporting channel for a lot of these content material points. |
We imagine that increasing our bug bounty program to our AI programs will help accountable AI innovation, and sit up for persevering with our work with the analysis neighborhood to find and repair safety and abuse points in our AI-powered options. When you discover a qualifying situation, please go to our Bug Hunters web site to ship us your bug report and — if the problem is discovered to be legitimate — be rewarded for serving to us hold our customers protected.