When defining the system, hazards are recognized, requirements are defined to avoid or mitigate those hazards to an appropriate level of confidence, and evidence is provided that the design and then the implementation meet those requirements. Certification is simply sign-off that, within the context of the particular system, appropriate evidence has been provided to justify a claim that the risk (product of the likelihood of some event happening, and the adverse impact if that event occurs) is acceptably low. At most, a set of evidence is provided or developed for a particular product (in your case an AI engine) which will be analyzed in the context of other system elements (for which evidence also needs to be obtained or provided) and the means of assembling those components into a working system. It is the system that will get the certification, not the technologies used to build it. The evidence provided with a particular technology or subsystem might well be reused but it will be analyzed in the context of the requirements for each complete system the technology or subsystem is used in.
This is why some technologies are labeled as "certifiable" rather than "certified". For example, some real-time operating systems (RTOS) have versions that are delivered with a pack of evidence that can be used to support the acceptance of a system they are used in.
Now, where would an AI fit into this? If the AI is to be used to meet requirements related to mitigating or avoiding hazards, it is necessary to provide evidence that they do so appropriately in the context of the total system. If there is a failure of the AI to meet those requirements, it will be necessary for the system as a whole to contain or lessen the effects, so the entire system meets its complete set of requirements.
If the behavior of the AI prevents delivery of sufficient evidence that the system as a whole meets its requirements, then the AI cannot be employed. This is equally true whether it is technically impossible to provide such evidence, or if real-world constraints prevent delivery of that evidence in the context of the system being developed (e.g. constraints on available manpower, time, and other resources affecting the ability to deliver the system and provide evidence it meets its requirements).
Testing on its system is considered a poor means of providing evidence. The primary logic is that testing can only confirm the presence of a deficiency against requirements (if the testing results demonstrate) but cannot provide evidence of the absence of a deficiency. It means that a system passing all its test cases does not provide evidence about anything not tested for. The difficulty is justifying that testing provides sufficient coverage of requirements. This presents the main obstacle to using an AI in a system with safety-related requirements - it is necessary for work at the system level to provide evidence that requirements are met because it will be quite expensive to provide sufficient test-based evidence with the AI.
A strategy that can be used at the system level is partitioning. The interaction of the AI with other subsystems will be significantly constrained. Let us say, the AI will probably not directly interact with actuators that can cause a hazard, but will instead make requests to other subsystems. Then the burden of evidence is placed on how well the other subsystems meet requirements, including the manner they interact with actuators. When it comes to providing that evidence, the other subsystems may check all the data or requests from the AI, and ignore any that would cause an inappropriate actuation (or any other breach of overall system requirements). As a result of this, the AI itself may not meet any safety-related requirements at all - it might simply take information or provide information to other subsystems, and those other subsystems contribute more directly to meeting the overall system requirements. Provided that the developers of an AI probably cannot provide all the needed evidence, it is a fair bet that system developers will try to constrain the effects an AI - if employed - can have on the behavior of the total system.
Another strategy is to limit the learning opportunities for AI. For example, evidence will be provided with each training set - in the context of the AI itself - that the AI behaves predictably. That evidence will need to be provided in total every time the training set is updated, and then the analysis for the system as a whole will need to be redone. That is likely to be a significant undertaking that has a long and expensive process so the AI or its training sets will probably not be updated at a particularly high rate.