OpenAI has some great work that has come out in the last 12 months including ImageGPT, CLIP, and DALL-E. In my space in digital forensics I want to discuss some theoretical applications of these works.
ImageGPT is an exciting development where incomplete images are able to be extrapolated to something that is similar to what the original image should be.
Some of the applications I can see this tool being used for include recovering corrupted media from a hard drive, deleted or destroyed, and also reading number plates that are obscured due to inadequate recording, incorrect file compression settings or just having non-ideal recording conditions.
Now, there are obvious evidentiary reasons why this might be a bad idea. Indeed, I don’t see a forensic use for these applications but instead see this as a investigatory or intelligence tool to enable other evidence to be found. I don’t, for one second, believe anything recovered by such a tool would be able to be relied on for forensic purposes however, I don’t see why such a tool shouldn’t be able to be used to find evidence of something that can be.
Consider, we used ImageGPT to recover an image of child exploitation material on someone’s hard drive. It would not be sufficient to use this as evidence to lay charges but, we might consider that this is sufficient to allow further investigation to see if there is any non-generated evidence of an offence.
Reconstructing number plates are a much more suited use of this technology from poorly compressed (unreadable to the human eye) media files. This is because we can use the plate number to cross reference against stored databases to confirm the output of the GAN. From an intelligence perspective this tool would be very useful to an investigation.
Another open area that I work within is OSINT. CLIP is exciting when I look at several of the problems through this lens. While image recognition and classification is not a new application, CLIP is showing significant advances on the types of problems I can see my research area benefiting.
One example given in the paper above is the ability to find someone in surveillance footage just by typing in their name. Another is the ability to find shoplifters through the use of CCTV.
The use case I am most excited about and want to explore is the automatic classification of media taken from social media. See my upcoming paper at DFRWS EU 2021. I’d love to be able to use this algorithm to search through an indexed version of footage obtained from a social media site using SQL like queries. Can you imagine a crime committed on a Friday night and then using a search string we can pull all the information from a set area? I see this tool being able to do that. While I understand it’s very 1984, I also understand that the benefit far outweighs the social issues that the above paper only touches on.
It’s no secret that one of the areas I work in is the prevention, detection and elimination of child exploitation material.
In 2016, Reed et al made the following opening statement to their paper Generative Adversarial Text to Image Synthesis: “Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal.” . In 2021, realistic images are now capable of being generated with a simple string of text.
From the perspective of my research area, the creation of realistic images from a string of text represents a significant emerging challenge for multiple jurisdictions in which child exploitation laws have not kept pace with technology. The creation of artificial child exploitation images that are both life like and involve no real people, represents a significant challenge for jurisdictions that would see such items protected under free speech or similar legislation.
Working in the opposite method as CLIP, DALL-E creates an image from a string of text. While the examples given are benign (interior design, country flags, food and buildings) the power of this tool to generate other media is not lost. I have significant concerns for the future abilities of tools such as DALL-E being used to generate images of child exploitation or other depraved activities.
The images that are generated by CLIP/DALL-E are not very useful at present as they are clearly manipulated. However, that doesn’t mean they won’t be better in the future. Indeed, given the current progress I would not be surprised if we have realistic images within 5 years. It would then take another 5-10 before we would see commercialisation and widespread adoption of this technology.
A big reason why this technology is not being used for forensic purposes yet, is because I don’t believe anyone is doing the research into this emerging technology. Many consider the technology to be akin to black magic and thus, have no forensic validity. Other’s are focusing on abstract use cases such as deep fakes.
If you’re working in this area and putting AIML/ deep learning/ GAN/CNN etc to use in forensic capacity let me know.