Unlike my typical blog entries, this blog is a very serious deep-dive into the C2PA provenance solution for photos, video, and other kinds of media. This solution is in the process of being adopted by hundreds of commercial organizations, from newsrooms and human rights observers to camera manufacturers and financial institutions. I explicitly cover multiple security-related issues in the C2PA specification that enable a wide range of fraudulent activities, from the small-time catphishers and online merchant scammers to nation-state propaganda efforts and large-scale financial fraud. Consumers and corporations need to be aware: C2PA does not provide reliable and validated information about a photo’s origins.
Before I begin, I just want to be clear: my butterfly picture is a forgery. This blog entry includes a wide range of pictures that unrelate people across the internet have uploaded to FotoForensics. All of them are as-is and unaltered by me. The only exception is one of two butterfly pictures. The source butterfly picture came from Adobe and has authoritative credentials, while a variation of this picture is a forgery that I created. I want to emphasize this, because when I reach conclusions like My butterfly picture is original or Mine came first, you need to remember: Mine is a forgery.
This very long blog entry (more of a whitepaper than a blog) covers the following sections:
The problem space (The Easy Button)
Current state of the art (In The Wild)
C2PA design goals (Provenance and Authenticity Design Goals)
Four ways to defeat the cryptographic authentication and enable forgeries (Stripped, Untracked Alterations, Obscured by Appended Edits, and Intentional Forgeries)
Existing tools (Tools of the Trade and In-Camera Options)
Network Tracking and Privacy Issues
Consistency and Complexity Issues
Complexity and Vulnerabilities
Cryptography and Certificates
and: Keeping People Honest
The Easy Button
Everyone wants a one-button authenticity solution or simple litmus test that can tell you if a picture is real or fake. The problem is that it’s never a simple situation. What you consider to be an acceptable alteration may be unacceptable in a different context. On top of this, most cameras automatically make alterations when you capture the photo: auto-focus, auto-brightness, limiting extreme colors, auto-sharpening, etc. (And that’s before adding any filters.) Even a direct-from-the-camera original may have some alterations.
To tell if an picture has been intentionally altered, people usually focus on metadata and image artifacts. These can help identify the original source and any alterations. However, this kind of trail is far from standardized. Clues can be hidden in different metadata blocks (EXIF, IPTC, XMP, etc.) or encoded in the raw pixel data.
The Coalition for Content Provenance and Authenticity (C2PA) has been trying to create standardized paper trail. Since the format doesn’t have a separate name, it’s just called the C2PA Specification, or more generally ‘C2PA’. To avoid confusion, I use terms like ‘C2PA specification’,’C2PA metadata’, and ‘C2PA group’ to differentiate between the specs, the stored data format, and the people.
The C2PA group includes corporate support from a lot of big companies: Adobe, Intel, Microsoft, Sony, Canon, Nikon, BBC, etc. As I mentioned back in 2021, I tried to become a C2PA member, but I could not agree to their clause that prevented public disclosure without approval. Since I’m not a member, I am not restricted by their limitations on free speech, but I also have no insight into their decision process; they are a closed-door and opaque group. As an outsider, I only see their final specifications and not how they reached their decisions.
The C2PA group claims to work closely with another group, the Content Authenticity Initiative (CAI). According to these groups, CAI provides context to C2PA and helps promote the C2PA specification. CAI has hundreds of members. My own company, Hacker Factor, has been a member for about two years. As a member, I will tell you:
I have never been asked to provide any feedback. CAI doesn’t send out surveys or questionnaires, and they don’t actively solicit input or requirements. (I should also mention that Adobe is a founder of C2PA and leads CAI. If there is any feedback, then it’s not from the general members. It looks like Adobe is providing requirements to Adobe.)
I receive about a dozen emails from CAI each year. The emails either promote CAI/C2PA or offer invitations to attend livestream panels that talk about the wonders of C2PA (both the group and the specification). I’ve attended some of the presentations. While they do accept questions, they don’t solicit feedback prior to making their development choices.
As it says on their homepage (my bold emphasis), “We are a community of media and tech companies, NGOs, academics, and others working to promote adoption of an open industry standard for content authenticity and provenance.” Other than having my company’s name listed on their membership page, I have never been asked to help promote anything.
In my viewpoint, C2PA is a closed group of powerful companies who are trying to create a standard without feedback from the general user base. They seem to use CAI as a façade for claiming to have widespread support.
In The Wild
The C2PA specification is supposed to provide a formal paper trail that identifies a picture’s origin, attributions, alterations, authenticity, and provenance. This information is cryptographically signed and stored as metadata that is associated with the file. My FotoForensics service rarely sees pictures with the C2PA metadata.
Last year, I only saw a handful of pictures with C2PA metadata and all came from C2PA developers.
This year, I’m seeing pictures with C2PA metadata about a dozen times per week. All either come from Adobe’s Firefly (AI) or Microsoft’s AI image generation system.
A few camera manufacturers have announced in-camera C2PA support. One camera manufacturer has had this deployed for a few years, but I’ve only seen camera test pictures that were provided to me after I politely asked for samples. I have yet to see a real-world example. More manufactures should mean more photos with the metadata in the future. I’m just not seeing it right now.
Perhaps it is a good thing that the C2PA solution isn’t being widely adopted yet since it does not appear to provide any form of authenticity or verifiable provenance.
Provenance and Authenticity Design Goals
The C2PA specification includes the following design goals:
Privacy: Enable content creators (photographers, artists, etc.) to control the privacy of their information, including identity, ‘consumption data’ (resources used to create a picture), and information recorded in provenance.
Responsibility: Ensure consumers can determine the provenance of an asset.
Security: Ensure that consumers can trust the integrity and source of provenance, and ensure the design is reviewed by experts.
Harms and Misuse: Design to avert and mitigate potential harms, including threats to human rights and disproportionate risks to vulnerable groups.
In addition, the specification states that the solution should be verifiable (validated) and tamper-proof (my bold for emphasis):
From the overarching goals section of the guiding principles:
C2PA specifications SHOULD NOT provide value judgments about whether a given set of provenance data is ‘good’ or ‘bad,’ merely whether the assertions included within can be validated as associated with the underlying asset, correctly formed, and free from tampering.
A variation of this was also tweeted (pre-Musk) by Forbes (my bold emphasis):
The Coalition for Content Provenance and Authenticity, which includes Microsoft Arm, Intel TruePic and the BBC among its members, said the standard will allow content creators and editors to create media that can’t be secretly tampered with. https://trib.al/3gWk0c7
I like this part: “can’t be secretly tampered with”. Really? Would you like a demonstration?
In this blog entry, I will show how the current C2PA solution fails all of these design goals.
To begin, let’s focus on the base case: you can’t validate the data if you don’t have the data.
The C2PA information is stored as metadata. Usually it is attached to the file. However, most online services, including Facebook, Instagram, Youtube, X (formerly known as Twitter), TikTok, etc. all typically strip out the original metadata. The reason is that metadata isn’t used for displaying the content. It’s extra and unnecessary bytes. When you’re operating on the scale of Google or Meta, saving one byte per file translates into a huge amount of bandwidth and storage savings. And with C2PA, the amount of metadata can easily double the file size.
On top of this, applications often strip out the C2PA metadata. The simple act of clicking “Save” could remove this metadata.
What I typically see at FotoForensics is either complete or partial stripping of the C2PA metadata. For example, when Adobe released their beta version of their Firefly software, they would tag the image with the “Adobe Firefly” logo in the lower left corner. They also included C2PA metadata. Here are two pictures with the Adobe Firefly logo in the corner (click on each to see the image and metadata at FotoForensics):
The picture of bread contains the complete C2PA record in the “JUMBF” metadata block. If you submit it to the C2PA’s “Content Credentials” web site (link), then you can see that it has a valid certificate that was issued by Adobe.
The shark picture also has the Adobe Firefly watermark in the lower corner, but the file doesn’t have the C2PA metadata. This was stripped out. However, we can be pretty certain that it came from there because the Content Credentials web site has a search-by-image option. With this example, a search identifies the pre-edited shark image:
Being able to search Content Credentials and find the source is the rare exception and not the norm; I got lucky with this shark image. It appears that the artist used Adobe Firefly to add a person and sea turtle. (I can’t tell if the face on the shark came from Firefly or was a post-Firefly edit.) After the picture was altered, something removed the C2PA metadata.
I also see a lot of pictures like this one:
In this case, the metadata includes an XMP field that says “Providence” with the value “self#jumbf=/c2pa/adobe:urn:uuid:a37a6826-acbc-473b-8db6-75ad70459901/c2pa.claim”. The value should be a URL to some remotely hosted metadata (called sidecar metadata) or it can begin with “self#jumbf”, indicating that there must be JUMBF metadata in this file that contains the C2PA metadata. In this picture:
The URL says the file should contain a JUMBF block (self#jumbf), but the JUMBF data is missing. This is partially stripped metadata.
The XMP metadata clearly states that this is an Adobe Stock photo and processed using Adobe software, but a search using Content Credentials finds nothing.
If you search Google for Adobe Stock image #591956858 (the number is found in the metadata), you’ll find the source picture.
The takeaway here is that C2PA provenance information can be trivially removed. Moreover, the presence of partial C2PA metadata does not authenticate the image. (These may seem like obvious scenarios, but they are the most common cases I’ve seen.) While FotoForensics currently receives a dozen images with C2PA metadata per week, about half have partially-removed metadata. I suspect that probably twice that number has fully removed C2PA metadata.
The C2PA specifications define a method for tracking alterations to a picture. In contrast, the purpose of ‘provenance‘ is to identify the origin. (Pedigree refers to the elements that were used to create the final result and would be a better term for what C2PA is trying to do, but they went with provenance.) The amount of detail C2PA tracks is seriously lacking and fails to identify the pedigree, provenance, history, or origin of a picture.
For example, on 16-Oct-2023 FotoForensics received a picture of a labor rally from France:
The FotoForensics error level analysis (ELA) clearly shows the alterations, which helps establish the pedigree and is the first step in identifying the image’s origin.
At minimum, (1) the horse and bicycle along the bottom were added, (2) objects like a dog, tent, and more people were added, and (3) a section of the street appears erased.
The metadata identifies two applications: Adobe Firefly and Adobe Photoshop/25.1.0. However, it doesn’t specify what alterations were made.
This picture includes C2PA metadata. Using Content Credentials, we can see a valid signature and that it was processed on Sep 25, 2023. However, there is no other information about the origin of this picture. Assuming the date is accurate, the date only identifies when the picture was digitally signed (notarized). It does not verify the contents or identify when the picture was created. It was created sometime before that date (could be hours, could be years).
A quick search on TinEye identifies an earlier version of the unedited picture. The image comes from The Guardian, is attributed to “Georges Gobet/AFP/Getty Images”, and traces to May 2018, which is 5 years earlier than the C2PA metadata. (In the ELA, the erased street is where they removed some people who were holding the wrong flags.)
In this example, C2PA failed to identify, record, or validate the origin and alterations. Moreover, the embedded signing date could cause confusion since it does not represent the photo’s creation date.
As another example, FotoForensics often receives pictures of pregnancy tests. (TMI! I think the younger generation often overshares.)
Changing a pregnancy test result is often used for fraud. (“I’m pregnant! Pay up now and I won’t make you pay child support later.” or “I’m not pregnant! And I never want to see you again!”) Alternately, it could be that someone is in denial. With this picture:
FotoForensics ELA shows an alteration in the area where the word “Not” should be located. (The ‘Not’ was removed.) With a deeper dive, I can also tell that the picture was last encoded using Adobe Creative Cloud and then transmitted using WhatsApp.
The picture contains C2PA metadata. According to Content Credentials, it was last known to be modified on September 25, 2023 using Adobe Firefly and Photoshop. (It’s a coincidence that it’s the same date as the previous labor picture. I was looking for sample images and found two examples almost immediately.) Again, the C2PA metadata doesn’t identify what was altered, when it was altered, or the image source.
While the C2PA data does list some of the tools used, it does not identify the provenance, origin, authenticity, or alterations made to the image. The C2PA specification allows the history of sources and alterations to be stored in the metadata. However, this detail is neither essential nor required; it is usually absent.
I’m not just seeing altered photos with authentic C2PA metadata on journalism pictures and pregnancy tests. In the last two months, FotoForensics has received images of altered credit cards, drivers licenses, diplomas, visa passport photos, drug prescriptions, and “defective merchandise” with this “authoritative provenance” information that is neither authoritative nor recording the provenance. People conducting fraud are already using it.
Obscured by Appended Edits
The C2PA specification permits using one picture with C2PA data as a source for a composite image. Ideally, the C2PA metadata should include both the previous history and the newest information. The problem is, the previous metadata may not be verifiable after the subsequent edits are performed.
The Content Credentials web site uses this complex image (home2.91ab8f2d.jpg) as an example of their provenance solution:
The Content Credentials evaluation shows a large number of dependencies, including some that are nested.
Adobe has provided c2patool to help developers manage C2PA content. Use “c2patool –tree home2.91ab8f2d.jpg” to extract the nesting and “c2patool -d home2.91ab8f2d.jpg” to see the detailed validation. These show the internal structure and validated components:
├── Asset:max_DSC7386-16×9.jpg, Manifest:adobe:urn:uuid:aaa9ede7-11e6-4866-b735-67a019bd7f41
| ├── Assertion:c2pa.thumbnail.claim.jpeg
| ├── Assertion:c2pa.thumbnail.ingredient.jpeg
| ├── Assertion:c2pa.ingredient
| ├── Assertion:c2pa.thumbnail.ingredient__1.jpeg
| ├── Assertion:c2pa.ingredient__1
| ├── Assertion:c2pa.thumbnail.ingredient__2.jpeg
| ├── Assertion:c2pa.ingredient__2
| ├── Assertion:stds.schema-org.CreativeWork
| ├── Assertion:c2pa.actions
| ├── Assertion:c2pa.hash.data
| ├── Asset:_DSC7386.jpg, Manifest:adobe:urn:uuid:4fd5a284-6fe9-479c-a8af-9bcbbf851d92
| | ├── Assertion:c2pa.thumbnail.claim.jpeg
| | ├── Assertion:c2pa.thumbnail.ingredient.jpeg
| | ├── Assertion:c2pa.ingredient
| | ├── Assertion:c2pa.thumbnail.ingredient__1.jpeg
| | ├── Assertion:c2pa.ingredient__1
| | ├── Assertion:adobe.dictionary
| | ├── Assertion:stds.schema-org.CreativeWork
| | ├── Assertion:c2pa.actions
| | ├── Assertion:c2pa.hash.data
| | ├── Asset:_DSC7386.ARW
| | └── Asset:_DSC7386.jpg
For this picture, the C2PA metadata contains three manifests (7b57d33b-a055-40e6-b590-07219089bffd, aaa9ede7-11e6-4866-b735-67a019bd7f41, and 4fd5a284-6fe9-479c-a8af-9bcbbf851d92). This means that three separate compositions (each with many image components) were combined to form the final picture. However, only the most recent two (7b57d33b and aa9ede7) are verified by certificates. The third one (4fd5a284) is unverified. We just assume it is legitimate because the 3rd unverified manifest is signed by the verifiable 2nd manifest (aa9ede7), and the second is verified by the first (7b57d33b). Neither c2patool nor Content Credentials points out that one of the manifests is unverifiable.
The C2PA specification says that the new claim should validate the previous claim. However, this validation is not a requirement. (See section Section 13.2.4: Signature Validation.) A forgery could skip the validation step, permitting the alteration of any previous claim contents since nobody will be able to validate them. Validation tools will assume that the altered prior information was validated by the subsequent signatory.
Excluding the cryptographic checksums, any additional information recorded in the C2PA metadata effectively replicates information that is typically found in EXIF and XMP metadata fields. The only difference is that it has a cryptographic signature that should prevent tampering. But does it really stop tampering?
Consider these two butterfly pictures:
The first picture comes from the C2PA’s Content Credentials web page (home1.33703fa3.jpg) and is used as an example for how C2PA authenticates and tracks provenance. The other is a forgery that I created using C2PA’s public tools.
Let’s assume that you don’t know that my version is a forgery.
Bit per bit, pixel by pixel: both pictures render the exact same image. (I didn’t re-encode the JPEG stream to create my forgery.) The only difference is the metadata.
Since the pixels are the same, they have the same ELA result. ELA clearly shows that the picture was altered. At minimum, someone created an artificial depth-of-field. This alteration is not recorded anywhere in the C2PA authenticated provenance information.
Both pictures contain C2PA metadata. If you view the first (source) picture at Content Credentials (link), then you’ll see a valid signature dated “Aug 29, 2023”. It identifies the application as Adobe Firefly 1.0.
The second picture (my forgery, link) also has a valid signature. And it has two source images (because I can add in arbitrary source images to support my claims).
This means that we have two images, both validated and authenticated by C2PA, that have conflicting provenance claims.
Content Credentials does not display all of the available and signed C2PA metadata. My picture includes multiple timestamps dated over a year earlier. (I set them to “2022-02-03T04:05:06+00:00”. See the FotoForensics metadata for the extracted dates.)
With two conflicting claims of ownership, we often rely on Occam’s Razor: the simplest explanation is usually the best one. My picture is older so mine must be the original.
Another evaluation heuristic relies on the amount of metadata. Originals usually have lots of camera information and little or no metadata from applications. In contrast, altered images often remove camera metadata and add in non-camera information.
Adobe’s source picture has no information about the camera. It only lists one of Adobe’s drawing applications. Any camera-specific metadata was removed. The remaining metadata clearly indicates an altered file with no conclusive origin.
My version of the image includes a camera make/model, GPS coordinates, lens settings, multiple source images, and more. (And if you don’t like my metadata, I can always revise my forgery to include additional details.)
Since my picture has more metadata and includes information about the camera, mine must be the original.
My image is signed using my own x509 certificate. I could easily change that to any x509 certificate that I control. This includes using a self-signed certificate that allows me to specify any issuer’s name. For this example, my self-signed certificate was issued by “Hacker Factor”, but I can easily change the name to be “Adobe” or “Leica Camera” or some other trusted company. (Whatever helps sell the forgery.)
The source picture is signed by Adobe, a company associated with digital editing. My image is signed using a trusted company’s name that represents an original photo, so mine must be original.
I mentioned that the Content Credentials web site includes a search engine. A search for this image turns up five known sightings:
I want to emphasize: None of these known sightings were planted by me. All of the known sightings are dated after both the source image (Aug 29, 2023) and my forgery (Feb 3, 2022). Moreover, they all include the “Adobe Firefly” logo in the lower left corner.
Adobe’s Firefly was beta software until September 13, 2023. If Adobe’s authoritative version came from Firefly in August 2023, then it should have the Adobe Firefly Beta watermark. While someone at Adobe probably used a pre-release of the commercial version to generate the image, you can’t tell that from the metadata or C2PA provenance information.
With these Firefly logo versions, ELA identifies that the logo was added after the picture was generated. Since we know that neither Adobe’s source nor my forgery removed the logo, we can safely determine that the version without the logo came first.
If you didn’t know that mine was a forgery, then you would probably conclude that someone started with my butterfly photo and then added in the Adobe Firefly logo later. And given both pictures, Adobe’s version likely came after mine and someone gave it a fake Firefly provenance claim. Why would someone do this? How about to cast doubt on the originality? My forgery is the original. Someone added in Adobe’s Firefly logo to make people think it was AI generated and to discredit me!
Adding to the confusion and doubt, I don’t know why someone at Adobe added the Firefly logo a few weeks after the picture appeared on the C2PA’s Content Credentials web site. The confusion caused by Adobe’s real actions just strengthens the argument that my forgery is the original.
This is far from everything I can do to improve my forgery’s C2PA metadata appearance. But even given my current effort, if you didn’t know that mine was a forgery, you’d have a hard time trying to identify Adobe’s as the authentic version. Remember: I don’t have to create a perfect forgery; I just have to cast doubt on your own picture’s credibility. C2PA’s metadata does nothing to validate the picture’s authenticity or provenance.
Tools of the Trade
All of my alterations in the butterfly forgery were made using open source tools: exiftool and Adobe’s c2patool. Using this approach, I can reassign ownership for any image, make AI-generated pictures appear real, or hide evidence of alterations. These are all good options for propaganda, promoting conspiracies, claiming fake news, and committing fraud. This directly goes against the C2PA stated design goal to prevent harm and misuse. And since C2PA claims to provide “provenance and authenticity” (it’s in the name: C2PA), it gives credibility to forgeries without providing any actual protections.
Unfortunately, even if the C2PA metadata is authentic, there is now always a question about whether the content is real or a forgery. You can’t assume the C2PA metadata is always truthful. (I also pointed this out in 2021, when The New York Times released their CAI/C2PA validation solution. Their own example that contained provably false metadata that was signed as original.)
If you assume that C2PA shows verifiable and validated claims of authenticity and provenance, then you’re going to be in for a rude awakening. It is trivial to create conflicting and false C2PA metadata that appears verifiable, authentic, and valid. Moreover, you may not have another image for comparison. Without external corroborating information, you should assume that the C2PA metadata could be inaccurate.
There’s a belief that, if you can move the digital signing to when the photo is taken, then it can authenticate the picture’s creation. Truepic teamed up with Qualcomm to incorporate C2PA into Snapdragon mobile phones. (As their brochure states, “Truepic photo capture with C2PA standard support lets you snap a photo with Truepic’s cryptographic seal to prove the photo is real and not generative AI.”) The idea, as reported by NBC News, “allows users to take a photo that has a digital signature to prove its provenance, including where and when the photo was taken.”
I covered the basic vulnerability with this solution back in 2021:
Last year , Truepic and Qualcomm announced a partnership for signing the picture as it comes off the camera. Sounds great, right? Well, there’s one big problem. Every digital camera out there has test taps for checking the electronics. Qualcomm offers hardware development kits, such as those available for the Snapdragon 855 and 865. Alternately, you can buy a replacement sensor for testing. Either option can be used to hijack the camera. The software and firmware inside the mobile device will never know that the picture is coming from an alternate source; you can feed it a fake photo for signing by the trusted notary.
A year ago (Oct 2022), Leica and Nikon announced plans for in-camera C2PA support. Last month, Leica announced their first camera with C2PA. However, you must install the latest firmware on a Leica M11 camera.
I don’t own a Leica M11 camera, but I did download the firmware. As I reviewed the binary, I can see where it accesses the private certificate for signing and extracts the public certificate for inclusion in the C2PA metadata. (The certs appear to be stored in a chip in the camera.) If only I knew how to create my own firmware for Leica… then I could call the same crypto-chip and sign any picture I’d like. (Oh wait! There’s a github for hacking Leica M8 firmware! Maybe that could be a good start! From a hacking viewpoint, I bet the M8 is similar to the M11.)
Granted, modifying a camera’s hardware or firmware for signing arbitrary photos is a technical hurdle. However, banking fraud and insurance fraud are multi-billion dollar industries ($88 billion and $308 billion in 2022, respectively). There is plenty of financial incentive to address the technical difficulties and manufacture fake photos that are digitally signed as “provable originals”.
Network Tracking and Privacy Issues
As I mentioned, the C2PA metadata can become very large. In my forged butterfly picture, the C2PA metadata is 596K, while the rest of the JPEG content is 394K. The C2PA specification permits storing the metadata in a separate file (sidecar) that should be accessible via URL. In this case, you only need a small XMP metadata block in the JPEG that identifies the remote location.
This version of my forged butterfly picture uses the remote sidecar:
If you view the metadata, then you’ll see the sidecar’s URL in the XMP block. With this example, the JPEG is 396K (405,068 bytes) rather than 990K with the embedded metadata. This is a 60% reduction in size. While the amount of reduction will definitely vary by picture, the sidecar-enabled picture will always be smaller than a picture with embedded C2PA metadata. This saves the service provider’s bandwidth and associated costs.
Again, with my sidecar version, the Content Credentials web site, c2patool, and other C2PA validation systems all show that my picture has valid signatures. (Because they are valid.)
Behind the scenes: when you checked the C2PA signature, the validation tool saw the sidecar URL, retrieved the C2PA metadata from the web server, and then performed the validation checks. Unfortunately, this is where we start getting into privacy issues:
The URL points to a web server. In this example, it’s my web server. As the person who controls the server, I know every time someone tries to evaluate the picture’s provenance. I know the date, time, IP address, web client, and the specific picture you are interested in. Here are some possible uses for this data:
Advertisers can use this to track interest in a specific picture.
Newsrooms can use this to detect a sudden interest in a news photo. Perhaps they can re-investigate the provenance of a picture or use this as a flag for double-checking the associated text. Maybe they can cut off a controversy before it becomes problematic.
Propagandists can use this to determine how much reach their misinformation has obtained. Since only a small percentage of people will evaluate the metadata, seeing a few people performing validations would equate to a very widespread campaign.
Criminals can use this as a honey token, letting them know that someone is investigating them. It can also be used to gather operational intelligence. (“Hey! That bank I’m trying to scam with that photo of a fake check is trying to validate the C2PA metadata! It took them 3 days to get to this point, and they used this specific IP address and web client.” Or maybe “That person I’m catphishing is looking too closely.”)
Nothing says I have to return a static C2PA metadata file. I’m not doing it now, but it is trivial to generate a custom C2PA metadata reply with each request. This way, if the C2PA metadata ever shows up in some kind of report, I can identify the “who” and “when” related to the investigation. Moreover, because I can set timestamps, you’ll never know if the file was dynamically generated.
Going back to forensics and provenance: the C2PA’s file time on my web server is 2 hours 5 seconds after the photo and metadata claims it was created. An investigator would assume that someone took the photo, uploaded it to the web site a little later, and that it’s been sitting on the web site for over a year. (Typically a safe assumption.) This again shows that my forgery must be the original because the server’s timestamp corroborates with the metadata and predates Adobe’s source picture. Moreover, an investigator is unlikely to think that I manually backdated the timestamp on my server’s file to match the backdated C2PA metadata.
touch -d ‘2022-02-03T06:05:11+00:00’ butterfly-forgery-remote.c2pa
The ability to use a sidecar helps reduce bandwidth for the provider. However, it does nothing to improve the validity or authenticity of the file’s provenance. Moreover, it introduces privacy issues, could be used to alert a criminal about an ongoing investigation, and could mislead an investigation.
Consistency and Complexity Issues
Photo and video file formats typically include a distinct set of data structures. For example, PNG encodes all data in “chunks”. Each chunk can contain image rendering content or supportive information. The supportive data could be in EXIF, XMP, ICC Profile, or other data formats. Similarly, JPEG uses a well-defined data blocking system. Some JPEG blocks are required for rendering (SOI, DHT, DQT, SOS, EOI, etc.) while others store supportive information (APP1, APP2, etc.). As with PNG, the supportive metadata in a JPEG is often encoded in formats like EXIF, XMP, MPF, ICC Profile, etc.
If you want to process the metadata, then you need to first parse the file format. Then, after you find the supportive information, you need to be able to parse that specific data encoding. Fortunately, EXIF is well-defined and easy to parse. So are XMP (it’s XML), ICC Profile, and other metadata formats. Each has their own distinct encoding method, but it’s one encoding method per metadata block.
The same cannot be said for the C2PA specification. Finding the C2PA metadata inside a JPEG or PNG is relatively easy, but it requires looking in two places: JUMBF for raw data and XMP for any pointer to a sidecar. It’s not like EXIF or IPTC or the other formats, where you just need to look in one place.
After finding the C2PA metadata, you need to be able to process a wide range of data structures in order to parse the file format. This includes (at minimum):
JUMBF: The JPEG Universal Metadata Box Format (JUMBF) is an overly complicated way of storing nested field/value pairs of data. It includes both generalized structures and hard-coded values for identifying specific types of data that can be buried in a nested JUMBF ‘box’.
CBOR: The Concise Binary Object Representation (CBOR) is a way of storing binary data that is in nested field/value pairs.
XML: The eXtensible Markup Language (XML) is a text-based way of storing nested field/value pairs of data. (Like JSON, XML has ways to encode binary data.)
You might notice that each of these file structures are used to store nested field/value pairs of data. There is no logical reason why C2PA needs four equivalent ways to encode the data. Ironically, Adobe’s c2patool converts everything to a JSON control file called a manifest. When creating the C2PA metadata, the JSON manifest is converted to a nesting of JUMBF, CBOR, JSON, and XML data. As a specification, one would think that the C2PA group would just choose one structure and standardize on it.
The use of multiple encoding systems adds complexity to the C2PA metadata. However, this complexity neither increases the trustworthiness of the data nor the skill level needed to create a forgery. It appears complicated for the sake of being complicated and not because of any added value.
(I consulted with a few friends who focus legal issues; some lawyers, some hackers. Each noted that this added complexity makes it easier to patent each step of the C2PA encoding process. This way, you can’t implement it without violating someone’s patent. See the section in this blog entry on Patent Issues.)
Complexity and Vulnerabilities
A Software Bill of Materials (SBOM) itemizes all of a program’s dependencies. It acts like an ingredient list for software. The larger the dependency listing, the more potential there is for a system compromise.
Adobe’s c2patool provides a good example of the dependencies and build requirements. Compiling c2patool requires over 300 dependent libraries. (At 15 megs of dependent libraries, it limits the ability for simple IoT devices to implement C2PA.) The sheer number of requirements creates a huge attack surface. If there is a known vulnerability that impacts any of these dependencies, then it may be possible to create a hostile JUMBF record that will be processed by the validating system.
For example, c2patool compiles with OpenSSL 1.1.1w. New versions of OpenSSL come out often and many address new vulnerabilities. Moreover, the 1.1.1 branch of OpenSSL reached end-of-life on September 11, 2023. This means that c2patool is currently linking to unsupported software. (And already there are known bugs for OpenSSL that may impact 1.1.1w.)
Keep in mind: Patching your OS won’t help in this case because c2patool is statically compiled. With over 300 dependencies, there are likely new patches coming out every few days. You should probably update and recompile often, so as to ensure that you have the latest code.
In just the few weeks that I have been playing with c2patool, many of the c2patool dependencies have been updated.
$ cargo update
Updating crates.io index
Updating base64 v0.21.4 -> v0.21.5
Updating cpufeatures v0.2.9 -> v0.2.11
Updating fdeflate v0.3.0 -> v0.3.1
Updating id3 v1.8.0 -> v1.9.0
Updating indexmap v2.0.2 -> v2.1.0
Updating js-sys v0.3.64 -> v0.3.65
Updating libc v0.2.149 -> v0.2.150
Updating num_enum v0.7.0 -> v0.7.1
Updating num_enum_derive v0.7.0 -> v0.7.1
Updating proc-macro-crate v1.3.1 -> v2.0.0
Updating redox_syscall v0.3.5 -> v0.4.1
Adding ring v0.17.5
Updating rustix v0.38.20 -> v0.38.21
Updating rustls v0.21.7 -> v0.21.8
Updating rustls-webpki v0.101.6 -> v0.101.7
Updating sct v0.7.0 -> v0.7.1
Updating serde v1.0.189 -> v1.0.190
Updating serde_derive v1.0.189 -> v1.0.190
Updating serde_json v1.0.107 -> v1.0.108
Updating tempfile v3.8.0 -> v3.8.1
Updating toml_datetime v0.6.3 -> v0.6.5
Updating toml_edit v0.19.15 -> v0.20.7
Adding untrusted v0.9.0
Updating wasm-bindgen v0.2.87 -> v0.2.88
Updating wasm-bindgen-backend v0.2.87 -> v0.2.88
Updating wasm-bindgen-futures v0.4.37 -> v0.4.38
Updating wasm-bindgen-macro v0.2.87 -> v0.2.88
Updating wasm-bindgen-macro-support v0.2.87 -> v0.2.88
Updating wasm-bindgen-shared v0.2.87 -> v0.2.88
Updating web-sys v0.3.64 -> v0.3.65
Updating winnow v0.5.17 -> v0.5.19
I don’t know how many of these updates are security related, but I know there is at least one vulnerability fix. I also don’t know if any of these updates break existing functionality. Unfortunately, c2patool doesn’t include a full end-to-end test suite. Given the complexity of the C2PA specification, a thorough end-to-end test suite may not be feasible.
Keep in mind: The SBOM and attack surface area issues are not limited to c2patool. Any program that implements the C2PA specification must have a large attack surface due to the wide range of required data structures.
Cryptography and Certificates
The C2PA specification uses two types of cryptographic signatures to deter tampering. The first method is a simple checksum. Each claim in the JUMBF structure has a computed value (typically SHA256). The nested CBOR structure lists the expected checksum values. If someone tampers with any of the provenance information, then the checksums won’t match.
Of course, the workaround for the forger is to recompute the checksums and update the values. If you’re already forging the data, then altering a few more metadata bytes is trivial to do. Or you can take the easy route and use Adobe’s c2patool to regenerate all of the checksums.
The stronger cryptographic protection comes from the embedded x509 certificates and associated signature. The signature acts like a notary and the certificate verifies the signature. This prevents the data from being changed after being signed. However, there are limitations:
The signature only records that the data existed at the time of the signing. It does not validate the data, does not authenticate the alleged content owner, does not prevent tampering before signing, and does not prevent a forgery from replacing the entire certificate.
There isn’t just one Certificate Authority (CA) that can issue certificates. For signing, you can use any CA and certificate that meets the signing requirements.
These limitations create some really bad situations.
First, you can have two files, both with valid signatures but from different certificates. So which one is legitimate?
The one that came first? Nope. You might be working on an image while offline and sign it later when you come online. You can’t assume ‘first’ (even by years) makes it more legitimate. The only thing the date tells you is that it existed on that date and was not created later.
The one from the bigger name? Smaller companies have just as much right to copy protection as larger companies. Also, there have been multiple instances of competing Certificate Authorities issuing certs for the same companies. This resulted in fraudulent certificates issued for Google, Microsoft, Yahoo, and others. A big company name does not mean the certificate is trustworthy.
The one associated with the older metadata or most metadata? I already demonstrated that this doesn’t work with my forged butterfly example.
The one that cost more? Ha!
Second, let’s assume you don’t have two pictures. You only have one picture and it is signed. The problem is that you don’t know if this is authoritative or applied to a forgery. While the certificate is authentic and can be traced to a trusted CA server, it doesn’t validate the content or the alleged origin.
Third, as demonstrated with my forged butterfly example, self-signed certificates are permitted. A self-signed certificate can be attributed to anyone. It authenticates the cryptographic signature, but not the signing authority.
In each of these cases, the reliance on x509 without a way to directly validate the content owner’s authenticity means that the certificate is nothing more than a notary. It identifies that the notary signed the data, but not that the data is authoritative or validated. It also does not resolve issues related to valid competing signatures, nested signatures, or authentication.
With most public standards, there is a declaration that (1) the specification and any associated code is available for commercial and non-commercial use, and (2) the technology is unencumbered by patents. That’s not the case with C2PA.
The C2PA specification has a section titled “Patent Policy“. It effectively says that:
“For materials other than source code or datasets developed by the Working Group, each Working Group Participant agrees to make available any of its Essential Claims”. Keep in mind, “make available” does not mean “free to use”.
They reserve the right to license their patents and specifically reference the W3C RF licensing requirements Section 5. However, Section 5 specifies a “Recommendation” for a royalty-free license that “participants are encouraged” to provide. It is not a requirement.
Participants “may exclude Essential Claims from its licensing commitments”. Again, ‘may’ is a suggestion and not a requirement.
As far as I can tell, none of the C2PA participants have agreed to waive, exclude, or provide a royalty-free license for any of their patents that cover this technology. (It’s not documented anywhere that I can find.)
I checked the US Patent and Trademark office search engine and found dozens of patents that appear to cover parts of the C2PA specification. These include:
Adobe: US-11632238-B2 and US-11146381-B2 are both titled “Traceability of edits to digital documents via distributed ledgers”, US-20230102162-A1 “ACCELERATED FACT CHECKING WITH DISTRIBUTED LEDGERS”, and US-7660981-B1 “Verifiable chain of transfer for digital documents”.
Microsoft: US-11720754-B2 “Systems and methods for extracting evidence to facilitate claim verification”.
Truepic: They have dozens of pending patents, including: US-20230351054-A1 “SYSTEMS AND METHODS FOR AUTHENTICATING PHOTOGRAPHIC IMAGE DATA”, US-20230269105-A1 “METHODS FOR REQUESTING AND AUTHENTICATING PHOTOGRAPHIC IMAGE DATA”, US-20200349293-A1 “SYSTEMS AND METHODS FOR AUTHENTICATING PHOTOGRAPHIC IMAGE DATA”, and US-11403746-B2 “Methods for requesting and authenticating photographic image data”.
People in Japan have 1 patent and 3 pending patents around the use of JUMBF: US-20200151917-A1, US-20200151331-A1, US-11269998-B2, and US-20230260299-A1.
This is far from every potential patent conflict. I spot-checked companies that are listed as C2PA members. Most seem to have multiple patents that cover different aspects and uses of this technology.
Unless the C2PA license terms either permit free usage or indemnify developers who implement and incorporate this technology, I cannot risk including support for the C2PA specification with any of my products. Otherwise, a submarine patent may surface and torpedo my efforts. Companies that incorporate C2PA may end up deploying solutions that are later found to be violating some patents, resulting in either a forced licensing agreement or a rapid removal of a deployed technology.
In this blog entry, I have shown that the C2PA solution fails to meet its defined goals:
Privacy: Though the use of sidecar metadata, an investigator discloses information about their investigation to external parties. While content creators can control what information they disclose, it does not prevent someone from falsely attributing media to a content creator.
Responsibility: Consumers cannot reliably determine the provenance of an asset.
Security: Consumers cannot trust the integrity and source of provenance. In addition, because it is defined by a closed-door and opaque committee, the design is not reviewed by identifiable external experts prior to publication.
Harms and Misuse: The C2PA specification fails to prevent forgeries and cannot distinguish between competing claims. Having a trusted system that makes fraud appear authentic will directly impact virtually every industry: large companies, small companies, finance, credit cards, insurance, real estate, online merchants, news outlets, and even the legal system and court proceedings. Any industry where photos are used as evidence will be impacted. A widely promoted verification system that fails to validate and authenticate photographic evidence will definitely lead to harm and misuse.
Verifiable: Although the signatures can be verified using the embedded x509 certificates, this does not verify that the certificates match the authorized and representative signatories. It fails to identify the content owner or how the content was handled.
Can’t be secretly tampered with: My butterfly forgery appears more authentic than Adobe’s known-untampered source. Content Credentials has a picture on their homepage with an unverified component. A forger can easily and secretly alter composition components as long as a subsequent signature is valid. (I wonder if the signing companies, like Adobe and Microsoft, can be held liable for authenticating a forgery?)
In my opinion, the C2PA provenance solution is nothing more than a failed attempt to assign blame by attempting to attribute weakly defined source information to an image. It doesn’t provide authenticity, validation, or verifiable provenance. Moreover, the patent issues suggest that C2PA may be an attempt to force license agreements at a later date, or consume technology from smaller companies that cannot afford a license agreement.
Keeping People Honest
In previous blog entries, I showed how the C2PA implementations by Starling Labs and the New York Times included provably altered information in their examples of ‘authenticated proof’. Now I have demonstrated how the C2PA specification has fundamental flaws that prevent trustworthy validations. So why is everyone still trying to support C2PA? I think it comes down to a cascade of fallacies:
Logical fallacy: Appeal to authority. C2PA has the backing of a lot of large and powerful companies, and the specifications explicitly mention that “the design is reviewed by experts”. The assumption is that many large companies using unidentified experts never make big mistakes, so it must be safe to use.
The C2PA steering committee members includes large tech companies like Microsoft, Intel, and Digicert. These companies employ very knowledgeable computer security experts who should have easily identified these vulnerabilities. I assume that these experts were never consulted because the alternative is that they either missed obvious problems or their findings were ignored.
Logical fallacy: False dichotomy. We need a solution. This is the only solution currently being promoted, so we must use it.
Logical fallacy: Bandwagon. Everyone else is trying to support it, so we must too. Citing a façade like CAI helps this fallacy by applying peer pressure.
Logical fallacy: Circular argument. If the person is honest, then it works for tracking provenance.
Economics: Sunk-cost fallacy. We’ve already invested years and reputations into this C2PA solution. We can’t stop now.
The circular argument fallacy is particularly important. Over on Reddit’s “explain like I’m five” (ELI5) forum, user SadFaceFromSpace asked, “ELI5: How is C2PA possible?” and then described a variety of unaddressed forgery techniques. (It’s not just me thinking this way.) The response from TheSkiGeek nailed it: “Ultimately you have to trust someone.”
C2PA provides a framework for recording provenance with a cryptographic signature. However, the detail, accuracy, and trustworthiness is dependent on the person who adds in the metadata. In effect, it’s a solution designed for keeping honest people honest. The problem is: I’m not worried about the honest people. The honest people will identify sources, authorship, and alterations. They can use a textual description or store it inside the metadata using XMP, EXIF, IPTC, or other existing standards that record changes. There is no need for a new (and overly complicated) standard for keeping honest people honest.
In contrast, C2PA does nothing to prevent or deter dishonest people. In fact, it’s the opposite: it enables them. Given that financial fraud is a multi-billion dollar per year industry, having an easy way to create a signed and authenticated fake image is certain to result in more forgeries. Moreover, the investigators who review these claims are more likely to approve forgeries that have verifiable digital signatures and appear authentic. In the worst case, there can be competing provenance claims with no method to distinguish real from fake (or fake from more fake; there doesn’t have to be a real one).
While it does come down to trust, the C2PA specification does not provide something we should trust. Perhaps it is time for C2PA (and CAI) to re-evaluate their solution and include external experts in the design and review process. I’m not saying that the solution needs to address all of their design goals, but it should address at least one of them.
For full disclosure, none of these problems are new to C2PA. Beginning in June 2021, I started disclosing these issues during video chats and in writing with a group that included C2PA leadership. Some of these topics were also reported as issues in their github repository. I often repeated these unaddressed risks for a year and a half (the last being in January 2023). The only new component is the forgery demonstration; our previous discussions only mentioned that it was possible, but it had not been implemented.
In chaos theory, the Butterfly effect refers to the condition where a tiny localized change in a complex system can have large effects elsewhere. (The single flap of a butterfly wing could start a chain of events that results in a hurricane on the other side of the world.) With C2PA, what started as a simple idea has grown into a chaotic storm. My examples in this blog entry demonstrate how small, unaddressed issues in this provenance solution will lead to an increase in “authentic” fraud, propaganda, and false attribution.