Events
Each Event is a JSON-representable object. Events have a core set of fields and are extensible.
Field | Type | Optional? | Description |
---|---|---|---|
subj_id |
URI | No | Subject persistent ID. |
relation_type_id |
string | No | Type of the relationship between the subject and object. |
obj_id |
URI | No | Object persistent ID. |
timestamp |
Timestamp | No | Timestamp of when the Event was created. |
occurred_at |
Timestamp | No | Timestamp of when the Event is reported to have occurred. |
id |
UUID | No | Unique ID for the Event. |
source_id |
string | No | A name for the source. |
source_token |
UUID | No | Unique ID that identifies the Agent that generated the Event. |
terms |
URL | Yes | Terms of use for using the API at the point that you acquire the Event. |
license |
URL | Yes | A license under which the Event is made available. |
evidence_record |
URL | Yes | Link to an Evidence Record for this Event. |
subj |
JSON Object | Yes | Subject metadata. |
obj |
JSON Object | Yes | Object metadata. |
subj_id and obj_id
An Event has a Subject and an Object, expressed as a URI. This could be:
- The Canonical URL of a webpage, if one is supplied. E.g. Some blogging platforms report a Canonical URL.
- The URL of a webpage as found.
- The DOI of the content being referenced.
Becuase Event Data is designed to track registered content via DOIs, every Event contains a DOI as the subj_id
or obj_id
. Most events have a webpage URL as the subj_id
and a DOI as the obj_id
, but this is left to the individual Agent that produces the Event.
relation_type_id
The Relation Type ID is taken from a controlled vocabulary. You should refer to the documentation for each source to see which relation types are used.
timestamp and occurred_at
Every Event has a time at which it was created. This is automatically assigned when the Agent sends an Event into the Event Data system. This is usually soon after the Event was observed. You can use this timestamp
to filter Events that were collected within a certain date range, and as a reliable method for iterating over all Events.
In addition to this, every Event has a theoretical date on which it occurred. The precise meaning of this can vary from source to source. Twitter's occurred_at
corresponds to the date that Twitter reports that the tweet was published. The Newsfeed source takes the date from the time that an RSS feed indicates that the blog was published. The web source takes the date that a webpage was observed. For more information, see each source's documentation. There is an in-depth discussion of times in the Time page.
Timestamps found in the timestamp
or occurred_at
fields are given in ISO 8601 UTC Z format, e.g. 2017-05-14T05:04:57Z
. In other fields you may see formats including, e.g. 2017-05-14T05:04:57.000Z
. All dates conform to ISO8601.
id
Every Event is assigned a unique ID. This enables you to unambiguously refer to the Event in question.
source_id
The source_id
identifies the original source of the input data. It corresponds to the list of sources you can find listed in this guide. Every source is monitored by an Agent.
source_token
Each Agent has its own unique ID, which you can find in the Sources documentation. The source_token
identifies the Agent that processed the data to produce the Event.
terms and license
Events will be made available via the Crossref Event Data Query API. If you use this service you must abide by the Terms of Use, which are indicated in the terms
field. You may find the same data made available via other services, for example DataCite's Event Data service. If this is the case, the terms field in Events made available via these other services may be different or missing.
As each source may operate under a different license, every Event includes a license
field. Only licenses from the Open Definition list are used.
evidence_record
An Event may have an Evidence Record associated with it. This is a file that provides an audit trail of how the Event was created. For more information see Evidence Records.
subj and obj
The Event may include metadata about its subject and/or object. For example:
pid
- The persistent ID. Must correspond tosubj_id
orobj_id
.issued
- Publication date.title
- The title of the webpage, comment, etc.author
- Author of the comment, blog etc.url
- URL where this was found. May be different topid
.method
- Optional. The method by which we matched a URL for registered content to a DOI.verification
. Optional. The method by which we verified that this content has the given DOI.work-type-id
. Optional. The work-type id of the object as defined by its Registration Agency, for example journal-article, Dataset, Text etc.
We typically only include metadata that you are unable to get elsewhere and that is unlikely to change. Where the subj_id
or obj_id
is a DOI, we will not include the metadata because you can easily look it up using the Crossref or DataCite APIs and it could go out of date if recorded. We also store a large amount of metadata associated with each DOI, and including it all would balloon the data to many times its size.
People don't always use DOIs to cite registered content, especially on social media, so we often link using the landing page. Where a link to a landing page is identified, and we can match that to a DOI, we will use the DOI as the obj_id
and obj.pid
, and the actual URL of the landing page as the obj.url
. In these cases, we include an optional method
and verification
field to indicate how we matched a DOI, and how we verified that. For a full description see Matching Landing Pages.