# Data Provenance

EVd3x integrates multi-source EV and molecular data with source-level traceability.

## Provenance principles

- Preserve source database fields through UI and exports.
- Keep publication/provenance identifiers where available.
- Distinguish direct evidence from derived/inferred context.

## Source-by-source coverage map

### Identity and mapping layer
- Ensembl / BioMart: canonical gene mapping.
- miRBase: canonical miRNA identifiers.
- UniProt: protein identifier and mapping context.

### Target layer
- miRTarBase, TarBase, TargetScan: miRNA-target support and confidence integration.

### EV evidence layer
- ExoCarta, Vesiclepedia, SVAtlas: EV cargo evidence rows.
- EV-TRACK: EV-TRACK provenance fields when present in linked upstream records.

### Pathway layer
- Reactome, KEGG, GO, WikiPathways: pathway memberships and enrichment context.

### Cell and communication layer
- HPA, RNALocate, mirMine, miRNA-atlas: expression/localization context.
- CellPhoneDB, OmniPath-linked resources: ligand/receptor communication context.
- CellMarker, PanglaoDB: cell specificity support.

### Disease layer
- DisGeNET (+ linked disease resources in processed datasets): disease associations and evidence context.

## EV-TRACK semantics (important)

- EV-TRACK ID is record-level provenance, not a universal entity identifier.
- Some valid EV evidence rows may not carry EV-TRACK ID if upstream rows do not include it.
- Always interpret EV-TRACK fields together with source and publication fields.

## Reproducibility minimum

When sharing findings, include:
- query text,
- active mode (single/collection),
- top-N or key filters,
- exported files used in interpretation,
- any active cap/threshold settings relevant to the result.
