MDMC-NEP Glossary of Terms

This glossary defines and explains the high-level terms used in the context of the MDMC (meta)data management.

The definitions of terms have been designed keeping a balance between the specific applications of MDMC and the definitions available in other projects (NEP, HMC, NeXus, CODATA-CASRAI). 

The definitions make use in a consistent way of glossary terms, which are written in bold with Capital Initial Letters. 

This glossary is intended to be a living document, subject to updates if required by the community. For any inquiries, please contact Dr. Rossella Aversa.

Main contributors: R. Aversa, A. Boubnov, C. Eschke, S. Irvine, R. Joseph, M. Kabbe, N. MacKinnon, I. Modolo, M. Panighel, R. Thelen, D. Valentinis

Analysed Data

Specific type of Research Data, primary output of any kind of Data Analysis performed on Research Data, typically on Processed Data.

Calculation

Computational Data Acquisition performed on a Model to process its input Settings into output calculated properties using a specific computational and/or theoretical Technique based on a theory accepted by the community (e.g., Density Functional Theory, Conformal Field Theory, … ).

Conclusions

The primary output of Data Interpretation performed on Research Data, typically Analysed Data. Conclusions are any kind of insight that support the answer to some specific research question, such as the significance and implications of the research findings of a Study, possibly in comparison with Reference Data, along with recommendations which may support decision-making about the next processes of a Study or about future work. Conclusions form an important part of a Study debrief and are usually reported in Scientific Publications.

Consumable

Auxiliary entity used during Fabrication, Sample Preparation or Measurement which has a limited time capacity or is limited in its number of uses before it is disposed of, necessary to the process itself and normally bought from third party manufacturers. Examples are: gloves, syringes, wipes, etching solutions, glass slides, spatulas, weighing paper, two-sided tape.

Correlative Characterization

The action of characterising and connecting the different types of information from co-referenced (in time or space) multimodal Research Data obtained using different Techniques. This may include the output of multiple Data Acquisitions and/or of any of the processes included in the Data Analysis Lifecycle to obtain complementary insights on a region of interest, as well as to put into relation features of different Systems across multiple length scales over time.

Data Acquisition

Set of actions carried out by one or more Research Users, performed on a System or a set of them to generate a single self-consistent unit of Raw Data using a Technique, an Instrument and other Equipment under constant or varying controlled conditions described by Settings, depending on the particular research context. Data Acquisition may be an experimental (Measurement) or a computational (Calculation, Simulation) process. Data Acquisition is specific to Technique: an investigation on the same System conducted using a different Technique implies a different Data Acquisition. The output of Data Acquisition is Raw Data.

Data Analysis

Set of actions, included in the Data Analysis Lifecycle, performed on Research Data, typically Processed Data, to extract insights that support the answer to some scientific research question (i.e., Conclusions). Data Analysis may include: linear combination fitting, least-squares curve fitting, data modelling, pattern extraction, segmentation. The output of Data Analysis is Analysed Data.

Data Analysis Lifecycle

Set of processes carried out by one or more Research Users, performed on Research Data using one or more Techniques and/or Research Software in order to produce synthesised knowledge to, e.g., detect patterns, determine relationships, develop explanations, test hypotheses, prove theories, and to eventually suggest the Conclusions of a Study. Data Analysis Lifecycle includes (but is not limited to): Data Processing, Data Analysis, Data Interpretation. These processes may be iterative and may be combined in chains or workflows.

Data Collaboration Platform

Operational information system which allows Research Users to keep their Research Data, Datasets and related documents (e.g., drafts of Scientific Publications) synchronised and up to date, and to exchange them with other Research Users, who are typically members of the same Project. The system is intended for the long-tail and still volatile data, which can change and are still subject to active research. Therefore, a Data Collaboration Platform offers versioning of all ingested files but does not usually assign Persistent Identifiers to them.

Data Interpretation

Set of actions, included in the Data Analysis Lifecycle, performed on Research Data, typically Analysed Data, to determine the Conclusions of the Study, possibly in comparison with Reference Data. Data Interpretation supports decision-making about the next processes of the Study or about future work.  

Data Processing

Set of actions, included in the Data Analysis Lifecycle, performed on Research Data, typically Reference Data or Raw Data, to prepare it for one or more further processes, e.g., Model Preparation, Data Acquisition (in case of Calculations or Simulations), Data Analysis or Data Interpretation. Data Processing usually consists of routine actions. It may include: filtering, denoising, transformation, fusion or compression of Reference Data, as well as calibration, normalisation, statistical data reduction, background subtraction, correction of artefacts. The output of Data Processing is Processed Data.

Data Repository

Information system used to store, manage and provide access to digital resources, following a set of rules that define storage and access norms. A Data Repository is particularly suitable for Research Data (especially Datasets and/or Publication Data) which are not likely to be altered again. Many Data Repositories automatically assign globally unique Persistent Identifiers to deposited resources. Data Repositories may be associated with an Institution or a group of them, with an Instrument or a group of them, or with a Technique or a group of them, or may be run by a third party. Data Repositories may or may not be directly used by Research Users.

Dataset

Collection of scientifically related (depending on the research context) Research Data, along with their respective descriptive Metadata, typically stored in a Data Collaboration Platform and/or in a Data Repository. A Dataset may consist of other Datasets. The components of a Dataset remain individually identifiable.

Equipment

Any kind of physical or virtual item, device, machine or other tools used to perform one or more Fabrication(s), Sample Preparation(s), Model Preparation(s), Data Acquisition(s) and/or any of the processes included in the Data Analysis Lifecycle. Usually, the Equipment is located in a Laboratory hosted by an Institution and/or can be virtually or remotely accessed. Equipment is usually an investment. According to this definition, an Instrument is a particular type of Equipment.

Fabrication

Set of actions (physical changes or chemical reactions) carried out by a commercial enterprise, one or more Research Users or a third party, performed on one or more Inputs to produce one or more Precursors in controlled conditions described by Settings. Fabrication may require the use of Equipment, Consumable(s) and Instrument(s). A Data Acquisition may be performed during the Fabrication, e.g., to characterise the intermediate stages and/or the final resulting Precursor(s). The output of Fabrication is one or more Precursors.

Input

Physical System (typically a piece of material) which undergoes a Fabrication.

Institution

Hierarchical entity which hosts one or more Laboratories.

Instrument

Physical or virtual identifiable piece of Equipment used to perform a Data Acquisition and to generate Raw Data. The Instrument is located in a Laboratory hosted by an Institution and/or can be virtually or remotely accessed. A virtual Instrument may be any computational resource or HPC infrastructure (cloud infrastructure or supercomputer) needed to perform Calculations or Simulations.

Laboratory

Physical or virtual place hosted by an Institution, where one or more Instruments, as well as the Equipment, are located and/or can be virtually or remotely accessed, and the Data Acquisition may be performed.

Measurement

Experimental Data Acquisition, typically performed on a Sample using an experimental Technique. It may also be performed during Fabrication or Sample Preparation, e.g., to characterise the intermediate stages and/or the final resulting Precursor(s) or Sample(s), respectively. A Measurement may require the use of Consumables.

Metadata

Any descriptive data intended to help contextualise or otherwise qualify Research Data and/or Datasets and/or Publication Data and their management through time. Depending on the mode of use, Metadata contains information pertaining to any aspect of the Study, including (but not limited to) processes, outputs, and Research Users involved in the Project. Metadata may include descriptions of how files are named, structured and stored. Metadata may be registered in a Metadata Repository.

Metadata Repository

Information System used to store, manage and provide access to Metadata, following a policy or a set of rules that define storage and access norms. Metadata Repositories may be associated with an Institution or a group of them, or may be run by a third party. Metadata Repositories may or may not be directly used by Research Users.

Model

Digital representation of a System, primary output of any kind of Model Preparation, aimed to to be used in Calculation(s) or in Simulation(s) for its description or for predictions of its behaviour. A Model represents the System by direct similitude (e.g. small scale replica) or by capturing in a logical framework the relations between its properties (e.g. mathematical Model). A Model typically consists of Settings which may be stored in a file.

Model Preparation

Set of actions carried out by one or more Research Users, performed on Research Data (including collection and Data Processing of Reference Data) to define and/or formulate a Model. Model Preparation may require the use of Equipment and Instrument(s). The output of Model Preparation is Model.

Persistent Identifier

Long-lasting reference to a digital resource which provides the information required to reliably identify, verify and locate Research Data (typically Datasets or Publication Data) or Scientific Publications.

Precursor

Physical System (typically a piece of material) which is formed or manufactured during the Fabrication and is used during the Sample Preparation to produce a Sample. It may include one or more substrates, layers, masks, evaporation materials, coatings and molecules. A single Precursor might itself become the only Sample Component of a Sample in case it undergoes a Measurement.

Processed Data

Specific type of Research Data, primary output of any kind of Data Processing performed on Research Data, typically Raw Data or Reference Data. Processed Data is usually an intermediate result, to be used as input of one or more further processes, e.g., Model Preparation, Data Acquisition (in case of Calculations or Simulations), Data Analysis or Data Interpretation.

Project

An enterprise (potentially individual but typically collaborative) of one or more Research Users, planned to perform one or more Studies.

Publication Data

Dataset(s) generated in the course of a Study, that has undergone quality assessment and can be referred to as citations (i.e., a Persistent Identifier is assigned to it), e.g., to validate the results and/or the Conclusions presented in a Scientific Publication or appearing in it. Publication Data may include any kind of Research Data, as well as the relevant Metadata about the actions performed. Publication Data may be attributed to some or to all the Research Users who are members of the Project.

Raw Data

Specific type of Research Data, primary output of a Data Acquisition performed on a System, before any subsequent Data Processing.

Reference Data

Any Research Data not produced during the current Study, which is reused during the Study (e.g., during the Model Preparation) or is used as reference to compare and/or to validate the outputs of the Study, typically during the Data Analysis Lifecycle.

Research Data

Data collected, created or examined by one or more Research Users to be analysed or considered as a basis for reasoning, discussion or calculation in a research context, with the purpose of generating, verifying and validating original scientific claims that support the answer to some specific research question (i.e., Conclusions). Examples of Research Data include files containing the Settings of a Model, as well as any digital resource input or output of Data Acquisition, Data Processing or Data Analysis. According to this definition, Raw Data, Processed Data, Analysed Data and Reference Data are particular types of Research Data. Research Data is typically in the form of a data file, but it may potentially be a data stream or any other form of data which is relevant in a particular data management context. Research Data may be described by Metadata and may be stored in a Data Collaboration Platform and/or in a Data Repository. Research Data may be part of a Dataset.

Research Software

Any software used to process, analyse or visualise Research Data (including data rendering and/or plotting). Depending on the research context, Research Software can be used during Model Preparation, Data Processing, Data Analysis or Data Interpretation. Any software used during Fabrication, Sample Preparation or Data Acquisition is considered part of the Instrument and should be described as such.

Research User

Person, usually member of a Project, who conducts any part of the Study, in order to collect and/or analyse Research Data, or is interested in reusing Research Data by a third party (e.g. Reference Data) with the final aim to extract insights that support the answer to some specific research question (i.e., Conclusions). Research Users may be assigned with a role (data curator, instrument scientist, team leader, team member).

Sample

Physical System (typically a piece of material) composed by one or more Sample Components, exposed to the Instrument during a Measurement, typically after a Sample Preparation. Sample may be held by a Sample Holder and/or carried by a Sample Carrier during the Measurement.

Sample Component

Physical System (typically a piece of material) which constitutes a part of a Sample. It may include one or more substrates, layers, masks, embedding or filler or evaporation materials, coatings, conducting powders and molecules.

Sample Carrier

A piece of Equipment used for carrying one or more Samples and/or one or more Sample Holders helpful, e.g., for referencing, handling or height adjustment. Sample Carrier may be, e.g., a naked wafer, a glass slide or an individually designed metal frame.

Sample Holder

A piece of Equipment that makes one or more Samples accessible for a Measurement, or holds them in place in the predefined position to be mounted inside the Instrument (e.g., glass slide, TEM grid, tilting support). Sample Holder(s) may be carried by a Sample Carrier.

Sample Preparation

Set of actions (physical changes or chemical reactions) carried out by one or more Research Users, performed on (or between) one or more Precursor(s) or Sample(s) to produce one or more Samples and/or to make the Sample(s) fit to perform a Measurement in controlled conditions described by Settings. Sample Preparation may require the use of Equipment, Consumable(s) and Instrument(s). A Measurement may also be performed during the Sample Preparation, e.g., to characterise the intermediate stages and/or the final resulting Sample(s). The output of Sample Preparation is one or more Samples.

Settings

Set of configuration parameters which may be involved, for example, in a Data Acquisition (e.g., Settings of the Instrument), in any of the processes included in the Data Analysis Lifecycle (e.g., Settings of the Research Software), or to describe a Model (e.g., by specifying the type of solver used).

Scientific Publication

Any of the following contributions, peer-reviewed or not: article in a scientific journal (and related supporting information), monograph, book or book chapter, conference proceedings and “grey literature” (informally published material not having gone through a standard publishing process, e.g., reports and highlights). A Persistent Identifier may be assigned to them. Scientific Publications typically report the Conclusions of a Study and may be supplemented by Publication Data. Scientific Publications may be attributed to some or to all the Research Users who are members of the Project.

Simulation

Computational Data Acquisition performed on a Model to manipulate its Settings using a specific computational and/or theoretical Technique in order to study, predict or optimise the behaviour and performance of existing or proposed features and properties of a physical System that would otherwise be too complex, too large/small, too fast/slow, too dangerous, not accessible, or unacceptable to engage or control. Examples of Simulations are: multiscale simulation, finite element simulation, molecular dynamics simulation, discrete dislocation dynamics simulation.

System

Physical or digital entity or set of entities with distinctive properties (structural, chemical, dimensional, functional or others) which is the subject of one or more actions or investigations. According to this definition, Input, Precursor, Sample, Sample Component, and Model are particular types of System.

Study

Set of all the processes and activities performed by one or more Research Users who are part of the same Project, with the purpose of verifying, falsifying or establishing the validity of a hypothesis and supporting the answer to some scientific research question (i.e., Conclusions). The output of a Study is usually reported in one or more Scientific Publications and may be supplemented by Publication Data.

Technique

Any experimental, theoretical or computational method used during Data Acquisition or during any of the processes included in the Data Analysis Lifecycle to acquire, process or analyse Research Data about a System or a set of them with an Instrument.