Processing in EMBRC
Context of processing in EMBRC / St Andrews
Questionnaire answers from EMBRC/St Andrews on Processing available at: https://envriplus.manageprojects.com/projects/requirements/notebooks/470/pages/40
Summary of EMBRC / St Andrews requirements for Processing
- Data processing desiderata: input
a. What data are to be processed? What are their:
> Typologies Varies
> Volume Varies
> Velocity Varies
> Variety Varies
b. How is the data made available to the analytics phase? By file, by web (stream/protocol), etc.
c. Please provide concrete examples of data.
> It varies a lot. There are also data protection issues.
- Data processing desiderata: analytics
a. Computing needs quantification:
a.1 How many processes do you need to execute?
a.2 How much time does each process take/should take?
b. Process implementation:
b.1 What do you use in terms of:
> Programming languages varies
> Platform varies
> Specific software requirements varies
c. Is there a possibility to inject proprietary/user defined algorithms/processes for each of the above?
d. Do you use a sandbox to test and tune the algorithm/process for each of the above?
f. Do you use batch or interactive processing?
g. Do you use a monitoring console?
> It varies
h. Please provide concrete examples of processes to be supported/currently in use;
> It varies
- Data processing desiderata: output
a. What data are produced?
> Mainly results of analysis
- How are analytics outcomes made available?
> By paper
- Statistical questions
a. Is the data collected with a distinct question/hypothesis in mind? Or is simply something being measured?
b. Will questions/hypotheses be generated or refined (broadened or narrowed in scope) after the data has been collected? (N.B. Such activity would not be good statistical practice)
> Hopefully not
- Statistical data
a. Does the question involve analysing the responses of a single set of data (univariate) to other predictor variables or are there multiple response data (bi or multivariate data)?
b. Is the data continuous or discrete?
c. Is the data bounded in some form (i.e. what is the possible range of the data)?
d. Typically how many datums approximately are there?
> Can be millions
- Statistical data analysis
a. Is it desired to work within a statistics or data mining paradigm?
> Mainly statistical
b. Is it desired that there is some sort of outlier/anomaly assessment?
c. Are you interested in a statistical approach which rejects null hypotheses (frequentist) or generates probable belief in a hypothesis (Bayesian approach) or do you have a no real preference
Formalities (who & when) 
|Cristina Adriana Alexandru|
Period of requirements collection