Methods of Data Analysis
      
  
                  A.Y. 2020/2021
      
      
  
Learning objectives
        
            
                  This course introduces the students to the statistical analysis of experimental data and provides the basics for C++ applications in the ROOT environment with simple computer applications. By solving problems the student will be able to apply data analysis techniques and evaluate the results quantitatively.
      
      
  
  Expected learning outcomes
        
            
                  At the end of the course the student will have learned the fundamental elements of the statistical analysis of experimental  data and will be able to extract quantitative information from a data sample, also using simple computer applications. In particular, the student will have learned basic concepts of probability, Monte Carlo simulation techniques, statistical tests, classification of events, and parameter estimation methods. In addition the student will also be able to implement  simple applications in the ROOT environment for data analysis.
      
      
  
  Lesson period: First semester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.
Course syllabus and organization
    
            Single session
Responsible
              
          Lesson period
              First semester
          Lessons will be delivered remotely in virtual classrooms (Zoom platform) in
synchronous connection, with the possibility of interaction in real time
between students and the teacher. Oral exams will be organized on request, in agreement with the teacher, in virtual classrooms (Zoom platform).
      
          synchronous connection, with the possibility of interaction in real time
between students and the teacher. Oral exams will be organized on request, in agreement with the teacher, in virtual classrooms (Zoom platform).
Course syllabus
1. Introduction
Statistical inference concept. Classification of types of inference. Statistical uncertainty and systematic uncertainty. Random variables. Expected values. Transformations of variables. Error propagation.
2. Probability
Definition of probability. Kolmogorov axioms. Frequentist probability. Bayesian probability. Bayes' theorem. Probability density function (pdf). Conditional probability. Cumulative probability function.
3. Examples of probability functions
Binomial, multinomial, Poisson, uniform, exponential, Gaussian, chi-square, Cauchy probability functions. Law of large numbers. Central limit theorem. Examples and applications.
4. Monte Carlo method
Pseudorandom number generators. Inverse transform method, rejection method (accept-reject). Examples of applications.
 
5. Point estimate of the parameters
Sample of observations, estimators and their properties, bias. Average, variance and covariance estimator.
6. Maximum likelihood method
Likelihood function. Maximum likelihood (ML) estimators. Examples for exponential and Gaussian functions. Variance of ML estimator. Cramer-Rao inequality. Minimum variance bound and conditions under which it is reached. Extended maximum likelihood (EML). ML of histograms. Goodness of fit. Combination of measures with ML. Bayesian estimators.
7. Least squares method
Relationship with ML. Linear fit. Fit of a polynomial. Fit of histograms. Goodness of fit with chi-square. Combination of measurements with least squares.
8. Estimate of intervals
Confidence intervals. Frequentist method. Confidence interval for Gaussian distribution estimator, for the mean of the Poisson distribution. Confidence intervals using likelihood or chi-square. Bayesian method. Limits on the average of a Poissonian variable in the presence of background.
9. Hypothesis testing
Test statistics. Consistency and significance level of the test. Neyman-Pearson lemma. Goodness of fit. Definition of p-value. Significance of an observed signal. Pearson chi-square test. Komogorov-Smirnov test. Wilks' theorem.
10. Classification
Multivariate classification. ROC curve and Neyman-Pearson lemma. Introduction to machine learning. Multivariate classification techniques. Fisher discriminator. Artificial neural networks. Decision trees.
11. Computer exercises
Generation of pseudorandom numbers. Monte Carlo methods for the generation of pseudoexperiments. Multivariate analysis using ROOT's TMVA software. Fit of maximum likelihood. Fit of chi-square. Hypothesis testing and p-value calculation.
Statistical inference concept. Classification of types of inference. Statistical uncertainty and systematic uncertainty. Random variables. Expected values. Transformations of variables. Error propagation.
2. Probability
Definition of probability. Kolmogorov axioms. Frequentist probability. Bayesian probability. Bayes' theorem. Probability density function (pdf). Conditional probability. Cumulative probability function.
3. Examples of probability functions
Binomial, multinomial, Poisson, uniform, exponential, Gaussian, chi-square, Cauchy probability functions. Law of large numbers. Central limit theorem. Examples and applications.
4. Monte Carlo method
Pseudorandom number generators. Inverse transform method, rejection method (accept-reject). Examples of applications.
5. Point estimate of the parameters
Sample of observations, estimators and their properties, bias. Average, variance and covariance estimator.
6. Maximum likelihood method
Likelihood function. Maximum likelihood (ML) estimators. Examples for exponential and Gaussian functions. Variance of ML estimator. Cramer-Rao inequality. Minimum variance bound and conditions under which it is reached. Extended maximum likelihood (EML). ML of histograms. Goodness of fit. Combination of measures with ML. Bayesian estimators.
7. Least squares method
Relationship with ML. Linear fit. Fit of a polynomial. Fit of histograms. Goodness of fit with chi-square. Combination of measurements with least squares.
8. Estimate of intervals
Confidence intervals. Frequentist method. Confidence interval for Gaussian distribution estimator, for the mean of the Poisson distribution. Confidence intervals using likelihood or chi-square. Bayesian method. Limits on the average of a Poissonian variable in the presence of background.
9. Hypothesis testing
Test statistics. Consistency and significance level of the test. Neyman-Pearson lemma. Goodness of fit. Definition of p-value. Significance of an observed signal. Pearson chi-square test. Komogorov-Smirnov test. Wilks' theorem.
10. Classification
Multivariate classification. ROC curve and Neyman-Pearson lemma. Introduction to machine learning. Multivariate classification techniques. Fisher discriminator. Artificial neural networks. Decision trees.
11. Computer exercises
Generation of pseudorandom numbers. Monte Carlo methods for the generation of pseudoexperiments. Multivariate analysis using ROOT's TMVA software. Fit of maximum likelihood. Fit of chi-square. Hypothesis testing and p-value calculation.
Prerequisites for admission
1) Mathematical knowledge: derivatives, integrals, operations with matrices.
2) Basic knowledge of the programming language C ++ (or Python).
2) Basic knowledge of the programming language C ++ (or Python).
Teaching methods
The didactic method includes lectures and exercises for the verification and application of the acquired notions. The exercises include the discussion of problems to be solved with the calculator and some techniques and programs to be used.
Teaching Resources
1) Glen Cowan, "Statistical data analysis", Oxford Science Publications. 
2) O. Behnke et al. "Data analysis in High Energy Physics", WILEY-VCH.
2) O. Behnke et al. "Data analysis in High Energy Physics", WILEY-VCH.
Assessment methods and Criteria
The exam consists of an oral test with theory questions and exercises on topics covered in class. The presentation of a written report with the solution of exercises proposed during the course to be carried out on the computer is also required. Each student will have to prepare the report individually.
Professor(s)