Historia wymaga pasterzy, nie rzeźników.


corrcoef
Correlation coefficient – normalized measure of linear
relationship strength between variables.
Covariance
cov returns the variance for a vector of data. The variance of the data in the
first column of count is
cov(count(:,1))
ans =
643.6522
For an array of data, cov calculates the covariance matrix. The variance values
for the array columns are arranged along the diagonal of the covariance matrix.
The remaining entries reflect the covariance between the columns of the
original array. For an m-by- n matrix, the covariance matrix has size n-by- n.
For example, the covariance matrix for count, cov(count), is arranged as
σ2 σ2 σ2
11
12
13
σ2 σ2 σ2
21
22
23
σ2 σ2 σ2
31
32
33
σ2 = σ2
ij
ji
13-11
13 Data Analysis and Statistics
Correlation Coefficients
corrcoef produces a matrix of correlation coefficients for an array of data
where each row is an observation and each column is a variable. The
correlation coefficient is a normalized measure of the strength of the linear
relationship between two variables. Uncorrelated data results in a correlation
coefficient of 0; equivalent data sets have a correlation coefficient of 1.
For an m-by- n matrix, the correlation coefficient matrix has size n-by- n. The
arrangement of the elements in the correlation coefficient matrix corresponds
to the location of the elements in the covariance matrix described above.
For our traffic count example
corrcoef(count)
results in
ans =
1.0000 0.9331 0.9599
0.9331 1.0000 0.9553
0.9599 0.9553 1.0000
Clearly there is a strong linear correlation between the three traffic counts
observed at the three locations, as the results are close to 1.
Finite Differences
MATLAB provides three functions for finite difference calculations.
Function
Description
diff
Difference between successive elements of a vector.
Numerical partial derivatives of a vector.
gradient
Numerical partial derivatives a matrix.
del2
Discrete Laplacian of a matrix.
The diff function computes the difference between successive elements in a
numeric vector. That is, diff(X) is [X(2)-X(1) X(3)-X(2)...
X(n)-X(n-1)]. So, for a vector A,
A = [9 -2 3 0 1 5 4];
13-12
Basic Data Analysis Functions
diff(A)
ans =
-11 5 -3 1 4 -1
Besides computing the first difference, diff is useful for determining certain
characteristics of vectors. For example, you can use diff to determine if a
vector is monotonic (elements are always either increasing or decreasing), or if
a vector has equally spaced elements. This table describes a few different ways
to use diff with a vector x.
Test
Description
diff(x)==0
Tests for repeated elements.
all(diff(x)>0)
Tests for monotonicity.
all(diff(diff(x))==0)
Tests for equally spaced vector elements.
13-13
13 Data Analysis and Statistics
Data Preprocessing
This section tells you how to work with
• Missing values
• Outliers and misplaced data points
Missing Values
The special value, NaN, stands for Not-a-Number in MATLAB. IEEE
floating-point arithmetic convention specifies NaN as the result of undefined
expressions such as 0/0.
The correct handling of missing data is a difficult problem and often varies in
different situations. For data analysis purposes, it is often convenient to use
NaNs to represent missing values or data that are not available.
MATLAB treats NaNs in a uniform and rigorous way. They propagate naturally
through to the final result in any calculation. Any mathematical calculation
involving NaNs produces NaNs in the results.
For example, consider a matrix containing the 3-by-3 magic square with its
center element set to NaN.
a = magic(3); a(2,2) = NaN

a =
8 1 6
3 NaN 7
4 9 2
Compute a sum for each column in the matrix.
sum(a)

ans =
15 NaN 15
Any mathematical calculation involving NaNs propagates NaNs through to the
final result as appropriate.
13-14
Data Preprocessing
You should remove NaNs from the data before performing statistical
computations. Here are some ways to use isnan to remove NaNs from data.
Code
Description
i = find(~isnan(x));
Find indices of elements in vector that are
x = x(i)
not NaNs, then keep only the non-NaN
elements.
x = x(find(~isnan(x)))
Remove NaNs from vector.
x = x(~isnan(x));
Remove NaNs from vector (faster).
x(isnan(x)) = [];
Remove NaNs from vector.
X(any(isnan(X)'),:) = [];
Remove any rows of matrix X containing
NaNs.
Note You must use the special function isnan to find NaNs because, by IEEE
arithmetic convention, the logical comparison, NaN == NaN always produces 0.
You cannot use x(x==NaN) = [] to remove NaNs from your data.
If you frequently need to remove NaNs, write a short M-file function.
function X = excise(X)
X(any(isnan(X)'),:) = [];
Now, typing
X = excise(X);
accomplishes the same thing.
Removing Outliers
You can remove outliers or misplaced data points from a data set in much the
same manner as NaNs. For the vehicle traffic count data, the mean and
standard deviations of each column of the data are
mu = mean(count)
13-15
13 Data Analysis and Statistics
sigma = std(count)
mu =
32.0000 46.5417 65.5833
sigma =
25.3703 41.4057 68.0281
The number of rows with outliers greater than three standard deviations is
obtained with
[n,p] = size(count)
outliers = abs(count - mu(ones(n, 1),:)) > 3*sigma(ones(n, 1),:);
nout = sum(outliers)
nout =
1 0 0
There is one outlier in the first column. Remove this entire observation with
count(any(outliers'),:) = [];
13-16
Regression and Curve Fitting
Regression and Curve Fitting
It is often useful to find functions that describe the relationship between some
variables you have observed. Identification of the coefficients of the function
often leads to the formulation of an overdetermined system of simultaneous
linear equations. You can find these coefficients efficiently by using the
MATLAB backslash operator.
Suppose you measure a quantity y at several values of time t.
t = [0 .3 .8 1.1 1.6 2.3]';

Podstrony