Home Page | Overview | Site Map | Index | Appendix | Illustration | About | Contact | Update | FAQ |
![]() |
Statistics can be considered as experimental or observational mathematics. It uses the collected data to derive an estimated value of a parameter such as population mean, averaged temperature, ... etc. These values would not be very precise, but could present a vague idea in the absence of elaborate measurements, hence it is often associated with chance or probability. Many methods were invented over the years to check out the statistical significance of a given set(s) of data. Unfortunately, the data are often manipulated to suit some purposes, hence the saying : "There are three kinds of lies: lies, damned lies, and statistics". |
Figure 01 House Survey |
The following example illustrates the most elementary statistics about the number of occupants k in each of the N=20 houses on a street (Figure 01), f(k) denotes the frequency of occurrence hence f(k)/N is the probability of finding k occupants among these houses. It is often referred to as the Probability Distribution (note that its total is normalized to 1). Table 01 below contains the data collected in a survey. |
k | f(k) | f/20 | kf(k) | Age |
---|---|---|---|---|
1 | 3 | 0.15 | 3 | 56 |
2 | 8 | 0.4 | 16 | 45 |
3 | 5 | 0.25 | 15 | 30 |
4 | 2 | 0.1 | 8 | 20 |
5 | 1 | 0.05 | 5 | 25 |
6 | 1 | 0.05 | 6 | 18 |
![]() |
![]() |
Figure 02 Statistical Graph |
From these statistical evaluations, we obtain a vague idea about the occupants on this street. However, many details such as gender, occupation, age, ... are missing. To obtain a little bit more information, the survey could include the averaged age of the occupants in each category as shown in the 5th column of Table 01. |
k | k Rank | Age | Age Rank | d | d2 |
---|---|---|---|---|---|
1 | 1 | 56 | 6 | -5 | 25 |
2 | 2 | 45 | 5 | -3 | 9 |
3 | 3 | 30 | 4 | -1 | 1 |
4 | 4 | 20 | 2 | 2 | 4 |
5 | 5 | 25 | 3 | 2 | 4 |
6 | 6 | 18 | 1 | 5 | 25 |
![]() |
The probability for a particular winning lineup (e.g., bronze to B, silver to A, gold to D) is just the reciprocal of the number of arrangements, i.e., P(5,3) = 1/60. This choice conveys a message or information about the selection of that year's winning contestants. The information can be quantified as I = - log2(N) ![]() |
Figure 03b Permutation Example |
![]() |
![]() |
As shown in Figure 05, since the events are always simultaneously opposite to each other the distribution is symmetrical about the mid-point. When the Binomial coefficients are normalized by dividing the sum of all the ![]() ![]() |
Figure 04 Bernoulli Distribution |
Figure 05 Binomial Distribution [view large image] |
![]() For the case of n = 6, p = 0.5: ![]() ![]() |
![]() |
The Poisson distribution is used for estimating random occurring phenomenon based on a more likely one. The formula is :![]() where ![]() ![]() |
Figure 06 Poisson Distribution [view large image] |
Figure 2 (by scaling with a multiplication factor of 22.22). Figure 06 shows that the Poisson Distribution approaches the Normal Distribution as ![]() ![]() |
![]() |
![]() |
Quantum physics depends extensively on the probability distribution of small particles to understand the nature of microscopic objects. A very simple example is about a particle inside a box, which is represented by an infinite square well. The probability distribution is in the form of P(x) = sin2(nx), where n is an integer specifying the state of the system, which is associated with the energy level in this case. A few of the probability distribution are shown in Figure 07. Another exactly solvable case for the Schrodinger's equation is the Hydrogen atom. The probability distribution is related to the electron (probability) density around the nucleus - the proton. A few of the states are displayed in Figure 08 for its radial distribution. These states are also |
Figure 07 Infinite Square Well Probability [view large image] |
Figure 08 H Atom Distribution [view large image] |
related to the energy level, but there are more complicated configuration involving orbital angular momentum, spin, ... Some of such states are |
![]() |
A confidence interval (CI) is the interval for sets of measurements, which are likely to including the unknown parameter, e.g., the mean of the population sample ![]() ![]() A coffee vending machine customer decides to check out the manufacturer's specification, which claims to dispense 250 g (the ![]() ![]() |
Figure 09 Confidence Interval [view large image] |
whole population) and the customer's sample are considered to follow the pattern of normal distribution. Sample data gathering in this case has 50 sets of 25 cups each as shown in the top portray in Figure 09. |
![]() |
![]() |
Normal Probability Table |
This example shows that the estimate of ![]() |
![]() |
The confidence level is used for the search of the Higgs particle by scientists in LHC. They first calculate all the possible events excluding those related to the Higgs' from the Standard Model. This is the ![]() ![]() ![]() ![]() |
Figure 10 CL in Higgs Discovery [view large image] |
mean (dark dotted curve) in Figure 10. The occurrence of rare events outside the expected boundary indicates additional contribution by the Higgs particles, the the corresponding energy (~ 125 Gev) is the mass of the Higgs particle. The ultimate goal is to obtain a signal at 99% CL. |
![]() |
The Chi Square Test is a tool to check out if the measurements relate to a hypothesis is true or not. It was originally intended to judge if the statistical data are significant. It doesn't even come up with a degree of correlation as provided by the Spearman's test. However, it becomes a gold standard for evaluating statistical data over the years since its invention in the 1920's. There is a 2014 article in Nature to remind everyone to use it carefully. A table of ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Figure 11a Chi Square Distribution |
assertion) is right. Thus a small nummber for ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
By definition x![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Figure 11b Online Chi2 Calculation |
For the novice, it is important to distinct the difference between ![]() ![]() ![]() ![]() |
![]() |
|
Figure 12 Contingency Table |
There is always one degree of freedom less because of the restriction imposed by the sum. |
![]() |
The expected frequencies are computed by the formula Er,c = (nr x nc)/n, where nr, nc are the row and column totals, n is the grand total number in the survey. The Er,c's have the effect of blurring the distinction between the political parties and genders and so represent the |
Figure 13 Frequency Table |
null hypothesis. Or,c's are the raw data from the survey as shown within the red line boundary in Figure 12. These data are collectively shown in the frequency table in Figure 13. |
![]() |
![]() |
Figure 14 Three Dice Gambling [view large image] |
The formula is : ![]() where Oi is the observed frequency, Ei the expected frequency as asserted by the null hypothesis, and n the number of cells to be evaluated. For perfect match, Oi = Ei, ![]() ![]() |
# of Sixes | Observed Counts Oi | Expected Counts Ei |
---|---|---|
0 | 64 | 1(5/6)(5/6)(5/6)x100 = 58 |
1 | 30 | 3(5/6)(5/6)(1/6)x100 = 34.5 |
2 | 5 | 3(5/6)(1/6)(1/6)x100 = 7 |
3 | 1 | 1(1/6)(1/6)(1/6)x100 = 0.5 |
![]() |
According to the article in Nature (as mentioned earlier), the problem of relying on the p-value is in the underlying hypothesis, which challenges the null hypothesis. As shown in Figure 15, the more outlandish the hypothesis the more chance it would fail even at the 0.01 level. In other words, the significance p-value varies with the nature of the hypothesis, 0.05 is not the absolute standard as most researchers assumed. A subjective element about the likelihood of the hypothesis is thus introduced into the process. On top of such fallacy, there are cases when data |
Figure 15 Statistical Errors [view large image] |
are massaged to yield a low p-value. It is suggested that claims of p-value around 0.05 often involves fishing for significance; failure to replicate results is another sign; p-hacking (manipulation of data) can also be detected by looking at the method of analysis often banished to the discussion section at the end of the paper. |
![]() |
Boltzmann in 1868. The atomic composition of gas was not widely accepted by physicists at that time, and nothing was known about atomic structure; thus, the micro-state was invented to accommodate the particles with population Ni in an energy state Ei, which contains a number of cells gi (corresponding to different orientations for example). These micro-ensembles are supposed to link with the macroscopic variables such as the total number of particles N = ![]() ![]() |
Figure 16 Maxwell-Boltzmann Distribution [view large image] |
![]() |
In thermodynamics, the infinitesimal change in energy is related to the other macroscopic variables by the identity : dE = TdS - pdV + ![]() where T is the temperature, p the pressure, and ![]() ![]() ![]() ![]() |
Figure 17 Chemical Potential [view large image] |
![]() |
![]() |
In deriving the Bose-Einstein distribution, it is assumed that the constituent gas particles are non-interacting at thermodynamic equilibrium and indistinguishable. There is no restriction on the number of particles in any quantum states. They are called boson, all of them have integer spin. Those are photon gas in the cavity of blackbody radiation, the coupled electron pair (Cooper pair) in superconductivity, and Helium-4 in Superfluidity. |
Figure 19 BE-MB Distribution |
Figure 20 BE-FD Distribution [view large image] |
The methodology to derive the statistical distribution is similar to the treatment in the last section. The difference is on counting the occupation number. |
![]() |
![]() |
(5) Photon Gas in Cavity - Photon gas is the quantum version of electromagnetic wave when the particle-like behavior is dual to the wave property. It's main difference from the ideal gas is the way in reaching equilibrium. Whereas that state is attained by collisions in the latter case and the number of particle is conserved, the photon gas arrived at equilibrium by interacting with the matter in the cavity wall (Figure 21). The number of photons in a given state is not constant in the absorption and emission processes. However, the total energy remains the same at equilibrium, i.e., the total negative change of the chemical potential is balanced by the positive change. |
Figure 21 Photon Gas |
Figure 22 Blackbody Radiation [view large image] |
Consequently, the overall change is zero which implies the chemical potential ![]() ![]() |
![]() |
Derivation of the number of cells gi is similar to the case of Bose-Einstein statistics counting in phase space: Vx2x4![]() ![]() ![]() ![]() |
Figure 23 Fermi-Dirac Statistics [view large image] |
In terms of energy E = p2/2m, g(E) dE = [(8![]() ![]() |
![]() |
Table 04 lists the electrical and thermal conductivity for some metals in descending sequence of EF. Theoretical calculations show that these conductivities are proportional to the electron number density (Figure 24). The table reveals that while some entries follow the trend, the others (the latter half) do not seem to conform. The deviation can be explained by interaction with phonons (coherent wave motion of the lattice), which determines the conductivity for some metals. |
Figure 24 Conductivity by Electron Gas |
Metal | Fermi Energy, EF (ev) | # Density (1028/m3) | Electric Conductivity (107/ohm-m) | Thermal Conductivity (w/m-K) |
---|---|---|---|---|
Aluminium (Al) | 11.7 | 18.1 | 3.50 | 205.0 |
Iron (Fe) | 11.1 | 17.0 | 1.00 | 79.5 |
Lead (Pb) | 9.47 | 13.2 | 0.455 | 34.7 |
Copper (Cu) | 7.0 | 8.47 | 5.96 | 385.0 |
Silver (Ag) | 5.49 | 5.86 | 6.30 | 406.0 |
Sodium (Na) | 3.24 | 2.40 | 2.22 | 134.0 |