# Difference in pdf and cdf of normal random variables

Posted on Friday, April 30, 2021 11:58:55 PM Posted by AgnГЁs M. - 01.05.2021 File Name: difference in and cdf of normal random variables.zip

Size: 2229Kb

Published: 01.05.2021  Typical Analysis Procedure. Enter search terms or a module, class or function name. While the whole population of a group has certain characteristics, we can typically never measure all of them.

## Probability density function

Random variables whose spaces are not composed of a countable number of points but are intervals or a union of intervals are said to be of the continuous type.

Continuous distributions are probability models used to describe variables that do not occur in discrete intervals, or when a sample size is too large to treat each individual event in a discrete manner please see Discrete Distributions for more details on discrete distributions. The main difference between continuous and discrete distributions is that continuous distributions deal with a sample size so large that its random variable values are treated on a continuum from negative infinity to positive infinity , while discrete distributions deal with smaller sample populations and thus cannot be treated as if they are on a continuum.

This leads to a difference in the methods used to analyze these two types of distributions: continuous and discrete distributions is continuous distributions are analyzed using calculus, while discrete distributions are analyzed using arithmetic. There are many different types of continuous distributions including some such as Beta, Cauchy, Log, Pareto, and Weibull.

In this wiki, though, we will only cover the two most relevant types of continuous distributions for chemical engineers: Normal Gaussian distributions and Exponential distributions. In chemical engineering, analysis of continuous distributions is used in a number of applications.

For example in error analysis, given a set of data or distribution function, it is possible to estimate the probability that a measurement temperature, pressure, flow rate will fall within a desired range, and hence determine how reliable an instrument or piece of equipment is. Also, one can calibrate an instrument eg. A Gaussian distribution can be used to model the error in a system where the error is caused by relatively small and unrelated events.

This distribution is a curve which is symmetric about the mean i. To better explain, consider that a certain percentage of all data points will fall within one standard deviation of the mean. Likewise, more data points will fall within two standard deviations of the mean, and so on.

However, under this model it would require an infinite range to capture ALL the data points thus presenting a minor difficulty in this appraoch. F x is the number of times a certain value of x occurs in the population. The mean is simply the numerical average of all the samples in the population, and the standard deviation is the measure of how far from the mean the samples tend to deviate. The following sections explain how and why a normal distribution curve is used in control and what it signifies about sets of data.

As was mentioned in the Introduction section, distribution curves can be used to determine the probability, P x , of a certain event occurring. In the figure shown above, the x-axis represents the range of possible events eg. The y-axis represents the number of times a certain x value occurs in a population. The PDF can be described mathematically as follows:. In some cases, it might not be necessary to know the probability of just one event occurring.

Rather, you may want to know the probability of a range of events eg. When this happens, you must integrate the above PDF over the desired range, in the following manner:.

This integral results in the following expression:. The Erf function can be found in most scientific calculators and can also be calculated using tables of Erf[] values. Determine the value inside the brackets of the erf function through simple arithmetic, then take this value and find the corresponding Erf number from a table.

Finally use this value in the calculation to determine the probability of a certain point, x, falling within a range bound from k1 to k2. Given a data set with an average of 20 and a standard deviation of 2, what is the probability that a randomly selected data point will fall between 20 and 23? To solve simply substitue the values into the equation above.

This yeilds the following equation:. These Erf values must be looked up in a table and substituted into the equation. Doing this yeilds. Thus there is a Graphically speaking, the PDF is just the area under the normal distribution curve between k 1 and k 2. Alternatively, rather than using the error function, Mathematica's built-in probability density function can be used to solve this problem.

This probability density function can be applied to the normal distribution using the syntax shown below. This shows that the probability of a randomly selected data point falling between 20 and 23 is 0. As expected, this value calculated using the built-in probability density function in Mathematica matches the value calculated from using the error function.

Mathematica provides a faster solution to this problem. An important point to note about the PDF is that when it is integrated from negative infinity to positive infinity, the integral will always be equal to one, regardless of the mean and standard deviation of the data set.

Likewise, the integral between negative infinity and the mean is 0. To transition our arguement to a more holistic perspective regarding the probability density function for a normal distribution, we present the cumulative density function, which represents the integral area under the curve of the PDF from negative infinity to some given value on the y- axis of the graph incrementally accoring to the x axis, which remains the same as before.

Because of this type of definition, we may circumvent the rigorous error function analysis presented above by simply subtracting one CDF points from another. For example, if engineers desire to determine the probability of a certain value of x falling within the range defined by k1 to k2 and posses a chart feauturing data of the relevant CDF, they may simply find CDF k2 - CDF k1 to find the relevant probability.

The Cumulative Density Function CDF is simply the probability of the random variable, x, falling at or below a given value. For example, this type of function would be used if you wanted to know the probability of your temperature sensor noise being less than or equal to 5 Hz.

The CDF for a normal distribution is described using the following expression:. The main difference between the PDF and CDF is that the PDF gives the probability of your variable x falling within a definite range, where the CDF gives the probability of your variable x falling at or below a certain limit, k. The following figure is the CDF for a normal distribution. You'll notice that as x approaches infinity, the CDF approaches 1. This implies that the probability of x falling between negative and positive infinity is equal to 1.

This simplified model of distribution typically assists engineers, statisticians, business strategists, economists, and other interested professionals to model process conditions, and to associate the attention and time needed to adress particular issues i.

Also, our grades in many of the courses here at the U of M, both in and outside of the college of engineering, are based either strictly or loosely off of this type of distribution. The benefit of the standard normal distribution is it can be used in place of the Erf[] function, if you do not have access to a scientific calculator or Erf[] tables. To use the Standard Normal Distribution curve, the following procedure must be followed:. Perform a z-transform. This is a transformation which essentially normalizes any normal distribution into a standard normal distribution.

It is done using the following relationship:. Mathematically speaking, the z transform normalizes the data by changing all the raw data points into data points that dictate how many standard deviations they fall away from the mean. So, regardless of the magnitude of the raw data points, the standardization allows multiple sets of data to be compared to each other.

Use a standard normal table to find the p-value. A standard normal table has values for z and corresponding values for F x , where F x is known as the p-value and is just the cumulative probability of getting a particular x value like a CDF. A more detailed standard normal table can be found here Note: This table is the same table used in the 'Basic Statistics' wiki.

First, find your two z values that correspond to a and b. So these would be and , respectively. The probability of x falling in between a and b is just: F zb — F za , where F zb and F za are found from the standard normal tables. Lets take the same scenario as used above, where you have a data set with an average of 20 and standard deviation of 3 and calculate the probability of a randomly selected data point being between 20 and These Z scores correspond to probabilities of 0.

Their difference, 0. Notice that this is almost identical to the answer obtained using the Erf method. Note that obtaining this answer also required much less effort. This is the main advantage of using Z scores. There are several properties for normal distributions that become useful in transformations. This is derived using the limiting results of the central limit theorem. This is derived using the central limit theorem.

This result shows how the sample mean. This has application for chi-square testing as seen other sections of this text. The exponential distribution can be thought of as a continuous version of the geometric distribution without any memory.

It is often used to model the time for a process to occur at a constant average rate. Events that occur with a known probability for a given x value build the theory developed previously i. However, do remember that the assumption of a constant rate rarely holds as valid in actuality. The rate of incoming phone calls differs according to the time of day. But if we focus on a time interval during which the rate is roughly constant, such as from 2 to 4 p.

One can implement the exponential distribution function into Mathematica using the command: ExponentialDistribution[lambda]. Or, for a more grass-roots understanding of the function reference the following website, detailing the number of sharks seen in area one square mile in different one hour time periods. Use the "Fish" button to run the applet. To change the parameter lambda, type in the value and hit the "Clear" button.

A few notes are worth mentioning when differentiating the PDF from the two-parameter Exponential Distribution function. This distribution has no shape parameter as it has only one shape, i.

Here, x could represent time while the rate parameter could be the rate in which decay occurs. The rate parameter must be constant and greater than 0. The PDF decreases continuously in this diagram because of its definition as a decay example. Exponential decay typically models radioactive particles which lose mass per unit of time.

Thus F x represents the mass of the particle with x equalling elapsed time since the start of the decay. Following the example given above, this graph describes the probability of the particle decaying in a certain amount of time x.

Among the distribution functions, the exponential distribution funtion has two unique properties, they are the memoryless property and a constant hazard rate. If a random variable, X , survived for "t" units of time, then the probability of X surviving an additional "s" units of time is the same as the probability of X suriving "s" units of time.

The random variable has "forgotten" that it has survived for "t" units of time, thus this property is called the "memoryless" property. ## 13.8: Continuous Distributions- normal and exponential

Say you were to take a coin from your pocket and toss it into the air. While it flips through space, what could you possibly say about its future? Will it land heads up? More than that, how long will it remain in the air? How many times will it bounce? How far from where it first hits the ground will it finally come to rest? The Relationship Between a CDF and a PDF. In technical terms, a probability density function (pdf) is the derivative of a cumulative distribution.

## PDF is not a probability.

Random variables whose spaces are not composed of a countable number of points but are intervals or a union of intervals are said to be of the continuous type. Continuous distributions are probability models used to describe variables that do not occur in discrete intervals, or when a sample size is too large to treat each individual event in a discrete manner please see Discrete Distributions for more details on discrete distributions. The main difference between continuous and discrete distributions is that continuous distributions deal with a sample size so large that its random variable values are treated on a continuum from negative infinity to positive infinity , while discrete distributions deal with smaller sample populations and thus cannot be treated as if they are on a continuum. This leads to a difference in the methods used to analyze these two types of distributions: continuous and discrete distributions is continuous distributions are analyzed using calculus, while discrete distributions are analyzed using arithmetic.

Mathematics Stack Exchange is a question and answer site for people studying math at any level and professionals in related fields. It only takes a minute to sign up. I am learning stats. On page 20, my book, All of Statistics 1e, defines a CDF as function that maps x to the probability that a random variable, X, is less than x.

In probability theory , a probability density function PDF , or density of a continuous random variable , is a function whose value at any given sample or point in the sample space the set of possible values taken by the random variable can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample. In a more precise sense, the PDF is used to specify the probability of the random variable falling within a particular range of values , as opposed to taking on any one value. This probability is given by the integral of this variable's PDF over that range—that is, it is given by the area under the density function but above the horizontal axis and between the lowest and greatest values of the range. The probability density function is nonnegative everywhere, and its integral over the entire space is equal to 1. The terms " probability distribution function "  and " probability function "  have also sometimes been used to denote the probability density function. 