Thursday, December 11, 2025

Information Envelopment Evaluation Tutorial


Information Envelopment Evaluation, also called DEA, is a non-parametric methodology for performing frontier evaluation. It makes use of linear programming to estimate the effectivity of a number of decision-making models and it’s generally utilized in manufacturing, administration and economics. The method was first proposed by Charnes, Cooper and Rhodes in 1978 and since then it grew to become a priceless instrument for estimating manufacturing frontiers.

Replace: The Datumbox Machine Studying Framework is now open-source and free to obtain. Try the bundle com.datumbox.framework.algorithms.dea to see the implementation of Information Envelopment Evaluation in Java.

After I first encountered the tactic 5-6 years in the past, I used to be amazed by the originality of the algorithm, its simplicity and the cleverness of the concepts that it used. I used to be much more amazed to see that the method labored effectively exterior of its typical purposes (monetary, operation analysis and so on) because it may very well be efficiently utilized in On-line Advertising and marketing, Search Engine Rating and for creating composite metrics. Regardless of this, immediately DEA is sort of solely mentioned throughout the context of enterprise. That’s the reason, on this article, I’ll cowl the essential concepts and mathematical framework behind DEA and within the subsequent submit I’ll present you some novel purposes of the algorithm on net purposes.

Why Information Envelopment Evaluation is attention-grabbing?

Information Envelopment Evaluation is a technique that permits us examine and rank information (shops, staff, factories, webpages, advertising campaigns and so on) based mostly on their options (weight, dimension, price, revenues and different metrics or KPIs) with out making any prior assumptions in regards to the significance or weights of the options. Essentially the most attention-grabbing a part of this system is that it permits us to check information comprised of a number of options which have completely totally different models of measurement. Which means that we will have information with options measured in kilometers, kilograms or financial models and nonetheless have the ability to examine, rank them and discover the perfect/worst and common performing information. Sounds attention-grabbing? Preserve studying.

The outline and assumptions of Information Envelopment Evaluation

Data-envelopment-analysis-graph
As we mentioned earlier, DEA is a technique which was invented to measure productiveness in enterprise. Thus a number of of its concepts come from the way in which that productiveness is measured on this context. One of many core traits of the tactic is the separation of the report options into two classes: enter and output. For instance if we measure the effectivity of a automobile, let’s imagine that the enter is the liters of petrol and the output is the variety of kilometers that it travels.

In DEA, all options should be optimistic and it’s assumed that the upper their worth, the extra their enter/output is. Moreover Information Envelopment Evaluation assumes that the options will be mixed linearly as a weighted sum of non-negative weights and type a ratio between enter and output that can measure the effectivity of every report. For a report to be environment friendly it should give us a “good” output relative to the supplied enter. The effectivity is measured by the ratio between output and enter after which in comparison with the ratio of the opposite information.

The ingenious concept behind DEA

What we lined up to now is a typical sense/apply. We use enter and outputs, weighted sums and ratios to rank our information. The intelligent concept of DEA is in the way in which that the weights of the options are calculated. As a substitute of getting to set the weights of the options and deciding on their significance earlier than we run the evaluation, the Information Envelopment Evaluation calculates them from the info. Furthermore the weights are NOT the identical for each report!

Right here is how DEA selects the weights: We attempt to maximize the ratio of each report by choosing the suitable function weights; on the identical time although we should be certain that if we use the identical weights to calculate the ratios of all the opposite information, none of them will change into bigger than 1.

The concept sounds a bit unusual in the beginning. Received’t this result in the calculation of otherwise weighted ratios? The reply is sure. Doesn’t this imply that we really calculate otherwise the ratios for each report? The reply is once more sure. So how does this work? The reply is straightforward: For each report, given its traits we attempt to discover the “superb scenario” (weights) by which its ratio could be as excessive as potential and thus making it as efficient as potential. BUT on the identical time, given this “superb scenario” not one of the output/enter ratios of the opposite information must be bigger than 1, which means that they’ll’t be more practical than 100%! As soon as we calculate the ratios of all information underneath every “superb scenario”, we use their ratios to rank them.

So the primary concept of DEA will be summed within the following: “Discover the best scenario by which we will obtain the perfect ratio rating based mostly on the traits of every report. Then calculate this superb ratio of every report and use it to check their effectiveness”.

Let’s see an instance

Let’s see an instance the place we might use DEA.

Suppose that we’re curious about evaluating the effectivity of the grocery store shops of a selected chain based mostly on quite a lot of traits: the full variety of staff, the scale of retailer in sq. meters, the quantity of gross sales that they generate and the variety of prospects that they serve each month on common. It turns into apparent that discovering essentially the most environment friendly shops requires us to check information with a number of options.

To use DEA we should outline which is our enter and output. On this case the output is clearly the quantity of gross sales and the variety of prospects that they serve. The enter is the variety of staff and the scale of the shop. If we run DEA, we are going to estimate the output to enter ratio for each retailer underneath the best weights (as mentioned above). As soon as we’ve their ratios we are going to rank them in accordance with their effectivity.

It’s math time!

Now that we bought an instinct of how DEA works, it’s time to dig into the maths.

The effectivity ratio of a selected report i with x enter and y output (each function vectors with optimistic values) is estimated by utilizing the next system:

dea2

The place u and v are the weights of every output and enter of the report, s is the variety of output options and m is the variety of enter options.

The issue of discovering the perfect/superb weights for a selected report i will be formulated as follows:

dea4
dea6
dea8

Once more the above is simply the mathematical approach of discovering the weights u and v that maximize the effectivity of report i, supplied that these weights won’t make any of the opposite information extra environment friendly than 100%.

To unravel this downside we should use linear programming. Sadly linear programming doesn’t permit us to make use of fractions and thus we have to remodel the formulation of the issue as following:

dea10
dea12
dea14
dea8

We must always stress that the above linear programming downside will provides us the perfect weights for report i and calculate its effectivity underneath these optimum weights. The identical should be repeated for each report in our dataset. So if we’ve n information, we’ve to unravel n separate linear issues. Right here is the pseudocode of how DEA works:


ratio_scores = [];
for each report i {
    i_ratio = get_maximum_effectiveness();
    ratio_scores[i] = i_ratio;
}

Limitations of Information Envelopment Evaluation

DEA is a good method but it surely has its limitations. You could perceive that DEA is sort of a black field. Because the weights which are used within the effectiveness ratio of every report are totally different, attempting to clarify how and why every rating was calculated is pointless. Normally we deal with the rating of the information quite than on the precise values of the effectiveness scores. Additionally observe that the existence of extremums could cause the scores to have very low values.

Take into account that DEA makes use of linear combos of the options to estimate the ratios. Thus if combining them linearly isn’t applicable in our software, we should apply transformations on the options and make them potential to be linearly mixed. One other disadvantage of this system is that we’ve to unravel as many linear programming issues because the variety of information, one thing that requires lots of computational assets.

One other downside that DEA faces is that it doesn’t work effectively with excessive dimensional information. To make use of DEA the variety of dimensions d = m + s  should be vital decrease than the variety of observations. Working DEA when d may be very shut or bigger than n doesn’t present helpful outcomes since most certainly all of the information will likely be discovered to be optimum. Be aware that as you add a brand new output variable (dimension), all of the information with most worth on this dimension will likely be discovered optimum.

Lastly we should always observe that within the normal type of the algorithm, the weights of the options in DEA are estimated from the info and thus they don’t use any prior details about the significance of options that we’d have in our downside (in fact it’s potential to include this info as constrains in our linear downside). Moreover the effectivity scores which are calculated are literally the higher restrict effectivity ratios of every report since they’re calculated underneath “superb conditions”. Which means that DEA could be a good resolution when it’s not potential to make any assumptions in regards to the significance of the options but when we do have any prior info or we will quantify their significance then utilizing different strategies is suggested.

 

Within the subsequent article, I’ll present you methods to develop an implementation of Information Envelopment Evaluation in JAVA and we are going to use the tactic to estimate the recognition of net pages and articles in social media networks.

In the event you just like the article, take a second to share it on Twitter or Fb. 🙂

Related Articles

Latest Articles