The main purpose of this report is to work out factors affecting the points players get per game in the NBA. The report comprises the best opencast and opencast & underground mining strategy, sensitivity analysis on the optimal strategy for a specific site and a generalised model for opencast mining with sites of similar properties.

2.0 Background

When talking to the unbreakable records in the NBA history, it is unbelievable that Wilt Chamberlain got 100 points in a single game and average more than 50 points in a season. Thus, what affect points players score in each game? Here, I consider 8 factors: games played, playing time per game, field goals attempted per game, field goal percentage, 3-point field goals attempted per game ,3-point field goal percentage ,free throws Attempted per game and free throws percentage that may affect points player get per game.

3.0 Data

I take data of top 50 scores per game leaders in the NBA 2012-2013 regular season into consideration.

PLAYER

GP

MPG

FGA

FG%

3PA

3P%

FTA

FT%

PTS

Kevin Durant

47

39.5

18.5

0.516

4.7

0.414

9.5

0.904

29.6

Carmelo Anthony

38

37.8

22

0.447

6.6

0.409

7.4

0.822

28.5

Kobe Bryant

47

38.7

21.1

0.466

5.7

0.341

7.5

0.838

27.9

LeBron James

43

38.7

18.7

0.547

3.3

0.403

6.4

0.734

26.5

James Harden

48

38.3

17.4

0.44

5.6

0.328

10.1

0.859

25.8

Kyrie Irving

37

35.6

18.6

0.471

4.8

0.412

5.3

0.851

24

Russell Westbrook

47

36.3

18.9

0.419

4.1

0.325

6.7

0.801

22.6

Stephen Curry

43

38

16.7

0.44

7.1

0.457

3.6

0.902

21.1

Dwyane Wade

39

34

15.4

0.508

1.2

0.319

6.3

0.738

20.6

LaMarcus Aldridge

45

38.2

17.6

0.47

0.2

0.1

4.9

0.801

20.5

Tony Parker

47

32.7

15.1

0.534

1.1

0.396

4.4

0.808

20.1

Jrue Holiday

42

38.4

17

0.463

3

0.354

3.3

0.779

19.4

David Lee

46

37.8

15.6

0.514

0.1

0

4.2

0.802

19.4

Brandon Jennings

46

36.8

16.6

0.406

5.7

0.374

3.8

0.828

18.7

Brook Lopez

40

29.4

14.2

0.526

0

0

5.1

0.734

18.7

Paul Pierce

46

33.7

14.8

0.422

5

0.346

5.5

0.788

18.6

Monta Ellis

46

36.4

17.4

0.4

3.5

0.252

4.7

0.799

18.6

Blake Griffin

48

32.6

13.9

0.531

0.3

0.188

5.6

0.658

18.5

Damian Lillard

47

38.6

15.4

0.423

6.3

0.362

3.6

0.845

18.4

O.J. Mayo

47

35.9

13.9

0.461

4.8

0.427

3.7

0.847

18

Kemba Walker

46

35.2

15.3

0.432

3.8

0.349

4.3

0.797

18

DeMar DeRozan

47

36.7

15

0.44

1.6

0.28

4.7

0.826

17.4

DeMarcus Cousins

44

31.9

14.7

0.444

0.2

0.2

5.6

0.762

17.4

Luol Deng

42

40

14.9

0.436

2.9

0.336

4.1

0.82

17.3

Paul George

46

37.3

15.1

0.427

5.7

0.382

2.8

0.808

17.3

Rudy Gay

43

36.6

16.4

0.411

3.1

0.319

3.7

0.772

17.3

Tim Duncan

43

29.8

13.7

0.505

0.1

0.4

4

0.828

17.3

Al Jefferson

47

32.9

15.4

0.477

0.2

0.2

2.9

0.837

17.1

Chris Bosh

42

33.9

12.2

0.54

0.8

0.25

4.6

0.818

17.1

Danilo Gallinari

47

32.9

13.1

0.424

5.4

0.37

4.9

0.811

17

David West

47

33.6

14.5

0.485

0.3

0.214

3.9

0.739

17

Joe Johnson

47

38

15

0.425

5.5

0.381

2.6

0.82

17

Ryan Anderson

48

31.3

14.1

0.434

7.6

0.396

1.9

0.878

16.9

Josh Smith

43

35.5

15.7

0.451

2.2

0.302

4.1

0.497

16.9

Deron Williams

46

36.4

13.5

0.415

5.3

0.34

4.4

0.858

16.8

Klay Thompson

47

35.3

14.5

0.418

7

0.391

2.1

0.888

16.7

Arron Afflalo

43

36.7

14.1

0.442

3.8

0.346

3.4

0.857

16.7

Jamal Crawford

46

29.4

13.5

0.417

5

0.362

4

0.863

16.5

Dwight Howard

43

34.7

10.3

0.577

0.1

0.25

9.3

0.496

16.5

J.R. Smith

45

33.4

15.1

0.402

4.9

0.338

3.1

0.793

16.3

Al Horford

43

37.3

13.4

0.532

0

0

2.9

0.602

16

Nicolas Batum

46

38.9

12.5

0.425

6.5

0.362

3.5

0.849

15.9

Carlos Boozer

44

31.2

14.1

0.475

0

0

3.5

0.699

15.8

Greg Monroe

47

32.6

12.8

0.483

0

0

4.9

0.685

15.7

Zach Randolph

44

35.2

13.6

0.472

0.4

0.125

3.5

0.75

15.5

J.J. Redick

46

32

11.7

0.452

6.2

0.399

2.6

0.892

15.3

Thaddeus Young

46

36

13

0.522

0.1

0.2

2.6

0.57

15.1

Raymond Felton

33

33.5

15.3

0.401

4.2

0.365

1.7

0.782

15.1

Kevin Martin

46

29.8

10.6

0.45

5.2

0.435

3.6

0.904

15.1

Ty Lawson

47

34.3

13.1

0.431

2.9

0.36

3.6

0.737

15

http://espn.go.com/nba/statistics/player/_/stat/scoring-per-game

GP: Games Played

MPG: Minutes Per Game

PTS: Points Per Game

FGA: Field Goals Attempted Per Game

FG%: Field Goal Percentage

3PA: 3-Point Field Goals Attempted Per Game

3P%: 3-Point Field Goal Percentage

FTA: Free Throws Attempted Per Game

FT%: Free Throws Percentage

4.0 Analysis

4.1 Correlation

Firstly, I use scatterplot with regression to show the links between points per game and the 8 factors respectively.

From the picture, we can observe FGA has the strongest relationship with PTS as the data points are closest to the line. At the same time, GP have a weakest relationship with PTS.

In addition, I use correlation coefficient to show the relationships between average points and the other 8 factors. Below is a correlation matrix for all variables in the model. Numbers are Pearson correlation coefficients, go from -1 to 1. Closer to 1 means strong correlation. A negative value indicates an inverse relationship (roughly, when one goes up the other goes down).

From the table above, we can observe field goals attempted per game(FGA) and free throws attempted per game(FTA) have a great influence on the points per game players get as the correlations are 0.840 and 0.727 respectively.

4.2 Simple linear regression

I take independent variables FGA and GP as the typical examples to show the linear relationship with PTS.

Therefore,I build the simple regression model using FGA as the independent variable and PTS as the dependent variable and get results below.

The t-values test the hypothesis that the coefficient is different from 0. To reject this, we need a t-value greater than 1.96 (for 95% confidence). In this case, t-value of FGA is 10.72, which indicates there is a linear relationship between PTS and FGA.

Two-tail p-values test the hypothesis that each coefficient is different from 0. To reject this, the p-value has to be lower than 0.05 (you could choose also an alpha of 0.10). In this case,with a p-value of 0.000, there is very strong evidence to suggest that the simple linear regression model is useful for PTS.

The r2 value listed on the output is 70.5%, which is implies that about 70.5% of the sample variation in points per game(PTS) is explained by field goals attempted per game(FGA) in a straight-line model. There are some unusual observations and thus there are likely other variables that affect PTS.

Moreover, the most important part of the ANOVA table is the probability. The probability is calculated by assuming that the independent variable in question has no effect and then gauging the likelihood that the outcome you observed would occur. The effect of the variable is called statistically significant if the P value is less than 0.05 or 0.01, with smaller numbers indicating higher significance. Since the P value above is well below 0.01, we can reasonably say that the tested factor(PGA) has a real impact on the response variable（PTS）..

The regression equation is PTS = -0.84 +1.29*FGA. For each one-point increase in FGA, scores increase by 1.29 points

A typical assumption in regression is that the random errors () are normally distributed. The normality assumption is important when conducting hypothesis tests of the estimates of the coefficients (). Fortunately, even when the random errors are not normally distributed, the test results are usually reliable when the sample is large enough. In this case, it is a well-behaved residual.

Then let me show another typical relationship by using PTS as a response and GP as a predictor.

In this case, t-value of GP is -0.67, greater than -1.96 (for 95% confidence), which indicates we have insufficient evidence to conclude that a statistically significant relationship between PTS and FGA exists. Alternatively, with the p-value of 0.509, greater than 0.05,we can also obtain there is no significant relationship between PTS and FGA. The r2 value listed on the output is only 0.9%, which is implies that nearly no sample variation in points per game(PTS) is explained by games player(GP) in a straight-line model. Moreover, the P value of the ANOVA table above is 0.509 far greater than 0.05 and there are many unusual observations, we can reasonably say that the tested factor(GP) has no impact on the response variable（PTS）.

The picture illustrates that the random errors are not normally distributed, there the test results are not reliable.

Overall, there is no significant relationship between PTS and FGA.

4.3 Multiple Linear Regression

To improve the results obtained above, we can use the multiple linear regression to get the relationship between the points per game and other 8 factors.

The t-values test the hypothesis that the coefficient is different from 0. To reject this, you need a t-value greater than 1.96 (at 0.05 confidence). The t-values also show the importance of a variable in the model. In this case, FGA is the most important.

Alternatively, two-tail p-values test the hypothesis that each coefficient is different

from 0. To reject this, the p-value has to be lower than 0.05 (you could choose also an alpha of 0.10). In this case, GP, MPG, and 3P% are not statistically significant in explaining PTS. FGA, FG%,3PA,FTA,FT% are variables that have some significant impact on PTS.

Moreover, the model explains 99.0% of variances on PTS.

The P value of the ANOVA table above is 0.000 far less than 0.05, we can reasonably say that the tested factors has great impact on the response variable（PTS）.

Overall, the model describes the variation in data well, however, we can still improve it as there are some factors that are not important in explaining PTS.

4.4 Improvement

As discussed before, the three factors GP, MPG and 3P% are not statistically significant in explaining PTS, therefore, I exclude the three factors and build a new multiple regression model with other 5 factors.

Similarly, we can conclude the five facotrs have some influence on PTS. Among them, FGA have the largest impact.

2.0 Background

When talking to the unbreakable records in the NBA history, it is unbelievable that Wilt Chamberlain got 100 points in a single game and average more than 50 points in a season. Thus, what affect points players score in each game? Here, I consider 8 factors: games played, playing time per game, field goals attempted per game, field goal percentage, 3-point field goals attempted per game ,3-point field goal percentage ,free throws Attempted per game and free throws percentage that may affect points player get per game.

3.0 Data

I take data of top 50 scores per game leaders in the NBA 2012-2013 regular season into consideration.

PLAYER

GP

MPG

FGA

FG%

3PA

3P%

FTA

FT%

PTS

Kevin Durant

47

39.5

18.5

0.516

4.7

0.414

9.5

0.904

29.6

Carmelo Anthony

38

37.8

22

0.447

6.6

0.409

7.4

0.822

28.5

Kobe Bryant

47

38.7

21.1

0.466

5.7

0.341

7.5

0.838

27.9

LeBron James

43

38.7

18.7

0.547

3.3

0.403

6.4

0.734

26.5

James Harden

48

38.3

17.4

0.44

5.6

0.328

10.1

0.859

25.8

Kyrie Irving

37

35.6

18.6

0.471

4.8

0.412

5.3

0.851

24

Russell Westbrook

47

36.3

18.9

0.419

4.1

0.325

6.7

0.801

22.6

Stephen Curry

43

38

16.7

0.44

7.1

0.457

3.6

0.902

21.1

Dwyane Wade

39

34

15.4

0.508

1.2

0.319

6.3

0.738

20.6

LaMarcus Aldridge

45

38.2

17.6

0.47

0.2

0.1

4.9

0.801

20.5

Tony Parker

47

32.7

15.1

0.534

1.1

0.396

4.4

0.808

20.1

Jrue Holiday

42

38.4

17

0.463

3

0.354

3.3

0.779

19.4

David Lee

46

37.8

15.6

0.514

0.1

0

4.2

0.802

19.4

Brandon Jennings

46

36.8

16.6

0.406

5.7

0.374

3.8

0.828

18.7

Brook Lopez

40

29.4

14.2

0.526

0

0

5.1

0.734

18.7

Paul Pierce

46

33.7

14.8

0.422

5

0.346

5.5

0.788

18.6

Monta Ellis

46

36.4

17.4

0.4

3.5

0.252

4.7

0.799

18.6

Blake Griffin

48

32.6

13.9

0.531

0.3

0.188

5.6

0.658

18.5

Damian Lillard

47

38.6

15.4

0.423

6.3

0.362

3.6

0.845

18.4

O.J. Mayo

47

35.9

13.9

0.461

4.8

0.427

3.7

0.847

18

Kemba Walker

46

35.2

15.3

0.432

3.8

0.349

4.3

0.797

18

DeMar DeRozan

47

36.7

15

0.44

1.6

0.28

4.7

0.826

17.4

DeMarcus Cousins

44

31.9

14.7

0.444

0.2

0.2

5.6

0.762

17.4

Luol Deng

42

40

14.9

0.436

2.9

0.336

4.1

0.82

17.3

Paul George

46

37.3

15.1

0.427

5.7

0.382

2.8

0.808

17.3

Rudy Gay

43

36.6

16.4

0.411

3.1

0.319

3.7

0.772

17.3

Tim Duncan

43

29.8

13.7

0.505

0.1

0.4

4

0.828

17.3

Al Jefferson

47

32.9

15.4

0.477

0.2

0.2

2.9

0.837

17.1

Chris Bosh

42

33.9

12.2

0.54

0.8

0.25

4.6

0.818

17.1

Danilo Gallinari

47

32.9

13.1

0.424

5.4

0.37

4.9

0.811

17

David West

47

33.6

14.5

0.485

0.3

0.214

3.9

0.739

17

Joe Johnson

47

38

15

0.425

5.5

0.381

2.6

0.82

17

Ryan Anderson

48

31.3

14.1

0.434

7.6

0.396

1.9

0.878

16.9

Josh Smith

43

35.5

15.7

0.451

2.2

0.302

4.1

0.497

16.9

Deron Williams

46

36.4

13.5

0.415

5.3

0.34

4.4

0.858

16.8

Klay Thompson

47

35.3

14.5

0.418

7

0.391

2.1

0.888

16.7

Arron Afflalo

43

36.7

14.1

0.442

3.8

0.346

3.4

0.857

16.7

Jamal Crawford

46

29.4

13.5

0.417

5

0.362

4

0.863

16.5

Dwight Howard

43

34.7

10.3

0.577

0.1

0.25

9.3

0.496

16.5

J.R. Smith

45

33.4

15.1

0.402

4.9

0.338

3.1

0.793

16.3

Al Horford

43

37.3

13.4

0.532

0

0

2.9

0.602

16

Nicolas Batum

46

38.9

12.5

0.425

6.5

0.362

3.5

0.849

15.9

Carlos Boozer

44

31.2

14.1

0.475

0

0

3.5

0.699

15.8

Greg Monroe

47

32.6

12.8

0.483

0

0

4.9

0.685

15.7

Zach Randolph

44

35.2

13.6

0.472

0.4

0.125

3.5

0.75

15.5

J.J. Redick

46

32

11.7

0.452

6.2

0.399

2.6

0.892

15.3

Thaddeus Young

46

36

13

0.522

0.1

0.2

2.6

0.57

15.1

Raymond Felton

33

33.5

15.3

0.401

4.2

0.365

1.7

0.782

15.1

Kevin Martin

46

29.8

10.6

0.45

5.2

0.435

3.6

0.904

15.1

Ty Lawson

47

34.3

13.1

0.431

2.9

0.36

3.6

0.737

15

http://espn.go.com/nba/statistics/player/_/stat/scoring-per-game

GP: Games Played

MPG: Minutes Per Game

PTS: Points Per Game

FGA: Field Goals Attempted Per Game

FG%: Field Goal Percentage

3PA: 3-Point Field Goals Attempted Per Game

3P%: 3-Point Field Goal Percentage

FTA: Free Throws Attempted Per Game

FT%: Free Throws Percentage

4.0 Analysis

4.1 Correlation

Firstly, I use scatterplot with regression to show the links between points per game and the 8 factors respectively.

From the picture, we can observe FGA has the strongest relationship with PTS as the data points are closest to the line. At the same time, GP have a weakest relationship with PTS.

In addition, I use correlation coefficient to show the relationships between average points and the other 8 factors. Below is a correlation matrix for all variables in the model. Numbers are Pearson correlation coefficients, go from -1 to 1. Closer to 1 means strong correlation. A negative value indicates an inverse relationship (roughly, when one goes up the other goes down).

From the table above, we can observe field goals attempted per game(FGA) and free throws attempted per game(FTA) have a great influence on the points per game players get as the correlations are 0.840 and 0.727 respectively.

4.2 Simple linear regression

I take independent variables FGA and GP as the typical examples to show the linear relationship with PTS.

Therefore,I build the simple regression model using FGA as the independent variable and PTS as the dependent variable and get results below.

The t-values test the hypothesis that the coefficient is different from 0. To reject this, we need a t-value greater than 1.96 (for 95% confidence). In this case, t-value of FGA is 10.72, which indicates there is a linear relationship between PTS and FGA.

Two-tail p-values test the hypothesis that each coefficient is different from 0. To reject this, the p-value has to be lower than 0.05 (you could choose also an alpha of 0.10). In this case,with a p-value of 0.000, there is very strong evidence to suggest that the simple linear regression model is useful for PTS.

The r2 value listed on the output is 70.5%, which is implies that about 70.5% of the sample variation in points per game(PTS) is explained by field goals attempted per game(FGA) in a straight-line model. There are some unusual observations and thus there are likely other variables that affect PTS.

Moreover, the most important part of the ANOVA table is the probability. The probability is calculated by assuming that the independent variable in question has no effect and then gauging the likelihood that the outcome you observed would occur. The effect of the variable is called statistically significant if the P value is less than 0.05 or 0.01, with smaller numbers indicating higher significance. Since the P value above is well below 0.01, we can reasonably say that the tested factor(PGA) has a real impact on the response variable（PTS）..

The regression equation is PTS = -0.84 +1.29*FGA. For each one-point increase in FGA, scores increase by 1.29 points

A typical assumption in regression is that the random errors () are normally distributed. The normality assumption is important when conducting hypothesis tests of the estimates of the coefficients (). Fortunately, even when the random errors are not normally distributed, the test results are usually reliable when the sample is large enough. In this case, it is a well-behaved residual.

Then let me show another typical relationship by using PTS as a response and GP as a predictor.

In this case, t-value of GP is -0.67, greater than -1.96 (for 95% confidence), which indicates we have insufficient evidence to conclude that a statistically significant relationship between PTS and FGA exists. Alternatively, with the p-value of 0.509, greater than 0.05,we can also obtain there is no significant relationship between PTS and FGA. The r2 value listed on the output is only 0.9%, which is implies that nearly no sample variation in points per game(PTS) is explained by games player(GP) in a straight-line model. Moreover, the P value of the ANOVA table above is 0.509 far greater than 0.05 and there are many unusual observations, we can reasonably say that the tested factor(GP) has no impact on the response variable（PTS）.

The picture illustrates that the random errors are not normally distributed, there the test results are not reliable.

Overall, there is no significant relationship between PTS and FGA.

4.3 Multiple Linear Regression

To improve the results obtained above, we can use the multiple linear regression to get the relationship between the points per game and other 8 factors.

The t-values test the hypothesis that the coefficient is different from 0. To reject this, you need a t-value greater than 1.96 (at 0.05 confidence). The t-values also show the importance of a variable in the model. In this case, FGA is the most important.

Alternatively, two-tail p-values test the hypothesis that each coefficient is different

from 0. To reject this, the p-value has to be lower than 0.05 (you could choose also an alpha of 0.10). In this case, GP, MPG, and 3P% are not statistically significant in explaining PTS. FGA, FG%,3PA,FTA,FT% are variables that have some significant impact on PTS.

Moreover, the model explains 99.0% of variances on PTS.

The P value of the ANOVA table above is 0.000 far less than 0.05, we can reasonably say that the tested factors has great impact on the response variable（PTS）.

Overall, the model describes the variation in data well, however, we can still improve it as there are some factors that are not important in explaining PTS.

4.4 Improvement

As discussed before, the three factors GP, MPG and 3P% are not statistically significant in explaining PTS, therefore, I exclude the three factors and build a new multiple regression model with other 5 factors.

Similarly, we can conclude the five facotrs have some influence on PTS. Among them, FGA have the largest impact.