Review

[5]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sympy as sy

Problem: Summations

Use the summation formulas below to evaluate the given summations.

\[\sum_{i = 1}^n i = \frac{n^{2}}{2} + \frac{n}{2}\]
\[\sum_{i = 1}^n i^2 = \frac{n^{3}}{3} + \frac{n^{2}}{2} + \frac{n}{6}\]
\[\sum_{i = 1}^n i^3 = \frac{n^{4}}{4} + \frac{n^{3}}{2} + \frac{n^{2}}{4}\]
  1. \(\sum_{i = 1}^{10} i^2 - i\)
  2. \(\sum_{i = 1}^{20} 4i^3 + i^2 - 6\)
  3. \(\sum_{i = 4}^{10} 2i^2\)
[6]:
i = sy.Symbol('i')
[7]:
sy.summation(i**2 - i, (i, 1, 10))
[7]:
$\displaystyle 330$
[11]:
sy.summation(4*i**3 + i**2 - 6, (i, 1, n))
[11]:
$\displaystyle n^{4} + \frac{7 n^{3}}{3} + \frac{3 n^{2}}{2} - \frac{35 n}{6}$
[9]:
n = sy.Symbol('n')
[12]:
sy.summation(2*i**2, (i, 4, 10))
[12]:
$\displaystyle 742$

Problem 2: Mean, Variance, Standard Deviation

Below is a sample of historic data relating to NYPD arrests. We count the 20 most frequent violations, and ask you to use these values to compute the mean number of incidents, variance in incident counts, and standard deviation of incident counts.

[13]:
nyc_arrests = pd.read_json('https://data.cityofnewyork.us/resource/8h9b-rp9u.json')
[14]:
nyc_arrests.head()
[14]:
arrest_key arrest_date pd_cd pd_desc ky_cd ofns_desc law_code law_cat_cd arrest_boro arrest_precinct jurisdiction_code age_group perp_sex perp_race x_coord_cd y_coord_cd latitude longitude
0 173130602 2017-12-31T00:00:00.000 566 MARIJUANA, POSSESSION 678.0 MISCELLANEOUS PENAL LAW PL 2210500 V Q 105 0 25-44 M BLACK 1063056 207463 40.735772 -73.715638
1 173114463 2017-12-31T00:00:00.000 478 THEFT OF SERVICES, UNCLASSIFIED 343.0 OTHER OFFENSES RELATED TO THEFT PL 1651503 M Q 114 0 25-44 M ASIAN / PACIFIC ISLANDER 1009113 219613 40.769437 -73.910241
2 173113513 2017-12-31T00:00:00.000 849 NY STATE LAWS,UNCLASSIFIED VIOLATION 677.0 OTHER STATE LAWS LOC000000V V K 73 1 18-24 M BLACK 1010719 186857 40.679525 -73.904572
3 173113423 2017-12-31T00:00:00.000 101 ASSAULT 3 344.0 ASSAULT 3 & RELATED OFFENSES PL 1200001 M M 18 0 25-44 M WHITE 987831 217446 40.763523 -73.987074
4 173113421 2017-12-31T00:00:00.000 101 ASSAULT 3 344.0 ASSAULT 3 & RELATED OFFENSES PL 1200001 M M 18 0 45-64 M BLACK 987073 216078 40.759768 -73.989811
[16]:
arrests = nyc_arrests['pd_desc'].value_counts().nlargest(20).to_frame()
[18]:
np.mean(arrests)
[18]:
pd_desc    38.25
dtype: float64
[19]:
np.var(arrests)
[19]:
pd_desc    1031.2875
dtype: float64
[20]:
np.std(arrests)
[20]:
pd_desc    32.113665
dtype: float64
[15]:
nyc_arrests['pd_desc'].value_counts().nlargest(20).to_frame()
[15]:
pd_desc
ASSAULT 3 142
LARCENY,PETIT FROM OPEN AREAS,UNCLASSIFIED 89
TRAFFIC,UNCLASSIFIED MISDEMEAN 74
MARIJUANA, POSSESSION 4 & 5 67
INTOXICATED DRIVING,ALCOHOL 53
ASSAULT 2,1,UNCLASSIFIED 47
ROBBERY,UNCLASSIFIED,OPEN AREAS 37
THEFT OF SERVICES, UNCLASSIFIED 32
LARCENY,GRAND FROM OPEN AREAS,UNCLASSIFIED 29
CONTROLLED SUBSTANCE, POSSESSION 7 25
PUBLIC ADMINISTRATION,UNCLASSIFIED FELONY 23
OBSTR BREATH/CIRCUL 20
MENACING,UNCLASSIFIED 19
RESISTING ARREST 16
WEAPONS, POSSESSION, ETC 16
FORGERY,ETC.,UNCLASSIFIED-FELONY 16
CONTROLLED SUBSTANCE,INTENT TO SELL 3 16
WEAPONS POSSESSION 3 15
TRAFFIC,UNCLASSIFIED MISDEMEANOR 15
WEAPONS POSSESSION 1 & 2 14

Problem: Root Mean Squared Error

Suppose we build a model to predict the price of an apartment using the square footage and bedrooms as follows:

\[c(x,y) = 10x + 1.2y + 200\]

where \(x\) represents square footage and \(y\) the number of bedrooms. The Root Mean Squared Error is defined by:

\[RMSE = \sqrt{\frac{\sum_{i = 1}^n (\hat{y} - y_i)^2 }{n}}\]

essentially the square root of summed squared errors between real and predicted cost.

Use the formula to find the RMSE of our models predictions below.

<tr style="text-align: right;">      <th></th>      <th>square footage</th>      <th>bedrooms</th>      <th>real_cost</th>      <th>predicted_cost</th>    </tr>  </thead>  <tbody>    <tr>      <th>0</th>      <td>400</td>      <td>1</td>      <td>4300.76</td>      <td>4201.2</td>    </tr>    <tr>      <th>1</th>      <td>500</td>      <td>1</td>      <td>5301.01</td>      <td>5201.2</td>    </tr>    <tr>      <th>2</th>      <td>600</td>      <td>3</td>      <td>6302.63</td>      <td>6203.6</td>    </tr>    <tr>      <th>3</th>      <td>700</td>      <td>1</td>      <td>7300.91</td>      <td>7201.2</td>    </tr>    <tr>      <th>4</th>      <td>800</td>      <td>1</td>      <td>8301.07</td>      <td>8201.2</td>    </tr>    <tr>      <th>5</th>      <td>900</td>      <td>3</td>      <td>9303.85</td>      <td>9203.6</td>    </tr>    <tr>      <th>6</th>      <td>1000</td>      <td>2</td>      <td>10302.35</td>      <td>10202.4</td>    </tr>  </tbody></table>
[21]:
(4300.76-4201.2)**2 + (5301.01 - 5201.2)**2
[21]:
9912.19360000008
[27]:
arrests['predictions'] = arrests['pd_desc'] + np.random.randint(1, 20, 20)
[30]:
np.mean((arrests['pd_desc'] - arrests['predictions'])**2)
[30]:
157.0
[31]:
np.sqrt(157)
[31]:
12.529964086141668
[ ]:

Problem: Area under a curve.

Evaluate the definite integrals below and represent the solution visually as the area under the curve \(f(x)\).

  1. \(\int_{1}^3 x^3 - \frac{1}{x} dx\)
  2. \(\int_{\pi}^{4\pi} 2 \cos(x) dx\)
  3. \(\int_4^9 1.04(5)^x dx\)
[32]:
from scipy.integrate import quad
[37]:
def f(x): return x**3 - 1/x
def g(x): return 2*np.cos(x)
def h(x): return 1.04*5**x
[34]:
quad(f, 1, 3)
[34]:
(18.90138771133189, 2.0984755834487654e-13)
[36]:
quad(g, np.pi, 4*np.pi)
[36]:
(-1.5809616649023117e-15, 1.306472773737261e-13)
[38]:
quad(h, 4, 9)
[38]:
(1261682.718116748, 1.4007492034248646e-08)

Problem: Area between curves

Given the functions:

\[f(x) = 1 + x + e^{x^2 - 2x} \quad g(x) = x^4 - 6.5x^2 + 6x + 2\]

define the regions R and S shown below.

<img src = images/a4p4.png />
</center>
  1. Prove that the lines intersect at \(x = 1\).
  2. Set up definite integrals to represent the areas \(R\) and \(S\)
  3. Evaluate the integrals using technology.
[ ]:

[ ]:

[ ]:

[ ]:

Problem: Volumes and Revolution

  1. Find the volume of the solid generated by rotating the region bounded by \(y = x\), \(x = 0\), and \(y = (x-1)^2 + 1\). Sketch an image of this region or try to use Python to visualize.
  2. Find the volume of the solid formed by rotating the region R from previous problem about the \(x\)-axis. Sketch an image of this region.
[ ]:

[ ]:

[ ]:

Problem: Gini Index

The World Bank provides access to data about world GINI Indicies here. Take a look around at a country of your choice. What does the GINI Index say about this country?

The United States Census gathers and provides data related to Income and Poverty in the United States. Visit their site here, and explore the data available. Download one data table and discuss the information your found and what it says about income and poverty in the United States.

[ ]:
# def c(x, y): return 10*x + 1.2*y + 200

# x = np.arange(400, 1100, 100)

# y = np.random.randint(1, 4, 7)

# zhat = np.round(c(x, y), 2)

# z = zhat + np.round(np.random.normal(100, size = 7), 2)

# df = pd.DataFrame({'square footage': x, 'bedrooms': y, 'real_cost': z, 'predicted_cost': zhat})
# df.to_html()
[ ]:
# def f(x): return 1 + x + np.e**(x**2 - 2*x)
# def g(x): return x**4 - 6.5*x**2 + 6*x + 2
# x = np.linspace(0, 2, 1000)
# plt.plot(x, f(x), label = '$f(x)$')
# plt.plot(x, g(x), label = '$g(x)$')
# plt.legend()
# plt.fill_between(x, f(x), g(x), color = 'grey', alpha = 0.1)
# plt.text(0.4, 2.5, 'R', fontsize = 30)
# plt.text(1.4, 2.2, 'S', fontsize = 30)
# plt.savefig('images/a4p4.png')