{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Review\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"import sympy as sy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Problem: Summations\n",
"\n",
"Use the summation formulas below to evaluate the given summations.\n",
"\n",
"$$\\sum_{i = 1}^n i = \\frac{n^{2}}{2} + \\frac{n}{2}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\sum_{i = 1}^n i^2 = \\frac{n^{3}}{3} + \\frac{n^{2}}{2} + \\frac{n}{6}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\sum_{i = 1}^n i^3 = \\frac{n^{4}}{4} + \\frac{n^{3}}{2} + \\frac{n^{2}}{4}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"a. $\\sum_{i = 1}^{10} i^2 - i$\n",
"\n",
"b. $\\sum_{i = 1}^{20} 4i^3 + i^2 - 6$\n",
"\n",
"c. $\\sum_{i = 4}^{10} 2i^2$ "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"i = sy.Symbol('i')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/latex": [
"$\\displaystyle 330$"
],
"text/plain": [
"330"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sy.summation(i**2 - i, (i, 1, 10))"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/latex": [
"$\\displaystyle n^{4} + \\frac{7 n^{3}}{3} + \\frac{3 n^{2}}{2} - \\frac{35 n}{6}$"
],
"text/plain": [
"n**4 + 7*n**3/3 + 3*n**2/2 - 35*n/6"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sy.summation(4*i**3 + i**2 - 6, (i, 1, n))"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"n = sy.Symbol('n')"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/latex": [
"$\\displaystyle 742$"
],
"text/plain": [
"742"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sy.summation(2*i**2, (i, 4, 10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Problem 2: Mean, Variance, Standard Deviation\n",
"\n",
"Below is a sample of historic data relating to NYPD arrests. We count the 20 most frequent violations, and ask you to use these values to compute the *mean number of incidents*, *variance in incident counts*, and *standard deviation of incident counts*. "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"nyc_arrests = pd.read_json('https://data.cityofnewyork.us/resource/8h9b-rp9u.json')"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" arrest_key | \n",
" arrest_date | \n",
" pd_cd | \n",
" pd_desc | \n",
" ky_cd | \n",
" ofns_desc | \n",
" law_code | \n",
" law_cat_cd | \n",
" arrest_boro | \n",
" arrest_precinct | \n",
" jurisdiction_code | \n",
" age_group | \n",
" perp_sex | \n",
" perp_race | \n",
" x_coord_cd | \n",
" y_coord_cd | \n",
" latitude | \n",
" longitude | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 173130602 | \n",
" 2017-12-31T00:00:00.000 | \n",
" 566 | \n",
" MARIJUANA, POSSESSION | \n",
" 678.0 | \n",
" MISCELLANEOUS PENAL LAW | \n",
" PL 2210500 | \n",
" V | \n",
" Q | \n",
" 105 | \n",
" 0 | \n",
" 25-44 | \n",
" M | \n",
" BLACK | \n",
" 1063056 | \n",
" 207463 | \n",
" 40.735772 | \n",
" -73.715638 | \n",
"
\n",
" \n",
" | 1 | \n",
" 173114463 | \n",
" 2017-12-31T00:00:00.000 | \n",
" 478 | \n",
" THEFT OF SERVICES, UNCLASSIFIED | \n",
" 343.0 | \n",
" OTHER OFFENSES RELATED TO THEFT | \n",
" PL 1651503 | \n",
" M | \n",
" Q | \n",
" 114 | \n",
" 0 | \n",
" 25-44 | \n",
" M | \n",
" ASIAN / PACIFIC ISLANDER | \n",
" 1009113 | \n",
" 219613 | \n",
" 40.769437 | \n",
" -73.910241 | \n",
"
\n",
" \n",
" | 2 | \n",
" 173113513 | \n",
" 2017-12-31T00:00:00.000 | \n",
" 849 | \n",
" NY STATE LAWS,UNCLASSIFIED VIOLATION | \n",
" 677.0 | \n",
" OTHER STATE LAWS | \n",
" LOC000000V | \n",
" V | \n",
" K | \n",
" 73 | \n",
" 1 | \n",
" 18-24 | \n",
" M | \n",
" BLACK | \n",
" 1010719 | \n",
" 186857 | \n",
" 40.679525 | \n",
" -73.904572 | \n",
"
\n",
" \n",
" | 3 | \n",
" 173113423 | \n",
" 2017-12-31T00:00:00.000 | \n",
" 101 | \n",
" ASSAULT 3 | \n",
" 344.0 | \n",
" ASSAULT 3 & RELATED OFFENSES | \n",
" PL 1200001 | \n",
" M | \n",
" M | \n",
" 18 | \n",
" 0 | \n",
" 25-44 | \n",
" M | \n",
" WHITE | \n",
" 987831 | \n",
" 217446 | \n",
" 40.763523 | \n",
" -73.987074 | \n",
"
\n",
" \n",
" | 4 | \n",
" 173113421 | \n",
" 2017-12-31T00:00:00.000 | \n",
" 101 | \n",
" ASSAULT 3 | \n",
" 344.0 | \n",
" ASSAULT 3 & RELATED OFFENSES | \n",
" PL 1200001 | \n",
" M | \n",
" M | \n",
" 18 | \n",
" 0 | \n",
" 45-64 | \n",
" M | \n",
" BLACK | \n",
" 987073 | \n",
" 216078 | \n",
" 40.759768 | \n",
" -73.989811 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" arrest_key arrest_date pd_cd \\\n",
"0 173130602 2017-12-31T00:00:00.000 566 \n",
"1 173114463 2017-12-31T00:00:00.000 478 \n",
"2 173113513 2017-12-31T00:00:00.000 849 \n",
"3 173113423 2017-12-31T00:00:00.000 101 \n",
"4 173113421 2017-12-31T00:00:00.000 101 \n",
"\n",
" pd_desc ky_cd \\\n",
"0 MARIJUANA, POSSESSION 678.0 \n",
"1 THEFT OF SERVICES, UNCLASSIFIED 343.0 \n",
"2 NY STATE LAWS,UNCLASSIFIED VIOLATION 677.0 \n",
"3 ASSAULT 3 344.0 \n",
"4 ASSAULT 3 344.0 \n",
"\n",
" ofns_desc law_code law_cat_cd arrest_boro \\\n",
"0 MISCELLANEOUS PENAL LAW PL 2210500 V Q \n",
"1 OTHER OFFENSES RELATED TO THEFT PL 1651503 M Q \n",
"2 OTHER STATE LAWS LOC000000V V K \n",
"3 ASSAULT 3 & RELATED OFFENSES PL 1200001 M M \n",
"4 ASSAULT 3 & RELATED OFFENSES PL 1200001 M M \n",
"\n",
" arrest_precinct jurisdiction_code age_group perp_sex \\\n",
"0 105 0 25-44 M \n",
"1 114 0 25-44 M \n",
"2 73 1 18-24 M \n",
"3 18 0 25-44 M \n",
"4 18 0 45-64 M \n",
"\n",
" perp_race x_coord_cd y_coord_cd latitude longitude \n",
"0 BLACK 1063056 207463 40.735772 -73.715638 \n",
"1 ASIAN / PACIFIC ISLANDER 1009113 219613 40.769437 -73.910241 \n",
"2 BLACK 1010719 186857 40.679525 -73.904572 \n",
"3 WHITE 987831 217446 40.763523 -73.987074 \n",
"4 BLACK 987073 216078 40.759768 -73.989811 "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nyc_arrests.head()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"arrests = nyc_arrests['pd_desc'].value_counts().nlargest(20).to_frame()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pd_desc 38.25\n",
"dtype: float64"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.mean(arrests)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pd_desc 1031.2875\n",
"dtype: float64"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.var(arrests)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pd_desc 32.113665\n",
"dtype: float64"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.std(arrests)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" pd_desc | \n",
"
\n",
" \n",
" \n",
" \n",
" | ASSAULT 3 | \n",
" 142 | \n",
"
\n",
" \n",
" | LARCENY,PETIT FROM OPEN AREAS,UNCLASSIFIED | \n",
" 89 | \n",
"
\n",
" \n",
" | TRAFFIC,UNCLASSIFIED MISDEMEAN | \n",
" 74 | \n",
"
\n",
" \n",
" | MARIJUANA, POSSESSION 4 & 5 | \n",
" 67 | \n",
"
\n",
" \n",
" | INTOXICATED DRIVING,ALCOHOL | \n",
" 53 | \n",
"
\n",
" \n",
" | ASSAULT 2,1,UNCLASSIFIED | \n",
" 47 | \n",
"
\n",
" \n",
" | ROBBERY,UNCLASSIFIED,OPEN AREAS | \n",
" 37 | \n",
"
\n",
" \n",
" | THEFT OF SERVICES, UNCLASSIFIED | \n",
" 32 | \n",
"
\n",
" \n",
" | LARCENY,GRAND FROM OPEN AREAS,UNCLASSIFIED | \n",
" 29 | \n",
"
\n",
" \n",
" | CONTROLLED SUBSTANCE, POSSESSION 7 | \n",
" 25 | \n",
"
\n",
" \n",
" | PUBLIC ADMINISTRATION,UNCLASSIFIED FELONY | \n",
" 23 | \n",
"
\n",
" \n",
" | OBSTR BREATH/CIRCUL | \n",
" 20 | \n",
"
\n",
" \n",
" | MENACING,UNCLASSIFIED | \n",
" 19 | \n",
"
\n",
" \n",
" | RESISTING ARREST | \n",
" 16 | \n",
"
\n",
" \n",
" | WEAPONS, POSSESSION, ETC | \n",
" 16 | \n",
"
\n",
" \n",
" | FORGERY,ETC.,UNCLASSIFIED-FELONY | \n",
" 16 | \n",
"
\n",
" \n",
" | CONTROLLED SUBSTANCE,INTENT TO SELL 3 | \n",
" 16 | \n",
"
\n",
" \n",
" | WEAPONS POSSESSION 3 | \n",
" 15 | \n",
"
\n",
" \n",
" | TRAFFIC,UNCLASSIFIED MISDEMEANOR | \n",
" 15 | \n",
"
\n",
" \n",
" | WEAPONS POSSESSION 1 & 2 | \n",
" 14 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" pd_desc\n",
"ASSAULT 3 142\n",
"LARCENY,PETIT FROM OPEN AREAS,UNCLASSIFIED 89\n",
"TRAFFIC,UNCLASSIFIED MISDEMEAN 74\n",
"MARIJUANA, POSSESSION 4 & 5 67\n",
"INTOXICATED DRIVING,ALCOHOL 53\n",
"ASSAULT 2,1,UNCLASSIFIED 47\n",
"ROBBERY,UNCLASSIFIED,OPEN AREAS 37\n",
"THEFT OF SERVICES, UNCLASSIFIED 32\n",
"LARCENY,GRAND FROM OPEN AREAS,UNCLASSIFIED 29\n",
"CONTROLLED SUBSTANCE, POSSESSION 7 25\n",
"PUBLIC ADMINISTRATION,UNCLASSIFIED FELONY 23\n",
"OBSTR BREATH/CIRCUL 20\n",
"MENACING,UNCLASSIFIED 19\n",
"RESISTING ARREST 16\n",
"WEAPONS, POSSESSION, ETC 16\n",
"FORGERY,ETC.,UNCLASSIFIED-FELONY 16\n",
"CONTROLLED SUBSTANCE,INTENT TO SELL 3 16\n",
"WEAPONS POSSESSION 3 15\n",
"TRAFFIC,UNCLASSIFIED MISDEMEANOR 15\n",
"WEAPONS POSSESSION 1 & 2 14"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nyc_arrests['pd_desc'].value_counts().nlargest(20).to_frame()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Problem: Root Mean Squared Error\n",
"\n",
"Suppose we build a model to predict the price of an apartment using the square footage and bedrooms as follows:\n",
"\n",
"$$c(x,y) = 10x + 1.2y + 200$$\n",
"\n",
"where $x$ represents square footage and $y$ the number of bedrooms. The **Root Mean Squared Error** is defined by:\n",
"\n",
"$$RMSE = \\sqrt{\\frac{\\sum_{i = 1}^n (\\hat{y} - y_i)^2 }{n}}$$\n",
"\n",
"essentially the square root of summed squared errors between real and predicted cost."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the formula to find the **RMSE** of our models predictions below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" | square footage | bedrooms | real_cost | predicted_cost |
| 0 | 400 | 1 | 4300.76 | 4201.2 |
| 1 | 500 | 1 | 5301.01 | 5201.2 |
| 2 | 600 | 3 | 6302.63 | 6203.6 |
| 3 | 700 | 1 | 7300.91 | 7201.2 |
| 4 | 800 | 1 | 8301.07 | 8201.2 |
| 5 | 900 | 3 | 9303.85 | 9203.6 |
| 6 | 1000 | 2 | 10302.35 | 10202.4 |
"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"9912.19360000008"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(4300.76-4201.2)**2 + (5301.01 - 5201.2)**2"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"arrests['predictions'] = arrests['pd_desc'] + np.random.randint(1, 20, 20)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"157.0"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.mean((arrests['pd_desc'] - arrests['predictions'])**2)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"12.529964086141668"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.sqrt(157)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Problem: Area under a curve. \n",
"\n",
"Evaluate the definite integrals below and represent the solution visually as the area under the curve $f(x)$.\n",
"\n",
"a. $\\int_{1}^3 x^3 - \\frac{1}{x} dx$\n",
"\n",
"b. $\\int_{\\pi}^{4\\pi} 2 \\cos(x) dx$\n",
"\n",
"c. $\\int_4^9 1.04(5)^x dx$"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"from scipy.integrate import quad"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"def f(x): return x**3 - 1/x\n",
"def g(x): return 2*np.cos(x)\n",
"def h(x): return 1.04*5**x"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(18.90138771133189, 2.0984755834487654e-13)"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quad(f, 1, 3)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(-1.5809616649023117e-15, 1.306472773737261e-13)"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quad(g, np.pi, 4*np.pi)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1261682.718116748, 1.4007492034248646e-08)"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quad(h, 4, 9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Problem: Area between curves\n",
"\n",
"Given the functions:\n",
"\n",
"$$f(x) = 1 + x + e^{x^2 - 2x} \\quad g(x) = x^4 - 6.5x^2 + 6x + 2$$\n",
"\n",
"define the regions R and S shown below.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"
\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Prove that the lines intersect at $x = 1$.\n",
"\n",
"2. Set up definite integrals to represent the areas $R$ and $S$\n",
"\n",
"3. Evaluate the integrals using technology."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Problem: Volumes and Revolution\n",
"\n",
"1. Find the volume of the solid generated by rotating the region bounded by $y = x$, $x = 0$, and $y = (x-1)^2 + 1$. Sketch an image of this region or try to use Python to visualize.\n",
"\n",
"2. Find the volume of the solid formed by rotating the region R from previous problem about the $x$-axis. Sketch an image of this region.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Problem: Gini Index\n",
"\n",
"The World Bank provides access to data about world GINI Indicies [here](https://data.worldbank.org/indicator/SI.POV.GINI?end=2017&start=1985). Take a look around at a country of your choice. What does the GINI Index say about this country? \n",
"\n",
"The United States Census gathers and provides data related to Income and Poverty in the United States. Visit their site [here](https://www.census.gov/library/publications/2019/demo/p60-266.html), and explore the data available. Download one data table and discuss the information your found and what it says about income and poverty in the United States."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# def c(x, y): return 10*x + 1.2*y + 200\n",
"\n",
"# x = np.arange(400, 1100, 100)\n",
"\n",
"# y = np.random.randint(1, 4, 7)\n",
"\n",
"# zhat = np.round(c(x, y), 2)\n",
"\n",
"# z = zhat + np.round(np.random.normal(100, size = 7), 2)\n",
"\n",
"# df = pd.DataFrame({'square footage': x, 'bedrooms': y, 'real_cost': z, 'predicted_cost': zhat})\n",
"# df.to_html()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# def f(x): return 1 + x + np.e**(x**2 - 2*x)\n",
"# def g(x): return x**4 - 6.5*x**2 + 6*x + 2\n",
"# x = np.linspace(0, 2, 1000)\n",
"# plt.plot(x, f(x), label = '$f(x)$')\n",
"# plt.plot(x, g(x), label = '$g(x)$')\n",
"# plt.legend()\n",
"# plt.fill_between(x, f(x), g(x), color = 'grey', alpha = 0.1)\n",
"# plt.text(0.4, 2.5, 'R', fontsize = 30)\n",
"# plt.text(1.4, 2.2, 'S', fontsize = 30)\n",
"# plt.savefig('images/a4p4.png')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}