Review¶
[5]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sympy as sy
Problem: Summations¶
Use the summation formulas below to evaluate the given summations.
- \(\sum_{i = 1}^{10} i^2 - i\)
- \(\sum_{i = 1}^{20} 4i^3 + i^2 - 6\)
- \(\sum_{i = 4}^{10} 2i^2\)
[6]:
i = sy.Symbol('i')
[7]:
sy.summation(i**2 - i, (i, 1, 10))
[7]:
[11]:
sy.summation(4*i**3 + i**2 - 6, (i, 1, n))
[11]:
[9]:
n = sy.Symbol('n')
[12]:
sy.summation(2*i**2, (i, 4, 10))
[12]:
Problem 2: Mean, Variance, Standard Deviation¶
Below is a sample of historic data relating to NYPD arrests. We count the 20 most frequent violations, and ask you to use these values to compute the mean number of incidents, variance in incident counts, and standard deviation of incident counts.
[13]:
nyc_arrests = pd.read_json('https://data.cityofnewyork.us/resource/8h9b-rp9u.json')
[14]:
nyc_arrests.head()
[14]:
| arrest_key | arrest_date | pd_cd | pd_desc | ky_cd | ofns_desc | law_code | law_cat_cd | arrest_boro | arrest_precinct | jurisdiction_code | age_group | perp_sex | perp_race | x_coord_cd | y_coord_cd | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 173130602 | 2017-12-31T00:00:00.000 | 566 | MARIJUANA, POSSESSION | 678.0 | MISCELLANEOUS PENAL LAW | PL 2210500 | V | Q | 105 | 0 | 25-44 | M | BLACK | 1063056 | 207463 | 40.735772 | -73.715638 |
| 1 | 173114463 | 2017-12-31T00:00:00.000 | 478 | THEFT OF SERVICES, UNCLASSIFIED | 343.0 | OTHER OFFENSES RELATED TO THEFT | PL 1651503 | M | Q | 114 | 0 | 25-44 | M | ASIAN / PACIFIC ISLANDER | 1009113 | 219613 | 40.769437 | -73.910241 |
| 2 | 173113513 | 2017-12-31T00:00:00.000 | 849 | NY STATE LAWS,UNCLASSIFIED VIOLATION | 677.0 | OTHER STATE LAWS | LOC000000V | V | K | 73 | 1 | 18-24 | M | BLACK | 1010719 | 186857 | 40.679525 | -73.904572 |
| 3 | 173113423 | 2017-12-31T00:00:00.000 | 101 | ASSAULT 3 | 344.0 | ASSAULT 3 & RELATED OFFENSES | PL 1200001 | M | M | 18 | 0 | 25-44 | M | WHITE | 987831 | 217446 | 40.763523 | -73.987074 |
| 4 | 173113421 | 2017-12-31T00:00:00.000 | 101 | ASSAULT 3 | 344.0 | ASSAULT 3 & RELATED OFFENSES | PL 1200001 | M | M | 18 | 0 | 45-64 | M | BLACK | 987073 | 216078 | 40.759768 | -73.989811 |
[16]:
arrests = nyc_arrests['pd_desc'].value_counts().nlargest(20).to_frame()
[18]:
np.mean(arrests)
[18]:
pd_desc 38.25
dtype: float64
[19]:
np.var(arrests)
[19]:
pd_desc 1031.2875
dtype: float64
[20]:
np.std(arrests)
[20]:
pd_desc 32.113665
dtype: float64
[15]:
nyc_arrests['pd_desc'].value_counts().nlargest(20).to_frame()
[15]:
| pd_desc | |
|---|---|
| ASSAULT 3 | 142 |
| LARCENY,PETIT FROM OPEN AREAS,UNCLASSIFIED | 89 |
| TRAFFIC,UNCLASSIFIED MISDEMEAN | 74 |
| MARIJUANA, POSSESSION 4 & 5 | 67 |
| INTOXICATED DRIVING,ALCOHOL | 53 |
| ASSAULT 2,1,UNCLASSIFIED | 47 |
| ROBBERY,UNCLASSIFIED,OPEN AREAS | 37 |
| THEFT OF SERVICES, UNCLASSIFIED | 32 |
| LARCENY,GRAND FROM OPEN AREAS,UNCLASSIFIED | 29 |
| CONTROLLED SUBSTANCE, POSSESSION 7 | 25 |
| PUBLIC ADMINISTRATION,UNCLASSIFIED FELONY | 23 |
| OBSTR BREATH/CIRCUL | 20 |
| MENACING,UNCLASSIFIED | 19 |
| RESISTING ARREST | 16 |
| WEAPONS, POSSESSION, ETC | 16 |
| FORGERY,ETC.,UNCLASSIFIED-FELONY | 16 |
| CONTROLLED SUBSTANCE,INTENT TO SELL 3 | 16 |
| WEAPONS POSSESSION 3 | 15 |
| TRAFFIC,UNCLASSIFIED MISDEMEANOR | 15 |
| WEAPONS POSSESSION 1 & 2 | 14 |
Problem: Root Mean Squared Error¶
Suppose we build a model to predict the price of an apartment using the square footage and bedrooms as follows:
where \(x\) represents square footage and \(y\) the number of bedrooms. The Root Mean Squared Error is defined by:
essentially the square root of summed squared errors between real and predicted cost.
Use the formula to find the RMSE of our models predictions below.