Title: | Data and Code for Financial Accounting Research |
---|---|
Description: | Handy functions and data to support a course book for accounting research. Gow, Ian D. and Tongqing Ding (2024) 'Empirical Research in Accounting: Tools and Methods' <https://iangow.github.io/far_book/>. |
Authors: | Ian Gow [aut, cre] |
Maintainer: | Ian Gow <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0.9000 |
Built: | 2025-03-12 06:15:57 UTC |
Source: | https://github.com/iangow/farr |
A data set containing dates and descriptions for AAERs
aaer_dates
aaer_dates
A tibble with 2,920 rows and 4 variables:
AAER number
Date
Description
Year of AAER
A data set containing AAER firms-years used in Bao et al. (2020).
aaer_firm_year
aaer_firm_year
A tibble with 415 rows and 4 variables:
AAER identifier
GVKEY (firm identifier)
First affected year
Last affected year
A data set containing the dates of Apple media events since 2005.
apple_events
apple_events
A tibble with 47 rows and 3 variables:
Description of event
First date of event
Last date of event
https://en.wikipedia.org/wiki/List_of_Apple_Inc._media_events
A function returning AUC.
auc(scores, response)
auc(scores, response)
scores |
Probability that response is true or 1. |
response |
Responses coded as logical or 0-or-1. |
vector including AUC
https://stackoverflow.com/questions/4903092/calculate-auc-in-r
A data set containing fundamental financial information for Australian banks.
aus_bank_funds
aus_bank_funds
A tibble with 283 rows and 7 variables:
GVKEY (firm identifier)
Fiscal year-end
Total assets
Income before extraordinary items
Extraordinary items
Income from discontinued operations
A data set containing fundamental financial information for Australian banks.
aus_bank_rets
aus_bank_rets
A tibble with 3,047 rows and 4 variables:
GVKEY (firm identifier)
Last trading date of month
Stock return for month
Market capitalization on datadate
A data set containing identifying information for 10 Australian banks.
aus_banks
aus_banks
A tibble with 10 rows and 3 variables:
GVKEY (firm identifier)
Stock exchange ticker
Bank name
Firm-years in RDD analysis of Bloomfield (2021).
bloomfield_2021
bloomfield_2021
A tibble with 1,855 rows and 2 variables:
Fiscal year
CRSP firm identifier (PERMCO)
A data set containing data on tagged questions on StackOverflow
by_tag_year
by_tag_year
A tibble with 40,518 rows and 4 variables:
Year
Tag
Number of questions with tag during year
Total number of questions with tag during year
A simulated data set related to camp attendance.
camp_attendance
camp_attendance
A tibble with 1,000 rows and 2 variables:
Student identifier
Indicator for student attendance at camp
Data on whistleblowers and enforcement actions from Call et al. (2018)
cmsw_2018
cmsw_2018
A tibble with 1,133 rows and 31 variables:
CMSW record identifier
The total firm civil and criminal monetary penalties assessed against the firm, its parent and subsidiaries consisting of disgorgement, prejudgment interest, civil fines, criminal restitution, and criminal fines in millions of dollars
The total firm civil and criminal monetary penalties assessed against the agent firms and/or respondents (e.g., the audit firm, bankers, suppliers) in connection with the financial misrepresentation of the target firm, in millions of dollars
The total civil and criminal penalties assessed against all employees consisting of disgorgement, prejudgment interest, civil fines, criminal restitution, and criminal fines in millions of dollars
Total incarceration consisting of jail, prison, home detention, and halfway house in months imposed upon employee respondents named in the enforcement action
An indicator variable equal to one if the violation includes self-dealing such as embezzlement and theft by respondents and equal to zero otherwise
The percentage of blockholder ownership, defined as owners with at least five percent of common shares outstanding from the last 10-K or DEF 14A prior to the first public announcement the firm may be (is) subject to a regulatory enforcement action
The value-weighted market-adjusted return measured at the close of trading on the initial public announcement date that the firm may be (is) subject to a regulatory enforcement action
An indicator variable equal to one if a whistleblower is associated with the enforcement action and equal to zero otherwise
Post-SOX action flag
The natural logarithm of the total time the violation occurred in months as indicated in the regulatory enforcement proceedings
An indicator variable equal to one if the enforcement actions includes charges under the Foreign Corrupt Practices Act for bribery of a foreign official and zero otherwise
An indicator variable equal to one if violation or any of the respondents were associated with a known organized crime family and zero otherwise
An indicator variable equal to one if the violation includes an offense for either option backdating, insider trading, or an offense related to an offering, IPO, merger, or reverse merger and equal to zero otherwise
The natural logarithm of the total number of C-level respondents (e.g. CEO, COO, CFO, CAO, CMO, and CIO) named in the enforcement action
The natural logarithm of the total number of unique code sections and rules violated (charges) associated with the enforcement action
n indicator variable equal to one if fraud under 15 USC §§ 77q, 78j(b), or rules promulgated thereunder are included among the charges in the enforcement action
An indicator variable equal to one if the violation included violations of 17 CFR 240.13b2-2 that prohibits materially false or misleading statement to an accountant in connection with the preparation of financial statements and zero otherwise
An indicator variable equal to one if the misreporting firm used a Big N auditor, and equal to zero otherwise
An indicator variable equal to one if the firm terminated an executive respondent as a result of the violations and equal to zero otherwise
An indicator variable equal to one if the firm received credit in the assessment of penalties for cooperation as stated in regulatory enforcement documents during the course of the investigation and equal to zero otherwise
An indicator variable equal to one if regulators acknowledged they were deliberately misled and/or charges were included for lying to investigators and equal to zero otherwise
The percentage of the firm's directors that are independent from the last 10-K or DEF 14A prior to the first public announcement the firm may be (is) subject to a regulatory enforcement action
An indicator variable equal to one if the firm was previously the subject of a securities regulatory enforcement action and equal to zero otherwise
The natural logarithm of the market value of equity measured in millions of dollars prior to the first public announcement that the firm may be (is) subject to a regulatory enforcement action
The sum of market value of equity plus total assets minus total debt divided by total assets with market value determined below and total assets and total debt measured at the last fiscal year end prior to the first public announcement the firm may be (is) subject to a regulatory enforcement action
Total debt divided by total assets measured at the last fiscal year end prior to the first public announcement the firm may be (is) subject to a regulatory enforcement action
The natural logarithm of the distance in miles from the location of the firm's headquarters to the offices of the regulator assigned to the geographic area of the firm's headquarter location (closer of the SEC Regional Office or DOJ U.S. District Attorney).
Fama-French industry code (12-industry)
Whistleblower data source
Whistleblower type: tipster or nontipster
A data set containing data about accruals for 2,000 firms.
comp
comp
A tibble with 16,237 rows and 14 variables:
GVKEY (firm identifier)
Fiscal year-end
Fiscal year
Indicator for Big Four auditor
Total accruals (scaled by assets)
Return on assets
Cash flow from operating activities (scaled by assets)
Size
Leverage
Market-to-book ratio
1/Total assets
Change in revenue
Change in accounts receivable
Property, plant & equipment (scaled by assets)
A function returning sensitivity and precision.
confusion_stats(scores, response, predicted = NULL, k = NULL)
confusion_stats(scores, response, predicted = NULL, k = NULL)
scores |
Probability that response is true or 1. |
response |
Responses coded as logical or 0-or-1. |
predicted |
Predicted value coded as 0-or-1. |
k |
Percentage to classify as TRUE or 1. |
vector including sensitivity and precision
A data set containing the GVKEYs and datadates for firm-years used in Fang, Huang and Karpoff (2016).
fhk_firm_years
fhk_firm_years
A tibble with 60,272 rows × 2 variables.
GVKEY (firm identifier)
Fiscal year-end
A data set containing the tickers, GVKEYs, and treatment indicator for SHO pilot program. i
fhk_pilot
fhk_pilot
A tibble with 3,030 rows × 4 variables.
Ticker
GVKEY (firm identifier)
PERMNO (CRSP security identifier)
SHO pilot program treatment indicator
Calculate deciles for a variable.
form_deciles(x)
form_deciles(x)
x |
A vector for which deciles are to be calculated. |
vector
library(farr) library(dplyr, warn.conflicts = FALSE) df <- tibble(x = rnorm(100)) %>% mutate(dec_x = form_deciles(x)) df
library(farr) library(dplyr, warn.conflicts = FALSE) df <- tibble(x = rnorm(100)) %>% mutate(dec_x = form_deciles(x)) df
Produce a table mapping announcements to trading dates.
See vignette("wrds-conn", package = "farr")
for more on using this function.
get_annc_dates(conn)
get_annc_dates(conn)
conn |
connection to a PostgreSQL database |
tbl_df
## Not run: ## Not run: library(DBI) library(dplyr, warn.conflicts = FALSE) library(RPostgres) pg <- dbConnect(Postgres()) get_annc_dates(pg) ## End(Not run) ## End(Not run)
## Not run: ## Not run: library(DBI) library(dplyr, warn.conflicts = FALSE) library(RPostgres) pg <- dbConnect(Postgres()) get_annc_dates(pg) ## End(Not run) ## End(Not run)
Produce a table of event returns from CRSP.
get_event_cum_rets( data, conn, permno = "permno", event_date = "event_date", win_start = 0, win_end = 0, end_event_date = NULL, suffix = "" )
get_event_cum_rets( data, conn, permno = "permno", event_date = "event_date", win_start = 0, win_end = 0, end_event_date = NULL, suffix = "" )
data |
data frame containing data on events |
conn |
connection to a PostgreSQL database |
permno |
string representing column containing PERMNOs for events |
event_date |
string representing column containing dates for events |
win_start |
integer representing start of trading window (e.g., -1) |
win_end |
integer representing start of trading window (e.g., 1) |
end_event_date |
string representing column containing ending dates for events |
suffix |
Text to be appended after "ret" in variable names |
tbl_df
## Not run: ## Not run: library(DBI) library(dplyr, warn.conflicts = FALSE) library(RPostgres) pg <- dbConnect(Postgres()) events <- tibble(permno = c(14593L, 10107L), event_date = as.Date(c("2019-01-31", "2019-01-31"))) get_event_cum_rets(events, pg) ## End(Not run) ## End(Not run)
## Not run: ## Not run: library(DBI) library(dplyr, warn.conflicts = FALSE) library(RPostgres) pg <- dbConnect(Postgres()) events <- tibble(permno = c(14593L, 10107L), event_date = as.Date(c("2019-01-31", "2019-01-31"))) get_event_cum_rets(events, pg) ## End(Not run) ## End(Not run)
Produce a table of event returns from CRSP
See vignette("wrds-conn", package = "farr")
for more on using this function.
get_event_cum_rets_mth( data, conn, permno = "permno", event_date = "event_date", win_start = 0, win_end = 0, end_event_date = NULL, suffix = "" )
get_event_cum_rets_mth( data, conn, permno = "permno", event_date = "event_date", win_start = 0, win_end = 0, end_event_date = NULL, suffix = "" )
data |
data frame containing data on events |
conn |
connection to a PostgreSQL database |
permno |
string representing column containing PERMNOs for events |
event_date |
string representing column containing dates for events |
win_start |
integer representing start of trading window (e.g., -1) in months |
win_end |
integer representing start of trading window (e.g., 1) in months |
end_event_date |
string representing column containing ending dates for events |
suffix |
Text to be appended after "ret" in variable names. |
tbl_df
## Not run: ## Not run: library(DBI) library(dplyr, warn.conflicts = FALSE) library(RPostgres) pg <- dbConnect(Postgres()) events <- tibble(permno = c(14593L, 10107L), event_date = as.Date(c("2019-01-31", "2019-01-31"))) get_event_cum_rets_mth(events, pg) ## End(Not run) ## End(Not run)
## Not run: ## Not run: library(DBI) library(dplyr, warn.conflicts = FALSE) library(RPostgres) pg <- dbConnect(Postgres()) events <- tibble(permno = c(14593L, 10107L), event_date = as.Date(c("2019-01-31", "2019-01-31"))) get_event_cum_rets_mth(events, pg) ## End(Not run) ## End(Not run)
Produce a table of event dates for linking with CRSP.
See vignette("wrds-conn", package = "farr")
for more on using this function.
get_event_dates( data, conn, permno = "permno", event_date = "event_date", win_start = 0, win_end = 0, end_event_date = NULL )
get_event_dates( data, conn, permno = "permno", event_date = "event_date", win_start = 0, win_end = 0, end_event_date = NULL )
data |
data frame containing data on events |
conn |
connection to a PostgreSQL database |
permno |
string representing column containing PERMNOs for events |
event_date |
string representing column containing dates for events |
win_start |
integer representing start of trading window (e.g., -1) |
win_end |
integer representing start of trading window (e.g., 1) |
end_event_date |
string representing column containing ending dates for events |
tbl_df
## Not run: ## Not run: library(DBI) library(dplyr, warn.conflicts = FALSE) pg <- dbConnect(RPostgres::Postgres()) events <- tibble(permno = c(14593L, 10107L), event_date = as.Date(c("2019-01-31", "2019-01-31"))) get_event_dates(events, pg, win_start = -3, win_end = + 3) ## End(Not run) ## End(Not run)
## Not run: ## Not run: library(DBI) library(dplyr, warn.conflicts = FALSE) pg <- dbConnect(RPostgres::Postgres()) events <- tibble(permno = c(14593L, 10107L), event_date = as.Date(c("2019-01-31", "2019-01-31"))) get_event_dates(events, pg, win_start = -3, win_end = + 3) ## End(Not run) ## End(Not run)
Produce a table of event returns from CRSP.
See vignette("wrds-conn", package = "farr")
for more on using this function.
get_event_rets( data, conn, permno = "permno", event_date = "event_date", win_start = 0, win_end = 0, end_event_date = NULL )
get_event_rets( data, conn, permno = "permno", event_date = "event_date", win_start = 0, win_end = 0, end_event_date = NULL )
data |
data frame containing data on events |
conn |
connection to a PostgreSQL database |
permno |
string representing column containing PERMNOs for events |
event_date |
string representing column containing dates for events |
win_start |
integer representing start of trading window (e.g., -1) |
win_end |
integer representing start of trading window (e.g., 1) |
end_event_date |
string representing column containing ending dates for events |
tbl_df
## Not run: ## Not run: library(DBI) library(dplyr, warn.conflicts = FALSE) pg <- dbConnect(RPostgres::Postgres()) events <- tibble(permno = c(14593L, 10107L), event_date = as.Date(c("2019-01-31", "2019-01-31"))) get_event_rets(events, pg, win_start = -3, win_end = +3) %>% select(permno, event_date, date, ret) ## End(Not run) ## End(Not run)
## Not run: ## Not run: library(DBI) library(dplyr, warn.conflicts = FALSE) pg <- dbConnect(RPostgres::Postgres()) events <- tibble(permno = c(14593L, 10107L), event_date = as.Date(c("2019-01-31", "2019-01-31"))) get_event_rets(events, pg, win_start = -3, win_end = +3) %>% select(permno, event_date, date, ret) ## End(Not run) ## End(Not run)
Fetch Fama-French industry grouping from Ken French's website.
get_ff_ind(ind)
get_ff_ind(ind)
ind |
Fama-French industry grouping (e.g., 11, 48) |
tbl_df
## Not run: get_ff_ind(5) ## End(Not run)
## Not run: get_ff_ind(5) ## End(Not run)
Function to generate simulated panel data as described in Gow, Ormazabal and Taylor (2010).
get_got_data(N = 400, T = 20, Xvol, Evol, rho_X, rho_E)
get_got_data(N = 400, T = 20, Xvol, Evol, rho_X, rho_E)
N |
Number of firms |
T |
Number of years |
Xvol |
Cross-sectional correlation of X |
Evol |
Cross-sectional correlation of errors |
rho_X |
Autocorrelation coefficient for firm-effect portion of X |
rho_E |
Autocorrelation coefficient for firm-effect portion of epsilon |
tibble
https://www.jstor.org/stable/20744139
set.seed(2021) test <- get_got_data(N = 500, T = 10, Xvol = 0.75, Evol = 0.75, rho_X = 0.5, rho_E = 0.5)
set.seed(2021) test <- get_got_data(N = 500, T = 10, Xvol = 0.75, Evol = 0.75, rho_X = 0.5, rho_E = 0.5)
Periods defined by precedent-setting legal cases adopting or rejecting the Inevitable Disclosure Doctrine (IDD) by state.
get_idd_periods(min_date, max_date)
get_idd_periods(min_date, max_date)
min_date |
First date of sample period |
max_date |
Last date of sample period |
Three kinds of period by state:
Pre-adoption
Post-adoption
Post-rejection
tibble with four columns: state, period_type, start_date, end_date
idd_periods <- get_idd_periods(min_date = "1994-01-01", max_date = "2010-12-31") idd_periods
idd_periods <- get_idd_periods(min_date = "1994-01-01", max_date = "2010-12-31") idd_periods
Create a table of with cut-offs for size portfolios
get_me_breakpoints()
get_me_breakpoints()
tbl_df
library(dplyr, warn.conflicts = FALSE) get_me_breakpoints() %>% filter(month == '2022-04-01')
library(dplyr, warn.conflicts = FALSE) get_me_breakpoints() %>% filter(month == '2022-04-01')
Create a table of monthly returns for size portfolios
get_size_rets_monthly()
get_size_rets_monthly()
tbl_df
A function returning simulated data on test_scores.
get_test_scores( effect_size = 15, n_students = 1000L, n_grades = 4L, include_unobservables = FALSE, random_assignment = FALSE )
get_test_scores( effect_size = 15, n_students = 1000L, n_grades = 4L, include_unobservables = FALSE, random_assignment = FALSE )
effect_size |
Effect of attending camp on subsequent test scores |
n_students |
Number of students in simulated data set |
n_grades |
Number of grades in simulated data set |
include_unobservables |
Include talent in returned data (TRUE or FALSE) |
random_assignment |
Is assignment to treatment completely random? (TRUE or FALSE) |
tbl_df
set.seed(2021) library(dplyr, warn.conflicts = FALSE) get_test_scores() %>% head()
set.seed(2021) library(dplyr, warn.conflicts = FALSE) get_test_scores() %>% head()
Produce a table mapping dates on CRSP to "trading days". Returned table has two columns: date, a trading date on CRSP; td, a sequence of integers ordered by date.
get_trading_dates(conn)
get_trading_dates(conn)
conn |
connection to a PostgreSQL database |
tbl_df
## Not run: library(DBI) library(dplyr, warn.conflicts = FALSE) pg <- dbConnect(RPostgres::Postgres()) get_trading_dates(pg) %>% filter(between(date, as.Date("2022-03-18"), as.Date("2022-03-31"))) ## End(Not run)
## Not run: library(DBI) library(dplyr, warn.conflicts = FALSE) pg <- dbConnect(RPostgres::Postgres()) get_trading_dates(pg) %>% filter(between(date, as.Date("2022-03-18"), as.Date("2022-03-31"))) ## End(Not run)
Link table from GVKEYs to CIKs
gvkey_ciks
gvkey_ciks
A tibble with 78,339 rows and 5 variables:
GVKEY (Compustat firm identifier)
Issue ID
CIK (SEC firm identifier)
First link date
Last link date
Dates of precedent-setting legal cases adopting or reject the Inevitable Disclosure Doctrine (IDD) by state.
idd_dates
idd_dates
A tibble with 24 rows and 3 variables:
Two-letter state abbreviation
Date of precedent-setting legal case
Either "Adopt" or "Reject"
doi:10.1016/j.jfineco.2018.02.008
Data on public float of listed companies from Iliev (2010).
iliev_2010
iliev_2010
A tibble with 7,214 and 9 variables:
Compustat firm identifier (GVKEY)
Fiscal year
Date of end of fiscal year
Date for public float value
Year for public float value
Public float in $ million
Indicator for filing of a management report
Indicator for accelerator filer
SEC firm identifier (CIK)
doi:10.1111/j.1540-6261.2010.01564.x
GVKEYs used in Li, Lin and Zhang (2018)
llz_2018
llz_2018
A tibble with 5,830 rows and 1 variable:
GVKEY
Function to read data from a parquet file data_dir/schema/table_name.parquet into a table in the DuckDB database at conn.
load_parquet(conn, table, schema = "", data_dir = Sys.getenv("DATA_DIR"))
load_parquet(conn, table, schema = "", data_dir = Sys.getenv("DATA_DIR"))
conn |
DuckDB connection |
table |
Name of table to be loaded |
schema |
Database schema for table |
data_dir |
Directory for data repository |
Remote data frame in conn
Data on firms suffering natural disasters based on the sample in Michels (2017).
michels_2017
michels_2017
A tibble with 423 rows and 12 variables:
CUSIP supplied by Michels (2017)
Date of relevant natural disaster supplied by Michels (2017)
Matched CIK (SEC firm identifier)
Matched PERMNO (CRSP security identifier)
Matched GVKEY (Compustat firm identifier)
Date of next filing of type 10-Q, 10-K, 10QSB, 10-K405 after event
List of relevant form types filed on date_filed
Next fiscal period-end after event date
Fiscal quarter of next period-end after event date
Last fiscal period-end before event date
Fiscal quarter of last period-end before event date
Indicator for event being recognized (next_period_end before date_filed)
A function returning NDCG-at-k metric.
ndcg(scores, response, k = 0.01)
ndcg(scores, response, k = 0.01)
scores |
Probability that response is true or 1. |
response |
Responses coded as logical or 0-1. |
k |
Percentage to classify as TRUE or 1. |
vector including sensitivity and precision
Function to get data from a table on the WRDS PostgreSQL server and save to local parquet file using DuckDB.
pg_to_parquet(table_name, schema, data_dir = Sys.getenv("DATA_DIR"))
pg_to_parquet(table_name, schema, data_dir = Sys.getenv("DATA_DIR"))
table_name |
Name of table on WRDS |
schema |
Database schema for table |
data_dir |
Directory for data repository |
Number of rows created
A function returning data for a ROC plot.
roc(scores, response)
roc(scores, response)
scores |
Probability that response is true or 1. |
response |
Responses coded as logical or 0-or-1. |
tbl_df
Function to create temporary training dataset using distribution implied by w.
rus(y_train, ir = 1)
rus(y_train, ir = 1)
y_train |
df on the target variable. |
ir |
Imbalance ratio. Specifies how many times the under-sampled majority instances are over minority instances. |
Following MATLAB, function samples observations of the minority class with replacement and observations of the majority class without replacement.
vector
RUSBoost for two-class problems
rusboost(formula, df, size, ir = 1, learn_rate = 1, rus = TRUE, control)
rusboost(formula, df, size, ir = 1, learn_rate = 1, rus = TRUE, control)
formula |
A formula specify predictors and target variable. Target variable should be a factor of 0 and 1. Predictors can be either numerical and categorical. |
df |
A df frame used for training the model, i.e. training set. |
size |
Ensemble size, i.e. number of weak learners in the ensemble model |
ir |
Imbalance ratio. Specifies how many times the under-sampled majority instances are over minority instances. |
learn_rate |
Default of 1. |
rus |
TRUE for random undersampling; FALSE for AdaBoost with full sample |
control |
Control object passed onto rpart function. |
rusboost object
A data set containing the tickers and company names for Russell 3000 at time SEC created the pilot sample. Data are created from sample supplied by FHK.
sho_r3000
sho_r3000
A tibble with 3000 rows × 2 variables.
Ticker
Company name
A data set containing the tickers, PERMNOs, GVKEYs, and treatment assignments for Russell 3000 sample used by SEC.
sho_r3000_gvkeys
sho_r3000_gvkeys
A tibble with 2,951 rows × 3 variables.
Ticker
PERMNO (CRSP security identifier)
GVKEY (Compustat firm identifier)
Indicator for stock being part of Reg SHO pilot program
https://iangow.github.io/far_book/natural-revisited.html#the-sho-pilot-sample
A data set containing the tickers, PERMNOs, and treatment assignments for Russell 3000 sample used by SEC.
sho_r3000_sample
sho_r3000_sample
A tibble with 2,954 rows × 3 variables.
Ticker
PERMNO (CRSP security identifier)
Indicator for stock being part of Reg SHO pilot program
https://iangow.github.io/far_book/natural-revisited.html#the-sho-pilot-sample
A data set containing the tickers and company names for pilot firms from Reg SHO pilot. Data are scraped from the SEC's own website.
sho_tickers
sho_tickers
A tibble with 986 rows × 2 variables.
Ticker
Company name
https://www.sec.gov/rule-release/34-50104
Data on firm headquarters based on SEC EDGAR filings. Dates related to SEC filing dates. Rather than provide dates for all filings, data are aggregated into groups of filings by state and CIK and dates are collapsed into windows over which all filings for a given CIK were associated with a given state. For example, CIK 0000037755 has filings with a CA headquarters from 1994-06-02 until 1996-03-25, then filings with an OH headquarters from 1996-05-30 until 1999-04-05, then filings with a CA headquarters from 1999-06-11 onwards. To ensure continuous coverage over the sample period, it is assumed that any change in state occurs the day after the last observed filing for the previous state.
state_hq
state_hq
A tibble with 53,133 rows and 4 variables:
SEC's Central Index Key (CIK)
Two-letter abbreviation of state
Date of first filing with CIK-state combination in a contiguous series of filings
Date of last filing with CIK-state combination in a contiguous series of filings
https://sraf.nd.edu/data/augmented-10-x-header-data/
system.time()
that works with assignmentPrint CPU (and other) times that expr
used, return value of expr
system_time(expr)
system_time(expr)
expr |
Valid R expression to be timed, evaluated and returned |
Result of evaluating expr
A simulated data set of test scores.
test_scores
test_scores
A tibble with 4,000 rows and 3 variables:
Student identifier
School grade at time of test
Test score
Truncate a vector at prob and 1 - prob. Extreme values are turned in NA values.
truncate(x, prob = 0.01, p_low = prob, p_high = 1 - prob)
truncate(x, prob = 0.01, p_low = prob, p_high = 1 - prob)
x |
A vector to be winsorized |
prob |
Level (two-sided) for winsorization (e.g., 0.01 gives 1% and 99%) |
p_low |
Optional lower level for winsorization (e.g., 0.01 gives 1%) |
p_high |
Optional upper level for winsorization (e.g., 0.99 gives 99%) |
vector
trunced <- truncate(1:100, prob = 0.05) min(trunced, na.rm = TRUE) max(trunced, na.rm = TRUE)
trunced <- truncate(1:100, prob = 0.05) min(trunced, na.rm = TRUE) max(trunced, na.rm = TRUE)
Data to be combined with data in compsegd.seg_customer to create an indicator for non-disclosure of customer names.
undisclosed_names
undisclosed_names
A tibble with 460 rows and 2 variables:
Matches field in compsegd.seg_customer (WRDS)
Indicator that name is not disclosed
Winsorize a vector at prob and 1 - prob.
winsorize(x, prob = 0.01, p_low = prob, p_high = 1 - prob)
winsorize(x, prob = 0.01, p_low = prob, p_high = 1 - prob)
x |
A vector to be winsorized |
prob |
Level (two-sided) for winsorization (e.g., 0.01 gives 1% and 99%) |
p_low |
Optional lower level for winsorization (e.g., 0.01 gives 1%) |
p_high |
Optional upper level for winsorization (e.g., 0.99 gives 99%) |
vector
winsorized <- winsorize(1:100, prob = 0.05) min(winsorized, na.rm = TRUE) max(winsorized, na.rm = TRUE)
winsorized <- winsorize(1:100, prob = 0.05) min(winsorized, na.rm = TRUE) max(winsorized, na.rm = TRUE)
A data set containing the event dates used in Zhang (2007). Data obtained from Panel of Table of Zhang (2007). If an event spans multiple dates, then a row is included for each date.
zhang_2007_events
zhang_2007_events
A tibble with 30 rows × 3 variables.
Identifier for the event
Date of event
Description of the event
doi:10.1016/j.jacceco.2007.02.002
A data set containing the event windows used in Zhang (2007). Data obtained from Panel of Table of Zhang (2007).
zhang_2007_windows
zhang_2007_windows
A tibble with 17 rows × 3 variables.
Identifier for the event
First date of event window
Last date of event window
doi:10.1016/j.jacceco.2007.02.002