vaers_flatfile_build.py run_all() ... validate_dirs_and_files() ... 137 drops in input to process First (oldest) input: ../Download/ALL_VAERS_DROPS/2020-12-18_VAERS_CSV.zip Last (newest) input: ../Download/ALL_VAERS_DROPS/2023-08-04_AllVAERSDataCSVS.zip Already processed files do appear in vaers_changes and the latest will be built upon: vaers_changes/2020-12-18_VAERS_CHANGES.csv vaers_changes/2020-12-25_VAERS_CHANGES.csv vaers_changes/2021-01-08_VAERS_CHANGES.csv vaers_changes/2021-01-15_VAERS_CHANGES.csv vaers_changes/2021-01-22_VAERS_CHANGES.csv ... 137 total = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Next date 2023-08-04 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = unzip ../Download/ALL_VAERS_DROPS/2023-08-04_AllVAERSDataCSVS.zip Creating in ./vaers_working/ date marker file 2023-08-04 Consolidation Concatenating files, *VAERSDATA.csv, *VAERSVAX.csv, *VAERSSYMPTOMS.csv open vaers_working\2020VAERSDATA.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSDATA.csv ... Highest VAERS_ID 2666063 open vaers_working\2022VAERSDATA.csv ... Highest VAERS_ID 2659681 open vaers_working\2023VAERSDATA.csv ... Highest VAERS_ID 2665923 open vaers_working\NonDomesticVAERSDATA.csv ... Highest VAERS_ID 2665889 open vaers_working\2020VAERSVAX.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSVAX.csv ... Highest VAERS_ID 2666063 open vaers_working\2022VAERSVAX.csv ... Highest VAERS_ID 2659681 open vaers_working\2023VAERSVAX.csv ... Highest VAERS_ID 2665923 open vaers_working\NonDomesticVAERSVAX.csv ... Highest VAERS_ID 2665889 4734 exact duplicates dropped in concatenated files, now 2009721 rows open vaers_working\2020VAERSSYMPTOMS.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSSYMPTOMS.csv ... Highest VAERS_ID 2666063 open vaers_working\2022VAERSSYMPTOMS.csv ... Highest VAERS_ID 2659681 open vaers_working\2023VAERSSYMPTOMS.csv ... Highest VAERS_ID 2665923 open vaers_working\NonDomesticVAERSSYMPTOMS.csv ... Highest VAERS_ID 2665889 133358 records removed prior to the first covid report (covid_earliest_vaers_id 896636) 1706758 reports to work with (unique VAERS_IDs) week_vids_present [896750, 896754, 896765, 896766, 896897, 896637, 896638, 896639 ... 2665866, 2665867, 2665868, 2665884, 2665885, 2665886, 2665887, 2665889] hi_all_never_published 2662483 lo_ever 896637 hi_this_week 2666063 list_range_all_ever [896637, 896638, 896639, 896640, 896641, 896642, 896643, 896644 ... 2666056, 2666057, 2666058, 2666059, 2666060, 2666061, 2666062, 2666063] list_range_week_only [2662484, 2662485, 2662486, 2662487, 2662488, 2662489, 2662490, 2662491 ... 2666056, 2666057, 2666058, 2666059, 2666060, 2666061, 2666062, 2666063] gaps_filled [2661888, 2662400, 2662401, 2662402, 2662403, 2662404, 2662405, 2659847 ... 2662383, 2662385, 2662386, 2662387, 2662388, 2662389, 2662390, 2661887] week_gaps_new [2662492, 2662519, 2662545, 2662617, 2662630, 2662667, 2662671, 2662674 ... 2666055, 2666056, 2666057, 2666058, 2666059, 2666060, 2666061, 2666062] remedied_past_all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2662173, 2662192, 2662259, 2662260, 2662261, 2662262, 2662267, 2662384] all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2666055, 2666056, 2666057, 2666058, 2666059, 2666060, 2666061, 2666062] VAERS_IDs 896637 to 2666063 expected 1769427 all_ever 1739161 gaps 30298 124449 dropped in df_data due to no covid VAX_TYPE involved in the report 156910 dropped in df_vax due to no covid VAX_TYPE involved in the report 153932 dropped in df_syms due to no covid VAX_TYPE involved in the report 1582309 covid reports to work with Repeat sentence removal in SYMPTOM_TEXT, showing each next larger if any (takes time) 806 SYMPTOM_TEXT field repeat sentences deduped in 113 reports, max difference 6192 bytes in VAERS_ID 1645697 Shortening some field values in VAX_NAME, VAX_MANU Merging DATA into VAX 1665539 rows in df_data_vax Aggregating symptoms into symptom_entries string, new column Combining symptoms column items. Grouping by VAERS_ID ... Appending each symptom in new column called symptom_entries Cleaning multiple delimiters due to empty columns Merging symptom_entries into df_data_vax 1665539 rows in df_data_vax_syms_consolidated Saving result into one file: vaers_consolidated/2023-08-04_VAERS_CONSOLIDATED.csv Consolidation of 2023-08-04 done Flattening Aggregate/flatten VAX items. Grouping by VAERS_ID 1582309 rows in df_vax_flat Merging DATA into VAX flattened 1582309 rows in df_data_vax_flat Merging symptom_entries into df_data_vax_syms_flat Saving result into one file: vaers_flattened/2023-08-04_VAERS_FLATTENED.csv 1582309 rows in vaers_flattened/2023-08-04_VAERS_FLATTENED.csv Flattening of 2023-08-04 done open vaers_flattened/2023-07-28_VAERS_FLATTENED.csv ... Highest VAERS_ID 2662484 Using flat 2023-08-04 already in memory, 1582309 rows Previous changes file for changes, cell_edits and status columns open vaers_changes/CSV_2023-07-28_VAERS_CHANGES.zip NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead vaers_flatfile_build.py run_all() ... vaers_flatfile_build.py run_all() ... validate_dirs_and_files() ... No csv or zip files in dir_input ../Download/ALL_VAERS_DROPS, no point in continuing Done with vaers_flatfile_build.py at line 378, clock time 2023-08-11 10:44:26.649883 - - - - - - - - - - - - - - - - - - - - - - - - vaers_flatfile_build.py run_all() ... validate_dirs_and_files() ... 137 drops in input to process First (oldest) input: ../Download/ALL_VAERS_DROPS/2020-12-18_VAERS_CSV.zip Last (newest) input: ../Download/ALL_VAERS_DROPS/2023-08-04_AllVAERSDataCSVS.zip Already processed files do appear in vaers_changes and the latest will be built upon: vaers_changes/2020-12-18_VAERS_CHANGES.csv vaers_changes/2020-12-25_VAERS_CHANGES.csv vaers_changes/2021-01-08_VAERS_CHANGES.csv vaers_changes/2021-01-15_VAERS_CHANGES.csv vaers_changes/2021-01-22_VAERS_CHANGES.csv ... 136 total = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Next date 2023-08-04 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 2023-08-04 already consolidated, no need to copy input files to dir_working Creating in ./vaers_working/ date marker file 2023-08-04 Consolidation Consolidation already done: vaers_consolidated/2023-08-04_VAERS_CONSOLIDATED.csv Flattening Flattening already done: vaers_flattened/2023-08-04_VAERS_FLATTENED.csv open vaers_flattened/2023-07-28_VAERS_FLATTENED.csv ... Highest VAERS_ID 2662484 open vaers_flattened/2023-08-04_VAERS_FLATTENED.csv ... Highest VAERS_ID 2666063 Previous changes file for changes, cell_edits and status columns open vaers_changes/2023-07-28_VAERS_CHANGES.csv ... Highest VAERS_ID 2662484 Comparing 2023-07-28 v. 2023-08-04 1582309 this drop total covid 1579415 previous total covid 1579357 identical set aside 2952 this drop to work with 58 previous to work with 2894 difference 2912 new in 2023-08-04 3 delayed this week 22 deleted this week kept 1 restored this week 1688496 Column value changes DATEDIED 2 cells of trivial non-letter differences ignored 1 duplicate dropped in df_three_columns 2 DIED [] <> Y [1671758, 1826219] SYMPTOM_TEXT 1723500 Authorization Holder <> MAH SYMPTOM_TEXT 1827101 the patient lived in REDACTED <> SYMPTOM_TEXT 1810007 PHIFDA,PH-PHFDA-300111346 <> RA,PH-PHFDA-300111346 SYMPTOM_TEXT 1824133 of an unspecified race and ethnic origin <> SYMPTOM_TEXT 1477104 university in unlisted country <> facility SYMPTOM_TEXT 1447279 Patient reported for herself COVAES Pfizer <> regulatory authority SYMPTOM_TEXT 1506608 Pfizer <> a SYMPTOM_TEXT 1763497 GENERAL INVESTIGATION TARGETING THE VACCINES HEALTH CARE PROVIDERS HCPS WHO ARE VACCINATED IN EARLY POST-APPROVAL PHASE FOLLOW-UP STUDY C4591006 <> SYMPTOM_TEXT 1430774 COVAES <> the regulatory authority SYMPTOM_TEXT 1409231 COVAES <> SYMPTOM_TEXT 1841103 from the Kidney Int 2021 100 458-459 10.1016/j.kint.2021.05.006 entitled Letter regarding Minimal change disease relapse following SARS-CoV-2 mRNA vaccine <> SYMPTOM_TEXT 1721354 CEP ID 5570 in saudi arabia moh <> 1 duplicate dropped in df_three_columns symptom_entries 1477137 _|_Anaphylactic reaction_|_ <> symptom_entries 1465481 _|_Abdominal pain_|_Allergic reaction to excipient_|_Blood pressure measurement_|_Body temperature_|_Heart rate_|_Loss of consciousness_|_Oxygen saturation_|_ <> symptom_entries 1464907 _|_Maternal exposure during pregnancy_|_ <> symptom_entries 1477186 _|_Angina pectoris_|_ <> symptom_entries 1481950 _|_Investigation_|_Non-cardiac chest pain_|_Overdose_|_ <> symptom_entries 1481810 _|_Inappropriate schedule of product administration_|_ <> symptom_entries 2658613 _|_Axillary pain_|_Pain_|_ <> symptom_entries 1467412 _|_Blood pressure decreased_|_Blood pressure measurement_|_ <> symptom_entries 1467444 _|_Skin discolouration_|_ <> symptom_entries 1452339 _|_SARS-CoV-2 test_|_ <> symptom_entries 1470933 _|_Drug ineffective_|_ <> symptom_entries 1199912 _|_Vaccination site pain_|_ <> symptom_entries 1467587 _|_Incorrect route of product administration_|_ <> symptom_entries 1465483 _|_Erythema_|_Pain in extremity_|_ <> symptom_entries 1428557 _|_SARS-CoV-2 test_|_ <> symptom_entries 1472347 _|_SARS-CoV-2 test_|_ <> 2 symptom_entries _|_SARS-CoV-2 test_|_ <> [1297763, 1304332] symptom_entries 1424763 _|_Ophthalmological examination_|_ <> symptom_entries 1311812 _|_SARS-CoV-2 test_|_ <> 4 columns altered 31190 modified reports on 2023-08-04 Writing ... vaers_changes/2023-08-04_VAERS_CHANGES.csv 1 report with the most (18) records/lots/doses: 1900339 1 comparison done Doing stats open stats.csv ... ok column changes: {'symptom_entries': 20, 'SYMPTOM_TEXT': 14, 'DIED': 2, 'BIRTH_DEFECT': 0, 'X_STAY': 0, 'VAX_NAME': 0, 'ONSET_DATE': 0, 'VAX_ROUTE': 0, 'RECVDATE': 0, 'VAX_MANU': 0, 'NUMDAYS': 0, 'AGE_YRS': 0, 'V_ADMINBY': 0, 'CUR_ILL': 0, 'RPT_DATE': 0, 'VAX_DOSE_SERIES': 0, 'OFC_VISIT': 0, 'CAGE_MO': 0, 'HOSPDAYS': 0, 'HOSPITAL': 0, 'VAX_DATE': 0, 'DISABLE': 0, 'FORM_VERS': 0, 'OTHER_MEDS': 0, 'ER_ED_VISIT': 0, 'ER_VISIT': 0, 'ALLERGIES': 0, 'RECOVD': 0, 'VAX_LOT': 0, 'SEX': 0, 'PRIOR_VAX': 0, 'STATE': 0, 'L_THREAT': 0, 'LAB_DATA': 0, 'V_FUNDBY': 0, 'VAX_TYPE': 0, 'HISTORY': 0, 'CAGE_YR': 0, 'SPLTTYPE': 0, 'DATEDIED': 0, 'VAX_SITE': 0, 'TODAYS_DATE': 0} This week 3 delayed/late/gapfill 22 deleted 1 restored 2 cell edits trivial not printed 36 cell edits significant 0 cells emptied entirely 14 writeups changed All time 542233 delayed/late/gapfill 31191 deleted 15 restored 29372230 cell edits trivial not printed 30624 cell edits significant 1475720 cells emptied entirely 7410 writeups changed 30298 never published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2666055, 2666056, 2666057, 2666058, 2666059, 2666060, 2666061, 2666062] 16 reports cleared of duplicate sentences within them This week 0 hr 9.2 min Overall 0 hr 9.2 min No more to do, last set 2023-08-04 >= 2023-08-04 done Done with vaers_flatfile_build.py at line 2353, clock time 2023-08-11 11:01:32.122742 - - - - - - - - - - - - - - - - - - - - - - - -