vaers_flatfile_build.py run_all() ... validate_dirs_and_files() ... 138 drops in input to process First (oldest) input: ../Download/ALL_VAERS_DROPS/2020-12-18_VAERS_CSV.zip Last (newest) input: ../Download/ALL_VAERS_DROPS/2023-08-11_AllVAERSDataCSVS.zip Already processed files do appear in vaers_changes and the latest will be built upon: vaers_changes/2020-12-18_VAERS_CHANGES.csv vaers_changes/2020-12-25_VAERS_CHANGES.csv vaers_changes/2021-01-08_VAERS_CHANGES.csv vaers_changes/2021-01-15_VAERS_CHANGES.csv vaers_changes/2021-01-22_VAERS_CHANGES.csv ... 137 total = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Next date 2023-08-11 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = unzip ../Download/ALL_VAERS_DROPS/2023-08-11_AllVAERSDataCSVS.zip Creating in ./vaers_working/ date marker file 2023-08-11 Consolidation Concatenating files, *VAERSDATA.csv, *VAERSVAX.csv, *VAERSSYMPTOMS.csv open vaers_working\2020VAERSDATA.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSDATA.csv ... Highest VAERS_ID 2669915 open vaers_working\2022VAERSDATA.csv ... Highest VAERS_ID 2659681 open vaers_working\2023VAERSDATA.csv ... Highest VAERS_ID 2669753 open vaers_working\NonDomesticVAERSDATA.csv ... Highest VAERS_ID 2669669 open vaers_working\2020VAERSVAX.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSVAX.csv ... Highest VAERS_ID 2669915 open vaers_working\2022VAERSVAX.csv ... Highest VAERS_ID 2659681 open vaers_working\2023VAERSVAX.csv ... Highest VAERS_ID 2669753 open vaers_working\NonDomesticVAERSVAX.csv ... Highest VAERS_ID 2669669 4742 exact duplicates dropped in concatenated files, now 2013767 rows open vaers_working\2020VAERSSYMPTOMS.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSSYMPTOMS.csv ... Highest VAERS_ID 2669915 open vaers_working\2022VAERSSYMPTOMS.csv ... Highest VAERS_ID 2659681 open vaers_working\2023VAERSSYMPTOMS.csv ... Highest VAERS_ID 2669753 open vaers_working\NonDomesticVAERSSYMPTOMS.csv ... Highest VAERS_ID 2669669 133358 records removed prior to the first covid report (covid_earliest_vaers_id 896636) 1710314 reports to work with (unique VAERS_IDs) week_vids_present [896750, 896754, 896765, 896766, 896897, 896637, 896638, 896639 ... 2669543, 2669544, 2669545, 2669546, 2669547, 2669548, 2669637, 2669669] hi_all_never_published 2666062 lo_ever 896637 hi_this_week 2669915 list_range_all_ever [896637, 896638, 896639, 896640, 896641, 896642, 896643, 896644 ... 2669908, 2669909, 2669910, 2669911, 2669912, 2669913, 2669914, 2669915] list_range_week_only [2666063, 2666064, 2666065, 2666066, 2666067, 2666068, 2666069, 2666070 ... 2669908, 2669909, 2669910, 2669911, 2669912, 2669913, 2669914, 2669915] gaps_filled [2659849, 2659850, 2661897, 2665997, 2665494, 2665495, 2661915, 2666016 ... 2663380, 2665436, 2665439, 2659811, 2661862, 2665449, 2659836, 2659839] week_gaps_new [2666130, 2666131, 2666151, 2666160, 2666162, 2666188, 2666189, 2666213 ... 2669906, 2669907, 2669908, 2669909, 2669910, 2669911, 2669912, 2669914] remedied_past_all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2665771, 2665831, 2665861, 2665862, 2665865, 2665869, 2665898, 2666035] all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2669906, 2669907, 2669908, 2669909, 2669910, 2669911, 2669912, 2669914] VAERS_IDs 896637 to 2669915 expected 1773279 all_ever 1742753 gaps 30558 125223 dropped in df_data due to no covid VAX_TYPE involved in the report 157998 dropped in df_vax due to no covid VAX_TYPE involved in the report 154895 dropped in df_syms due to no covid VAX_TYPE involved in the report 1585091 covid reports to work with Repeat sentence removal in SYMPTOM_TEXT, showing each next larger if any (takes time) 2020 SYMPTOM_TEXT field repeat sentences deduped in 286 reports, max difference 6192 bytes in VAERS_ID 1645697 Shortening some field values in VAX_NAME, VAX_MANU Merging DATA into VAX 1668497 rows in df_data_vax Aggregating symptoms into symptom_entries string, new column Combining symptoms column items. Grouping by VAERS_ID ... Appending each symptom in new column called symptom_entries Cleaning multiple delimiters due to empty columns Merging symptom_entries into df_data_vax 1668497 rows in df_data_vax_syms_consolidated Saving result into one file: vaers_consolidated/2023-08-11_VAERS_CONSOLIDATED.csv Consolidation of 2023-08-11 done Flattening Aggregate/flatten VAX items. Grouping by VAERS_ID 1585091 rows in df_vax_flat Merging DATA into VAX flattened 1585091 rows in df_data_vax_flat Merging symptom_entries into df_data_vax_syms_flat Saving result into one file: vaers_flattened/2023-08-11_VAERS_FLATTENED.csv 1585091 rows in vaers_flattened/2023-08-11_VAERS_FLATTENED.csv Flattening of 2023-08-11 done open vaers_flattened/2023-08-04_VAERS_FLATTENED.csv ... Highest VAERS_ID 2666063 Using flat 2023-08-11 already in memory, 1585091 rows Previous changes file for changes, cell_edits and status columns open vaers_changes/2023-08-04_VAERS_CHANGES.csv ... Highest VAERS_ID 2666063 Comparing 2023-08-04 v. 2023-08-11 1585091 this drop total covid 1582309 previous total covid 1582236 identical set aside 2855 this drop to work with 73 previous to work with 2782 difference 2810 new in 2023-08-11 2 delayed this week 30 deleted this week kept 0 restored this week Column value changes DATEDIED 1 cell of trivial non-letter differences ignored DATEDIED 1822736 [] <> 10/26/2021 DIED 1822736 [] <> Y HISTORY 1702053 Chinese <> OTHER_MEDS 1739406 Pfizer, Inc. 027034 <> [] SYMPTOM_TEXT 1685095 DON PHH <> Employee clinic SYMPTOM_TEXT 1710669 Pfizer number;86260 for BNT162B2 <> the company number 86260 SYMPTOM_TEXT 1760129 COVAES <> portal SYMPTOM_TEXT 1675522 Pfizer <> company SYMPTOM_TEXT 1816804 COVAES <> SYMPTOM_TEXT 1629741 COVAES <> portal SYMPTOM_TEXT 1749141 Pfizer city <> location SYMPTOM_TEXT 1835000 ICN <> SYMPTOM_TEXT 2428471 to Europe <> abroad SYMPTOM_TEXT 1664707 Pharmaceuticals and Medical Devices Agency PMDA <> regulatory authority RA SYMPTOM_TEXT 1784389 Pfizer <> SYMPTOM_TEXT 1688160 GENERAL INVESTIGATION TARGETING THE VACCINES HEALTH CARE PROVIDERS HCPS WHO ARE VACCINATED IN EARLY POST-APPROVAL PHASE FOLLOW-UP STUDY C4591006 <> SYMPTOM_TEXT 1772647 entitled Pfizer Conmigo Xeljanz <> SYMPTOM_TEXT 1764523 COVAES <> portal SYMPTOM_TEXT 1702150 Pfizer <> an SYMPTOM_TEXT 1784396 Pharmaceuticals and Agency <> Authority SYMPTOM_TEXT 1819711 COVAES <> SYMPTOM_TEXT 1807017 PMDA <> RA SYMPTOM_TEXT 1820320 children's <> health SYMPTOM_TEXT 2522408 005570 <> SYMPTOM_TEXT 1747070 fee PMDA <> the regulatory authority VAX_DOSE_SERIES 1789664 2|UNK <> 2 VAX_DOSE_SERIES 1527391 UNK|2 <> 2 VAX_LOT 1 cell of trivial non-letter differences ignored VAX_LOT 1789664 Unknown|Unknown <> Unknown VAX_MANU 1789664 Pfizer-BionT|Unknown <> Pfizer-BionT VAX_MANU 1527391 Unknown|Pfizer-BionT <> Pfizer-BionT VAX_NAME 1789664 C19 Pfizer-BionT|Not Specified NO BRAND NAME <> C19 Pfizer-BionT VAX_NAME 1527391 Not Specified NO BRAND NAME|C19 <> C19 VAX_ROUTE 2 cells of trivial non-letter differences ignored VAX_SITE 2 cells of trivial non-letter differences ignored VAX_TYPE 1789664 COVID19|UNK <> COVID19 VAX_TYPE 1527391 UNK|COVID19 <> COVID19 V_ADMINBY 2661822 MIL <> UNK symptom_entries 1475552 _|_Skin lesion_|_ <> symptom_entries 1483524 _|_Circumstance or information capable of leading to medication error_|_ <> symptom_entries 1476656 _|_Maternal exposure during pregnancy_|_ <> symptom_entries 1483622 _|_Drug ineffective_|_ <> symptom_entries 1483079 _|_Breast discolouration_|_Pain in extremity_|_ <> symptom_entries 1475820 _|_SARS-CoV-2 test_|_ <> symptom_entries 1476732 _|_Dyspnoea_|_ <> symptom_entries 1476943 _|_Blood pressure measurement_|_Heart rate_|_Oxygen saturation_|_Sensation of blood flow_|_ <> symptom_entries 1476692 _|_Overdose_|_ <> symptom_entries 1480479 _|_Off label use_|_ <> symptom_entries 1472374 _|_SARS-CoV-2 test positive_|_ <> symptom_entries 1476974 _|_Blood pressure measurement_|_Body temperature_|_Dyspnoea_|_Oxygen saturation_|_ <> symptom_entries 1476868 _|_Blood pressure measurement_|_Body temperature_|_Body temperature increased_|_Heart rate_|_Oxygen saturation_|_ <> symptom_entries 1482866 _|_Product dose omission issue_|_ <> symptom_entries 1481768 _|_General symptom_|_Immune system disorder_|_ <> 14 columns altered 31219 modified reports on 2023-08-11 Writing ... vaers_changes/2023-08-11_VAERS_CHANGES.csv 1 report with the most (18) records/lots/doses: 1900339 1 comparison done Doing stats open stats.csv ... ok column changes: {'SYMPTOM_TEXT': 22, 'symptom_entries': 15, 'VAX_TYPE': 2, 'VAX_DOSE_SERIES': 2, 'VAX_MANU': 2, 'VAX_NAME': 2, 'DATEDIED': 1, 'V_ADMINBY': 1, 'VAX_LOT': 1, 'DIED': 1, 'HISTORY': 1, 'OTHER_MEDS': 1, 'VAX_SITE': 0, 'RPT_DATE': 0, 'CAGE_MO': 0, 'FORM_VERS': 0, 'AGE_YRS': 0, 'TODAYS_DATE': 0, 'OFC_VISIT': 0, 'NUMDAYS': 0, 'ER_VISIT': 0, 'STATE': 0, 'LAB_DATA': 0, 'VAX_DATE': 0, 'RECVDATE': 0, 'RECOVD': 0, 'ALLERGIES': 0, 'ER_ED_VISIT': 0, 'ONSET_DATE': 0, 'L_THREAT': 0, 'DISABLE': 0, 'SPLTTYPE': 0, 'CUR_ILL': 0, 'BIRTH_DEFECT': 0, 'V_FUNDBY': 0, 'SEX': 0, 'HOSPITAL': 0, 'CAGE_YR': 0, 'X_STAY': 0, 'HOSPDAYS': 0, 'PRIOR_VAX': 0, 'VAX_ROUTE': 0} This week 2 delayed/late/gapfill 30 deleted 0 restored 6 cell edits trivial not printed 51 cell edits significant 1 cells emptied entirely 22 writeups changed All time 542235 delayed/late/gapfill 31221 deleted 15 restored 29372236 cell edits trivial not printed 30675 cell edits significant 1475721 cells emptied entirely 7432 writeups changed 30558 never published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2669906, 2669907, 2669908, 2669909, 2669910, 2669911, 2669912, 2669914] 16 reports cleared of duplicate sentences within them This week 0 hr 33.5 min Overall 0 hr 33.5 min No more to do, last set 2023-08-11 >= 2023-08-11 done Done with vaers_flatfile_build.py at line 2353, clock time 2023-08-18 16:41:54.290928 - - - - - - - - - - - - - - - - - - - - - - - -