vaers_flatfile_build.py run_all() ... validate_dirs_and_files() ... 139 drops in input to process First (oldest) input: ../Download/ALL_VAERS_DROPS/2020-12-18_VAERS_CSV.zip Last (newest) input: ../Download/ALL_VAERS_DROPS/2023-08-18_AllVAERSDataCSVS.zip Already processed files do appear in vaers_changes and the latest will be built upon: vaers_changes/2020-12-18_VAERS_CHANGES.csv vaers_changes/2020-12-25_VAERS_CHANGES.csv vaers_changes/2021-01-08_VAERS_CHANGES.csv vaers_changes/2021-01-15_VAERS_CHANGES.csv vaers_changes/2021-01-22_VAERS_CHANGES.csv ... 138 total = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Next date 2023-08-18 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = unzip ../Download/ALL_VAERS_DROPS/2023-08-18_AllVAERSDataCSVS.zip Creating in ./vaers_working/ date marker file 2023-08-18 Consolidation Concatenating files, *VAERSDATA.csv, *VAERSVAX.csv, *VAERSSYMPTOMS.csv open vaers_working\2020VAERSDATA.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSDATA.csv ... Highest VAERS_ID 2673233 open vaers_working\2022VAERSDATA.csv ... Highest VAERS_ID 2659681 open vaers_working\2023VAERSDATA.csv ... Highest VAERS_ID 2673108 open vaers_working\NonDomesticVAERSDATA.csv ... Highest VAERS_ID 2673060 open vaers_working\2020VAERSVAX.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSVAX.csv ... Highest VAERS_ID 2673233 open vaers_working\2022VAERSVAX.csv ... Highest VAERS_ID 2659681 open vaers_working\2023VAERSVAX.csv ... Highest VAERS_ID 2673108 open vaers_working\NonDomesticVAERSVAX.csv ... Highest VAERS_ID 2673060 4748 exact duplicates dropped in concatenated files, now 2017517 rows open vaers_working\2020VAERSSYMPTOMS.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSSYMPTOMS.csv ... Highest VAERS_ID 2673233 open vaers_working\2022VAERSSYMPTOMS.csv ... Highest VAERS_ID 2659681 open vaers_working\2023VAERSSYMPTOMS.csv ... Highest VAERS_ID 2673108 open vaers_working\NonDomesticVAERSSYMPTOMS.csv ... Highest VAERS_ID 2673060 133358 records removed prior to the first covid report (covid_earliest_vaers_id 896636) 1713499 reports to work with (unique VAERS_IDs) week_vids_present [896750, 896754, 896765, 896766, 896897, 896637, 896638, 896639 ... 2672744, 2672745, 2672746, 2672747, 2673046, 2673049, 2673059, 2673060] hi_all_never_published 2669914 lo_ever 896637 hi_this_week 2673233 list_range_all_ever [896637, 896638, 896639, 896640, 896641, 896642, 896643, 896644 ... 2673226, 2673227, 2673228, 2673229, 2673230, 2673231, 2673232, 2673233] list_range_week_only [2669915, 2669916, 2669917, 2669918, 2669919, 2669920, 2669921, 2669922 ... 2673226, 2673227, 2673228, 2673229, 2673230, 2673231, 2673232, 2673233] gaps_filled [2669065, 2669067, 2669070, 2669074, 2667567, 2667575, 2667579, 2667580 ... 2669043, 2667509, 2669559, 2667512, 2667513, 2667514, 2667515, 2667516] week_gaps_new [2669939, 2669940, 2669948, 2669949, 2669950, 2669951, 2669952, 2669954 ... 2673225, 2673226, 2673227, 2673228, 2673229, 2673230, 2673231, 2673232] remedied_past_all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2669621, 2669636, 2669645, 2669769, 2669855, 2669871, 2669875, 2669888] all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2673225, 2673226, 2673227, 2673228, 2673229, 2673230, 2673231, 2673232] VAERS_IDs 896637 to 2673233 expected 1776597 all_ever 1745980 gaps 30649 125939 dropped in df_data due to no covid VAX_TYPE involved in the report 159060 dropped in df_vax due to no covid VAX_TYPE involved in the report 155813 dropped in df_syms due to no covid VAX_TYPE involved in the report 1587560 covid reports to work with Repeat sentence removal in SYMPTOM_TEXT, showing each next larger if any (takes time) 479 SYMPTOM_TEXT field repeat sentences deduped in 70 reports, max difference 6192 bytes in VAERS_ID 1645697 Shortening some field values in VAX_NAME, VAX_MANU Merging DATA into VAX 1671185 rows in df_data_vax Aggregating symptoms into symptom_entries string, new column Combining symptoms column items. Grouping by VAERS_ID ... Appending each symptom in new column called symptom_entries Cleaning multiple delimiters due to empty columns Merging symptom_entries into df_data_vax 1671185 rows in df_data_vax_syms_consolidated Saving result into one file: vaers_consolidated/2023-08-18_VAERS_CONSOLIDATED.csv Consolidation of 2023-08-18 done Flattening Aggregate/flatten VAX items. Grouping by VAERS_ID 1587560 rows in df_vax_flat Merging DATA into VAX flattened 1587560 rows in df_data_vax_flat Merging symptom_entries into df_data_vax_syms_flat Saving result into one file: vaers_flattened/2023-08-18_VAERS_FLATTENED.csv 1587560 rows in vaers_flattened/2023-08-18_VAERS_FLATTENED.csv Flattening of 2023-08-18 done open vaers_flattened/2023-08-11_VAERS_FLATTENED.csv ... Highest VAERS_ID 2669915 Using flat 2023-08-18 already in memory, 1587560 rows Previous changes file for changes, cell_edits and status columns open vaers_changes/2023-08-11_VAERS_CHANGES.csv ... Highest VAERS_ID 2669915 Comparing 2023-08-11 v. 2023-08-18 1587560 this drop total covid 1585091 previous total covid 1585017 identical set aside 2543 this drop to work with 74 previous to work with 2469 difference 2502 new in 2023-08-18 0 delayed this week 33 deleted this week kept 0 restored this week Column value changes DATEDIED 1 cell of trivial non-letter differences ignored DIED 1479816 [] <> Y LAB_DATA 1807074 JCS <> CS OTHER_MEDS 1736626 Pfizer, Inc. EUA 027034 <> [] SYMPTOM_TEXT 1770310 PMDA <> RA SYMPTOM_TEXT 1796874 PMDA <> RA SYMPTOM_TEXT 1743085 Pfizer <> SYMPTOM_TEXT 1787053 PHIFDA <> RA SYMPTOM_TEXT 1690805 SPECIAL INVESTIGATION OF COMIRNATY INTRAMUSCULAR INJECTION PATIENTS WITH UNDERLYING DISEASE CONSIDERED TO BE AT HIGH RISK AGGRAVATION C4591019 <> SYMPTOM_TEXT 1847937 PHIFDA unspecified race ethnic origin BGHMC <> RA at medical center SYMPTOM_TEXT 1807018 GENERAL INVESTIGATION TARGETING THE VACCINES HEALTH CARE PROVIDERS HCPS WHO ARE VACCINATED IN EARLY POST-APPROVAL PHASE FOLLOW-UP STUDY C4591006 <> SYMPTOM_TEXT 1877599 COVAES <> portal SYMPTOM_TEXT 1404947 wholesaler Pfizer <> SYMPTOM_TEXT 1469186 Pfizer <> SYMPTOM_TEXT 1477372 Pfizer <> SYMPTOM_TEXT 1476835 Pfizer <> SYMPTOM_TEXT 1676967 11-Jul-2021 <> Jul-2021 SYMPTOM_TEXT 1725024 PMDA MAH <> RA SYMPTOM_TEXT 1555664 university <> SYMPTOM_TEXT 1390415 Pfizer <> SYMPTOM_TEXT 1784211 C4591006 <> SYMPTOM_TEXT 1755446 Pfizer <> SYMPTOM_TEXT 1784414 PMDA <> RA SYMPTOM_TEXT 1401362 Pfizer <> SYMPTOM_TEXT 1576821 Pfizer <> SYMPTOM_TEXT 1747145 privacy <> SYMPTOM_TEXT 1763827 COVAES <> portal SYMPTOM_TEXT 1477148 Pharmaceuticals and Medical Devices Agency PMDA <> regulatory authority SYMPTOM_TEXT 1710352 private <> a SYMPTOM_TEXT 1541809 Agency DA <> Authority SYMPTOM_TEXT 1807074 JCS PRIVACY <> CS a SYMPTOM_TEXT 1845034 Pfizer private <> a 1 duplicate dropped in df_three_columns 2 VAX_DOSE_SERIES 1|UNK <> 1 [1710352, 1789430] VAX_DOSE_SERIES 1593618 UNK|1 <> 1 VAX_DOSE_SERIES 1866294 UNK|UNK <> UNK VAX_LOT 3 cells of trivial non-letter differences ignored VAX_LOT 1866294 FE1573|Unknown <> FE1573 2 duplicates dropped in df_three_columns 3 VAX_MANU Pfizer-BionT|Unknown <> Pfizer-BionT [1710352, 1789430, 1866294] VAX_MANU 1593618 Unknown|Pfizer-BionT <> Pfizer-BionT 2 duplicates dropped in df_three_columns 3 VAX_NAME C19 Pfizer-BionT|Not Specified NO BRAND NAME <> C19 Pfizer-BionT [1710352, 1789430, 1866294] VAX_NAME 1593618 Not Specified NO BRAND NAME|C19 <> C19 VAX_ROUTE 1 cell of trivial non-letter differences ignored 1 duplicate dropped in df_three_columns VAX_ROUTE 1593618 OT| <> [] 2 VAX_ROUTE OT|OT <> OT [1710352, 1866294] VAX_SITE 4 cells of trivial non-letter differences ignored 2 duplicates dropped in df_three_columns 3 VAX_TYPE COVID19|UNK <> COVID19 [1710352, 1789430, 1866294] VAX_TYPE 1593618 UNK|COVID19 <> COVID19 symptom_entries 1449607 _|_Injection site erythema_|_Maternal exposure during pregnancy_|_ <> symptom_entries 1486298 _|_Inappropriate schedule of product administration_|_Overdose_|_ <> symptom_entries 1486014 _|_Rhabdomyolysis_|_ <> symptom_entries 1477389 _|_Cardiac failure_|_Respiratory failure_|_ <> symptom_entries 1477109 _|_Body temperature_|_Body temperature increased_|_ <> symptom_entries 1364013 _|_Intentional dose omission_|_ <> symptom_entries 1477214 _|_Cardiac failure_|_Renal disorder_|_ <> symptom_entries 1486761 _|_Drug ineffective_|_ <> 13 columns altered 31247 modified reports on 2023-08-18 Writing ... vaers_changes/2023-08-18_VAERS_CHANGES.csv 1 report with the most (18) records/lots/doses: 1900339 1 comparison done Doing stats open stats.csv ... ok column changes: {'SYMPTOM_TEXT': 28, 'symptom_entries': 8, 'VAX_DOSE_SERIES': 4, 'VAX_NAME': 4, 'VAX_MANU': 4, 'VAX_TYPE': 4, 'VAX_ROUTE': 3, 'OTHER_MEDS': 1, 'LAB_DATA': 1, 'DIED': 1, 'VAX_LOT': 1, 'HOSPITAL': 0, 'RECOVD': 0, 'ER_ED_VISIT': 0, 'PRIOR_VAX': 0, 'RPT_DATE': 0, 'VAX_SITE': 0, 'ONSET_DATE': 0, 'FORM_VERS': 0, 'DATEDIED': 0, 'HOSPDAYS': 0, 'RECVDATE': 0, 'VAX_DATE': 0, 'NUMDAYS': 0, 'CUR_ILL': 0, 'HISTORY': 0, 'V_ADMINBY': 0, 'ER_VISIT': 0, 'DISABLE': 0, 'L_THREAT': 0, 'STATE': 0, 'X_STAY': 0, 'SPLTTYPE': 0, 'OFC_VISIT': 0, 'BIRTH_DEFECT': 0, 'CAGE_YR': 0, 'CAGE_MO': 0, 'TODAYS_DATE': 0, 'ALLERGIES': 0, 'AGE_YRS': 0, 'SEX': 0, 'V_FUNDBY': 0} This week 0 delayed/late/gapfill 33 deleted 0 restored 9 cell edits trivial not printed 59 cell edits significant 2 cells emptied entirely 28 writeups changed All time 542235 delayed/late/gapfill 31254 deleted 15 restored 29372245 cell edits trivial not printed 30734 cell edits significant 1475723 cells emptied entirely 7460 writeups changed 30649 never published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2673225, 2673226, 2673227, 2673228, 2673229, 2673230, 2673231, 2673232] 16 reports cleared of duplicate sentences within them This week 0 hr 34.0 min Overall 0 hr 34.0 min No more to do, last set 2023-08-18 >= 2023-08-18 done Done with vaers_flatfile_build.py at line 2354, clock time 2023-08-25 16:33:54.171501 - - - - - - - - - - - - - - - - - - - - - - - -