vaers_flatfile_build.py run_all() ... validate_dirs_and_files() ... 1 drops in input to process First (oldest) input: ../Download/ALL_VAERS_DROPS/2020-12-18_VAERS_CSV.zip Last (newest) input: ../Download/ALL_VAERS_DROPS/2023-10-27_AllVAERSDataCSVS.zip Already processed files do appear in vaers_changes and the latest will be built upon: vaers_changes/2020-12-18_VAERS_CHANGES.csv vaers_changes/2020-12-25_VAERS_CHANGES.csv vaers_changes/2021-01-08_VAERS_CHANGES.csv vaers_changes/2021-01-15_VAERS_CHANGES.csv vaers_changes/2021-01-22_VAERS_CHANGES.csv ... 145 total = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Next date 2023-10-27 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = unzip ../Download/ALL_VAERS_DROPS/2023-10-27_AllVAERSDataCSVS.zip Creating in ./vaers_working/ date marker file 2023-10-27 Consolidation Concatenating files, *VAERSDATA.csv, *VAERSVAX.csv, *VAERSSYMPTOMS.csv open vaers_working\2020VAERSDATA.csv ... Highest VAERS_ID 2679890 open vaers_working\2021VAERSDATA.csv ... Highest VAERS_ID 2681134 open vaers_working\2022VAERSDATA.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSDATA.csv ... Highest VAERS_ID 2703574 open vaers_working\NonDomesticVAERSDATA.csv ... Highest VAERS_ID 2702773 open vaers_working\2020VAERSVAX.csv ... Highest VAERS_ID 2679890 open vaers_working\2021VAERSVAX.csv ... Highest VAERS_ID 2681134 open vaers_working\2022VAERSVAX.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSVAX.csv ... Highest VAERS_ID 2703574 open vaers_working\NonDomesticVAERSVAX.csv ... Highest VAERS_ID 2702773 4833 exact duplicates dropped in concatenated files, now 2051781 rows open vaers_working\2020VAERSSYMPTOMS.csv ... Highest VAERS_ID 2679890 open vaers_working\2021VAERSSYMPTOMS.csv ... Highest VAERS_ID 2681134 open vaers_working\2022VAERSSYMPTOMS.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSSYMPTOMS.csv ... Highest VAERS_ID 2703574 open vaers_working\NonDomesticVAERSSYMPTOMS.csv ... Highest VAERS_ID 2702773 133356 records removed prior to the first covid report (covid_earliest_vaers_id 896636) 1742074 reports to work with (unique VAERS_IDs) lo_ever 896636 hi_all_never_published 2688319 hi_this_week 2703574 week_vids_present [2679890, 896750, 896754, 896765, 896766, 896897, 896637, 896638 ... 2702742, 2702743, 2702748, 2702756, 2702757, 2702759, 2702772, 2702773] list_range_all_ever [896636, 896637, 896638, 896639, 896640, 896641, 896642, 896643 ... 2703567, 2703568, 2703569, 2703570, 2703571, 2703572, 2703573, 2703574] list_range_week_only [2688320, 2688321, 2688322, 2688323, 2688324, 2688325, 2688326, 2688327 ... 2703567, 2703568, 2703569, 2703570, 2703571, 2703572, 2703573, 2703574] gaps_filled [2684315, 2684388, 2684976, 2685015, 2685017, 2685046, 2685310, 2685311 ... 2688308, 2688309, 2688310, 2688312, 2688313, 2688316, 2688318, 2688319] week_gaps_new [2688334, 2688507, 2688517, 2688574, 2688649, 2688671, 2688673, 2688674 ... 2703561, 2703562, 2703563, 2703564, 2703566, 2703567, 2703570, 2703572] remedied_past_all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2688174, 2688175, 2688204, 2688215, 2688222, 2688223, 2688239, 2688255] all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2703561, 2703562, 2703563, 2703564, 2703566, 2703567, 2703570, 2703572] VAERS_IDs 896637 to 2703574 expected 1806939 all_ever 1774882 gaps 32089 136315 dropped in df_data due to no covid VAX_TYPE involved in the report 172581 dropped in df_vax due to no covid VAX_TYPE involved in the report 168519 dropped in df_syms due to no covid VAX_TYPE involved in the report 1605759 covid reports to work with Repeat sentence removal in SYMPTOM_TEXT, showing each next larger if any (takes time) 1928 SYMPTOM_TEXT field repeat sentences deduped in 415 reports, max difference 6192 bytes in VAERS_ID 1645697 Shortening some field values in VAX_NAME, VAX_MANU Merging DATA into VAX 1691930 rows in df_data_vax Aggregating symptoms into symptom_entries string, new column Combining symptoms column items. Grouping by VAERS_ID ... Appending each symptom in new column called symptom_entries Cleaning multiple delimiters due to empty columns Merging symptom_entries into df_data_vax 1691930 rows in df_data_vax_syms_consolidated Saving result into one file: vaers_consolidated/2023-10-27_VAERS_CONSOLIDATED.csv Consolidation of 2023-10-27 done Flattening Aggregate/flatten VAX items. Grouping by VAERS_ID 1605759 rows in df_vax_flat Merging DATA into VAX flattened 1605759 rows in df_data_vax_flat Merging symptom_entries into df_data_vax_syms_flat Saving result into one file: vaers_flattened/2023-10-27_VAERS_FLATTENED.csv 1605759 rows in vaers_flattened/2023-10-27_VAERS_FLATTENED.csv Flattening of 2023-10-27 done open vaers_flattened/2023-09-29_VAERS_FLATTENED.csv ... Highest VAERS_ID 2688372 Using flat 2023-10-27 already in memory, 1605759 rows Previous changes file for changes, cell_edits and status columns open vaers_changes/2023-09-29_VAERS_CHANGES.csv ... Highest VAERS_ID 2688372 Comparing 2023-09-29 v. 2023-10-27 1605759 this drop total covid 1596978 previous total covid 1596853 identical set aside 8906 this drop to work with 125 previous to work with 8781 difference 8895 new in 2023-10-27 0 delayed this week 114 deleted this week kept 0 restored this week Column value changes DATEDIED 1 cell of trivial non-letter differences ignored 1 duplicate dropped in df_three_columns 2 DIED [] <> Y [941999, 1257420] DIED 1769852 Y <> [] RECOVD 2663740 [] <> Y RECVDATE 2663740 07/31/2023 <> 10/12/2023 SYMPTOM_TEXT 2613478 Chiu <> SYMPTOM_TEXT 2427586 12/16/19 <> 12/XX/19 SYMPTOM_TEXT 2208883 White US3432392 A Phase 3 Randomized Stratified Observer-Blind Placebo-Controlled Study to Evaluate Efficacy Safety Immunogenicity mRNA-1273 SARS-CoV-2 Vaccine Adults Aged 18 Years Older mRNA-1273-P301 <> study SYMPTOM_TEXT 1498573 NY <> a different state SYMPTOM_TEXT 2663740 Received past BUD <> WAS STORED IN AN UNAPPROVED STORAGE UNIT TODAYS_DATE 2663740 07/31/2023 <> 10/12/2023 VAX_DOSE_SERIES 2663740 2 <> 1 VAX_SITE 2663740 [] <> LA symptom_entries 1302548 _|_Abdominal distension_|_ <> symptom_entries 1496587 _|_Fracture_|_ <> symptom_entries 1599352 _|_Arthralgia_|_Pyrexia_|_ <> symptom_entries 2663740 _|_Expired product administered_|_ <> _|_Product storage error_|_ 9 columns altered 31416 modified reports on 2023-10-27 Writing ... vaers_changes/2023-10-27_VAERS_CHANGES.csv 1 report with the most (18) records/lots/doses: 1900339 1 comparison done Doing stats open stats.csv ... ok column changes: {'SYMPTOM_TEXT': 5, 'symptom_entries': 4, 'DIED': 3, 'TODAYS_DATE': 1, 'VAX_DOSE_SERIES': 1, 'VAX_SITE': 1, 'RECVDATE': 1, 'RECOVD': 1, 'OFC_VISIT': 0, 'VAX_ROUTE': 0, 'HISTORY': 0, 'SEX': 0, 'DATEDIED': 0, 'CAGE_YR': 0, 'PRIOR_VAX': 0, 'OTHER_MEDS': 0, 'HOSPITAL': 0, 'VAX_TYPE': 0, 'HOSPDAYS': 0, 'SPLTTYPE': 0, 'STATE': 0, 'VAX_NAME': 0, 'V_FUNDBY': 0, 'V_ADMINBY': 0, 'CAGE_MO': 0, 'VAX_MANU': 0, 'NUMDAYS': 0, 'X_STAY': 0, 'CUR_ILL': 0, 'FORM_VERS': 0, 'AGE_YRS': 0, 'LAB_DATA': 0, 'ER_VISIT': 0, 'VAX_DATE': 0, 'VAX_LOT': 0, 'L_THREAT': 0, 'ER_ED_VISIT': 0, 'DISABLE': 0, 'BIRTH_DEFECT': 0, 'RPT_DATE': 0, 'ONSET_DATE': 0, 'ALLERGIES': 0} This week 0 delayed/late/gapfill 114 deleted 0 restored 1 cell edits trivial not printed 17 cell edits significant 1 cells emptied entirely 5 writeups changed All time 542236 delayed/late/gapfill 31535 deleted 16 restored 29372271 cell edits trivial not printed 30973 cell edits significant 1475725 cells emptied entirely 7592 writeups changed 32089 never published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2703561, 2703562, 2703563, 2703564, 2703566, 2703567, 2703570, 2703572] 20 reports cleared of duplicate sentences within them 0 hr 29.6 min This week None 0 hr 29.7 min Overall None Saving vaers_changes/2023-10-27_VAERS_CHANGES_A.csv, 1048575 rows and vaers_changes/2023-10-27_VAERS_CHANGES_B.csv, 588703 rows No more to do, last set 2023-10-27 >= 2023-10-27 done 0 hr 31.9 min Done with vaers_flatfile_build.py at line 2375, clock time 2023-11-03 14:13:45.917934 - - - - - - - - - - - - - - - - - - - - - - - -