vaers_flatfile_build.py run_all() ... validate_dirs_and_files() ... 1 drops in input to process First (oldest) input: ../Download/ALL_VAERS_DROPS/2020-12-18_VAERS_CSV.zip Last (newest) input: ../Download/ALL_VAERS_DROPS/2023-09-22_AllVAERSDataCSVS.zip Already processed files do appear in vaers_changes and the latest will be built upon: vaers_changes/2020-12-18_VAERS_CHANGES.csv vaers_changes/2020-12-25_VAERS_CHANGES.csv vaers_changes/2021-01-08_VAERS_CHANGES.csv vaers_changes/2021-01-15_VAERS_CHANGES.csv vaers_changes/2021-01-22_VAERS_CHANGES.csv ... 143 total = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Next date 2023-09-22 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = unzip ../Download/ALL_VAERS_DROPS/2023-09-22_AllVAERSDataCSVS.zip Creating in ./vaers_working/ date marker file 2023-09-22 Consolidation Concatenating files, *VAERSDATA.csv, *VAERSVAX.csv, *VAERSSYMPTOMS.csv open vaers_working\2020VAERSDATA.csv ... Highest VAERS_ID 2679890 open vaers_working\2021VAERSDATA.csv ... Highest VAERS_ID 2681134 open vaers_working\2022VAERSDATA.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSDATA.csv ... Highest VAERS_ID 2684990 open vaers_working\NonDomesticVAERSDATA.csv ... Highest VAERS_ID 2684982 open vaers_working\2020VAERSVAX.csv ... Highest VAERS_ID 2679890 open vaers_working\2021VAERSVAX.csv ... Highest VAERS_ID 2681134 open vaers_working\2022VAERSVAX.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSVAX.csv ... Highest VAERS_ID 2684990 open vaers_working\NonDomesticVAERSVAX.csv ... Highest VAERS_ID 2684982 4798 exact duplicates dropped in concatenated files, now 2031484 rows open vaers_working\2020VAERSSYMPTOMS.csv ... Highest VAERS_ID 2679890 open vaers_working\2021VAERSSYMPTOMS.csv ... Highest VAERS_ID 2681134 open vaers_working\2022VAERSSYMPTOMS.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSSYMPTOMS.csv ... Highest VAERS_ID 2684990 open vaers_working\NonDomesticVAERSSYMPTOMS.csv ... Highest VAERS_ID 2684982 133356 records removed prior to the first covid report (covid_earliest_vaers_id 896636) 1725094 reports to work with (unique VAERS_IDs) lo_ever 896636 hi_all_never_published 2682284 hi_this_week 2684990 week_vids_present [2679890, 896750, 896754, 896765, 896766, 896897, 896637, 896638 ... 2684912, 2684913, 2684914, 2684915, 2684916, 2684919, 2684975, 2684982] list_range_all_ever [896636, 896637, 896638, 896639, 896640, 896641, 896642, 896643 ... 2684983, 2684984, 2684985, 2684986, 2684987, 2684988, 2684989, 2684990] list_range_week_only [2682285, 2682286, 2682287, 2682288, 2682289, 2682290, 2682291, 2682292 ... 2684983, 2684984, 2684985, 2684986, 2684987, 2684988, 2684989, 2684990] gaps_filled [2680745, 2681865, 2681866, 2681891, 2681904, 2681921, 2681922, 2681923 ... 2682136, 2682186, 2682195, 2682235, 2682237, 2682244, 2682279, 2682284] week_gaps_new [2682393, 2682503, 2682517, 2682577, 2682608, 2682645, 2682816, 2682973 ... 2684970, 2684972, 2684973, 2684976, 2684980, 2684983, 2684986, 2684987] remedied_past_all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2682080, 2682131, 2682139, 2682146, 2682158, 2682264, 2682266, 2682268] all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2684970, 2684972, 2684973, 2684976, 2684980, 2684983, 2684986, 2684987] VAERS_IDs 896637 to 2684990 expected 1788355 all_ever 1757735 gaps 30651 130098 dropped in df_data due to no covid VAX_TYPE involved in the report 164702 dropped in df_vax due to no covid VAX_TYPE involved in the report 160978 dropped in df_syms due to no covid VAX_TYPE involved in the report 1594996 covid reports to work with Repeat sentence removal in SYMPTOM_TEXT, showing each next larger if any (takes time) 361 SYMPTOM_TEXT field repeat sentences deduped in 79 reports, max difference 6192 bytes in VAERS_ID 1645697 Shortening some field values in VAX_NAME, VAX_MANU Merging DATA into VAX 1679512 rows in df_data_vax Aggregating symptoms into symptom_entries string, new column Combining symptoms column items. Grouping by VAERS_ID ... Appending each symptom in new column called symptom_entries Cleaning multiple delimiters due to empty columns Merging symptom_entries into df_data_vax 1679512 rows in df_data_vax_syms_consolidated Saving result into one file: vaers_consolidated/2023-09-22_VAERS_CONSOLIDATED.csv Consolidation of 2023-09-22 done Flattening Aggregate/flatten VAX items. Grouping by VAERS_ID 1594996 rows in df_vax_flat Merging DATA into VAX flattened 1594996 rows in df_data_vax_flat Merging symptom_entries into df_data_vax_syms_flat Saving result into one file: vaers_flattened/2023-09-22_VAERS_FLATTENED.csv 1594996 rows in vaers_flattened/2023-09-22_VAERS_FLATTENED.csv Flattening of 2023-09-22 done open vaers_flattened/2023-09-15_VAERS_FLATTENED.csv ... Highest VAERS_ID 2682285 Using flat 2023-09-22 already in memory, 1594996 rows Previous changes file for changes, cell_edits and status columns open vaers_changes/2023-09-15_VAERS_CHANGES.csv ... Highest VAERS_ID 2682285 Comparing 2023-09-15 v. 2023-09-22 1594996 this drop total covid 1593410 previous total covid 1593379 identical set aside 1617 this drop to work with 31 previous to work with 1586 difference 1609 new in 2023-09-22 0 delayed this week 24 deleted this week kept 1 restored this week 1275594 Column value changes SYMPTOM_TEXT 2099173 Q2037-Medicare Flu <> Q2037-Flu SYMPTOM_TEXT 1482870 COVAX US Support <> SYMPTOM_TEXT 2648998 E2B <> SYMPTOM_TEXT 1552030 <> Vaccine exposure during pregnancy This spontaneous prospective case was reported by a dietician and describes the occurrence of EXPOSURE DURING PREGNANCY Pregnant in 33-year-old female patient gravida 2 para 1 who received mRNA-1273 Moderna COVID-19 batch no 011J20A for vaccination The patient's past medical history included Alcohol use last had Glass wine 4 days ago glass once every weeks Concomitant products MINERALS NOS VITAMINS PRENATAL symptom_entries 1492522 _|_Abdominal pain_|_Blood pressure measurement_|_ <> symptom_entries 2174164 _|_Supraventricular tachycardia_|_Viral load_|_ <> symptom_entries 1543273 _|_Maternal exposure during breast feeding_|_ <> 2 columns altered 31354 modified reports on 2023-09-22 Writing ... vaers_changes/2023-09-22_VAERS_CHANGES.csv 1 report with the most (18) records/lots/doses: 1900339 1 comparison done Doing stats open stats.csv ... ok column changes: {'SYMPTOM_TEXT': 4, 'symptom_entries': 3, 'VAX_ROUTE': 0, 'RECVDATE': 0, 'ER_VISIT': 0, 'VAX_DATE': 0, 'TODAYS_DATE': 0, 'CAGE_YR': 0, 'DIED': 0, 'HOSPDAYS': 0, 'DISABLE': 0, 'AGE_YRS': 0, 'CUR_ILL': 0, 'PRIOR_VAX': 0, 'RPT_DATE': 0, 'BIRTH_DEFECT': 0, 'LAB_DATA': 0, 'STATE': 0, 'ER_ED_VISIT': 0, 'VAX_MANU': 0, 'RECOVD': 0, 'V_ADMINBY': 0, 'CAGE_MO': 0, 'V_FUNDBY': 0, 'L_THREAT': 0, 'VAX_TYPE': 0, 'HISTORY': 0, 'HOSPITAL': 0, 'SPLTTYPE': 0, 'SEX': 0, 'FORM_VERS': 0, 'VAX_DOSE_SERIES': 0, 'X_STAY': 0, 'ONSET_DATE': 0, 'OFC_VISIT': 0, 'NUMDAYS': 0, 'DATEDIED': 0, 'VAX_LOT': 0, 'ALLERGIES': 0, 'VAX_SITE': 0, 'OTHER_MEDS': 0, 'VAX_NAME': 0} This week 0 delayed/late/gapfill 24 deleted 1 restored 0 cell edits trivial not printed 7 cell edits significant 0 cells emptied entirely 4 writeups changed All time 542236 delayed/late/gapfill 31387 deleted 16 restored 29372270 cell edits trivial not printed 30955 cell edits significant 1475724 cells emptied entirely 7586 writeups changed 30651 never published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2684970, 2684972, 2684973, 2684976, 2684980, 2684983, 2684986, 2684987] 20 reports cleared of duplicate sentences within them 0 hr 25.9 min This week None 0 hr 25.9 min Overall None Saving vaers_changes/2023-09-22_VAERS_CHANGES_A.csv, 1048575 rows and vaers_changes/2023-09-22_VAERS_CHANGES_B.csv, 577792 rows No more to do, last set 2023-09-22 >= 2023-09-22 done 0 hr 27.7 min Done with vaers_flatfile_build.py at line 2375, clock time 2023-09-30 16:55:20.726207 - - - - - - - - - - - - - - - - - - - - - - - -