vaers_flatfile_build.py run_all() ... validate_dirs_and_files() ... 1 drops in input to process First (oldest) input: ../Download/ALL_VAERS_DROPS/2020-12-18_VAERS_CSV.zip Last (newest) input: ../Download/ALL_VAERS_DROPS/2023-09-08_AllVAERSDataCSVS.zip Already processed files do appear in vaers_changes and the latest will be built upon: vaers_changes/2020-12-18_VAERS_CHANGES.csv vaers_changes/2020-12-25_VAERS_CHANGES.csv vaers_changes/2021-01-08_VAERS_CHANGES.csv vaers_changes/2021-01-15_VAERS_CHANGES.csv vaers_changes/2021-01-22_VAERS_CHANGES.csv ... 141 total = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Next date 2023-09-08 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = unzip ../Download/ALL_VAERS_DROPS/2023-09-08_AllVAERSDataCSVS.zip Creating in ./vaers_working/ date marker file 2023-09-08 Consolidation Concatenating files, *VAERSDATA.csv, *VAERSVAX.csv, *VAERSSYMPTOMS.csv open vaers_working\2020VAERSDATA.csv ... Highest VAERS_ID 2679890 open vaers_working\2021VAERSDATA.csv ... Highest VAERS_ID 2675503 open vaers_working\2022VAERSDATA.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSDATA.csv ... Highest VAERS_ID 2680314 open vaers_working\NonDomesticVAERSDATA.csv ... Highest VAERS_ID 2680269 open vaers_working\2020VAERSVAX.csv ... Highest VAERS_ID 2679890 open vaers_working\2021VAERSVAX.csv ... Highest VAERS_ID 2675503 open vaers_working\2022VAERSVAX.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSVAX.csv ... Highest VAERS_ID 2680314 open vaers_working\NonDomesticVAERSVAX.csv ... Highest VAERS_ID 2680269 4780 exact duplicates dropped in concatenated files, now 2026186 rows open vaers_working\2020VAERSSYMPTOMS.csv ... Highest VAERS_ID 2679890 open vaers_working\2021VAERSSYMPTOMS.csv ... Highest VAERS_ID 2675503 open vaers_working\2022VAERSSYMPTOMS.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSSYMPTOMS.csv ... Highest VAERS_ID 2680314 open vaers_working\NonDomesticVAERSSYMPTOMS.csv ... Highest VAERS_ID 2680269 133356 records removed prior to the first covid report (covid_earliest_vaers_id 896636) 1720682 reports to work with (unique VAERS_IDs) lo_ever 896636 hi_all_never_published 2678490 hi_this_week 2680314 week_vids_present [2679890, 896750, 896754, 896765, 896766, 896897, 896637, 896638 ... 2680262, 2680263, 2680264, 2680265, 2680266, 2680267, 2680268, 2680269] list_range_all_ever [896636, 896637, 896638, 896639, 896640, 896641, 896642, 896643 ... 2680307, 2680308, 2680309, 2680310, 2680311, 2680312, 2680313, 2680314] list_range_week_only [2678491, 2678492, 2678493, 2678494, 2678495, 2678496, 2678497, 2678498 ... 2680307, 2680308, 2680309, 2680310, 2680311, 2680312, 2680313, 2680314] gaps_filled [2678227 ... 2678373] week_gaps_new [2678576, 2678611, 2678687, 2678759, 2678760, 2678765, 2679093, 2679102 ... 2679907, 2679930, 2679932, 2680043, 2680134, 2680251, 2680257, 2680308] remedied_past_all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2678179, 2678180, 2678181, 2678279, 2678307, 2678436, 2678449, 2678490] all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2679907, 2679930, 2679932, 2680043, 2680134, 2680251, 2680257, 2680308] VAERS_IDs 896637 to 2680314 expected 1783679 all_ever 1753276 gaps 30434 128428 dropped in df_data due to no covid VAX_TYPE involved in the report 162482 dropped in df_vax due to no covid VAX_TYPE involved in the report 158868 dropped in df_syms due to no covid VAX_TYPE involved in the report 1592254 covid reports to work with Repeat sentence removal in SYMPTOM_TEXT, showing each next larger if any (takes time) 401 SYMPTOM_TEXT field repeat sentences deduped in 62 reports, max difference 6192 bytes in VAERS_ID 1645697 Shortening some field values in VAX_NAME, VAX_MANU Merging DATA into VAX 1676434 rows in df_data_vax Aggregating symptoms into symptom_entries string, new column Combining symptoms column items. Grouping by VAERS_ID ... Appending each symptom in new column called symptom_entries Cleaning multiple delimiters due to empty columns Merging symptom_entries into df_data_vax 1676434 rows in df_data_vax_syms_consolidated Saving result into one file: vaers_consolidated/2023-09-08_VAERS_CONSOLIDATED.csv Consolidation of 2023-09-08 done Flattening Aggregate/flatten VAX items. Grouping by VAERS_ID 1592254 rows in df_vax_flat Merging DATA into VAX flattened 1592254 rows in df_data_vax_flat Merging symptom_entries into df_data_vax_syms_flat Saving result into one file: vaers_flattened/2023-09-08_VAERS_FLATTENED.csv 1592254 rows in vaers_flattened/2023-09-08_VAERS_FLATTENED.csv Flattening of 2023-09-08 done open vaers_flattened/2023-09-01_VAERS_FLATTENED.csv ... Highest VAERS_ID 2678495 Using flat 2023-09-08 already in memory, 1592254 rows Previous changes file for changes, cell_edits and status columns open vaers_changes/2023-09-01_VAERS_CHANGES.csv ... Highest VAERS_ID 2678495 Comparing 2023-09-01 v. 2023-09-08 1592254 this drop total covid 1591244 previous total covid 1591205 identical set aside 1049 this drop to work with 39 previous to work with 1010 difference 1033 new in 2023-09-08 0 delayed this week 23 deleted this week kept 0 restored this week Column value changes DATEDIED 2 cells of trivial non-letter differences ignored LAB_DATA 1 cell of trivial non-letter differences ignored SYMPTOM_TEXT 2418134 Journal of Emergency Medicine 2022 DOI:10.1016/j.ajem.2022.08.012 <> SYMPTOM_TEXT 2541447 Leukemia Lymphoma LLS <> SYMPTOM_TEXT 2650170 <> number SYMPTOM_TEXT 2406543 003110 <> SYMPTOM_TEXT 1518284 Vaccine was not administered at Military Facility <> SYMPTOM_TEXT 2564104 2700 dollars <> lot money every SYMPTOM_TEXT 2576714 <> postmenopausal haemorrhage SYMPTOM_TEXT 1508747 of 998523 <> SYMPTOM_TEXT 2515875 005570 <> SYMPTOM_TEXT 2514808 005570 <> SYMPTOM_TEXT 1518183 and not Military Facility <> symptom_entries 1517882 _|_Off label use_|_ <> symptom_entries 1514674 _|_Pain_|_ <> symptom_entries 1514817 _|_Bell's palsy_|_Diabetic neuropathy_|_ <> 4 columns altered 31324 modified reports on 2023-09-08 Writing ... vaers_changes/2023-09-08_VAERS_CHANGES.csv 1 report with the most (18) records/lots/doses: 1900339 1 comparison done Doing stats open stats.csv ... ok column changes: {'SYMPTOM_TEXT': 11, 'symptom_entries': 3, 'VAX_SITE': 0, 'L_THREAT': 0, 'TODAYS_DATE': 0, 'HISTORY': 0, 'ER_VISIT': 0, 'RECOVD': 0, 'VAX_DATE': 0, 'SPLTTYPE': 0, 'ER_ED_VISIT': 0, 'DISABLE': 0, 'LAB_DATA': 0, 'OTHER_MEDS': 0, 'CAGE_MO': 0, 'VAX_DOSE_SERIES': 0, 'VAX_NAME': 0, 'CUR_ILL': 0, 'PRIOR_VAX': 0, 'FORM_VERS': 0, 'AGE_YRS': 0, 'VAX_LOT': 0, 'HOSPDAYS': 0, 'DIED': 0, 'SEX': 0, 'NUMDAYS': 0, 'VAX_ROUTE': 0, 'ALLERGIES': 0, 'OFC_VISIT': 0, 'X_STAY': 0, 'CAGE_YR': 0, 'VAX_TYPE': 0, 'DATEDIED': 0, 'V_FUNDBY': 0, 'RECVDATE': 0, 'ONSET_DATE': 0, 'V_ADMINBY': 0, 'RPT_DATE': 0, 'BIRTH_DEFECT': 0, 'STATE': 0, 'HOSPITAL': 0, 'VAX_MANU': 0} This week 0 delayed/late/gapfill 23 deleted 0 restored 3 cell edits trivial not printed 14 cell edits significant 0 cells emptied entirely 11 writeups changed All time 542236 delayed/late/gapfill 31346 deleted 15 restored 29372264 cell edits trivial not printed 30902 cell edits significant 1475724 cells emptied entirely 7558 writeups changed 30434 never published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2679907, 2679930, 2679932, 2680043, 2680134, 2680251, 2680257, 2680308] 20 reports cleared of duplicate sentences within them This week 0 hr 32.4 min Overall 0 hr 32.5 min Saving vaers_changes/2023-09-08_VAERS_CHANGES_A.csv, 1048575 rows and vaers_changes/2023-09-08_VAERS_CHANGES_B.csv, 575010 rows No more to do, last set 2023-09-08 >= 2023-09-08 done Done with vaers_flatfile_build.py at line 2375, clock time 2023-09-15 13:45:35.760281 - - - - - - - - - - - - - - - - - - - - - - - -