vaers_flatfile_build.py run_all() ... validate_dirs_and_files() ... 1 drops in input to process First (oldest) input: ../Download/ALL_VAERS_DROPS/2020-12-18_VAERS_CSV.zip Last (newest) input: ../Download/ALL_VAERS_DROPS/2023-09-15_AllVAERSDataCSVS.zip Already processed files do appear in vaers_changes and the latest will be built upon: vaers_changes/2020-12-18_VAERS_CHANGES.csv vaers_changes/2020-12-25_VAERS_CHANGES.csv vaers_changes/2021-01-08_VAERS_CHANGES.csv vaers_changes/2021-01-15_VAERS_CHANGES.csv vaers_changes/2021-01-22_VAERS_CHANGES.csv ... 142 total = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Next date 2023-09-15 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = unzip ../Download/ALL_VAERS_DROPS/2023-09-15_AllVAERSDataCSVS.zip Creating in ./vaers_working/ date marker file 2023-09-15 Consolidation Concatenating files, *VAERSDATA.csv, *VAERSVAX.csv, *VAERSSYMPTOMS.csv open vaers_working\2020VAERSDATA.csv ... Highest VAERS_ID 2679890 open vaers_working\2021VAERSDATA.csv ... Highest VAERS_ID 2681134 open vaers_working\2022VAERSDATA.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSDATA.csv ... Highest VAERS_ID 2682291 open vaers_working\NonDomesticVAERSDATA.csv ... Highest VAERS_ID 2682283 open vaers_working\2020VAERSVAX.csv ... Highest VAERS_ID 2679890 open vaers_working\2021VAERSVAX.csv ... Highest VAERS_ID 2681134 open vaers_working\2022VAERSVAX.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSVAX.csv ... Highest VAERS_ID 2682291 open vaers_working\NonDomesticVAERSVAX.csv ... Highest VAERS_ID 2682283 4785 exact duplicates dropped in concatenated files, now 2028469 rows open vaers_working\2020VAERSSYMPTOMS.csv ... Highest VAERS_ID 2679890 open vaers_working\2021VAERSSYMPTOMS.csv ... Highest VAERS_ID 2681134 open vaers_working\2022VAERSSYMPTOMS.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSSYMPTOMS.csv ... Highest VAERS_ID 2682291 open vaers_working\NonDomesticVAERSSYMPTOMS.csv ... Highest VAERS_ID 2682283 133356 records removed prior to the first covid report (covid_earliest_vaers_id 896636) 1722584 reports to work with (unique VAERS_IDs) lo_ever 896636 hi_all_never_published 2680308 hi_this_week 2682291 week_vids_present [2679890, 896750, 896754, 896765, 896766, 896897, 896637, 896638 ... 2682232, 2682233, 2682234, 2682236, 2682238, 2682239, 2682240, 2682283] list_range_all_ever [896636, 896637, 896638, 896639, 896640, 896641, 896642, 896643 ... 2682284, 2682285, 2682286, 2682287, 2682288, 2682289, 2682290, 2682291] list_range_week_only [2680309, 2680310, 2680311, 2680312, 2680313, 2680314, 2680315, 2680316 ... 2682284, 2682285, 2682286, 2682287, 2682288, 2682289, 2682290, 2682291] gaps_filled [2679465, 2679504, 2679892, 2679895, 2679907 ... 2679932, 2680134, 2680251, 2680257, 2680308] week_gaps_new [2680399, 2680437, 2680453, 2680483, 2680487, 2680488, 2680559, 2680561 ... 2682235, 2682237, 2682244, 2682264, 2682266, 2682268, 2682279, 2682284] remedied_past_all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2679169, 2679306, 2679438, 2679449, 2679668, 2679848, 2679875, 2680043] all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2682235, 2682237, 2682244, 2682264, 2682266, 2682268, 2682279, 2682284] VAERS_IDs 896637 to 2682291 expected 1785656 all_ever 1755198 gaps 30489 129174 dropped in df_data due to no covid VAX_TYPE involved in the report 163482 dropped in df_vax due to no covid VAX_TYPE involved in the report 159856 dropped in df_syms due to no covid VAX_TYPE involved in the report 1593410 covid reports to work with Repeat sentence removal in SYMPTOM_TEXT, showing each next larger if any (takes time) 540 SYMPTOM_TEXT field repeat sentences deduped in 105 reports, max difference 6192 bytes in VAERS_ID 1645697 Shortening some field values in VAX_NAME, VAX_MANU Merging DATA into VAX 1677717 rows in df_data_vax Aggregating symptoms into symptom_entries string, new column Combining symptoms column items. Grouping by VAERS_ID ... Appending each symptom in new column called symptom_entries Cleaning multiple delimiters due to empty columns Merging symptom_entries into df_data_vax 1677717 rows in df_data_vax_syms_consolidated Saving result into one file: vaers_consolidated/2023-09-15_VAERS_CONSOLIDATED.csv Consolidation of 2023-09-15 done Flattening Aggregate/flatten VAX items. Grouping by VAERS_ID 1593410 rows in df_vax_flat Merging DATA into VAX flattened 1593410 rows in df_data_vax_flat Merging symptom_entries into df_data_vax_syms_flat Saving result into one file: vaers_flattened/2023-09-15_VAERS_FLATTENED.csv 1593410 rows in vaers_flattened/2023-09-15_VAERS_FLATTENED.csv Flattening of 2023-09-15 done open vaers_flattened/2023-09-08_VAERS_FLATTENED.csv ... Highest VAERS_ID 2680314 Using flat 2023-09-15 already in memory, 1593410 rows Previous changes file for changes, cell_edits and status columns open vaers_changes/2023-09-08_VAERS_CHANGES.csv ... Highest VAERS_ID 2680314 Comparing 2023-09-08 v. 2023-09-15 1593410 this drop total covid 1592254 previous total covid 1592201 identical set aside 1209 this drop to work with 53 previous to work with 1156 difference 1173 new in 2023-09-15 0 delayed this week 17 deleted this week kept 0 restored this week Column value changes OTHER_MEDS 2545207 <> ATORVASTATIN OTHER_MEDS 1591074 <> TRAMADOL BUPRENORPHINE HEPARINE HEPARIN SODIUM CITALOPRAM IBUPROFENE RECVDATE 1 cell of trivial non-letter differences ignored SYMPTOM_TEXT 1928037 Thanksgiving <> holiday SYMPTOM_TEXT 2325324 005570 <> SYMPTOM_TEXT 1597119 United States <> SYMPTOM_TEXT 1481156 C-VIPER COVID 19 Vaccines International Pregnancy Exposure Registry number C00072 <> SYMPTOM_TEXT 2547540 Figure 2).A <> A SYMPTOM_TEXT 2571439 E2B <> SYMPTOM_TEXT 2571343 E2B <> SYMPTOM_TEXT 2403915 Nonarteritic Anterior Ischemic Optic Neuropathy Associated With COVID-19 Vaccination J Neuroophthalmol 2021 <> SYMPTOM_TEXT 1300363 CMS/HCC <> HCC SYMPTOM_TEXT 2266907 white A Phase 3 Randomized Stratified Observer-Blind Placebo-Controlled to Evaluate the Efficacy Safety and Immunogenicity of mRNA-1273 SARS-CoV-2 Vaccine Adults Aged 18 Years Older mRNA-1273-P301 <> a SYMPTOM_TEXT 2489487 172086 Incyte Corp <> SYMPTOM_TEXT 2494669 arrested jail <> facility SYMPTOM_TEXT 2560967 E2B <> SYMPTOM_TEXT 2527331 005570 <> SYMPTOM_TEXT 1591074 VigiAccess <> SYMPTOM_TEXT 2561981 E2B <> SYMPTOM_TEXT 1609847 INC.-MOD-2021-083099:Kelly Thorn <> INC.-MOD-2021-083099 SYMPTOM_TEXT 1518211 Campbell <> SYMPTOM_TEXT 2234679 over 3000 <> a lot of money SYMPTOM_TEXT 2667063 <> with Decadron We will follow-up his outpatient cardiologist Additionally patient found to have iron deficiency anemia Started on oral and recommending endoscopic evaluation Assessment Plan Syncope No obvious cause Possible contributors are mild dehydration covid infection Low suspicion that this is a reflection of symptomatic bradycardia or other arrhythmia Patient reports syncopal episode after driving work preceding symptoms EKG sinus first-degree AV block similar prior Baseline heart rate seems be around 50 60 He metoprolol Monitor telemetry few PVCs V tach NSVT reported more likely artifact per cardiology TTE findings below Overall unremarkable Encourage hydration NSTEMI Suspect type II secondary demand ischemia episode/COVID Coronary calcifications noted CTPA so possibly some atherosclerosis component history CAD however does follow Dr at hospital for unknown reasons Denies ischemic work-up Presented Denied chest pain Troponin 15>34>78>91>83 changes 8/28 LVEF 5 no wall motion abnormality left atrial enlargement Pulmonary hypertension RVSP 58 heparin outlying ED Continue ASA statin Cardiology following further inpatient Can Lexiscan Cardiolite stress test once out COVID isolation Acute hypoxic respiratory failure resolved Symptom onset 7/26 event has cough diarrhea Per family was having vaccinated Tested positive dexamethasone 6 mg IV x 10 days 7/27 outlier Prn nebs mucinex been oxygenating well room air 24 without any than continue discharge HTN Chronic BP stable HLD atorvastatin fenofibrate lipid therapy current dose May need increase pending Will defer OSA Noted pulmonary echo Compliant CPAP On armodafinil SYMPTOM_TEXT 2527332 005570 <> SYMPTOM_TEXT 2176517 Clinic/Veterans Administration <> Clinic/Administration SYMPTOM_TEXT 1522072 Campbell <> SYMPTOM_TEXT 2545207 <> Other Case identifier(s US-MLMSERVICE-20221226-4002044-1 NZ-Adis-3743469-322644 US-MYLANLABS-2022M1140938 VAX_DOSE_SERIES 2545207 UNK|UNK <> UNK VAX_DOSE_SERIES 1591074 UNK|UNK|UNK <> UNK VAX_LOT 2 cells of trivial non-letter differences ignored VAX_MANU 2545207 Pfizer-BionT|Unknown <> Pfizer-BionT VAX_MANU 1591074 Pfizer-BionT|Unknown|Unknown <> Pfizer-BionT VAX_NAME 2545207 C19 Pfizer-BionT|Not Specified NO BRAND NAME <> C19 Pfizer-BionT VAX_NAME 1591074 C19 Pfizer-BionT|Not Specified NO BRAND NAME|Not Specified NO BRAND NAME <> C19 Pfizer-BionT VAX_ROUTE 1 cell of trivial non-letter differences ignored VAX_ROUTE 1591074 OT||OT <> OT VAX_SITE 2 cells of trivial non-letter differences ignored VAX_TYPE 2545207 COVID19|UNK <> COVID19 VAX_TYPE 1591074 COVID19|UNK|UNK <> COVID19 symptom_entries 1522261 _|_Pain_|_ <> symptom_entries 1534343 _|_Restlessness_|_ <> symptom_entries 1499310 _|_Overdose_|_ <> symptom_entries 1531006 _|_Burning sensation_|_Paraesthesia_|_Skin irritation_|_ <> symptom_entries 1492240 _|_Infant irritability_|_ <> symptom_entries 1487652 _|_Inappropriate schedule of product administration_|_ <> symptom_entries 1534522 _|_Maternal exposure during pregnancy_|_ <> symptom_entries 1522159 _|_Feeling hot_|_Pruritus_|_Vaccination site erythema_|_Vaccination site pain_|_Vaccination site swelling_|_ <> symptom_entries 1522459 _|_Immediate post-injection reaction_|_ <> symptom_entries 1524872 _|_Incomplete course of vaccination_|_ <> symptom_entries 1530652 _|_Panic attack_|_ <> 11 columns altered 31367 modified reports on 2023-09-15 Writing ... vaers_changes/2023-09-15_VAERS_CHANGES.csv 1 report with the most (18) records/lots/doses: 1900339 1 comparison done Doing stats open stats.csv ... ok column changes: {'SYMPTOM_TEXT': 24, 'symptom_entries': 11, 'VAX_NAME': 2, 'VAX_MANU': 2, 'OTHER_MEDS': 2, 'VAX_TYPE': 2, 'VAX_DOSE_SERIES': 2, 'VAX_ROUTE': 1, 'ER_VISIT': 0, 'DATEDIED': 0, 'ALLERGIES': 0, 'ONSET_DATE': 0, 'V_ADMINBY': 0, 'BIRTH_DEFECT': 0, 'HOSPITAL': 0, 'VAX_LOT': 0, 'V_FUNDBY': 0, 'SEX': 0, 'VAX_DATE': 0, 'X_STAY': 0, 'CAGE_MO': 0, 'RECOVD': 0, 'FORM_VERS': 0, 'HISTORY': 0, 'HOSPDAYS': 0, 'L_THREAT': 0, 'TODAYS_DATE': 0, 'NUMDAYS': 0, 'RECVDATE': 0, 'VAX_SITE': 0, 'OFC_VISIT': 0, 'ER_ED_VISIT': 0, 'DIED': 0, 'AGE_YRS': 0, 'PRIOR_VAX': 0, 'SPLTTYPE': 0, 'CAGE_YR': 0, 'STATE': 0, 'CUR_ILL': 0, 'RPT_DATE': 0, 'LAB_DATA': 0, 'DISABLE': 0} This week 0 delayed/late/gapfill 17 deleted 0 restored 6 cell edits trivial not printed 46 cell edits significant 0 cells emptied entirely 24 writeups changed All time 542236 delayed/late/gapfill 31363 deleted 15 restored 29372270 cell edits trivial not printed 30948 cell edits significant 1475724 cells emptied entirely 7582 writeups changed 30489 never published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2682235, 2682237, 2682244, 2682264, 2682266, 2682268, 2682279, 2682284] 20 reports cleared of duplicate sentences within them 0 hr 37.6 min This week None 0 hr 37.6 min Overall None Saving vaers_changes/2023-09-15_VAERS_CHANGES_A.csv, 1048575 rows and vaers_changes/2023-09-15_VAERS_CHANGES_B.csv, 576183 rows No more to do, last set 2023-09-15 >= 2023-09-15 done 0 hr 40.4 min Done with vaers_flatfile_build.py at line 2375, clock time 2023-09-22 14:20:12.082060 - - - - - - - - - - - - - - - - - - - - - - - -