vaers_flatfile_build.py run_all() ... validate_dirs_and_files() ... The expected working directory for processing does not exist, creating vaers_working 140 drops in input to process First (oldest) input: ../Download/ALL_VAERS_DROPS/2020-12-18_VAERS_CSV.zip Last (newest) input: ../Download/ALL_VAERS_DROPS/2023-08-25_AllVAERSDataCSVS.zip Already processed files do appear in vaers_changes and the latest will be built upon: vaers_changes/2020-12-18_VAERS_CHANGES.csv vaers_changes/2020-12-25_VAERS_CHANGES.csv vaers_changes/2021-01-08_VAERS_CHANGES.csv vaers_changes/2021-01-15_VAERS_CHANGES.csv vaers_changes/2021-01-22_VAERS_CHANGES.csv ... 139 total = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Next date 2023-08-25 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = unzip ../Download/ALL_VAERS_DROPS/2023-08-25_AllVAERSDataCSVS.zip Creating in ./vaers_working/ date marker file 2023-08-25 Consolidation Concatenating files, *VAERSDATA.csv, *VAERSVAX.csv, *VAERSSYMPTOMS.csv open vaers_working\2020VAERSDATA.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSDATA.csv ... Highest VAERS_ID 2675503 open vaers_working\2022VAERSDATA.csv ... Highest VAERS_ID 2675907 open vaers_working\2023VAERSDATA.csv ... Highest VAERS_ID 2676388 open vaers_working\NonDomesticVAERSDATA.csv ... Highest VAERS_ID 2676359 open vaers_working\2020VAERSVAX.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSVAX.csv ... Highest VAERS_ID 2675503 open vaers_working\2022VAERSVAX.csv ... Highest VAERS_ID 2675907 open vaers_working\2023VAERSVAX.csv ... Highest VAERS_ID 2676388 open vaers_working\NonDomesticVAERSVAX.csv ... Highest VAERS_ID 2676359 4759 exact duplicates dropped in concatenated files, now 2021504 rows open vaers_working\2020VAERSSYMPTOMS.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSSYMPTOMS.csv ... Highest VAERS_ID 2675503 open vaers_working\2022VAERSSYMPTOMS.csv ... Highest VAERS_ID 2675907 open vaers_working\2023VAERSSYMPTOMS.csv ... Highest VAERS_ID 2676388 open vaers_working\NonDomesticVAERSSYMPTOMS.csv ... Highest VAERS_ID 2676359 133358 records removed prior to the first covid report (covid_earliest_vaers_id 896636) 1716830 reports to work with (unique VAERS_IDs) week_vids_present [896750, 896754, 896765, 896766, 896897, 896637, 896638, 896639 ... 2676104, 2676105, 2676106, 2676253, 2676254, 2676255, 2676358, 2676359] hi_all_never_published 2676375 lo_ever 896637 hi_this_week 2676388 list_range_all_ever [896637, 896638, 896639, 896640, 896641, 896642, 896643, 896644 ... 2676381, 2676382, 2676383, 2676384, 2676385, 2676386, 2676387, 2676388] list_range_week_only [2676376, 2676377, 2676378, 2676379, 2676380, 2676381 ... 2676383, 2676384, 2676385, 2676386, 2676387, 2676388] gaps_filled ... week_gaps_new ... remedied_past_all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2676023, 2676024, 2676117, 2676295, 2676305, 2676311, 2676363, 2676375] all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2676023, 2676024, 2676117, 2676295, 2676305, 2676311, 2676363, 2676375] VAERS_IDs 896637 to 2676388 expected 1779752 all_ever 1749350 gaps 30434 126865 dropped in df_data due to no covid VAX_TYPE involved in the report 160397 dropped in df_vax due to no covid VAX_TYPE involved in the report 156969 dropped in df_syms due to no covid VAX_TYPE involved in the report 1589965 covid reports to work with Repeat sentence removal in SYMPTOM_TEXT, showing each next larger if any (takes time) 1281 SYMPTOM_TEXT field repeat sentences deduped in 189 reports, max difference 6192 bytes in VAERS_ID 1645697 Shortening some field values in VAX_NAME, VAX_MANU Merging DATA into VAX 1673835 rows in df_data_vax Aggregating symptoms into symptom_entries string, new column Combining symptoms column items. Grouping by VAERS_ID ... Appending each symptom in new column called symptom_entries Cleaning multiple delimiters due to empty columns Merging symptom_entries into df_data_vax 1673835 rows in df_data_vax_syms_consolidated Saving result into one file: vaers_consolidated/2023-08-25_VAERS_CONSOLIDATED.csv Consolidation of 2023-08-25 done Flattening Aggregate/flatten VAX items. Grouping by VAERS_ID 1589965 rows in df_vax_flat Merging DATA into VAX flattened 1589965 rows in df_data_vax_flat Merging symptom_entries into df_data_vax_syms_flat Saving result into one file: vaers_flattened/2023-08-25_VAERS_FLATTENED.csv 1589965 rows in vaers_flattened/2023-08-25_VAERS_FLATTENED.csv Flattening of 2023-08-25 done open vaers_flattened/2023-08-18_VAERS_FLATTENED.csv ... Highest VAERS_ID 2673233 Using flat 2023-08-25 already in memory, 1589965 rows Previous changes file for changes, cell_edits and status columns open vaers_changes/2023-08-18_VAERS_CHANGES.csv ... Highest VAERS_ID 2673233 Comparing 2023-08-18 v. 2023-08-25 1589965 this drop total covid 1587560 previous total covid 1587485 identical set aside 2480 this drop to work with 75 previous to work with 2405 difference 2429 new in 2023-08-25 0 delayed this week 24 deleted this week kept 0 restored this week Column value changes DATEDIED 3 cells of trivial non-letter differences ignored 2 duplicates dropped in df_three_columns 3 DIED [] <> Y [1458652, 1687005, 1729430] 1 duplicate dropped in df_three_columns 2 OTHER_MEDS [] <> HUMIRA [2187147, 2439994] SYMPTOM_TEXT 1622983 Takeda Pharmaceuticals <> a regulatory authority SYMPTOM_TEXT 2502731 white <> SYMPTOM_TEXT 1625758 RSI <> index SYMPTOM_TEXT 1789606 privacy <> a SYMPTOM_TEXT 1773210 Japan PMDA <> RA SYMPTOM_TEXT 1878853 6168537 <> SYMPTOM_TEXT 1878855 Health 6168537 <> regulatory SYMPTOM_TEXT 1883769 6168537 PGS Puurs NTM <> SYMPTOM_TEXT 1878854 6168537 <> SYMPTOM_TEXT 1690985 PHIFDA with unspecified race and ethnicity <> SYMPTOM_TEXT 1730675 of an unspecified race and ethnic origin <> SYMPTOM_TEXT 1743220 JCS JCS1-1 <> CS CS1-1 SYMPTOM_TEXT 1773214 PMDA <> RA SYMPTOM_TEXT 1577037 PHIFDA <> SYMPTOM_TEXT 1763553 PMDA <> RA SYMPTOM_TEXT 1763829 LITERATURE REFERENCE Acute following COVID-19 vaccination Vaccines 2021;9(9):1008 FTA SARA <> SYMPTOM_TEXT 1740497 JCS-1 <> CS-1 SYMPTOM_TEXT 1876161 RSI <> index SYMPTOM_TEXT 1809887 Takeda Pharmaceuticals and <> regulatory authority SYMPTOM_TEXT 1974333 for the following literature source(s Acute Retinal Necrosis from Reactivation of Varicella Zoster Virus BNT162b2 mRNA COVID-19 Vaccination Ocular Immunology and Inflammation 2021 DOI:10.1080/09273948.2021.2001540 <> SYMPTOM_TEXT 1630797 PHIFDA <> RA SYMPTOM_TEXT 1810018 PHIFDA <> RA SYMPTOM_TEXT 1787689 MAH <> RA SYMPTOM_TEXT 1743233 PMDA MAH <> RA SYMPTOM_TEXT 2237795 for the following source(s COVID-19 vaccine mRNA BNT162b2 and infection-induced thrombotic thrombocytopenic purpura in adolescents Pediatric Blood Cancer 2022 DOI:10.1002/pbc.29681 source entitled from DOI 10.1002/pbc.29681 <> SYMPTOM_TEXT 2439953 for the following literature source(s Recurrent non-traumatic non-exertional rhabdomyolysis after immunologic stimuli in a healthy adolescent female case report <> SYMPTOM_TEXT 1878857 6168537 <> SYMPTOM_TEXT 1660621 COVID-19 Adverse Event Self-Reporting Solution football Shimizu S-Pulse team boxed general Pfizer <> regulatory authority sports SYMPTOM_TEXT 2490886 005570 <> SYMPTOM_TEXT 1442946 COVAES <> online portal SYMPTOM_TEXT 1838418 PMDA <> RA SYMPTOM_TEXT 1729430 Teva pharmaceuticals <> VAX_DOSE_SERIES 2187147 1|UNK <> 1 VAX_DOSE_SERIES 2439994 2|UNK <> 2 VAX_DOSE_SERIES 1847306 3|UNK <> 3 VAX_LOT 1 cell of trivial non-letter differences ignored VAX_LOT 2187147 |1153187 <> [] VAX_LOT 1847306 FG7387|Unknown <> FG7387 2 duplicates dropped in df_three_columns 3 VAX_MANU Pfizer-BionT|Unknown <> Pfizer-BionT [1847306, 2187147, 2439994] 2 duplicates dropped in df_three_columns 3 VAX_NAME C19 Pfizer-BionT|Not Specified NO BRAND NAME <> C19 Pfizer-BionT [1847306, 2187147, 2439994] 2 duplicates dropped in df_three_columns 3 VAX_ROUTE OT|OT <> OT [1847306, 2187147, 2439994] VAX_SITE 3 cells of trivial non-letter differences ignored 2 duplicates dropped in df_three_columns 3 VAX_TYPE COVID19|UNK <> COVID19 [1847306, 2187147, 2439994] symptom_entries 1479761 _|_Exposure during pregnancy_|_ <> symptom_entries 1480298 _|_Maternal exposure before pregnancy_|_ <> symptom_entries 1469186 _|_Anaphylactic reaction_|_Anaphylactic shock_|_ <> symptom_entries 1488307 _|_Loss of consciousness_|_ <> symptom_entries 1507922 _|_Joint swelling_|_Pyrexia_|_ <> symptom_entries 1493155 _|_Bedridden_|_ <> symptom_entries 1336418 _|_Blood pressure measurement_|_Body temperature_|_Loss of consciousness_|_Overdose_|_ <> symptom_entries 1497550 _|_Dyschezia_|_Dysuria_|_Gait disturbance_|_Movement disorder_|_ <> symptom_entries 1492305 _|_Vaccination failure_|_ <> symptom_entries 1491074 _|_Drug ineffective_|_ <> symptom_entries 1836690 _|_Extra dose administered_|_ <> symptom_entries 1477659 _|_Foetal exposure during pregnancy_|_ <> symptom_entries 1373066 _|_Overdose_|_ <> symptom_entries 1497387 _|_Pain in extremity_|_ <> 12 columns altered 31290 modified reports on 2023-08-25 Writing ... vaers_changes/2023-08-25_VAERS_CHANGES.csv 1 report with the most (18) records/lots/doses: 1900339 1 comparison done Doing stats open stats.csv ... ok column changes: {'SYMPTOM_TEXT': 32, 'symptom_entries': 14, 'VAX_TYPE': 3, 'VAX_NAME': 3, 'VAX_ROUTE': 3, 'VAX_DOSE_SERIES': 3, 'DIED': 3, 'VAX_MANU': 3, 'VAX_LOT': 2, 'OTHER_MEDS': 2, 'RPT_DATE': 0, 'VAX_SITE': 0, 'TODAYS_DATE': 0, 'ALLERGIES': 0, 'STATE': 0, 'ER_ED_VISIT': 0, 'PRIOR_VAX': 0, 'ONSET_DATE': 0, 'RECVDATE': 0, 'DISABLE': 0, 'LAB_DATA': 0, 'BIRTH_DEFECT': 0, 'NUMDAYS': 0, 'AGE_YRS': 0, 'DATEDIED': 0, 'V_FUNDBY': 0, 'FORM_VERS': 0, 'SEX': 0, 'SPLTTYPE': 0, 'VAX_DATE': 0, 'CAGE_MO': 0, 'V_ADMINBY': 0, 'HISTORY': 0, 'CUR_ILL': 0, 'HOSPITAL': 0, 'L_THREAT': 0, 'ER_VISIT': 0, 'RECOVD': 0, 'OFC_VISIT': 0, 'CAGE_YR': 0, 'X_STAY': 0, 'HOSPDAYS': 0} This week 0 delayed/late/gapfill 24 deleted 0 restored 7 cell edits trivial not printed 68 cell edits significant 1 cells emptied entirely 32 writeups changed All time 542235 delayed/late/gapfill 31278 deleted 15 restored 29372252 cell edits trivial not printed 30802 cell edits significant 1475724 cells emptied entirely 7492 writeups changed 30434 never published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2676023, 2676024, 2676117, 2676295, 2676305, 2676311, 2676363, 2676375] 16 reports cleared of duplicate sentences within them This week 0 hr 33.6 min Overall 0 hr 33.6 min Saving vaers_changes/2023-08-25_VAERS_CHANGES_A.csv, 1048575 rows and vaers_changes/2023-08-25_VAERS_CHANGES_B.csv, 572653 rows No more to do, last set 2023-08-25 >= 2023-08-25 done Done with vaers_flatfile_build.py at line 2359, clock time 2023-09-01 12:01:46.440507 - - - - - - - - - - - - - - - - - - - - - - - -