vaers_flatfile_build.py run_all() ... validate_dirs_and_files() ... 1 drops in input to process First (oldest) input: ../Download/ALL_VAERS_DROPS/2020-12-18_VAERS_CSV.zip Last (newest) input: ../Download/ALL_VAERS_DROPS/2023-09-01_AllVAERSDataCSVS.zip Already processed files do appear in vaers_changes and the latest will be built upon: vaers_changes/2020-12-18_VAERS_CHANGES.csv vaers_changes/2020-12-25_VAERS_CHANGES.csv vaers_changes/2021-01-08_VAERS_CHANGES.csv vaers_changes/2021-01-15_VAERS_CHANGES.csv vaers_changes/2021-01-22_VAERS_CHANGES.csv ... 140 total = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Next date 2023-09-01 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = unzip ../Download/ALL_VAERS_DROPS/2023-09-01_AllVAERSDataCSVS.zip Creating in ./vaers_working/ date marker file 2023-09-01 Consolidation Concatenating files, *VAERSDATA.csv, *VAERSVAX.csv, *VAERSSYMPTOMS.csv open vaers_working\2020VAERSDATA.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSDATA.csv ... Highest VAERS_ID 2675503 open vaers_working\2022VAERSDATA.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSDATA.csv ... Highest VAERS_ID 2678496 open vaers_working\NonDomesticVAERSDATA.csv ... Highest VAERS_ID 2678495 open vaers_working\2020VAERSVAX.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSVAX.csv ... Highest VAERS_ID 2675503 open vaers_working\2022VAERSVAX.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSVAX.csv ... Highest VAERS_ID 2678496 open vaers_working\NonDomesticVAERSVAX.csv ... Highest VAERS_ID 2678495 4770 exact duplicates dropped in concatenated files, now 2024080 rows open vaers_working\2020VAERSSYMPTOMS.csv ... Highest VAERS_ID 918561 open vaers_working\2021VAERSSYMPTOMS.csv ... Highest VAERS_ID 2675503 open vaers_working\2022VAERSSYMPTOMS.csv ... Highest VAERS_ID 2678082 open vaers_working\2023VAERSSYMPTOMS.csv ... Highest VAERS_ID 2678496 open vaers_working\NonDomesticVAERSSYMPTOMS.csv ... Highest VAERS_ID 2678495 133356 records removed prior to the first covid report (covid_earliest_vaers_id 896636) 1718916 reports to work with (unique VAERS_IDs) week_vids_present [896750, 896754, 896765, 896766, 896897, 896637, 896638, 896639 ... 2678466, 2678467, 2678468, 2678469, 2678483, 2678484, 2678485, 2678486] hi_all_never_published 2676375 lo_ever 896636 hi_this_week 2678496 list_range_all_ever [896636, 896637, 896638, 896639, 896640, 896641, 896642, 896643 ... 2678489, 2678490, 2678491, 2678492, 2678493, 2678494, 2678495, 2678496] list_range_week_only [2676376, 2676377, 2676378, 2676379, 2676380, 2676381, 2676382, 2676383 ... 2678489, 2678490, 2678491, 2678492, 2678493, 2678494, 2678495, 2678496] gaps_filled [2618858, 2641434, 2653946, 2653955, 2654774, 2657398, 2659683, 2662267 ... 2675307, 2675319, 2675320, 2675507, 2675508, 2676023, 2676024, 2676375] week_gaps_new [2676421, 2676519, 2676817, 2676883, 2676925, 2676969, 2676970, 2677016 ... 2678227, 2678279, 2678296, 2678307, 2678373, 2678436, 2678449, 2678490] remedied_past_all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2675958, 2675960, 2676015, 2676117, 2676295, 2676305, 2676311, 2676363] all_never_published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2678227, 2678279, 2678296, 2678307, 2678373, 2678436, 2678449, 2678490] VAERS_IDs 896637 to 2678496 expected 1781861 all_ever 1751482 gaps 30410 127672 dropped in df_data due to no covid VAX_TYPE involved in the report 161520 dropped in df_vax due to no covid VAX_TYPE involved in the report 157947 dropped in df_syms due to no covid VAX_TYPE involved in the report 1591244 covid reports to work with Repeat sentence removal in SYMPTOM_TEXT, showing each next larger if any (takes time) 799 SYMPTOM_TEXT field repeat sentences deduped in 143 reports, max difference 6192 bytes in VAERS_ID 1645697 Shortening some field values in VAX_NAME, VAX_MANU Merging DATA into VAX 1675290 rows in df_data_vax Aggregating symptoms into symptom_entries string, new column Combining symptoms column items. Grouping by VAERS_ID ... Appending each symptom in new column called symptom_entries Cleaning multiple delimiters due to empty columns Merging symptom_entries into df_data_vax 1675290 rows in df_data_vax_syms_consolidated Saving result into one file: vaers_consolidated/2023-09-01_VAERS_CONSOLIDATED.csv Consolidation of 2023-09-01 done Flattening Aggregate/flatten VAX items. Grouping by VAERS_ID 1591244 rows in df_vax_flat Merging DATA into VAX flattened 1591244 rows in df_data_vax_flat Merging symptom_entries into df_data_vax_syms_flat Saving result into one file: vaers_flattened/2023-09-01_VAERS_FLATTENED.csv 1591244 rows in vaers_flattened/2023-09-01_VAERS_FLATTENED.csv Flattening of 2023-09-01 done open vaers_flattened/2023-08-25_VAERS_FLATTENED.csv ... Highest VAERS_ID 2676388 Using flat 2023-09-01 already in memory, 1591244 rows Previous changes file for changes, cell_edits and status columns open vaers_changes/2023-08-25_VAERS_CHANGES.csv ... Highest VAERS_ID 2676388 Comparing 2023-08-25 v. 2023-09-01 1591244 this drop total covid 1589965 previous total covid 1589851 identical set aside 1393 this drop to work with 114 previous to work with 1279 difference 1323 new in 2023-09-01 1 delayed this week 45 deleted this week kept 0 restored this week Column value changes DIED 2671577 [] <> Y OTHER_MEDS 2527375 [] <> LENVIMA SYMPTOM_TEXT 2498744 damn <> dang SYMPTOM_TEXT 1625768 PHIFDA of unknown race ethnicity RSI <> regulatory authority the SYMPTOM_TEXT 1507349 Asahi Kasei Pharma potassium Ohara Pharmaceutical Co Ltd Company Sawai Company)from <> SYMPTOM_TEXT 1593733 Moderna PMDA <> company regulatory authority SYMPTOM_TEXT 2512116 White <> SYMPTOM_TEXT 1800066 PHIFDA Central vigilance MAC <> RA the SYMPTOM_TEXT 2538610 Comparative Effectiveness of Moderna Pfizer-BioNTech and Janssen Johnson Vaccines in Preventing COVID-19 Hospitalizations Among Adults Without Immunocompromising Conditions March August 2021 <> SYMPTOM_TEXT 1640657 PHIFDA <> RA SYMPTOM_TEXT 1672028 PHIFDA of unspecified race and ethnic origin <> SYMPTOM_TEXT 1590587 PHIFDA,PH-PHFDA-30009787 of unspecified race and ethnicity <> regulatory authority PH-PHFDA-30009787 SYMPTOM_TEXT 1664959 PHIFDA <> SYMPTOM_TEXT 1851727 COVAES <> online SYMPTOM_TEXT 1654891 of unknown race and ethnicity <> SYMPTOM_TEXT 1669165 PHIFDA with unspecified unknown race and ethnicity <> SYMPTOM_TEXT 1642699 with an unspecified race and ethnic origin <> SYMPTOM_TEXT 1767937 Clinic/Veterans Administration facility <> Clinic/facility SYMPTOM_TEXT 1743181 Pharmaceuticals <> regulatory authority SYMPTOM_TEXT 1743182 Pharmaceuticals <> regulatory authority SYMPTOM_TEXT 1462807 COVID-19 Adverse Event Self-Reporting <> the regulatory authority SYMPTOM_TEXT 1625769 PHIFDA of unspecified race ethnic origin RSI <> regulatory authority the SYMPTOM_TEXT 1755556 PHIFDA race ethnicity RSI <> the regulatory authority SYMPTOM_TEXT 1504406 COVID-19 Adverse Event Self-Reporting Solution and from <> SYMPTOM_TEXT 1690829 Pfizer <> SYMPTOM_TEXT 1554749 coupon golf <> paperwork sport game SYMPTOM_TEXT 1479919 PRIVACY <> local SYMPTOM_TEXT 1784543 Pfizer-sponsored <> SYMPTOM_TEXT 1669815 PHIFDA of unspecified race and ethnic origin <> SYMPTOM_TEXT 1672033 PHIFDA with an unspecified race and ethnic origin <> SYMPTOM_TEXT 1547161 PHIFDA of unspecified race and ethnic origin <> regulatory authority SYMPTOM_TEXT 1571525 PHIFDA <> RA SYMPTOM_TEXT 1577127 RSI <> the regulatory authority SYMPTOM_TEXT 2514857 159558 <> SYMPTOM_TEXT 1771048 Pharmaceuticals Reference A <> the RA(Reference RA SYMPTOM_TEXT 1708017 SAE <> AE SYMPTOM_TEXT 2541359 and Person Nbr as 44239 <> SYMPTOM_TEXT 2543206 as 6556501 <> SYMPTOM_TEXT 2513689 Phase 3 Randomized Stratified Observer-Blind Placebo-Controlled <> SYMPTOM_TEXT 1817009 PHIFDA <> SYMPTOM_TEXT 1496570 COVAES <> SYMPTOM_TEXT 1554804 Takeda <> the regulatory authority SYMPTOM_TEXT 2548790 Select Regional <> Center SYMPTOM_TEXT 1517493 Pharmaceuticals Moderna orthopedic surgery general <> regulatory authority company SYMPTOM_TEXT 1717466 Pfizer <> SYMPTOM_TEXT 1763473 GENERAL INVESTIGATION TARGETING THE VACCINES HEALTH CARE PROVIDERS HCPS WHO ARE VACCINATED IN EARLY POST-APPROVAL PHASE FOLLOW-UP STUDY C4591006 <> SYMPTOM_TEXT 1763556 to Takeda MAH <> RA SYMPTOM_TEXT 1500157 Pfizer <> SYMPTOM_TEXT 1688210 Pharmaceuticals and Medical Devices Agency PMDA <> regulatory authority SYMPTOM_TEXT 1676725 PRIVACY <> a SYMPTOM_TEXT 1533644 privacy <> SYMPTOM_TEXT 2501320 159558 <> SYMPTOM_TEXT 1772642 COVAES <> the regulatory authority SYMPTOM_TEXT 1499939 Medical Devices Agency MDA <> regulatory authority RA SYMPTOM_TEXT 1782171 patient herself <> SYMPTOM_TEXT 1892100 in United Kingdom <> SYMPTOM_TEXT 1743244 NCSP 002496 <> VAX_DOSE_SERIES 1708209 1|UNK <> 1 VAX_DOSE_SERIES 1507349 UNK|1 <> 1 VAX_DOSE_SERIES 1886778 2|UNK <> 2 VAX_DOSE_SERIES 2527375 UNK|UNK <> UNK VAX_LOT 3 cells of trivial non-letter differences ignored VAX_LOT 1886778 FE1573|Unknown <> FE1573 2 duplicates dropped in df_three_columns 3 VAX_MANU Pfizer-BionT|Unknown <> Pfizer-BionT [1708209, 1886778, 2527375] VAX_MANU 1507349 Unknown|Pfizer-BionT <> Pfizer-BionT 2 duplicates dropped in df_three_columns 3 VAX_NAME C19 Pfizer-BionT|Not Specified NO BRAND NAME <> C19 Pfizer-BionT [1708209, 1886778, 2527375] VAX_NAME 1507349 Not Specified NO BRAND NAME|C19 <> C19 VAX_ROUTE 2 cells of trivial non-letter differences ignored 1 duplicate dropped in df_three_columns 2 VAX_ROUTE OT|OT <> OT [1507349, 1886778] VAX_SITE 4 cells of trivial non-letter differences ignored 2 duplicates dropped in df_three_columns 3 VAX_TYPE COVID19|UNK <> COVID19 [1708209, 1886778, 2527375] VAX_TYPE 1507349 UNK|COVID19 <> COVID19 symptom_entries 1508421 _|_Muscle spasms_|_ <> symptom_entries 1511731 _|_Swelling_|_ <> symptom_entries 1432221 _|_Circumoral swelling_|_ <> symptom_entries 1508591 _|_Ultrasound scan_|_ <> symptom_entries 1508449 _|_SARS-CoV-2 test_|_ <> symptom_entries 1508983 _|_Impaired work ability_|_ <> symptom_entries 1511857 _|_Pulmonary pain_|_ <> symptom_entries 1508789 _|_Potentiating drug interaction_|_ <> symptom_entries 1072469 _|_Influenza like illness_|_Q fever_|_ <> symptom_entries 1476800 _|_Injection site reaction_|_ <> 11 columns altered 31332 modified reports on 2023-09-01 Writing ... vaers_changes/2023-09-01_VAERS_CHANGES.csv 1 report with the most (18) records/lots/doses: 1900339 1 comparison done Doing stats open stats.csv ... ok column changes: {'SYMPTOM_TEXT': 55, 'symptom_entries': 10, 'VAX_DOSE_SERIES': 4, 'VAX_NAME': 4, 'VAX_MANU': 4, 'VAX_TYPE': 4, 'VAX_ROUTE': 2, 'DIED': 1, 'VAX_LOT': 1, 'OTHER_MEDS': 1, 'CAGE_YR': 0, 'CAGE_MO': 0, 'HISTORY': 0, 'NUMDAYS': 0, 'RECOVD': 0, 'V_FUNDBY': 0, 'TODAYS_DATE': 0, 'SEX': 0, 'ER_ED_VISIT': 0, 'AGE_YRS': 0, 'HOSPITAL': 0, 'HOSPDAYS': 0, 'DISABLE': 0, 'OFC_VISIT': 0, 'PRIOR_VAX': 0, 'BIRTH_DEFECT': 0, 'ALLERGIES': 0, 'VAX_DATE': 0, 'FORM_VERS': 0, 'SPLTTYPE': 0, 'ER_VISIT': 0, 'CUR_ILL': 0, 'VAX_SITE': 0, 'RPT_DATE': 0, 'DATEDIED': 0, 'ONSET_DATE': 0, 'V_ADMINBY': 0, 'L_THREAT': 0, 'X_STAY': 0, 'RECVDATE': 0, 'LAB_DATA': 0, 'STATE': 0} This week 1 delayed/late/gapfill 45 deleted 0 restored 9 cell edits trivial not printed 86 cell edits significant 0 cells emptied entirely 55 writeups changed All time 542236 delayed/late/gapfill 31323 deleted 15 restored 29372261 cell edits trivial not printed 30888 cell edits significant 1475724 cells emptied entirely 7547 writeups changed 30410 never published [896713, 896742, 896875, 896892, 896899, 897093, 897105, 897169 ... 2678227, 2678279, 2678296, 2678307, 2678373, 2678436, 2678449, 2678490] 20 reports cleared of duplicate sentences within them This week 0 hr 27.5 min Overall 0 hr 27.5 min Saving vaers_changes/2023-09-01_VAERS_CHANGES_A.csv, 1048575 rows and vaers_changes/2023-09-01_VAERS_CHANGES_B.csv, 573977 rows No more to do, last set 2023-09-01 >= 2023-09-01 done Done with vaers_flatfile_build.py at line 2375, clock time 2023-09-09 20:11:15.292880 - - - - - - - - - - - - - - - - - - - - - - - -