Projects
National Data Platform
The National Data Platform, or NDP, is a federated and extensible data ecosystem to promote collaboration, innovation, and equitable use of data on top of existing cyberinfrastructure capabilities.
NDP is envisioned as a broad data ecosystem to enable data-enabled and AI-integrated research and education workflows.
NDP aims to:
- Facilitate data registration, discovery and usage through a centralized hub
- Enhance distributed CI capabilities through distributed points of presence
- Cultivate resources for classroom education and data challenges
- Assist research and learning through personalized workspaces

NOURISH
NOURISH empowers people to build small businesses that make fresh food widely available, affordable, and convenient for all.
NOURISH app provides current and future small business owners with access to:
- Chat-based interface available in multiple languages
- Loan and grant information
- Online maps that optimize the placement of fresh food outlets for foot traffic
- Help with navigating the convoluted business permitting process
- AI-enabled guidance on affordable ways to locally source fresh ingredients

AWESOME
A Tri-Store Data System for Multi-Model Analytics
Modern data science applications increasingly use heterogeneous data sources and analytics, leading to growing interest in polystore systems. Instead of a general-purpose polystore system, we present AWESOME (Analytics WorkbEnch for SOcial MEdia), a cutting-edge specialized “tri-store” system tailored to analytics workloads spanning relational, graph, and text data. AWESOME features a powerful domain-specific language, ADIL, which empowers users to concisely express complex applications involving cross-DBMSs queries, text and graph analytical functions, and transformations across the three data types. By incorporating a learned optimizer, it intelligently selects the optimal platform for analytical functions and the most efficient data stores for intermediate processing, which eliminates the need for users to grapple with intricate coding or complex decision-making in heterogeneous data environments.

UCSF Industry Documents
Semantic Search for the Digital Library
The UCSF Digital Library has over 22 million documents containing a heterogeneous collection of research papers, reports, emails, industrial memos and so forth that have been collected to support research in areas like public health, policy and drug industry practices. The goal of the project is to enable researchers to perform progressive, contextual search on documents based on the concepts, entities, events, and their relationships. The contextual search will be enabled by a natural language like interface. In collaboration with researchers and digital librarians from UCSF, we will develop efficient indexing techniques to analyze these documents and offer a low-latency mechanism to keep the indices updated as new documents are added.

Physiological Data Analysis
In collaboration with the Smarr Lab (Shu Chien – Gene Lay Department of Bioengineering & Halicioğlu Data Science Institute, UCSD), we are developing information management techniques for large-scale time-series data that monitor multimodal physiological signals. These signals can come from wearable, in-body, or near-body devices that emit data. Depending on the device, the data may come in a streaming form or in periodically delivered batches. The time-series data may be accompanied by auxiliary data (e.g., medication history, surveys) about the subjects, such that both forms of information can be jointly analyzed. Further, the data may need to be deidentified for analysis and re-identified before analytical results are communicated back to the subjects. Application areas to date include sensor data analysis for COVID-19 prediction, identification of physiological correlates to mental health conditions, detection of pregnancy and prediction of complications, and glucose level prediction for diabetes management.
