{"id":119,"date":"2020-08-28T12:15:45","date_gmt":"2020-08-28T12:15:45","guid":{"rendered":"https:\/\/eslweb.epfl.ch\/?page_id=119"},"modified":"2020-11-19T08:44:20","modified_gmt":"2020-11-19T08:44:20","slug":"current-master-projects","status":"publish","type":"page","link":"https:\/\/eslweb.epfl.ch\/?page_id=119","title":{"rendered":"Current Master Projects"},"content":{"rendered":"\n<p class=\"has-text-align-right wp-block-paragraph\"><a href=\"https:\/\/eslweb.epfl.ch\/cgi-bin\/projects\/selectprojects.pl\">Edit Master Projects<\/a><\/p>\n\n\n<script>\nfunction opendesc(target,bullet) {\n document.getElementById(target).innerHTML = bullet;\n document.getElementById(target).innerHTML += \"<a href=#_ onClick=closedesc(\\\"\"+target+\"\\\",\"+target+\"minus);>[close]<\/a>\";\n}\n\nfunction closedesc(target,bullet) {\n document.getElementById(target).innerHTML = bullet;\n document.getElementById(target).innerHTML += \"<a href=#_ onClick=opendesc(\\\"\"+target+\"\\\",\"+target+\"); style='position:relative;z-index:99;'> [read&nbsp;on]<\/a>\";\n}\n<\/script>\n\n<br><table width='100%' cellspacing='0' cellpadding='8' border='0' id='table3'><tr><td colspan=2><h3>Master Student Assistant Projects<\/h3><h5>(Remunerated, for officially registered EPFL students only)<\/h5><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor699><\/a><b><span style='font-size: 20px;'>Acceleration of TinyML applications using a cutting-edge heterogeneous accelerator platform<\/b><br><script>var project699=\"<p>Edge Artificial Intelligence is a novel computing paradigm that has the potential to revolutionize Internet-of-Things devices. Instead of uploading sensitive data (i.e. audio, video, biosignals) to remote servers, edge-AI devices perform all of their data processing on-board, thus preserving users&rsquo; privacy. However, to execute complex edge-AI operations while maximizing battery lifetime, these devices must be equipped with high-performance, ultra-low power processors.<\/p> <p>To this end, the Embedded Systems Laboratory (ESL) of EPFL has designed HEEPatia, an ultra-low power chip for edge-AI. <span>HEEPatia has been submitted for tape-out in 16nm technology and will be ready for testing in Q4 2025.<\/span> It combines a dual-core implementation of the <a rel='noopener noreferrer nofollow' href='https:\/\/github.com\/esl-epfl\/x-heep' target='_blank'><span style='color: #1155cc'>X-HEEP platform<\/span><\/a> with several state-of-the-art hardware IPs. Among these is the <a rel='noopener noreferrer nofollow' href='https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3489517.3530980' target='_blank'><span style='color: #1155cc'>Very Wide Register Reconfigurable Array (VWR2A)<\/span><\/a>, an architecture that integrates high computational density and wide memory structures to efficiently execute data pre-processing kernels. Furthermore, HEEPatia contains two instances of <a rel='noopener noreferrer nofollow' href='https:\/\/ieeexplore.ieee.org\/abstract\/document\/10964076' target='_blank'><span style='color: #1155cc'>NM-Carus<\/span><\/a>, a near-memory computing platform that accelerates Deep Learning kernels. Finally, HEEPatia includes a 256 kB <a rel='noopener noreferrer nofollow' href='https:\/\/blocksandfiles.com\/2022\/04\/18\/gcram\/' target='_blank'><span style='color: #1155cc'>Gain-Cell Random Access Memory<\/span><\/a> (GCRAM), which incurs area and power reductions compared to traditional SRAM cells.<\/p> <p><span>Up until this point, each of the hardware IPs on HEEPatia has been validated independently on its own test kernels, but there are several tinyML applications that could be accelerated further using a combination of the aforementioned IPs. The goal of this project is to run and optimize a real-world, end-to-end tinyML application using the IPs available on HEEPatia. Furthermore, it can be used to test the trade-offs &ndash; in terms of power, performance, and area &ndash; of using  each IP to execute a given kernel. The student will have regular guidance and feedback throughout the research assistantship.<\/span> <\/p> <p><span>The expected outcomes of this assistantship are:<\/span> <\/p> <ul>     <li>         <p><span>Developing an FPGA implementation of HEEPatia to facilitate application testing<\/span>         <\/p>     <\/li>     <li>         <p><span>Adapting existing kernels of each of the IPs (i.e. FFT, FIR, MAC, matrix addition) to evaluate trade-offs<\/span>         <\/p>     <\/li>     <li>         <p><span>Adapting a tinyML transformer workload to run on the X-HEEP platform of HEEPatia and evaluating the baseline performance<\/span>         <\/p>     <\/li>     <li>         <p><span>Accelerating the end-to-end application using VWR2A, two NM-Carus, and the dual-core CPU with DSP instruction extensions. Evaluate the effects on performance, energy, and accuracy<\/span>         <\/p>     <\/li>     <li>         <p><span>Investigate the energy-vs-accuracy trade-off between weight storing in the GCRAM versus traditional SRAM<\/span>         <\/p>     <\/li> <\/ul> <p><span>The project will be carried out at the ESL at EPFL, one of the world's top-class universities. ESL is an active group (24 Ph.D. students among 45 members) involved in many research aspects. The student will be under the supervision of Ms. Lara Orlandic, Dr. David Mallas&eacute;n Quintana, and Prof. David Atienza. Please provide an updated CV, transcript, and short statement about why you are interested in this position and which qualifications you have (e.g. through coursework, internships, or prior projects) that would make you a good candidate.<\/span> <\/p> <p><span><strong>Assistantship objectives:<\/strong><\/span> <\/p> <ol>     <li>         <p><span>Understanding the HEEPatia platform and functionalities of the IPs<\/span>         <\/p>     <\/li>     <li>         <p><span>Developing an FPGA implementation and validating it using existing kernels<\/span>         <\/p>     <\/li>     <li>         <p><span>Modifying the FFT test kernel of each IP to harmonize the inputs and outputs, then evaluating the performance of each IP. Repeat for other kernels (i.e. FIR, matrix addition)<\/span>         <\/p>     <\/li>     <li>         <p><span>Run a baseline tinyML application on the X-HEEP MCU<\/span>         <\/p>     <\/li>     <li>         <p><span>Accelerating the workload using the dual-core X-HEEP CPU<\/span>         <\/p>     <\/li>     <li>         <p><span>Accelerating the inference of the application using NM-Carus<\/span>         <\/p>     <\/li>     <li>         <p><span>Accelerating the pre-processing of the application using the most performant IP from step #3<\/span>         <\/p>     <\/li>     <li>         <p><span>Evaluating the trade-offs of using the GCRAM for storing parameters (i.e. neural network weights) against the standard SRAM cells<\/span>         <\/p>     <\/li> <\/ol> <p><span><strong>Required knowledge and skills:<\/strong><\/span> <\/p> <ul>     <li>         <p><span>Excellent embedded C programming and debugging<\/span>         <\/p>     <\/li>     <li>         <p><span>Understanding RTL written in SystemVerilog<\/span>         <\/p>     <\/li>     <li>         <p><span>Strong implementation and simulation with FPGAs<\/span>         <\/p>     <\/li>     <li>         <p><span>Ability to work consistently, independently, and ask for help when needed<\/span>         <\/p>     <\/li>     <li>         <p><span>Good communication skills in advanced English<\/span>         <\/p>     <\/li>     <li>         <p><span>Git version control<\/span>         <\/p>     <\/li> <\/ul> <p><span><strong>Type of work:<\/strong> 10% theory analysis, 70% design and simulation, 20% verification and documentation <\/span> <\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ms. Lara Orlandic, Dr. David Mallas\u00e9n Quintana, and Prof. David Atienza.<br> \";<\/script>\n<script>var project699minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ms. Lara Orlandic, Dr. David Mallas\u00e9n Quintana, and Prof. David Atienza.<br>\";<\/script>\n<span id=project699><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ms. Lara Orlandic, Dr. David Mallas\u00e9n Quintana, and Prof. David Atienza.<br> <a href=#_ onclick=opendesc('project699',project699); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor676><\/a><b><span style='font-size: 20px;'>Automation of a semi-custom design and verification flow for ultra-low-leakage always-on circuits relying on differential logic<\/b><br><script>var project676=\"<div class='entry-content mb-5'> \t\t <p>Microcontrollers (MCUs) are used in a wide range of applications ranging from sensor monitoring to robotics and automotive.<\/p> <p>Thanks to their versatility, MCUs are typically chosen as edge computing platforms.&nbsp;<\/p> <p>IoT, wearable, and edge-computing applications are typically profiled in 4 different phases:<\/p> <ol><li>acquisition<\/li><li>pre-processing<\/li><li>processing<\/li><li>transmission or stimulation via actuators<\/li><\/ol> <p>Acquisitions and stimulation can be performed either with external  analog-digital (ADC) and digital-analog (DAC) converters, or with  integrated ones.<\/p> <p>For the latter, the processor (usually implemented in digital) and  the analog components are implemented in the same technology node and  integrated in the same system-on-chip (SoC), requiring  analog-digital-interfaces.<\/p> <p>With a wide collaboration between EPFL, Imperial College London,  Universidad Carlos III de Madrid, and Politecnico di Torino, we are  building HEEPidermis: an SoC that integrates both the processing  elements and the acquisition and stimulation required to obtain  precision measurements of impedance and conductance of the skin in a  low-power and autonomous manner.&nbsp;<\/p> <p>The processor is based on X-HEEP, an open-source RISC-V configurable  and extendable microcontroller, and includes smart pre-processing of  data coming from ADCs, while the acquisition and stimulus components are  based on a set of possible ADCs (VCO-based, &Delta;&Sigma; and Level-Crossing), and  a current DAC, respectively. HEEPidermis also includes an FLL and LDO  to reduce the required count of off-chip components.&nbsp;<\/p> <p>The full chip is going to be implemented into the TSMC 65 LP technology.&nbsp;<\/p> <p>This project proposes to:<\/p> <ul><li>Do the full-custom layout of Analog\/Digital\/Mixed-Signal blocks.  These include ADCs and DACs, as well as mixed-signal components. <ul><li>Specifications and schematics will be provided<\/li><li>The layout will be done by using Cadence Virtuoso <ul><li>Possibly requiring modifying the Schematic<\/li><\/ul> <\/li><li>The layout must be equivalent to the schematic (LVS) and performed with Calibre<\/li><li>The layout must be DRC-free and performed with Calibre<\/li><li>Simulations with the parasitic extracted netlist will need to be performed to evaluate the final design<\/li><li>LEF and LIB need to be generated so that such IPs can be integrated into the digital-on-top flow used to build HEEPidermis<\/li><\/ul> <\/li><li>Design level shifters that interface the asynchronous-analog side with the synchronous-digital side.&nbsp; <ul><li>Specifications will be provided &ndash; as well as a baseline schematic and layout<\/li><li>The schematic and the layout will be done by using Cadence Virtuoso<\/li><li>The layout must be equivalent to the schematic (LVS) and performed with Calibre<\/li><li>The layout must be DRC-free and performed with Calibre<\/li><li>Simulations with the parasitic extracted netlist will need to be performed to evaluate the final design<\/li><li>LEF and LIB need to be generated so that such IPs can be integrated into the digital-on-top flow used to build HEEPidermis<\/li><\/ul> <\/li><li>Perform the back-end of the digital and analog components of HEEPidermis using Cadence Innovus <ul><li>LEF and LIB of the analog\/mixed-signal components are generated by the tasks above<\/li><li>the processor netlist and constraints are provided<\/li><li>AMS verification of the whole HEEPidermis will be performed together with the rest of the team<\/li><\/ul> <\/li><\/ul> <p>The project will be carried out at the ESL at EPFL, one of the world&rsquo;s top-class universities.<\/p> <p><strong>Required knowledge and skills:<\/strong><\/p> <ul><li>Synopsys Design Compiler<\/li><li>Cadence Innovus and Virtuoso<\/li><li>Siemens Calibre<\/li><li>Abstraction files syntax of LEF, LIB, etc.<\/li><li>Spice\/HSpice and SystemVerilog\/Verilog<\/li><li>Good analytical skills<\/li><li>Teamwork and git<\/li><\/ul> <p><strong>Appreciated skills:<\/strong><\/p> <ul><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English&nbsp;<\/li><\/ul> <p><br \/><strong>Type of work:<\/strong> 10% theory analysis, 90% design and simulation<\/p> \t<\/div>                         <div class='post-nav py-md-1'>                                 <div class='nav-prev'>         <\/div><\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Dr. Alexandre Levisse, Prof. Matias Miguez, Robin Leplae, Juan Sapriza, Prof. David Atienza<br> Contact email: <a href='mailto:davide.schiavone@epfl.ch;alexandre.levisse@epfl.ch;matias.miguez@epfl.ch;robin.leplae@epfl.ch;juan.sapriza@epfl.ch;david.atienza@epfl.ch?subject=Automation of a semi-custom design and verification flow for ultra-low-leakage always-on circuits relying on differential logic'>davide.schiavone@epfl.ch;alexandre.levisse@epfl.ch;matias.miguez@epfl.ch;robin.leplae@epfl.ch;juan.sapriza@epfl.ch;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project676minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Dr. Alexandre Levisse, Prof. Matias Miguez, Robin Leplae, Juan Sapriza, Prof. David Atienza<br>\";<\/script>\n<span id=project676><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Dr. Alexandre Levisse, Prof. Matias Miguez, Robin Leplae, Juan Sapriza, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project676',project676); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor626><\/a><b><span style='font-size: 20px;'>FPGA accelerator design for urban digital twins<\/b><br><script>var project626=\"<p>Mixed-Integer Linear Programming (MILP) is a critical optimization technique widely used in a variety of industries to solve complex decision-making problems that involve both continuous and discrete variables. MILP formulations are employed in logistics for optimizing supply chain networks, in finance for portfolio optimization and risk management, in energy systems for grid management and resource allocation, and in manufacturing for production planning and scheduling. These industries rely on MILP to make informed decisions that balance multiple objectives such as cost, efficiency, and environmental impact.<\/p> <p>A key application of MILP in the context of Urban Digital Twins is the Renewable Energy Hub Optimizer (REHO), a decision support tool designed for sustainable urban energy system planning. REHO addresses the optimal design and operation of urban energy systems, balancing multi-objective considerations such as economic viability, environmental impact, and energy efficiency. By leveraging MILP, REHO can simultaneously optimize capacities and operational strategies, providing robust solutions for sustainable urban development. However, the computational demands of MILP, particularly for large-scale urban models, present challenges in terms of energy consumption and processing time.<\/p> <p>Despite its widespread application, MILP is computationally intensive, especially as problem sizes grow, which can result in significant energy consumption and long processing times. This has driven the need for more efficient computational solutions. Commercial (GUROBI) and open-source (HiGHS) methods rely on general-purpose processor parallel execution, which can lead to high energy consumption and long execution times, especially as problem sizes grow. This motivates the exploration of hardware accelerators, particularly Field-Programmable Gate Arrays (FPGAs), which offer the potential for parallel processing and energy-efficient computation.<\/p> <p><strong>Project Objectives<\/strong><\/p> <p>The motivation of this project is to design and develop novel energy-efficient hardware accelerators for solving MILP problems on FPGA platforms. For fast prototyping, we will adopt a hardware-software co-design approach to explore novel domain-specific accelerators using High Level Synthesis. By leveraging the parallelism and reconfigurability of FPGAs, the goal is to achieve significant energy savings while maintaining or improving the computational performance of MILP solvers.<\/p> <p><strong>Obligatory Objectives<\/strong><\/p> <ol> <li><strong>Understand and Describe MILP Kernels:<\/strong> <ul> <li>Conduct an in-depth analysis of key MILP algorithms, focusing on the most computationally intensive kernels.<\/li> <li>Identify the parts of the MILP solving process that can benefit most from hardware acceleration.<\/li> <\/ul> <\/li> <li><strong>Implement and Optimize MILP Kernels on FPGA Using High-Level Synthesis (HLS):<\/strong> <ul> <li>Design and implement at least two critical kernels of the MILP solving process using HLS tools.<\/li> <li>Focus on optimizing these kernels for energy efficiency, resource usage, and latency.<\/li> <\/ul> <\/li> <li><strong>Evaluate and Explore Design Trade-offs:<\/strong> <ul> <li>Conduct a comprehensive design space exploration to evaluate the trade-offs between energy consumption, computational latency, resource usage, throughput and hypermeters of the solver.<\/li> <li>Present at least three Pareto optimal solutions for each implemented kernel, demonstrating the balance between these metrics.<\/li> <\/ul> <\/li> <li><strong>Experimental Evaluation on FPGA:<\/strong> <ul> <li>Test and validate the designed accelerators on a selected FPGA platform.<\/li> <li>Measure and compare energy consumption, execution time, and solution accuracy against traditional CPU-based implementations.<\/li> <li>[Optional] Experimental validation of the proposed designs on UrbanTwin MILP problems, showcasing the advantages of FPGA-based acceleration.<\/li> <\/ul> <\/li> <\/ol> <p><strong>Additional Objectives<\/strong><\/p> <ol> <li><strong>Integration with Full MILP Solver Pipeline:<\/strong> <ul> <li>Integrate the optimized MILP kernels into a complete MILP solver pipeline on FPGA.<\/li> <li>Evaluate the performance of the integrated system in terms of energy efficiency and computational throughput.<\/li> <\/ul> <\/li> <li><strong>Qualitative Analysis of Solution Accuracy:<\/strong> <ul> <li>Assess the accuracy and quality of the solutions provided by the FPGA-accelerated MILP solver.<\/li> <li>Compare the results with those obtained from traditional software-based solvers, focusing on metrics such as solution optimality and robustness.<\/li> <\/ul> <\/li> <\/ol> <p><strong>Required Knowledge and Skills<\/strong><\/p> <ul> <li>Proven experience in hardware description languages such as VHDL or Verilog.<\/li> <li>Experience with Xilinx FPGAs or Intel FPGAs and associated development tools.<\/li> <li>Strong programming skills in Python, C\/C++, and familiarity with High-Level Synthesis (HLS) tools.<\/li> <li>Understanding of optimization techniques, particularly MILP.<\/li> <\/ul> <p><strong>Type of Work<\/strong><\/p> <ul> <li><strong>Theoretical Analysis (30%):<\/strong> Involves understanding MILP algorithms, energy-efficient design principles, and FPGA architecture.<\/li> <li><strong>Design and Experimentation (70%):<\/strong> Focuses on the implementation, optimization, and evaluation of the MILP kernels on FPGA.<\/li> <\/ul> <p>This project will be conducted under the supervision of experts in hardware design and optimization from the Embedded Systems Laboratory (ESL) and Industrial Process and Energy Systems Engineering (IPESE), providing the student with an opportunity to contribute to cutting-edge research in energy-efficient computing.<\/p> <p><strong>References<\/strong><\/p> <ul> <li>Renewable Energy Hub Optimizer (REHO): <a href='https:\/\/reho.readthedocs.io\/en\/main\/'>https:\/\/reho.readthedocs.io\/en\/main\/<\/a><\/li> <li>MILP with Gurobi (commercial): <a href='https:\/\/www.gurobi.com\/resources\/chapter-1-why-mixed-integer-programming-mip\/'>https:\/\/www.gurobi.com\/resources\/chapter-1-why-mixed-integer-programming-mip\/<\/a><\/li> <li>HiGHS (open source): <a href='https:\/\/highs.dev\/'>https:\/\/highs.dev\/<\/a><\/li> <li>Vitis HLS User Guide: <a href='https:\/\/docs.amd.com\/r\/en-US\/ug1399-vitis-hls'>https:\/\/docs.amd.com\/r\/en-US\/ug1399-vitis-hls<\/a><\/li> <\/ul><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Denisa-Andreea Constantinescu, Rub\u00e9n Rodr\u00edguez \u00c1lvarez, Ana Catarina Gouveia Braz, C\u00e9dric Terrier, David Atienza Alonso <br> \";<\/script>\n<script>var project626minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Denisa-Andreea Constantinescu, Rub\u00e9n Rodr\u00edguez \u00c1lvarez, Ana Catarina Gouveia Braz, C\u00e9dric Terrier, David Atienza Alonso <br>\";<\/script>\n<span id=project626><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Denisa-Andreea Constantinescu, Rub\u00e9n Rodr\u00edguez \u00c1lvarez, Ana Catarina Gouveia Braz, C\u00e9dric Terrier, David Atienza Alonso <br> <a href=#_ onclick=opendesc('project626',project626); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor516><\/a><b><span style='font-size: 20px;'>Firmware development for efficient biosignal data management<\/b><br><script>var project516=\"<div class='entry-content'>              <p>Millions of people worldwide suffer from chronic respiratory  disorders, such as chronic cough, COPD, and lung diseases. In order to  assess treatment efficacies and provide individualized care, we need a  low-cost, noninvasive means of continuously monitoring patients from  their homes. At the Embedded Systems Laboratory (ESL) of EPFL, we have  an ongoing research project that leverages wearable biosignal monitoring  technologies to extract valuable physiological information from  patients over a 24-hour period. We have developed a chest-worn device,  based on the STM32G474 microcontroller, that records audio and kinematic  biosignals that are saved to an on-board Flash memory and subsequently  processed offline. Eventually, we plan to implement Edge-AI capabilities  on the device, meaning that the signal processing algorithms are  implemented on the device itself.<\/p>    <p><strong>Tasks:<\/strong><\/p>    <p>The student assistant will execute various tasks that are crucial to  the firmware development of the embedded device. They will autonomously  work on various sub-projects, report their results to the lab personnel,  and thoroughly test and document their code. The tasks include but are  not limited to:<\/p>    <ul><li>Implementing data compression algorithms to efficiently manage the Flash storage while maintaining signal quality<\/li><li>Implement unit testing for developed modules<\/li><li>Changing the Flash memory structure to overcome storage limitations<\/li><li>Implementing and testing an open-source Flash driver<\/li><li>Porting signal processing algorithms from Python to C for Edge-AI execution<\/li><li>Recording data to test the robustness of the hardware and firmware<\/li><\/ul>    <p>A testing board will be made available for development.<\/p>    <p><strong>Required Skills<\/strong>:<\/p>    <ul><li>Experience programming in C and Python<\/li><li>Experience debugging software for embedded devices<\/li><li>Excellent code documentation<\/li><li>Knowledge of version control (ex. Git)<\/li><li>Interest in biosignal processing \/ wearable health monitoring<\/li><\/ul>    <p><strong>Desired Skills:<\/strong><\/p>    <ul><li>Familiarity with the STM32 Cube IDE<\/li><li>Knowledge of data compression techniques<\/li><li>Experience developing Flash drivers<\/li><li>Experience in embedded signal processing<\/li><\/ul>    <p><strong>Contact:<\/strong><\/p>    <p>To apply for or ask questions about this student assistant position, please contact Lara Orlandic (<a href='mailto:lara.orlandic@epfl.ch'>lara.orlandic@epfl.ch<\/a>) and J&eacute;r&ocirc;me Thevenot (<a href='mailto:jerome.thevenot@epfl.ch'>jerome.thevenot@epfl.ch<\/a>).  Please provide an updated CV, transcript, and short statement about why  you are interested in this position and which qualifications you have  (e.g. through coursework, internships, or prior projects) that would  make you a good candidate.<\/p>         <\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr J\u00e9r\u00f4me Thevenot, Lara Orlandic<br> \";<\/script>\n<script>var project516minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr J\u00e9r\u00f4me Thevenot, Lara Orlandic<br>\";<\/script>\n<span id=project516><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr J\u00e9r\u00f4me Thevenot, Lara Orlandic<br> <a href=#_ onclick=opendesc('project516',project516); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><a href=https:\/\/www.sensemodi.com\/ target=_blank title='Sensemodi'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/180.png width=70 alt='Sensemodi'><\/a><\/td><\/tr><tr><td colspan=2><h3> Master Projects<br><br><\/h3><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor745><\/a><b><span style='font-size: 20px;'>Precision-Scalable Systolic Arrays for High-Efficiency LLM Inference Acceleration<\/b><br><script>var project745=\"<p>This project focuses on the design and implementation of a scalable-bitwidth systolic array, a specialized hardware accelerator aimed at improving the computational efficiency of modern artificial intelligence workloads. Its primary objective is to develop an architecture that can dynamically support varying numerical precisions through a template-based design, ranging from low-bit quantization (e.g., INT2\/INT3\/INT4) to higher-precision formats such as FP16.<\/p>    <p>A systolic array consists of a network of tightly coupled data-processing units, referred to as Processing Elements (PEs), through which data flows in a structured manner. Each PE performs a small portion of the overall computation&mdash;typically multiply-accumulate (MAC) operations&mdash;and forwards intermediate results to neighboring elements. Systolic arrays architecture enables massive parallelism and high data reuse. By significantly reducing accesses to off-chip memory, they achieve very high energy efficiency for matrix multiplication workloads, which form the computational backbone of deep learning and large language models.<\/p>    <p>Despite their efficiency, state-of-the-art systolic arrays are typically limited to a single, fixed bitwidth. Most existing designs target a specific numerical format, such as INT8 or FP16, which makes them less adaptable to the diverse precision requirements of modern workloads. As a result, these architectures struggle to efficiently support workloads with varying numerical characteristics, limiting their flexibility and broader applicability.<\/p>    <p>To address this limitation, this project proposes a reconfigurable, scalable-bitwidth systolic array. To this end, the student will be tasked with the development of a template-based PE architecture that enables the generation of practical systolic arrays supporting varying bitwidths. Multiple PEs can be dynamically grouped to form scalable matrix-multiplication accelerators with configurable precision. This approach aims to bridge the gap between flexibility and efficiency, enabling hardware accelerators that better match the diverse precision demands of modern AI models.<\/p>    <p><strong>Tasks description<\/strong><\/p>    <ol class='wp-block-list'> <li>Understand the architecture of the TiC-SAT systolic array under development at ESL_EPFL, including its PE design and interconnects.<\/li>    <li>Implement a PE design template that supports various bitwidth configuration.<\/li>    <li>Generate and test a heterogeneous systolic array hardware designs using the scalable-bitwidth PEs to enable flexible precision support.<\/li>    <li>Automate the integration of systolic array instances in RISC-V systems-on-chip within the X-HEEP open-hardware framework.<\/li> <\/ol>    <p><strong>Project objectives<br \/><\/strong>The fulfillment of the following objective is required for a passing grade (4.0)<\/p>    <ul class='wp-block-list'> <li>Extend the TiC-SAT PE design to support multiple bitwidth, including a testbench and testsuite to test and validate the design.<\/li>    <li>Create a template-based generator of the systolic array. The template must enable the generation of instances of the systolic array from configuration parameters, specifying the array size, supported bitwidths and the arrangement of PEs.<\/li>    <li>Characterize the runtime latency and energy efficiency of the modified systolic array for at least four different bitwidth configurations.<\/li>    <li>Integrate an instance systolic array design into the XHEEP [2], and test the performance of at least on AI model on the resulting system,<\/li> <\/ul>    <p>The completion of each of the following tasks will add 0.5 extra points to the project grade<\/p>    <ul class='wp-block-list'> <li>Build a functional simulator for the SA in python.<\/li>    <li>Implement the system on an FPGA development board.<\/li>    <li>Automate the template systolic array integration into XHEEP.<\/li>    <li>Explore the rousource\/accuracy trade-offs realized by different systolic array configurations.<\/li> <\/ul>    <p><strong>Required knowledge and skills<\/strong><\/p>    <ul class='wp-block-list'> <li>Proficiency in RTL design and programming (e.g., VHDL or Verilog).<\/li>    <li>Basic understanding of computer architecture.<\/li>    <li>Strong analytical thinking and scientific curiosity.<\/li> <\/ul>    <p><strong>References<\/strong><\/p>    <p>[1] A. Amirshahi, J. Klein, G. Ansaloni, D. Atienza, &ldquo;TiC-SAT: Tightly-coupled Systolic Accelerator for Transformers&rdquo;,&nbsp; 28th Asia and South Pacific Design Automation Conference (ASP-DAC &rsquo;23), Tokyo, Japan, doi: 10.1145\/3566097.3567867<\/p>    <p>[2] S. Machetti, P. D. Schiavone, G. Ansaloni, M. Pe&oacute;n-Quir&oacute;s and D. Atienza, &ldquo;X-HEEP: An Open-Source, Configurable and Extendible RISC-V Platform for TinyAI Applications,&rdquo;&nbsp;<em>2025 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)<\/em>, Kalamata, Greece, 2025, pp. 1-6, doi: 10.1109\/ISVLSI65124.2025.11130281.<\/p>    <p><strong>Type of work<\/strong><\/p>    <ul class='wp-block-list'> <li>80% <strong>HW design<\/strong>: development and implementation of the scalable-bitwidth systolic array hardware.<\/li>    <li>20% <strong>Performance Evaluation<\/strong>: Benchmarking and analysis of various machine learning workloads on the designed hardware.<\/li><\/ul><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Rub\u00e9n Rodr\u00edguez \u00c1lvarez, Yuxuan Wang, Dr. Giovanni Ansaloni, Prof. David Atienza<br> Contact email: <a href='mailto:ruben.rodriguezalvarez@epfl.ch; yuxuan.wang@epfl.ch; giovanni.ansaloni@epfl.ch; david.atienza@epfl.ch?subject=Precision-Scalable Systolic Arrays for High-Efficiency LLM Inference Acceleration'>ruben.rodriguezalvarez@epfl.ch; yuxuan.wang@epfl.ch; giovanni.ansaloni@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project745minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Rub\u00e9n Rodr\u00edguez \u00c1lvarez, Yuxuan Wang, Dr. Giovanni Ansaloni, Prof. David Atienza<br>\";<\/script>\n<span id=project745><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Rub\u00e9n Rodr\u00edguez \u00c1lvarez, Yuxuan Wang, Dr. Giovanni Ansaloni, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project745',project745); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor742><\/a><b><span style='font-size: 20px;'>VersaSens Bluetooth Low Energy (BLE) Communication and High-Accuracy Time Synchronization for Distributed Sensor Networks <\/b><br><script>var project742=\"<p>The objective of this project is to implement robust Bluetooth Low Energy (BLE) communication and high-precision time synchronization across multiple VersaSens [1] platforms. Developed at the Embedded Systems Laboratory (ESL), VersaSens is a modular, multimodal, extendable, and reconfigurable Edge-AI platform. As its original configuration, it includes three sensing modules and two processing modules, with the option to integrate additional custom-designed modules, providing a flexible foundation for diverse applications.<\/p><p>In the first phase of the project, a star-topology BLE network will be designed and implemented, in which multiple VersaSens nodes operate as peripherals and communicate with a designated VersaSens node acting as the central. This bidirectional communication will support reliable data exchange as well as precise time synchronization between all nodes, with a target synchronization accuracy of better than 10 microseconds.<\/p><p>In the second phase, the VersaSens node acting as the BLE central for the sensor network will also operate concurrently as a BLE peripheral in communication with an external device (PC, tablet or mobile phone) acting as the central. In this configuration, the VersaSens hub will aggregate and stream real-time multimodal sensor data received from all connected VersaSens nodes to the external device. This communication link shall additionally enable time synchronization between the VersaSens hub and the external device, with a target accuracy of better than 10 microseconds.<\/p><p>As an optional extension, two signals one start\/stop signal and one trigger signal generated by the external device will be received by the VersaSens hub and aligned with EEG and other sensor data streams with an accuracy better than 10 microseconds. This trigger event will be timestamped and either stored locally on the microSD card or streamed over BLE, depending on the VersaSens hub\u2019s operating mode.<\/p><p>All implementations will be fully validated using VersaSens platforms. Validation will include the development of a demonstration application capable of visualizing real-time data acquisition, communication, and time synchronization across multiple nodes, verifying the system\u2019s performance and synchronization accuracy.<\/p><p><strong>&nbsp;<\/strong><\/p><p><strong>Mandatory t<\/strong><strong>asks:<\/strong><\/p><p>Completion of <strong>all<\/strong> these tasks is required to pass the exam and obtain a grade of <strong>4<\/strong>. Failure to complete any of these tasks will result in <strong>no pass<\/strong>:<\/p><ol><li>Become familiar with the VersaSens platform, including its hardware architecture, firmware framework, communication interfaces, and practical operation.<\/li><li>Become familiar with BLE communication in a star topology, including its mechanisms for reliable data exchange and high-accuracy time synchronization.<\/li><li>Set up a star-topology BLE network consisting of three VersaSens nodes operating as peripherals and one VersaSens node operating as the central (hub), and design and implement the corresponding firmware to ensure reliable and seamless bidirectional communication. Measure and characterize the maximum achievable communication throughput.<\/li><li>Design and implement firmware for all VersaSens nodes to achieve time synchronization across the sensor network with an accuracy better than 10 microseconds.<\/li><li>Configure the VersaSens hub node to operate concurrently as a BLE peripheral communicating with an external device (PC, tablet, or mobile phone) acting as the central.<\/li><li>Design and implement firmware for the hub to aggregate, and stream in real time all sensor data received from the peripheral VersaSens nodes to the external device. Measure and characterize the maximum achievable communication throughput.<\/li><li>Design and implement firmware to enable time synchronization between the VersaSens hub and the external device, achieving synchronization accuracy better than 10 microseconds.<\/li><li>Test, verify, and validate all implemented functionalities on VersaSens platforms and the external device, including communication reliability, data integrity, and synchronization performance.<\/li><li>Develop a demonstration application to visualize, monitor, and validate real-time data acquisition, BLE communication, and time synchronization across all VersaSens nodes and the external device.<\/li><li>Prepare and deliver a complete documentation package for upload to the VersaSens GitLab repository, including firmware source code (C\/C++), host-side software (e.g., Python), configuration files, and user and developer documentation.<\/li><\/ol><p><strong>&nbsp;<\/strong><\/p><p><strong>Optional tasks: <\/strong><\/p><p>Once all mandatory tasks have been completed and a grade of <strong>4<\/strong> has been obtained, each optional task completed will contribute an additional <strong>0.5<\/strong> points to the final grade, up to a maximum grade of <strong>6<\/strong>:<\/p><ol><li>Design and develop an application on an external device (PC or tablet) capable of generating two timing signals for an experimental scenario: (i) start and stop control signals, and (ii) event trigger signals. These signals must be transmitted to the VersaSens hub with a timing accuracy better than 10 microseconds.<\/li><li>Design and implement firmware on the VersaSens hub to receive these signals, accurately timestamp and align them with EEG and all other aggregated sensor data, and depending on the system\u2019s operating mode, store them locally and\/or stream them together with the aggregated data to the external device.<\/li><li>Extend the external device application to visualize the generated control and trigger signals alongside real-time EEG and other sensor data, enabling validation and verification of the end-to-end timing accuracy of signal transmission, reception, and alignment.<\/li><li>Prepare and deliver a complete documentation package for upload to the VersaSens GitLab repository, including firmware source code (C\/C++), host-side software (e.g., Python), configuration files, and user and developer documentation.<\/li><\/ol><p><strong>&nbsp;<\/strong><\/p><p><strong>Type of work<\/strong><\/p><ul><li>5% hardware setup.<\/li><li>15% system architecture and software design.<\/li><li>55% Firmware design and implementation.<\/li><li>20% Testing, validation and performance evaluation.<\/li><li>5% Documentation and deliverables preparation.<\/li><\/ul><p><strong>&nbsp;<\/strong><\/p><p><strong>Desired skills:<\/strong><\/p><ul><li>Strong C\/C++ programming skills for embedded systems<\/li><li>Good Python programming skills<\/li><li>Experience with embedded systems and RTOS (preferably Zephyr)<\/li><li>Understanding of BLE communication concepts<\/li><li>Strong debugging and problem-solving skills<\/li><li>Familiarity with Git version control<\/li><\/ul><p><strong>&nbsp;<\/strong><\/p><p><strong>Soft skills<\/strong><strong>:<\/strong><\/p><ul><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English<\/li><\/ul><p><strong>&nbsp;<\/strong><\/p><p><strong>Refrences:<\/strong><\/p><p>[1] Najafi, Taraneh Aminosharieh, et al. 'VersaSens: An Extendable Multimodal Platform for Next-Generation Edge-AI Wearables.'&nbsp;<em>IEEE Transactions on Circuits and Systems for Artificial Intelligence<\/em>&nbsp;(2024).<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Taraneh Aminosharieh Najafi, Dr. J\u00e9r\u00f4me Thevenot, Doctoral candidate Wensi Zhang, Prof. Jes\u00fas Grajal de la Fuente, Prof. David Atienza<br> Contact email: <a href='mailto:taraneh.aminoshariehnajafi@epfl.ch; jerome.thevenot@epfl.ch; wensi.zhang@epfl.ch; jesus.grajal@upm.es; david.atienza@epfl.ch?subject=VersaSens Bluetooth Low Energy (BLE) Communication and High-Accuracy Time Synchronization for Distributed Sensor Networks '>taraneh.aminoshariehnajafi@epfl.ch; jerome.thevenot@epfl.ch; wensi.zhang@epfl.ch; jesus.grajal@upm.es; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project742minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Taraneh Aminosharieh Najafi, Dr. J\u00e9r\u00f4me Thevenot, Doctoral candidate Wensi Zhang, Prof. Jes\u00fas Grajal de la Fuente, Prof. David Atienza<br>\";<\/script>\n<span id=project742><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Taraneh Aminosharieh Najafi, Dr. J\u00e9r\u00f4me Thevenot, Doctoral candidate Wensi Zhang, Prof. Jes\u00fas Grajal de la Fuente, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project742',project742); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor737><\/a><b><span style='font-size: 20px;'>Edge-AI Applications on VersaPants: Exercise Recognition and Gait Phase Classification<\/b><br><script>var project737=\"<p>The project aims to implement two real-time applications on a textile-based capacitive sensing system called VersaPants. The first application focuses on gait phase classification, while the second explores exercise classification. The VersaPants system consists of a pair of jogging pants equipped with conductive patches and a sensor unit built on VersaSens [1]. Developed at the Embedded Systems Laboratory (ESL), VersaSens is a modular, multimodal, extendable, and reconfigurable Edge-AI platform. As its original configuration, it includes three sensing modules and two processing modules, with the option to integrate additional custom-designed modules, providing a flexible foundation for diverse applications.<\/p><p>In this project, a capacitive sensing board (CSB), developed as an add-on module for&nbsp; VersaSens, is used for real-time capacitive signal acquisition, and the VersaSens processing module (Main) is used for data storage, on-device processing, and Bluetooth Low Energy (BLE) transmission.<\/p><p>This project extends the work presented in [2], which introduced VersaPants system for lower body motion capturing. In that study the raw signals of the capacitive sensing units were fed into a light weight transformer-based deep learning (DL) model which could execute on a smart watch (Tic watch) for real-time lower body position and angles prediction with competitive results. In this project as a continuation of the previous work, both gait phase classification and exercise recognition applications can use the output prediction of the transformer model as input for newly developed models. Note that each model must be designed to be lightweight enough to run on the watch alongside the original transformer model, which receives real-time signals from VersaPants over BLE as part of a fully integrated end\u2011to\u2011end application. The models can be trained using a newly collected dataset involving 20 subjects performing a set of exercises and a set of walking sessions.<\/p><p>As an optional second part of the project, two additional lightweight models must be developed for the same applications: gait phase classification and exercise recognition. These new models should take the raw capacitive sensing signals as input and output gait phase predictions and exercise labels, respectively. Each model must be deployed on the Main module of the VersaPants system and provide real\u2011time inference while meeting the constraints of the nRF5340 SoC, including limited flash memory, low energy consumption for battery\u2011powered operation, and execution times that fit within the classification window. Training for these models can also be performed using the dataset collected from 20 subjects.<\/p><p><br><\/p><p><strong>Mandatory t<\/strong><strong>asks:<\/strong><\/p><p>Completion of <strong>all<\/strong> these tasks is required to pass the exam and obtain a grade of <strong>4<\/strong>. Failure to complete any of these tasks will result in <strong>no pass<\/strong>:<\/p><ol><li>Become familiar with VersaPants, including its hardware architecture, firmware framework, and practical usage.<\/li><li>Become familiar with the transformer model developed previously and with the dataset gathered from 20 subjects.<\/li><li>Select an appropriate labeling strategy and annotate the gait data according to multiple levels of granularity:<ol><li>the two main phases of walking (stance and swing),<\/li><li>four sub-phases (heel strike, single-support stance, pre-swing, and single support swing),<\/li><\/ol><\/li><li>Design a DL model to receive the output data of the transformer model (joint positions and angles) as input and provide gait phase classification.<\/li><li>Train, test, optimize and fine tune the gait phase classification model for a competitive accuracy.<\/li><li>Optimize, quantize, and convert the designed gait phase classification model to C for embedded deployment, check the accuracy loss of the new model.<\/li><li>Integrate the converted C gait phase classification model into the Tic watch firmware and evaluate it on the device. Test the implemented model using real-time signals acquired from the CSB module of VersaPants.<\/li><li>Using gait classes, estimate in real-time:<ol><li>the number of steps<\/li><li>the length of the steps<\/li><li>cadence<\/li><li>walking speed<\/li><\/ol><\/li><li>Design a DL model to receive the output data of the transformer model (joint positions and angles) as input and provide exercise classification.<\/li><li>Train, test, optimize the exercise classification model for a competitive accuracy.<\/li><li>Optimize, quantize, and convert the designed exercise classification model to C for embedded deployment, check the accuracy loss of the new model.<\/li><li>Integrate the converted C exercise classification model into the Tic watch firmware and evaluate it on the device. Test the implemented model using real-time signals acquired from the CSB module of VersaPants.<\/li><li>Profile the execution performance of both models, including energy consumption, execution time, and computational workload.<\/li><li>Prepare and deliver a complete documentation package for upload to the VersaSens GitLab repository, including all Python code, c\/c++ code, and the firmware.<\/li><\/ol><p><strong>&nbsp;<\/strong><\/p><p><strong>Optional tasks: <\/strong><\/p><p>Once all mandatory tasks have been completed and a grade of <strong>4<\/strong> has been obtained, each optional task completed will contribute an additional <strong>0.2<\/strong> points to the final grade, up to a maximum grade of <strong>6<\/strong>:<\/p><ol><li>Design a DL model to receive the raw capacitive sensing signals as input and provide gait phase classification.<\/li><li>Train, test, optimize the gait phase classification model for a competitive accuracy.<\/li><li>Optimize, quantize, and convert the designed gait phase classification model to C for embedded deployment, check the accuracy loss of the new model.<\/li><li>Integrate the converted C gait phase classification model into the VersaSens firmware and evaluate it on the device. Test the implemented model using real-time signals acquired from the CSB module.<\/li><li>Using gait classes, estimate in real-time:<ol><li>the number of steps<\/li><li>the length of the steps<\/li><li>cadence<\/li><li>walking speed<\/li><\/ol><\/li><li>Design a DL model to receive the raw capacitive sensing signals as input and provide exercise classification.<\/li><li>Train, test, optimize the exercise classification model for a competitive accuracy.<\/li><li>Optimize, quantize, and convert the designed exercise classification model to C for embedded deployment, check the accuracy loss of the new model.<\/li><li>Integrate the converted C exercise classification model into the VersaSens firmware and evaluate it on the device. Test the implemented model using real-time signals acquired from the CSB module.<\/li><li>Profile the execution performance of both models, including energy consumption, execution time, and computational workload.<\/li><\/ol><p><strong>Type of work<\/strong><\/p><ul><li>40% software design.<\/li><li>40% Firmware design.<\/li><li>15% Testing and performance evaluation.<\/li><li>5% Preparation and delivery of the complete documentation package.<\/li><\/ul><p><strong>&nbsp;<\/strong><\/p><p><strong>Desired skills:<\/strong><\/p><ul><li>Strong background in C\/C++ and Python programming<\/li><li>Knowledge of capacitive sensing signals<\/li><li>Background in deep learning models, including transformers<\/li><li>Experience with embedded software development, RTOS environments and Zephyr<\/li><li>Familiarity with version control systems (Git)<\/li><\/ul><p><strong>&nbsp;<\/strong><\/p><p><strong>Soft skills<\/strong><strong>:<\/strong><\/p><ul><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English<\/li><\/ul><p><strong>&nbsp;<\/strong><\/p><p><strong>Refrences:<\/strong><\/p><p>[1] Najafi, Taraneh Aminosharieh, et al. 'VersaSens: An Extendable Multimodal Platform for Next-Generation Edge-AI Wearables.'&nbsp;<em>IEEE Transactions on Circuits and Systems for Artificial Intelligence<\/em>&nbsp;(2024).<\/p><p>[2] Kasap, Deniz, et al. 'VersaPants: A Loose-Fitting Textile Capacitive Sensing System for Lower-Body Motion Capture.'&nbsp;<em>arXiv preprint arXiv:2511.16346<\/em>&nbsp;(2025).<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Taraneh Aminosharieh Najafi, M.Sc. Deniz Kasap, Dr. J\u00e9r\u00f4me Thevenot, Dr. Jonathan Dan, Prof. David Atienza<br> Contact email: <a href='mailto:taraneh.aminoshariehnajafi@epfl.ch; deniz.kasap@epfl.ch; jerome.thevenot@epfl.ch; jonathan.dan@epfl.ch; david.atienza@epfl.ch?subject=Edge-AI Applications on VersaPants: Exercise Recognition and Gait Phase Classification'>taraneh.aminoshariehnajafi@epfl.ch; deniz.kasap@epfl.ch; jerome.thevenot@epfl.ch; jonathan.dan@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project737minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Taraneh Aminosharieh Najafi, M.Sc. Deniz Kasap, Dr. J\u00e9r\u00f4me Thevenot, Dr. Jonathan Dan, Prof. David Atienza<br>\";<\/script>\n<span id=project737><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Taraneh Aminosharieh Najafi, M.Sc. Deniz Kasap, Dr. J\u00e9r\u00f4me Thevenot, Dr. Jonathan Dan, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project737',project737); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor736><\/a><b><span style='font-size: 20px;'>From Telemetry to Flexibility Services: Schema Design for Data Centres as Energy Prosumers<\/b><br><script>var project736=\"<p>Data centres hosting HPC, cloud, and AI workloads are becoming megawatt-scale actors in electricity systems, yet they are still treated as passive consumers by energy markets. Recent work shows that data centres can act as grid prosumers by shifting or modulating computation to follow renewable availability and grid needs. In practice, however, this potential remains largely untapped because power and energy information is fragmented across servers, clusters, and facilities, with no common abstraction that allows operators to aggregate behaviour or expose flexibility in a verifiable and service-oriented way.<\/p> <p>Today, low-level telemetry such as CPU and GPU power counters, cluster-level utilisation metrics, and facility indicators like total power draw or PUE are collected using heterogeneous tools and formats. For privacy, security, and commercial reasons, these data cannot usually be shared at fine granularity or directly associated with individual jobs or users managed by schedulers such as SLURM or Kubernetes. As a result, data centres lack mechanisms to aggregate power and energy information in a way that preserves confidentiality while still conveying meaningful system-level flexibility to external energy stakeholders.<\/p> <p>The motivation of this thesis is to address the foundational abstraction gap by designing a common, machine-readable schema for power and energy data that spans from servers to the data center level, supporting privacy-preserving aggregation. Compatibility with external efforts and common tools of practice such as SLURM job metadata or Kubernetes pod annotations and reports is treated as a hard requirement, while ensuring that sensitive information is abstracted, anonymised, or aggregated before exposure. By structuring historical energy data in this way, the schema enables the derivation of auditable flexibility descriptors that can inform flexible Service-Level Agreements (f-SLAs), laying the groundwork for data centers to operate as trustworthy grid prosumers.<\/p> <p>&nbsp;<\/p> <h1>PROJECT OBJECTIVES<\/h1> <p>OBLIGATORY OBJECTIVES&nbsp;<\/p> <p>O1. <strong>Define a Common Multi-Scale Power and Energy Data Schema<\/strong>:<br \/>Design a unified, machine-readable schema that captures power and energy information at the server, cluster, and data-centre levels. The schema must define clear semantics, units, temporal resolution, aggregation rules, and provenance metadata, and must be suitable for representing historical energy behaviour in heterogeneous HPC and AI infrastructures.<\/p> <p>O2. <strong>Ensure Mandatory Compatibility with SLURM or Kubernetes<\/strong>:<br \/>Demonstrate that the schema can be instantiated and used in at least one production-relevant scheduling environment. The schema must be directly mappable to either SLURM job metadata and accounting reports, or Kubernetes job or pod annotations and execution reports, using supported interfaces without modifying core scheduler behaviour.<\/p> <p>O3. <strong>Enable Privacy-Preserving Aggregation for Flexibility Extraction:<\/strong><br \/>Design aggregation mechanisms that transform fine-grained power and energy data into non-sensitive, system-level indicators. The aggregation must prevent disclosure of job- or user-level information while preserving the ability to infer meaningful flexibility characteristics at the cluster and data-centre level.<\/p> <p>STRETCH OBJECTIVES&nbsp;<\/p> <p>S1. <strong>Derive Flexibility Descriptors for Flexible Service-Level Agreements:<\/strong> Translate aggregated historical power and energy data into flexibility descriptors such as load-shifting capacity, power modulation ranges, and temporal elasticity, and show how these descriptors can inform flexible Service-Level Agreements between data centres and energy providers.<\/p> <p>S2. <strong>Cross-Platform Validation Across HPC and AI Environments<\/strong>:&nbsp; Validate the schema and aggregation mechanisms in both a traditional HPC context and a modern AI platform, demonstrating that equivalent information and flexibility descriptors can be derived from SLURM-based and Kubernetes-based environments.<\/p> <p>S3. <strong>Represent, Validate and Present this work within communities of interest<\/strong>: Collaboration with, presentation to and adoption by data center practitioners and researchers from the Energy Efficiency HPC Working Gourp, active scientific data experimental sites and\/or industry partners would be an exceptional metric of success.<\/p> <h1>REQUIRED KNOWLEDGE AND SKILLS<\/h1> <ul><li>Programming skills (Python and\/or C\/C++)<\/li><li>Familiarity with Linux-based systems<\/li><li>\u200b\u200bSoftware engineering practices: version control, modular code design, documentation, and reproducibility<\/li><li>Scientific curiosity and good analytical skills<\/li><li>Basic understanding of batch scheduling or container orchestration<\/li><li>Prior experience with SLURM or Kubernetes is beneficial but not required<\/li><li>Interest in distributed systems, HPC, or AI infrastructure<\/li><li>Interest in energy systems, sustainability, or infrastructure policy<\/li><\/ul> <h1>Type of Work<\/h1> <ul><li><strong>Theoretical Analysis (30%)<\/strong> <ul><li>Formalization of user intent and sustainability signals.<\/li><li>Abstraction of scheduler-independent semantics.<\/li><\/ul> <\/li><li><strong>Design and Implementation (45%)<\/strong> <ul><li>Design of data acquisition schema.<\/li><li>Implementation for SLURM and Kubernetes environments.<\/li><\/ul> <\/li><li><strong>Testing and Evaluation (25%)<\/strong> <ul><li>Validation of correctness, expressiveness, and overhead.<\/li><li>Cross-platform comparison of schema instantiations.<\/li><\/ul> <\/li><\/ul> <p><strong>Expected Outcomes<\/strong><\/p> <ul><li>To pass, the student will provide a well-defined and documented common schema for power and energy data across data-centre scales and a prototype demonstrating how the schema supports flexibility extraction (obligatory objectives)<\/li><li>For the maximum grade, the student will analyse and demonstrate how this schema can contribute to research on data centres as grid prosumers (any two of the stretch objectives)<\/li><li>A Master&rsquo;s thesis report, presentation, and GitHub repository<\/li><\/ul> <h1>SUPERVISION<\/h1> <p>This project will be conducted in a research environment focused on <strong>sustainable computing and large-scale systems<\/strong>, under the supervision of experts in HPC, AI infrastructure, and energy-aware systems:<\/p> <ul><li>Embedded Systems Laboratory (ESL), EPFL: <strong>Dr. Denisa-Andreea Constantinescu<\/strong> &ndash; <a href='mailto:denisa.constantinescu@epfl.ch'>denisa.constantinescu@epfl.ch<\/a>, <strong>Prof. David Attienza<\/strong><\/li><li>Los Alamos National Laboratory, HPC Workload Management: <strong>Steven Senator<\/strong><\/li><\/ul> <h1>References<\/h1> <ul><li>SLURM Workload Manager:<a href='https:\/\/slurm.schedmd.com\/'> https:\/\/slurm.schedmd.com\/<\/a><\/li><li>Kubernetes Documentation:<a href='https:\/\/kubernetes.io\/'> https:\/\/kubernetes.io\/<\/a><\/li><li>EEHPCWG Operational Data Analytics, <a href='https:\/\/hpc-oda-org.pages.dev\/'>https:\/\/hpc-oda-org.pages.dev\/<\/a>&nbsp;<\/li><li>Horace, Leslie A., Christopher Stokes, Craig S. Walker, Anvitha Ramachandran, William M. Jones, Nathan A. DeBardeleben, and Steven T. Senator. &quot;Energy Forecasting in High Performance Computing Datacenters Using Machine Learning.&quot; In 2025 IEEE International Conference on AI and Data Analytics (ICAD), pp. 1-10. IEEE, 2025.<\/li><\/ul> <p>Colangelo, Philip, Ayse K. Coskun, Jack Megrue, Ciaran Roberts, Shayan Sengupta, Varun Sivaram, Ethan Tiao et al. &quot;Turning AI Data Centers into Grid-Interactive Assets: Results from a Field Demonstration in Phoenix, Arizona.&quot; <a href='https:\/\/arxiv.org\/pdf\/2507.00909v1'><em>https:\/\/arxiv.org\/pdf\/2507.00909v1<\/em><\/a> &nbsp;(2025).<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Denisa Constantinescu, Prof. David Atienza, Steven Senator of Los Alamos National Laboratory, HPC Workload Management<br> Contact email: <a href='mailto:denisa.constantinescu@epfl.ch; david.atienza@epfl.ch; sts@lanl.gov?subject=From Telemetry to Flexibility Services: Schema Design for Data Centres as Energy Prosumers'>denisa.constantinescu@epfl.ch; david.atienza@epfl.ch; sts@lanl.gov<\/a><br>\";<\/script>\n<script>var project736minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Denisa Constantinescu, Prof. David Atienza, Steven Senator of Los Alamos National Laboratory, HPC Workload Management<br>\";<\/script>\n<span id=project736><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Denisa Constantinescu, Prof. David Atienza, Steven Senator of Los Alamos National Laboratory, HPC Workload Management<br> <a href=#_ onclick=opendesc('project736',project736); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/urbantwin.ch target=_blank title='An urban digital twin for climate action: Assessing policies and solutions for energy, water and infrastructure'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/197.png width=70 alt='An urban digital twin for climate action: Assessing policies and solutions for energy, water and infrastructure'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor709><\/a><b><span style='font-size: 20px;'>Eyes on the Unmeasured: Blind Forecasting through Multivariate Attention Networks<\/b><br><script>var project709=\"<p>Large-scale wastewater systems contain many nodes where installing sensors is impractical or too costly. Blind forecasting aims to reconstruct or predict the time evolution of <em>unobserved<\/em> endogenous variables using only the available sensor signals and environmental drivers, such as precipitation. This capability is central to building scalable digital twins [2] that require sparse instrumentation while maintaining high-fidelity, network-wide predictive capabilities.<\/p> <p>Transformers are the state-of-the-art for hydrological forecasting due to their ability to model long-range temporal dependencies and cross-variable interactions. Building on the AquaCast multi-input Transformer [1], this project investigates how such architectures can be extended to perform <strong>blind forecasting<\/strong>&mdash;predicting a target time-series that is not provided as input, but must be inferred from the multivariate structure of the drainage network.<\/p> <p>The key research questions include: Which nodes can be reliably inferred from others? How does hydrodynamic coupling influence blind prediction? How much do exogenous variables such as rainfall contribute to successful reconstruction? The goal is to develop a robust, interpretable blind forecasting module and identify architectural or environmental factors that determine blind predictability.<\/p> <p><strong>TASKS<\/strong><\/p> <ol> <li><strong> Literature Review<\/strong><ul> <li>Perform a literature review on time-series forecasting and blind forecasting, focusing on methods for cross-variable prediction and latent reconstruction.<\/li> <\/ul><\/li>    <li><strong> Dataset Preparation &amp; Statistical Analysis<\/strong><ul> <li>Preprocess and construct the dataset from raw Lausanne wastewater records for blind forecasting scenarios.<\/li> <li>Conduct statistical analysis of time-series (cross-correlation, mutual information, hydrodynamic lags) to assess blind predictability.<\/li> <\/ul><\/li>    <li><strong> Blind Forecasting Model Development<\/strong><ul> <li>Design, implement, and test blind forecasting <strong>multivariate networks<\/strong>, preferably Transformer-based.<\/li> <li>Provide quantitative performance metrics and interpretability of current methods and your design.<\/li> <\/ul><\/li>    <li><strong> Exogenous + Endogenous Blind Forecasting<\/strong><ul> <li>Extend the model to handle blind forecasting with both exogenous and endogenous time-series.<br \/>Conduct a detailed quantitative evaluation and interpretability study.<\/li> <\/ul><\/li>  <\/ol> <p><strong>Optional:<\/strong><\/p> <ul> <li><strong>Robustness of missing samples:<\/strong> Propose embedding approaches to handle missing time-steps even under blind forecasting conditions.<\/li> <li><strong>Missing samples representation:<\/strong> Explore self-supervised reconstruction and evaluate transfer to forecasting tasks.<\/li> <\/ul> <p>&nbsp;<\/p> <p><strong>REQUIREMENTS<\/strong><\/p> <ul> <li>Good Python programming skills.<\/li> <li>Scientific curiosity<\/li> <li>Background in machine learning or signal processing.<\/li> <li>Interest in hydrological systems, digital twins, or representation learning.<\/li> <\/ul> <h3><strong>TYPE OF WORK<\/strong><\/h3> <ul> <li><strong>35% theory<\/strong> (state-of-the-art study, analytical feasibility, cross-sensor dependency analysis).<\/li> <li><strong>65% implementation<\/strong> (architectural design, training pipelines, model evaluation, robustness studies).<\/li> <\/ul> <p><strong>REFERENCES<\/strong><\/p> <p>[1] Abdollahinejad, Golnoosh, Saleh Baghersalimi, Denisa-Andreea Constantinescu, Sergey Shevchik, and David Atienza. &quot;<strong>AquaCast<\/strong>: Urban Water Dynamics Forecasting with Precipitation-Informed Multi-Input Transformer.&quot; <a href='https:\/\/arxiv.org\/abs\/2509.09458'><em>https:\/\/arxiv.org\/abs\/2509.09458<\/em><\/a><\/p> <p>[2] UrbanTwin project: <a href='https:\/\/urbantwin.ch\/'>https:\/\/urbantwin.ch\/<\/a><\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Golnoosh Abdollahinejad, Dr. Denisa Constantinescu, Prof. Dr. David Atienza<br> Contact email: <a href='mailto:golnoosh.abdollahinejad@epfl.ch; denisa.constantinescu@epfl.ch; david.atienza@epfl.ch?subject=Eyes on the Unmeasured: Blind Forecasting through Multivariate Attention Networks'>golnoosh.abdollahinejad@epfl.ch; denisa.constantinescu@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project709minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Golnoosh Abdollahinejad, Dr. Denisa Constantinescu, Prof. Dr. David Atienza<br>\";<\/script>\n<span id=project709><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Golnoosh Abdollahinejad, Dr. Denisa Constantinescu, Prof. Dr. David Atienza<br> <a href=#_ onclick=opendesc('project709',project709); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/urbantwin.ch target=_blank title='An urban digital twin for climate action: Assessing policies and solutions for energy, water and infrastructure'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/197.png width=70 alt='An urban digital twin for climate action: Assessing policies and solutions for energy, water and infrastructure'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor708><\/a><b><span style='font-size: 20px;'>When Sensors Go Silent: Masked Multivariate Transformers for Robust Urban Water Forecasting<\/b><br><script>var project708=\"<p>Accurate short- and long-horizon forecasting of water dynamics in  urban drainage systems is essential for flood prevention, operational  planning, and digital twin deployments. However, the sensing  infrastructure deployed in real-world wastewater networks often suffers  from unreliable conditions, including battery limitations, harsh  underground environments, radio communication losses, or temporary  sensor outages. As a result, forecasting models must remain functional  when one or more sensors suddenly stop reporting data.<\/p> <p>Recent progress in Transformer-based time-series forecasting&mdash;such as  the AquaCast multi-input architecture developed in our group [1, 2]&mdash;has  demonstrated that attention-based models can effectively combine  endogenous water dynamics signals with exogenous inputs, such as  precipitation history and forecast conditions. Yet, these models  generally assume fully available and clean multivariate input streams.  This assumption breaks down in real settings, where missing channels  compromise the learned temporal and cross-sensor dependencies.<\/p> <p>This Master thesis addresses this gap by developing a <strong>masked Transformer forecasting framework<\/strong>  for urban hydrology applications. The central goal is to enable  accurate and stable forecasting even when any subset of input series is  missing, while preserving the benefits of multi-channel information  fusion. This includes extending AquaCast with masked attention,  missing-aware embeddings, and exogenous&ndash;endogenous fusion strategies.  The project will also explore the limits of robustness and  interpretability under systematic sensor failures.<\/p> <p><strong>TASKS<\/strong><\/p> <p>Toward this end, the student will be responsible for:<\/p> <ol><li><strong> Literature Review<\/strong><ul><li>Conduct a thorough review on time-series forecasting and masked  Transformers, including architectures specifically designed for  channel-dropout, variable masking, or partially observed multivariate  sequences.<\/li><\/ul><\/li>  <li><strong> Dataset Preparation &amp; Statistical Analysis<\/strong><ul><li>Preprocess and construct the dataset from raw Lausanne wastewater records for missing-sensor scenarios.<\/li><li>Perform statistical analysis of the time-series (distributional  characterization, correlations, lag analysis, stationarity, periodicity,  etc.).<\/li><li>Generate controlled masking patterns to simulate sensor failure during training and testing.<\/li><\/ul><\/li>  <li><strong> Masked Multivariate Transformer Architecture<\/strong><ul><li>Design, implement, and test a <strong>masked multivariate Transformer<\/strong> to ensure forecasting remains accurate when one or more sensors stop working.<\/li><li>Provide quantitative evaluation and interpretability analysis  (attention maps, influence of mask tokens, sensitivity to missing  channels).<\/li><\/ul><\/li>  <li><strong> Exogenous + Endogenous Masked Forecasting<\/strong><ul><li>Extend the design to incorporate exogenous and endogenous input time-series together (e.g., rainfall history + forecast).<\/li><li>Evaluate how exogenous data compensates for missing endogenous channels.<\/li><li>Provide quantitative measurements and interpretability as above.<\/li><\/ul><\/li> <\/ol> <p><strong>Optional:<\/strong><\/p> <ul><li><strong>Missing samples robustness:<\/strong> Develop robust embedding strategies to prepare Transformer tokens when time-steps are missing.<\/li><li><strong>Missing samples representation:<\/strong> Investigate self-supervised reconstruction to learn representations transferable to forecasting tasks.<\/li><\/ul> <p><strong>REQUIREMENTS<\/strong><\/p> <ul><li>Strong programming skills in Python (PyTorch preferred).<\/li><li>Background in machine learning, deep learning, or time-series modeling.<\/li><li>Interest in AI robustness, forecasting, and spatio-temporal systems.<\/li><\/ul> <h3><strong>TYPE OF WORK<\/strong><\/h3> <ul><li><strong>35% theory<\/strong> (literature review on masked Transformers, missing-data modeling, experimental design, interpretability).<\/li><li><strong>65% implementation<\/strong> (dataset engineering, architecture development, training, evaluation, and result analysis).<\/li><\/ul> <p><strong>REFERENCES<\/strong><\/p> <p>[1] Abdollahinejad, Golnoosh, Saleh Baghersalimi, Denisa-Andreea Constantinescu, Sergey Shevchik, and David Atienza. &ldquo;<strong>AquaCast<\/strong>: Urban Water Dynamics Forecasting with Precipitation-Informed Multi-Input Transformer.&rdquo; <a href='https:\/\/arxiv.org\/abs\/2509.09458'><em>https:\/\/arxiv.org\/abs\/2509.09458<\/em><\/a><\/p> <p>[2] UrbanTwin project: <a href='https:\/\/urbantwin.ch\/'>https:\/\/urbantwin.ch\/<\/a>&nbsp;<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Golnoosh Abdollahinejad, Dr. Denisa Constantinescu, Prof. Dr. David Atienza<br> Contact email: <a href='mailto:golnoosh.abdollahinejad@epfl.ch; denisa.constantinescu@epfl.ch; david.atienza@epfl.ch?subject=When Sensors Go Silent: Masked Multivariate Transformers for Robust Urban Water Forecasting'>golnoosh.abdollahinejad@epfl.ch; denisa.constantinescu@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project708minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Golnoosh Abdollahinejad, Dr. Denisa Constantinescu, Prof. Dr. David Atienza<br>\";<\/script>\n<span id=project708><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Golnoosh Abdollahinejad, Dr. Denisa Constantinescu, Prof. Dr. David Atienza<br> <a href=#_ onclick=opendesc('project708',project708); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/urbantwin.ch target=_blank title='An urban digital twin for climate action: Assessing policies and solutions for energy, water and infrastructure'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/197.png width=70 alt='An urban digital twin for climate action: Assessing policies and solutions for energy, water and infrastructure'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor697><\/a><b><span style='font-size: 20px;'>High-Throughput Multiply-Accumulate for an Edge-Oriented Near-Memory Computing Device<\/b><br><script>var project697=\"<p class='s5' style='padding-top: 14pt;padding-left: 11pt;text-indent: 0pt;text-align: justify;'><a name='bookmark0'>\u200c<\/a>Background and Motivations<\/p><p style='padding-top: 7pt;padding-left: 11pt;text-indent: 0pt;line-height: 120%;text-align: justify;'>The ever-increasing adoption of data-driven algorithms across a wide range of applications has unlocked entirely new possibilities for smarter human-machine interaction, intelligent features,  and  autonomous  operation.  We  now  expect  our  battery-powered  digital companions to perform a variety of tasks, such as image recognition, virtual reality, always-on natural language understanding, and intelligent health monitoring. However, the deployment of such features on edge devices is currently hindered by their limited computing power on the one hand and the bandwidth limitations of the current network infrastructure on the other. Moving data processing close to the data source and the user is therefore paramount to enable the seamless yet sustainable di\ufb00usion of smart devices in our everyday lives.<\/p><p style='padding-top: 7pt;padding-left: 11pt;text-indent: 0pt;line-height: 120%;text-align: justify;'>This has driven the exploration of new computing paradigms that address the ine\ufb03ciencies of  the  traditional  Von  Neumann architecture, which serves as the basis for most microcontroller-based edge devices. In particular, signi\ufb01cant e\ufb00ort has been invested in mitigating the memory bottleneck by bringing computation within the memory subsystem itself. This approach allows for better exploitation of the available memory bandwidth at the output of the on-chip SRAM macros, rather than relying on the system bus to move data elements between the SRAM and the processing elements inside the CPU. This paradigm is commonly referred to as Processing-in-Memory (PiM) or Compute-in-Memory (CiM) and has been proven to e\ufb00ectively reduce energy consumption. Further bene\ufb01ts can be achieved by combining the PiM paradigm with existing data-centric computing approaches such as Single Instruction, Multiple Data (SIMD), where the same operation (i.e., an instruction) operates on multiple data elements (e.g., a vector or matrix), thereby reducing instruction fetch overhead and the overall energy consumption of the system.<\/p><p style='padding-top: 7pt;padding-left: 11pt;text-indent: 0pt;line-height: 120%;text-align: justify;'><a href='https:\/\/ieeexplore.ieee.org\/document\/10964076' style=' color: black; font-family:Calibri, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 12pt;' target='_blank'>Building upon these considerations, the Embedded Systems Laboratory (ESL) at EPFL has developed <a href=https:\/\/ieeexplore.ieee.org\/document\/10964076>NM-Carus<\/a>, a near-memory computing IP that tightly integrates o\ufb00-the-shelf SRAM memory with a programmable controller and a vector-capable execution engine supporting multi-precision  integer  and  \ufb01xed-point  data  types.  The  resulting  circuit  exposes  a memory-like slave interface to the host system and provides a transparent <i>memory <\/i>mode together with an autonomous <i>computing <\/i>mode where a user-programmed kernel is executed on NM-Carus data. From a software perspective, the NM-Carus CPU-based controller can be programmed using the RISC-V RV32E based instruction set, complemented by a custom, PiM-oriented vector extension that utilizes the private data memory as a large Vector Register File (VRF).<\/p><p style='padding-top: 7pt;padding-left: 11pt;text-indent: 0pt;line-height: 120%;text-align: justify;'>In its current form, NM-Carus o\ufb00ers 10-100\u00d7 higher performance when integrated into a host RISC-V microcontroller (MCU) and executing linear algebra workloads on 8-bit data, with 7-60\u00d7 lower energy consumption and a +25% area overhead compared to the CPU-only MCU. While these results are remarkable in comparison to CPU-based systems, they do not yet match the performance and energy e\ufb03ciency of Application Speci\ufb01c Integrated Circuits (ASICs). One critical reason for this is the way NM-Carus handles multiply-accumulate (MACC) operations, which are the fundamental building blocks of many linear algebra operations commonly used in machine learning and other data-driven workloads. Instead of relying on a dedicated accumulator, NM-Carus Arithmetic-Logic Units (ALUs) store partial results inside the internal SRAM memory, introducing expensive and redundant memory accesses that in\ufb02ate the overall energy budget.<\/p><p class='s5' style='padding-top: 14pt;padding-left: 11pt;text-indent: 0pt;text-align: justify;'><a name='bookmark1'>\u200c<\/a>Project Description and Main Goal<\/p><p style='padding-top: 15pt;padding-left: 11pt;text-indent: 0pt;line-height: 120%;text-align: justify;'>The primary goal of this thesis is to <b>optimize the energy cost and performance of multiply-accumulate (MACC) operations in NM-Carus<\/b>. This will be achieved by supporting a new class of instructions that de\ufb01ne operations between the data stored inside NM-Carus's private memory and a dedicated accumulation register. The expected outcome is a peak throughput of four 8-bit operations per cycle in each ALU.<\/p><p style='padding-top: 12pt;padding-left: 11pt;text-indent: 0pt;text-align: justify;'>Throughout this project, the student will learn:<\/p><p style='text-indent: 0pt;text-align: left;'><br><\/p><ul id='l1'><li data-list-text='\u25cf'><p style='padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>How the NM-Carus Near-Memory Computing (NMC) IP works and how to o\ufb04oad computationally expensive tasks to it within the X-HEEP MCU framework.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>How to extend NM-Carus's decoder and execution engine to support optimized multiply-accumulate instructions operating on a dedicated accumulator to reduce memory accesses and increase throughput.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>To assess the impact of the applied additions on the timing, area, and power characteristics of NM-Carus.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-left: 47pt;text-indent: -18pt;text-align: left;'>To add assembler support for the new instructions to the RISC-V toolchain.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-top: 2pt;padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>To write optimized assembly kernels to execute common linear-algebra tasks (e.g., matrix multiplication, convolution).<\/p><\/li><li data-list-text='\u25cf'><p style='padding-top: 3pt;padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>To write comprehensive regression tests to verify the functionality of the added instructions through extensive RTL simulation.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>To assess the system-level impact of the new instructions in terms of throughput and area overhead by integrating the updated NM-Carus IP into a host MCU.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>To compare the system-level performance and energy e\ufb03ciency of an MCU equipped with NM-Carus against other variants integrating di\ufb00erent accelerators and coprocessors (e.g., RISC-V compliant vector coprocessors, embedded GPUs, systolic arrays).<\/p><\/li><li data-list-text='\u25cf'><p class='s7' style='padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>[Optional] <span class='p'>To build an MCU system with multiple NM-Carus instances and deploy complex, real-world applications on it (e.g., complex neural networks) to extract real-world performance, which can then be compared with other existing embedded systems in the same domain (low-power edge computing devices).<\/span><\/p><\/li><\/ul><\/p><p style='padding-top: 12pt;padding-left: 11pt;text-indent: 0pt;line-height: 120%;text-align: justify;'>The project will be carried out entirely at the Embedded Systems Laboratory (ESL) of EPFL, one of the world's top-class universities. ESL is an active group (25 PhD students among 51 members) involved in several research aspects, providing a stimulating research and learning environment. The student will be under the supervision of Prof. David Atienza, Dr. Michele Caon, and Dr. Davide Schiavone.<\/p><p class='s5' style='padding-top: 14pt;padding-left: 11pt;text-indent: 0pt;text-align: justify;'><a name='bookmark2'>\u200c<\/a>Project Objectives<\/p><p style='padding-top: 15pt;padding-left: 11pt;text-indent: 0pt;line-height: 120%;text-align: justify;'>The work is subdivided into three major, sequential milestones, cumulatively contributing to the \ufb01nal grade. The \ufb01rst milestone is both necessary and su\ufb03cient to reach the minimum grade (4), while the latter two are required to reach the maximum grade (6).<\/p><ol id='l2'><li data-list-text='1.'>Optimize the performance of MACC operations in NM-Carus <span class='p'>(0-4 points)<\/span><ol id='l3'><li data-list-text='a.'><p style='padding-top: 2pt;padding-left: 83pt;text-indent: -18pt;line-height: 121%;text-align: left;'>De\ufb01ne a new class of vector instructions that take as input operands one scalar operand (from the CPU GPRs) or vector operand (from the SRAM-based VRF) and the current value of a special-purpose accumulation vector register. The new instruction variants (<span class='s8'>*.av <\/span>and <span class='s8'>*.ax <\/span>for <i>accumulator-vector <\/i>and <i>accumulator-scalar <\/i>respectively) shall be provided for all currently supported arithmetic and move\/slide operations (i.e., not only MACC operations).<\/p><\/li><li data-list-text='b.'><p style='padding-left: 83pt;text-indent: -18pt;line-height: 120%;text-align: left;'>Implement the new instruction class by modifying the RTL description of the decode and execution stages inside the vector pipeline of NM-Carus. The architecture of NM-Carus's ALU shall be modi\ufb01ed to provide a throughput of four 8-bit MACC operations per cycle (e.g., by adding the necessary arithmetic units).<\/p><\/li><li data-list-text='c.'><p style='padding-left: 83pt;text-indent: -18pt;line-height: 120%;text-align: left;'>Extensively test the modi\ufb01ed RTL description of NM-Carus by adding support for the newly added instructions to the RISC-V GCC assembler and writing a comprehensive set of computing kernels using the new instruction variants, to run in RTL simulation. This process implies the modi\ufb01cation of the existing C++ UVM-like testbench, particularly the reference model inside the scoreboard unit.<\/p><\/li><li data-list-text='d.'><p style='padding-left: 83pt;text-indent: -18pt;line-height: 120%;text-align: left;'>Assess the cost of hardware support for the new instruction class in terms of timing and circuit area by performing the ASIC synthesis of the original and modi\ufb01ed circuits on a low-power standard cell library. If necessary, modify the RTL description of the circuit to achieve better timing and area characteristics.<\/p><\/li><\/ol><\/li><li data-list-text='2.'>Evaluate the system-level performance of the updated NM-Carus <span class='p'>(0-1 points)<\/span><ol id='l4'><li data-list-text='a.'><p style='padding-top: 2pt;padding-left: 83pt;text-indent: -18pt;line-height: 120%;text-align: left;'><a href='https:\/\/github.com\/esl-epfl\/x-heep' style=' color: black; font-family:Calibri, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 12pt;' target='_blank'>Integrate the updated NM-Carus IP into an already-existing host MCU based on the<\/a><span style=' color: #1154CC; font-family:Calibri, sans-serif; font-style: normal; font-weight: normal; text-decoration: underline; font-size: 12pt;'> X-HEEP platform<\/span>.<\/p><\/li><li data-list-text='b.'><p style='padding-left: 83pt;text-indent: -17pt;text-align: left;'>Synthesize, place, and route the MCU using a low-power standard cell library.<\/p><\/li><li data-list-text='c.'><p style='padding-top: 2pt;padding-left: 83pt;text-indent: -17pt;text-align: left;'>Modify the existing applications to make use of the newly added instructions.<\/p><\/li><li data-list-text='d.'><p style='padding-top: 2pt;padding-left: 83pt;text-indent: -18pt;line-height: 120%;text-align: left;'>Evaluate the bene\ufb01ts brought by the new instructions at system level in terms of throughput and energy consumption by performing RTL and post-layout simulations.<\/p><\/li><\/ol><\/li><li data-list-text='3.'>Compare the performance of NM-Carus with other types of accelerators <span class='p'>(0-1 points)<\/span><ol id='l5'><li data-list-text='a.'><p style='padding-top: 2pt;padding-left: 83pt;text-indent: -18pt;line-height: 120%;text-align: left;'>Extend the MCU assembled in the previous point with at least another existing accelerator\/coprocessor (e.g., RISC-V compliant vector coprocessors, embedded GPUs, systolic arrays).<\/p><\/li><li data-list-text='b.'><p style='padding-left: 83pt;text-indent: -18pt;line-height: 120%;text-align: left;'>Select and deploy one or more meaningful real-world applications (e.g., edge-oriented AI workloads) on the MCU to evaluate the system-level throughput and energy e\ufb03ciency when o\ufb04oading computation to the di\ufb00erent accelerators\/coprocessors.<\/p><\/li><li data-list-text='c.'><p class='s7' style='padding-left: 83pt;text-indent: -18pt;line-height: 120%;text-align: left;'>[Optional] <span class='p'>Possibly explore the trade-o\ufb00s of di\ufb00erent NM-Carus con\ufb01gurations, varying the number of NM-Carus instances, their memory capacity, and computation parallelism.<\/span><\/p><\/li><\/ol><\/li><\/ol><p class='s5' style='padding-top: 14pt;padding-left: 11pt;text-indent: 0pt;text-align: left;'><a name='bookmark3'>\u200c<\/a>Prerequisites<\/p>Required knowledge and skills<ul id='l6'><li data-list-text='\u25cf'><p style='padding-top: 9pt;padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>Strong background and advanced understanding of computer architecture and microprocessor design.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>In-depth knowledge of Reduced Instruction Set Computer (RISC) architectures, from both hardware and software perspectives.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>Extensive experience with the digital implementation \ufb02ow, from RTL design to place and route.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-left: 47pt;text-indent: -18pt;text-align: left;'>Pro\ufb01cient in low-level programming, ideally with RISC-V assembly.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-top: 2pt;padding-left: 47pt;text-indent: -18pt;text-align: left;'>Solid experience with object-oriented programming in C++.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-top: 2pt;padding-left: 47pt;text-indent: -18pt;text-align: left;'>Solid experience in collaborative software and hardware development using Git.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-top: 2pt;padding-left: 47pt;text-indent: -18pt;text-align: left;'>Strong knowledge of at least one high-level programming language (e.g., Python).<\/p><\/li><li data-list-text='\u25cf'><p style='padding-top: 2pt;padding-left: 47pt;text-indent: -18pt;text-align: left;'>Good analytical skills.<\/p>Appreciated skills:<\/li><li data-list-text='\u25cf'><p style='padding-top: 9pt;padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>Good understanding and experience with machine learning algorithms, optimization work\ufb02ows, and deployment frameworks.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>Fundamental experience with hardware acceleration of linear algebra computational kernels.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-left: 47pt;text-indent: -18pt;line-height: 120%;text-align: left;'>Fundamental experience in writing scienti\ufb01c publications and navigating the state of the art.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-left: 47pt;text-indent: -18pt;text-align: left;'>Good understanding of the digital veri\ufb01cation \ufb02ow, ideally with the UVM.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-top: 2pt;padding-left: 47pt;text-indent: -18pt;text-align: left;'>Advanced pro\ufb01ciency in English.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-top: 2pt;padding-left: 47pt;text-indent: -18pt;text-align: left;'>E\ufb00ective communication skills.<\/p><p style='padding-top: 2pt;text-indent: 0pt;text-align: left;'><br><\/p><p class='s5' style='padding-left: 11pt;text-indent: 0pt;text-align: left;'><a name='bookmark4'>\u200c<\/a>Type of work<\/p><\/li><li data-list-text='\u25cf'><p style='padding-top: 7pt;padding-left: 47pt;text-indent: -18pt;text-align: left;'>70% hardware\/software co-design, implementation, veri\ufb01cation, and validation.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-top: 2pt;padding-left: 47pt;text-indent: -18pt;text-align: left;'>15% software development and deployment.<\/p><\/li><li data-list-text='\u25cf'><p style='padding-top: 2pt;padding-left: 47pt;text-indent: -18pt;text-align: left;'>15% theory and state-of-the-art analysis.<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Michele Caon, Dr. Davide Schiavone, Prof. David Atienza<br> \";<\/script>\n<script>var project697minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Michele Caon, Dr. Davide Schiavone, Prof. David Atienza<br>\";<\/script>\n<span id=project697><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Michele Caon, Dr. Davide Schiavone, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project697',project697); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor691><\/a><b><span style='font-size: 20px;'>Fast compilation for Coarse-Grained Reconfigurable Arrays (CGRAs) via Graph Minors<\/b><br><script>var project691=\"<p>This project offers students the opportunity to develop an efficient  mapping algorithm for deploying applications on Coarse-Grained  Reconfigurable Array (CGRA) accelerators [1]. Its primary goal is to  create a fast and scalable mapping methodology that achieves minimal  energy consumption and reduced runtime.<\/p>    <p>CGRA is a 2-dimensional mesh of Processing Elements (PEs) supporting  the execution of arithmetic operations (e.g., add, sub, mul), where each  PE comprises a local register file and an Arithmetic Logic Unit (ALU).  By being programmable at the granularity of operations, CGRAs are very  promising because they offer an in-between solution between the  efficiency of fixed-function hardware accelerators and the versatility  of bit-programmable FPGAs. However, they require substantial effort to  deploy applications, as operations must be mapped both in the space and  time dimensions.<\/p>    <p>Hence, most existing mapping methods tackle this challenge with a  limited scope, usually focusing on mapping a single loop onto the  hardware. While some approaches, such as the one proposed in [2], try to  extend support for control operations (e.g., if statements and\/or while  loops), they depend on integer linear programming models that do not  scale well, thus showing exponential growth in compilation time as the  application size increases.<\/p>    <p>In contrast, graph-based methods [3] have shown promise for  significantly faster mapping. This project aims to adopt this paradigm  for implementing a graph-minor-based mapping algorithm supporting to a  wide range of applications, including those with complex control  structures. The algorithm will automatically generate assembly code from  diverse workloads, which will be evaluated to assess its performance in  terms of latency and energy efficiency.<\/p>    <p><strong>Tasks description<\/strong><\/p>    <ol class='wp-block-list'><li>Understand the architecture of CGRAs, including their components, interconnects, and execution model.<\/li><li>Familiarize with the existing compilation flow developed in ESL-EPFL (Compigra).<\/li><li>Understand the graph minor problems in graph theory.<\/li><li>Implement an Operation Mapping Algorithm base on a graph minor approach.<\/li><li>Evaluate the application performance on CGRA with existing benchmarks [4].<\/li><\/ol>    <p><strong>Project objectives<\/strong><\/p>    <ul class='wp-block-list'><li>[Mandatory] Implement a dataflow graph mapping algorithm based on graph minors, targeting the OpenEdge CGRA platform [1].<\/li><li>[Optional] Extend the mapping capability to support control-flow graphs by incorporating graph minor detection prerequisites.<\/li><li>[Optional] Develop graph transformation techniques on subgraph manipulation when direct graph minor mapping fails.<\/li><li>Successful completion of the optional objectives may lead to a scientific publication based on the project&rsquo;s outcomes.<\/li><\/ul>    <p><strong>Required knowledge and skills<\/strong><\/p>    <ul class='wp-block-list'><li>Strong programming skills in C\/C++.<\/li><li>Strong foundation in mathematics, particularly graph theory.<\/li><li>Basic understanding of computer architecture.<\/li><li>Familiarity with LLVM and\/or MLIR frameworks is a plus.<\/li><li>Scientific curiosity.<\/li><\/ul>    <p><strong>References<\/strong><\/p>    <p>[1] &Aacute;lvarez, R. R., Denkinger, B., Sapriza, J., Calero, J. M.,  Ansaloni, G., &amp; Alonso, D. A. (2023, May). An open-hardware  coarse-grained reconfigurable array for edge computing. In <em>Proceedings of the 20th ACM International Conference on Computing Frontiers<\/em> (pp. 391-392).<\/p>    <p>[2] Wang, Y., Tirelli, C., Orlandic, L., Sapriza, J., &Aacute;lvarez, R. R.,  Ansaloni, G., &amp; Atienza, D. (2025, April). An mlir-based  compilation framework for cgra application deployment. In&nbsp;<em>International Symposium on Applied Reconfigurable Computing<\/em>&nbsp;(pp. 33-50). Cham: Springer Nature Switzerland.<\/p>    <p>[3] Zhou, G., Stojilovi\u0107, M. and Anderson, J. H., &ldquo;GRAMM: Fast CGRA  Application Mapping Based on A Heuristic for Finding Graph Minors,&rdquo; In <em>33rd International Conference on Field-Programmable Logic and Applications (FPL)<\/em>, Gothenburg, Sweden, 2023.<br \/><br \/>[4] M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, T. Mudge, and R. Brown, &quot;Mibench: A free, commercially representative embedded benchmark suite,&quot; in Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization, 2001.<\/p>    <p><strong>Type of work<\/strong><\/p>    <ul class='wp-block-list'><li>80% <strong>SW design<\/strong>: development of the mapping algorithm that automate the generation of the assembly code.<\/li><li>20% <strong>benchmarking<\/strong> of the performance of generated assembly code.<\/li><\/ul><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ms. Yuxuan Wang, Dr. Giovanni Ansaloni, Prof. David Atienza<br> Contact email: <a href='mailto:yuxuan.wang@epfl.ch; giovanni.ansaloni@epfl.ch; david.atienza@epfl.ch?subject=Fast compilation for Coarse-Grained Reconfigurable Arrays (CGRAs) via Graph Minors'>yuxuan.wang@epfl.ch; giovanni.ansaloni@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project691minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ms. Yuxuan Wang, Dr. Giovanni Ansaloni, Prof. David Atienza<br>\";<\/script>\n<span id=project691><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ms. Yuxuan Wang, Dr. Giovanni Ansaloni, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project691',project691); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor690><\/a><b><span style='font-size: 20px;'>Design of a full-custom, analog filter for feature enhancement of neural signals<\/b><br><script>var project690=\"<h3><strong>Project Description<\/strong><\/h3> <p><strong>The student will design and implement a full-custom analog filter that emphasizes periods of high and fast amplitude variation, aiding in the detection of neural activity. <\/strong>The design needs to reach a careful balance between power efficiency and performance, making this project an exciting challenge. The output of the design filter will be sampled using a Level-Crossing (LC) ADC, which codifies the data into spikes that can be used for low-power transmission or as the input stream for a Spiking Neural Network.<\/p> <p>This project aims to tackle a major bottleneck of wearable and implantable neural interfaces: a large number of channels transmitting high-resolution data require extensive processing and congest transmission channels. While several attempts have been made in the state of the art to reduce the amount of data transmitted by applying digital filters and feature extraction, in this project, the student will bring the feature extraction before the digitization.<\/p> <p>The design produced will be part of a System-on-Chip (SoC) aimed at acquiring and compressing neural signals for research and clinical purposes.<\/p> <p>This <strong>Master Thesis project<\/strong> will be carried out at the Embedded Systems Laboratory (ESL) of EPFL, which has been at the forefront of ultra-low power processing, focusing on embedded algorithms for healthcare wearables. ESL is an active group (22 Ph.D. students among 45 members) involved in various levels of the healthcare electronics stack, from full-custom microelectronic design to device manufacturing and edge-AI. The student will be under the supervision of Mr. Juan Sapriza, Dr. Nicol&aacute;s Calarco, and Prof. David Atienza.<\/p> <h3><strong>Project objectives:<\/strong><\/h3> <p>This project consists of several intermediate objectives. Each objective requires a small deliverable that helps students and supervisors stay on track, quantify progress, and build the final report from the start.<\/p> <p>Objectives 1-5 are required to get a passing grade.<\/p> <p>Objective 6 is required to get a maximum grade.<\/p> <p>Objectives 7-8 are optional and can help reach the maximum grade if other objectives are partially or weakly fulfilled.     <br \/><span style='color: #999999'><em>Note: Optional objectives do not replace required ones, and will only be considered upon a reasonable attempt of the compulsory objectives.   <\/em><\/span> <\/p> <ol>     <li>         <p>Understand the properties of the neural signals to be enhanced and digitized. Recognize the relevant features that need to be enhanced. <strong>Deliverable: 1-page introduction to the problem that needs to be solved. <\/strong>         <\/p>     <\/li>     <li>         <p>Understand the effect the filter should have on the signal to enhance the detection of neuronal spikes (and not noise). Understand how the subsequent digitization stage (LC ADC) will leverage this enhancement.<strong> Deliverable: 1-page explanation.<\/strong>          <\/p>     <\/li>     <li>         <p>Decide on the operations to be performed over the input signal to enhance its relevant features and perform behavioral simulations to quantify the potential improvements. <strong>Deliverable: 1-page report with circuit requirements.<\/strong>          <\/p>     <\/li>     <li>         <p>Perform the design of the circuit. Characterize the performance of the circuit as a spike-enhancer. Design for testing will be appreciated.<strong> Deliverable: Schematic and spice simulations showing the requirements are reasonably met. <\/strong>         <\/p>     <\/li>     <li>         <p>Perform the layout of the circuit. <strong>Deliverable: GDS with clean DRC and LVS.<\/strong>         <\/p>     <\/li>     <li>         <p>Thorough characterization of the circuit (post-layout simulations). e.g. Dynamic range, input range, power, BW, IRN, offset, etc. <strong>Deliverable: Final report including the characterization. <\/strong>         <\/p>     <\/li>     <li>         <p>Prepare a testing set-up guide. <strong>Deliverable: An annex explaining how the circuit shall be connected at PCB level to perform testing.<\/strong>          <\/p>     <\/li>     <li>         <p>Make a behavioral model of the circuit. <strong>Deliverable: A Python and\/or RTL behavioral model of the circuit.<\/strong>          <\/p>     <\/li> <\/ol> <h3><strong>Required knowledge and skills:<\/strong><\/h3> <ul>     <li>         <p>Full-custom analog design (analog filters and\/or operational amplifiers)<\/p>     <\/li>     <li>         <p>Use of Cadence tools<\/p>     <\/li>     <li>         <p>Creativity, autonomy, and scientific rigor<\/p>     <\/li> <\/ul> <h3><strong>Type of work:<\/strong><\/h3> <p>10% Research, 40% Design, 30% Implementation, 20% Testing.<\/p> <p>&nbsp;<\/p> <p><strong>Relevant reading:<\/strong> <\/p> <ol>     <li>         <p><a rel='noopener noreferrer nofollow' href='https:\/\/www.mdpi.com\/1424-8220\/18\/8\/2460' target='_blank'>Kim, Jong Pal, Hankyu Lee, and Hyoungho Ko. &quot;0.6 V, 116 nW neural spike acquisition IC with self-biased instrumentation amplifier and analog spike extraction.&quot; Sensors 18.8 (2018): 2460.<\/a>         <\/p>     <\/li>     <li>         <p><a rel='noopener noreferrer nofollow' href='https:\/\/ieeexplore.ieee.org\/document\/6165392' target='_blank'>Rodriguez-Perez, Alberto, et al. &quot;A low-power programmable neural spike detection channel with embedded calibration and data compression.&quot; IEEE transactions on biomedical circuits and systems 6.2 (2012): 87-100.<\/a>         <\/p>     <\/li> <\/ol><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Juan Sapriza, Dr. Nicolas Calarco, Prof. David Atienza<br> \";<\/script>\n<script>var project690minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Juan Sapriza, Dr. Nicolas Calarco, Prof. David Atienza<br>\";<\/script>\n<span id=project690><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Juan Sapriza, Dr. Nicolas Calarco, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project690',project690); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor661><\/a><b><span style='font-size: 20px;'>Development of a Wearable Textile Capacitive Sensing System for Human Body Motion Tracking<\/b><br><script>var project661=\"<p>Textile capacitive sensing is a cutting-edge technology that enables  non-intrusive, comfortable, and effective motion tracking through  everyday garments. This project will build on the innovative approach  described in recent work, where conductive textile patches are embedded  into clothing to sense human body movements without direct contact or  strain. The system can detect motion with high comfort and flexibility  by leveraging the deformation of these textile patches within the  garment.<\/p>    <p>The aim of this project is to develop and enhance a wearable system  that uses textile-based capacitive sensing for tracking human body  motion, ranging from single-joint angle measurements to multi-joint body  part tracking. In particular, the student will design and implement a  wearable system incorporating textile capacitive sensors, develop  algorithms for interpreting sensor data, and validate the system through  real-world applications. The system will be designed with an emphasis  on both technical performance (motion tracking accuracy, real-time  processing) and user experience (comfort, ease of use, and integration  into everyday clothing).<\/p>    <p>The student will work on:<\/p>    <ol class='wp-block-list'><li><strong>Integrate existing conductive textiles<\/strong> into a garment that can detect human motion across various body parts..<\/li><li><strong>Developing the analog frontend<\/strong> to acquire these signals and integrate it with the &nbsp;ESL&rsquo;s VersaSens platform<\/li><li><strong>Develop signal processing algorithms<\/strong> to interpret the deformation of  textile capacitive sensors and convert it into meaningful motion data  (e.g., joint angles, body posture, etc.).<\/li><li><strong>Create a user interface<\/strong> (mobile app or PC application) for real-time motion tracking visualization and data logging.<\/li><\/ol>    <h3 class='wp-block-heading'><strong>Tasks:<\/strong><\/h3>    <ul class='wp-block-list'><li><strong>Design and fabricate<\/strong> clothing prototypes  such as shirts, sleeves, or pants with embedded textile capacitive  sensors in strategic areas (e.g., elbows, knees, wrists) to capture  multi-joint body movements.<\/li><li><strong>Sensor signal processing<\/strong>: Design algorithms to process the sensor  signals in real-time, including filtering noise, detecting deformation  patterns, and calculating joint angles or other movement parameters.<\/li><li><strong>Develop machine learning or regression models<\/strong>:  Train a machine learning model to interpret complex body motions (e.g.,  multi-joint movements) from the sensor data. Optionally, implement a  gesture recognition system based on the motion data.<\/li><li><strong>Real-time data visualization<\/strong>: Develop a simple mobile app or PC  interface to visualize motion data in real-time. Display joint angles or  body posture metrics for end-users.<\/li><\/ul>    <h3 class='wp-block-heading'><strong>Requirements:<\/strong><\/h3>    <ul class='wp-block-list'><li>Experience with capacitive sensing techniques and sensor signal processing.<\/li><li>Signal processing skills: Experience with noise filtering, sensor calibration<\/li><li>Software development skills: Proficiency in programming (e.g.,  Python for data analysis, C for embedded software) to process and  interpret sensor data in real-time.<\/li><li>Experience with machine learning: Ability to implement simple  machine learning models for gesture recognition or movement analysis  (optional but beneficial).<\/li><li>Prototyping skills: Hands-on ability to work with textiles, sensors, and wearable electronics for prototyping purposes.<\/li><\/ul><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Jonathan Dan, Stefano Albini, Prof. David Atienza<br> Contact email: <a href='mailto:jonathan.dan@epfl.ch; david.atienza@epfl.ch?subject=Development of a Wearable Textile Capacitive Sensing System for Human Body Motion Tracking'>jonathan.dan@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project661minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Jonathan Dan, Stefano Albini, Prof. David Atienza<br>\";<\/script>\n<span id=project661><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Jonathan Dan, Stefano Albini, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project661',project661); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor655><\/a><b><span style='font-size: 20px;'>Microarchitectural explorations for biomedical applications<\/b><br><script>var project655=\"<p>Microcontrollers (MCUs) are used in a wide range of applications, from wearable devices and sensor monitoring, to robotics and automotive. In particular, the design of low-power microcontrollers for wearables in the biomedical domain has received a lot of attention in recent decades. Recent proposals such as BiomedBench [1] have created a collection of biomedical applications and kernels with the aim of informing the design of new processing architectures for wearable devices.<\/p> <p>In particular, the Embedded Systems Laboratory is developing <a href='https:\/\/x-heep.epfl.ch'>X-HEEP<\/a>, (eXtendable Heterogeneous Energy-Efficient Platform), which is an open-source, configurable, and extensible single-core RISC-V 32b MCU, sponsored by the <a href='https:\/\/ecocloud.epfl.ch'>EcoCloud<\/a> sustainable computing center of EPFL. X-HEEP is based on third-party open-source IPs and in-house IPs developed at ESL jointly with other EPFL laboratories. X-HEEP provides a framework to run applications compiled for RISC-V on a simulator (Verilator, Questasim, or VCS), on a Xilinx FPGA, and can be implemented in silicon as well. The first ASIC based on X-HEEP is called HEEPocrates.<\/p> <p><a href='https:\/\/biomedbench.epfl.ch'>BiomedBench <\/a>has recently been ported to X-HEEP. The open source nature of the platform, and the fact that it is being developed at ESL, creates an excellent chance to investigate which architectural features of low-power microcontrolers can increase the energy efficiency of wearables in the biomedical domain.<\/p> <p>In this project we want to explore if the use of an in-order superscalar core, in place of the in-order single-scalar core currently used in X-HEEP, the RISC-V OpenHW Group CV32E20 [2], can improve energy efficiency during the processing phase of the applications in BiomedBench. The working hypothesis will be that, whereas out-of-order execution introduces too much power overhead in comparison with the improvements obtained in execution time, in-order superscalar execution, in particular with the RISC-V ISA, produces larger time improvements than power increases. Therefore, in-order superscalar execution is suitable for reducing energy consumption in RISC-V microprocessors for the biomedical wearable domain. As a second step, we will evaluate if other architectural extensions, e.g., specifically for fixed-point arithmetic, can improve the energy efficiency of the microcontroller.<\/p> <p>During this project, the student will first develop a simulator for the RISC-V architecture that supports execution of the applications ported to X-HEEP (binary compatibility). Instead of developing a simulator from scratch, it will also be possible to use other existing open-source solutions, as long as they can be used for the second phase. The simulator will be used to generate dynamic execution traces from the applications in BiomedBench.<\/p> <p>In a second phase, the student will modify the simulator to evaluate the impact on performance of in-order superscalar execution. The simulator will be easily modifiable, so that it will be possible to test which combinations of additional functional units will produce the best performance impact with the minimal cost in HW.<\/p> <p>In a third phase, the student will evaluate, with help from the people participating in the X- HEEP project, the impact on power of the best candidate architectures found based on performance improvement. In this way, at the end of the project it will be possible to determine which architectural optimizations produce the best improvements in terms of total energy consumption, based on the maximum performance improvement with the minimum additional power.<\/p> <p>The previous explorations do not require modifying the traces obtained from the execution in the simulator. Therefore, the compiled binary used for X-HEEP will be valid during these phases. However, other explorations, such as the introduction of specific instructions for fixed-point execution, may need modifying the execution trace. This will be done either using the original assembly code produced by the compiler, or dynamically modifying the trace during simulation.<\/p> <p>The expected outcomes of this project are:<\/p> <ul id='l1'> <li> <p>Development of a lightweight RISC-V simulator that can execute the binary (compiled) applications of BiomedBench for X-HEEP. No interrupts will be included in the simulations. The simulator will produce as output a complete memory dump and a summary of processor cycles required for execution of the benchmark.<\/p> <ul id='l2'> <li data-list-text='\u25cb'> <p>The correctness of the simulation will be guaranteed at all times comparing the output of the application with the expected outputs from BiomedBench.<\/p> <\/li> <\/ul> <\/li> <li> <p>Generation of dynamic execution traces from the BiomedBench applications using the simulator. The student will be allowed to propose a different mechanism to obtain the traces, as long as it allows them to conduct the explorations in the following phases.<\/p> <\/li> <li> <p>Modification of the simulator to account for superscalar execution of the traces. At this point, no extra functional units will be introduced; the simulation will only account for data dependencies between the instructions, the nature of the instructions (e.g., whether the first instruction is a branch or not), and the availability of resource classes. For example, at this stage: loads can proceed in parallel with additions; one addition and one multiplication can be executed simultaneously; two additions\/subtractions cannot be executed in parallel.<\/p> <p style='padding-left: 5pt;text-indent: 0pt;text-align: left'>Optional\/additional outcomes:<\/p> <\/li> <li> <p>Exploration of the performance benefits of introducing different types of arithmetic operators, e.g., dual adders.<\/p> <\/li> <li> <p>Exploration of the overhead in area and power of the proposed modifications of the control stage and additional functional units.<\/p> <\/li> <li> <p>Exploration of other optimizations specific to the biomedical domain (e.g., for fixed- point arithmetic), as driven by the BiomedBench applications.<\/p> <p>Throughout the project, the student will learn:<\/p> <\/li> <li> <p>Basic processor architecture concepts and the RISC-V ISA.<\/p> <\/li> <li> <p>The main features of applications in the biomedical wearable domain.<\/p> <\/li> <li> <p>Advanced processor architecture concepts such as superscalar, in-order and out-of- order execution.<\/p> <\/li> <li> <p>How to work with git repositories in a team of contributors to the same project.<\/p> <\/li> <\/ul> <p>The project will be carried out at the ESL at EPFL, one of the world\u2019s top-class universities, including EcoCloud\u2019s technical support. ESL is an active group <span style='color: #000'>(24 Ph.D. students among 45 members) <\/span>involved in many research lines. The student will be under the supervision of Prof. David Atienza (ESL) and Dr. Miguel Pe\u00f3n-Quir\u00f3s (EcoCloud), with technical support from Stefano Albini (ESL).<\/p> <h4>Project objectives:<\/h4> <ol id='l3'> <li data-list-text='1.'> <p>Understanding the RISC-V architecture and development of a lightweight simulator that can execute the applications of BiomedBench compiled for X-HEEP and produce execution statistics.<\/p> <\/li> <li data-list-text='2.'> <p>Modification of the simulator to evaluate in-order superscalar (parallel) execution of the applications in BiomedBench, using the original or additional numbers of functional units. The output of this evaluation will sustain or refute our initial hypothesis.<\/p> <\/li> <li data-list-text='3.'> <p>(Optional) Evaluation of the overheads in terms of area\/power of the proposed modifications.<\/p> <\/li> <li>(Optional) Proposal of additional architectural improvements specific for the domain of biomedical wearables.<\/li><p><\/p> <\/ul><h4>Required knowledge and skills:<\/h4> <ul id='l4'> <li> <p>C++ and Python. General Linux use and scripting.<\/p> <\/li> <li> <p>Good background in computer architecture and algorithms.<\/p> <\/li> <li> <p>Some familiarity with any assembly language (RISC-V ISA will be used throught the project).<\/p> <\/li> <li> <p>Good analytical skills.<\/p> <\/li> <li> <p>Teamwork and git.<\/p><\/li><\/ul> <h4>Appreciated skills:<\/h4> <ul><li> <p>Scientific curiosity.<\/p> <\/li> <li> <p>Good communication skills.<\/p> <\/li> <li> <p>Advanced English (interaction during the project will be in English).<\/p> <\/li> <\/ul>  <\/ol> <h4>Type of work: <span class='p'>40% theory analysis, 60% design and simulation.<\/span><\/h4> <p class='s7' style='padding-top: 14pt;padding-left: 5pt;text-indent: 0pt'>[1]. Dimitrios Samakovlis et al. \u201cBiomedBench: A benchmark suite of TinyML biomedical applications for low- power wearables.\u201d IEEE Design &amp; Test, 2024. <a href='https:\/\/infoscience.epfl.ch\/handle\/20.500.14299\/208450.7'>https:\/\/infoscience.epfl.ch\/handle\/20.500.14299\/208450.7<\/a><\/p> <p class='s8'>[2]. Pasquale Davide Schiavone et al. \u201cSlow and steady wins the race? A comparison of ultra-low-power RISC-V cores for Internet-of-Things applications\u201d. In: Int. Symp. on Power and Timing Modeling, Optimization and Simulation (PATMOS). <a href='https:\/\/ieeexplore.ieee.org\/document\/8106976'>IEEE. 2017, pp. 1\u20138.<\/a><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Miguel Pe\u00f3n-Quir\u00f3s, EcoCloud; Prof. David Atienza, ESL; Stefano Albini, ESL <br> Contact email: <a href='mailto:miguel.peon@epfl.ch; david.atienza@epfl.ch; stefano.albini@epfl.ch?subject=Microarchitectural explorations for biomedical applications'>miguel.peon@epfl.ch; david.atienza@epfl.ch; stefano.albini@epfl.ch<\/a><br>\";<\/script>\n<script>var project655minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Miguel Pe\u00f3n-Quir\u00f3s, EcoCloud; Prof. David Atienza, ESL; Stefano Albini, ESL <br>\";<\/script>\n<span id=project655><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Miguel Pe\u00f3n-Quir\u00f3s, EcoCloud; Prof. David Atienza, ESL; Stefano Albini, ESL <br> <a href=#_ onclick=opendesc('project655',project655); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/wp-content\/uploads\/2024\/11\/Microarchitectural-explorations-for-biomedical-applications.pdf title='document link'><img width=30 border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/doclink.gif hspace=2 alt='document link'><\/a><\/td><td><a href='https:\/\/biomedbench.epfl.ch\/' title='weblink'><img hspace=2 width=30 border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/weblink.gif alt='web link'><\/a><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor637><\/a><b><span style='font-size: 20px;'>Enabling Local Tightly, Global Loosely coupled Programmable Accelerators  Heterogeneous Systems by integrating X-HEEP into ESP SoCs<\/b><br><script>var project637=\"<p><i>Artificial Intelligence (AI) <\/i>has been one of the most dominant factors driving technology innovation over the last decade and has been exploited in a huge variety of fields ranging from image recognition and natural language processing to autonomous driving and modeling of complex physical systems. For low-power applications, software-programmable microcontrollers (MCUs) are preferred thanks to their versatility and short time-to-market. However, MCUs are often extended with domain-specific accelerators to meet timing and energy requirements. Examples of applications that benefit from on-device AI include small cameras that recognize faces, microphones that filter background noise and recognize voice commands, and wearable Galvanic Skin Response sensors that detect emotions, etc. Most of such applications are implemented using neural networks, deployed partially or completely at the edge, on tiny and ultra-low-power SoCs. Many domain-specific accelerators have been proposed to speed up the computations of such kernels, such as feature-extraction engines, matrix-multiplication, or convolutional accelerators.&nbsp; SoCs usually employ one or more accelerators, giving birth to heterogeneous systems.&nbsp;<\/p> <p>Such accelerators are called \u201ctightly coupled\u201d when they are integrated close to the CPU so that the synchronization and data exchange between them are shallow and require low latency. In such systems, the accelerators and CPUs are usually connected on the same bus.<\/p> <p>Otherwise, they are called \u201cloosely coupled.\u201d In these systems, the accelerators and CPUs are not connected in the same bus but rather in a network-on-chip (NoC). This allows for more scalable SoCs but at the price of slower and more complex communications.<\/p> <p>&nbsp;<\/p> <p>X-HEEP (eXtendable Heterogeneous Energy-Efficient Platform) is an open-source, configurable, and extensible single-core RISC-V 32-bit MCU developed at the Embedded Systems Laboratory (ESL), sponsored by the EcoCloud Sustainable Computing Center of Swiss Federal Institute of Technology Lausanne (EPFL).<\/p> <p>It has been designed based on existing open-source IPs from the PULP, OpenHW Group, and OpenTitan projects. Its main advantage is that it eases the integration of tightly coupled accelerators throughout the so-called eXtension-Accelerator interface (XAIF) into SoCs. So far, it has been implemented with accelerators such as CGRAs, Near-Memory Computing IPs, Systolic Arrays, POSIT datapaths, and GPUs.<\/p> <p>ESP (Embedded Scalable Platform) is an open-source platform for heterogeneous SoC design that provides a flexible tile-based architecture built on a multi-plane NoC. It was developed at Columbia University by the System-Level Design (SLD) group and is also compatible with RISC-V IPs. ESP also provides a flow to develop and integrate accelerators described in High-Level Synthesis (SystemC, C++) or RTL (Verilog\/SystemVerilog\/VHDL) loosely coupled via the NoC with a specified tile-based interface.&nbsp;<\/p> <p>In this project, we propose to build a Local-Tightly, Global-Loosely coupled Accelerator Heterogeneous System by integrating X-HEEP into an ESP SoC.&nbsp;<\/p> <p>We want to use ESP to build the main tile-based SoC and X-HEEP to build the internals of individual accelerator tiles.&nbsp;<\/p> <p>This highly programmable and heterogeneous SoC will have several levels of accelerator integration. In fact, inside a tile, the accelerator will be seen as tightly coupled by the local CPU within the tile (X-HEEP), while it will be seen as loosely coupled by the CPU(s) integrated into another tile.<\/p> <p>The final system can leverage programmable tiles based on flexible CPU+Accelerator architectures, where each tile is specialized for a given function. For example, an ESP system could be composed of one tile specialized in running Linux with a RISC-V CPU; one tile could integrate X-HEEP with a GPU for high-parallel functions; one tile could contain X-HEEP with a CGRA for highly spatial reconfigurable functions; and one final tile with X-HEEP with near-memory computing IPs to efficient local processing and last-level cache functions. Plus, extra tiles for memories and I\/O.<\/p> <p><b>Project objectives:<\/b><\/p> <p>Project objectives:<\/p> <ol> <li>Understanding the ESP SoC, with a particular focus on the tile interface and NoC protocol. A provided example should be run and emulated on the FPGA or simulated with Questasim.<\/li> <li>Understanding the X-HEEP SoC, with a particular focus on the XAIF interface and bus protocol. A provided example should be run and emulated on the FPGA or simulated with Questasim.<\/li> <li>Build an interface and its relative bridge to connect X-HEEP and ESP and integrate X-HEEP into ESP in a minimalistic configuration to verify and test an application on Questasim or FPGA. To improve compatibility test coverage, one tile must be X-HEEP, while the other must be a tile coming from the ESP IPs. Such a test should include a C program that enables the bidirectional communication of the two tiles.<\/li> <li>Build an ESP SoC based on an ESP\u2019s example capable of running Linux with the CVA6, including an X-HEEP tile enhanced with a tightly coupled accelerator. The final system should leverage X-HEEP to accelerate one AI function.<\/li> <\/ol> <p>Throughout the project, the student will learn:<\/p> <ul> <li>how to design, use, and leverage complex SoCs such as ESP and X-HEEP.<\/li> <li>how to integrate and test different systems, simulating and emulating them on Questasim and\/or FPGA.<\/li> <li>how to design interfaces and specifications.<\/li> <li>how to deploy an application on a complex heterogeneous RISC-V SoC microcontroller.<\/li> <li>how to work with version control (Git) and third-party, open-source repositories.<\/li> <li>How to work in a collaborative team of people from different universities, all contributing to the same and different projects.<\/li> <\/ul> <p>The project will be carried out at the ESL at EPFL, one of the world\u2019s top-class universities. ESL is an active group (24 Ph.D. students among 45 members) involved in many research aspects, therefore providing a stimulating research environment.&nbsp;<\/p> <p>The student will be supervised by Prof. David Atienza, Dr. Davide Schiavone, Prof. Daniele Jahier Pagliari, and Prof. Alessio Burrello from the Polytechnic of Turin.<\/p> <p>In addition, given an ongoing collaboration with SDL Group at Columbia<\/p> <p>University that has developed ESP, Prof. Luca Carloni from Columbia University might also become a co-supervisor.<\/p> <p><b>Required knowledge and skills:<\/b><\/p> <ul> <li>RTL design and FPGA implementation in SystemVerilog<\/li> <li>Good understanding of memory architectures and microcontrollers<\/li> <li>Good analytical skills<\/li> <li>Good background in computer architecture<\/li> <li>Teamwork and git<\/li> <\/ul> <p><b>Appreciated skills:<\/b><\/p> <ul> <li>Scientific curiosity<\/li> <li>Good communication skills<\/li> <li>Advanced English <\/li> <\/ul> <p><b>Type of work:<\/b> 40% theory analysis, 60% SW\/HW co-design and simulation<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Prof. Daniele Jahier Pagliari, Prof. Alessio Burrello, Prof. David Atienza<br> Contact email: <a href='mailto:davide.schiavone@epfl.ch;daniele.jahier@polito.it;alessio.burrello@polito.it;david.atienza@epfl.ch?subject=Enabling Local Tightly, Global Loosely coupled Programmable Accelerators  Heterogeneous Systems by integrating X-HEEP into ESP SoCs'>davide.schiavone@epfl.ch;daniele.jahier@polito.it;alessio.burrello@polito.it;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project637minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Prof. Daniele Jahier Pagliari, Prof. Alessio Burrello, Prof. David Atienza<br>\";<\/script>\n<span id=project637><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Prof. Daniele Jahier Pagliari, Prof. Alessio Burrello, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project637',project637); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor565><\/a><b><span style='font-size: 20px;'>Porting and Optimization of biomedical benchmark applications to the X-HEEP Open-Source RISC-V-based microcontroller platform<\/b><br><script>var project565=\"<p>Wearable devices promise to improve  preventive medicine through continuous health monitoring of chronic  diseases. The design of low-power wearables for the biomedical domain  has received much attention in recent decades, as technological  advancements in chip manufacturing have allowed real-time monitoring of  patients within the &micro;W range. To ensure continued progression in this  domain, a co-design view that optimizes both hardware and software  simultaneously, and standardized tools are necessary.<\/p>    <p>In response to the aforementioned needs, the Embedded Systems  Laboratory(ESL) has developed BiomedBench. BiomedBench is a new  benchmark suite composed of state-of-the-art (SoA) biomedical  applications for real-time monitoring of patients using wearable  devices. Each application presents different requirements during the  typical signal acquisition and processing phases, including varying  computational workloads and relations between active and idle times.  Therefore, BiomedBench provides hardware developers with a tool to  assess the efficiency of their ultra-low power (ULP) platform designs  under varying requirements. Moreover, the open-sourcing nature of  BiomedBench will serve as a baseline for future application developers  aspiring to develop and deploy their biomedical applications in ULP  devices.<\/p>    <p>Typically, biomedical applications for patient monitoring tasks  include the modules depicted in Figure 1 below. Typically, the  processing step consists of signal preprocessing, feature extraction,  and inference based on these features. However, applications can exhibit  a wide range of workloads and computational requirements. For example,  feature extraction can be implemented explicitly (that is, manually  engineered features) or implicitly (e.g., convolutional neural network  (CNN)). Similarly, the inference step can use a lightweight machine  learning method, such as a random forest or a complex deep neural  network (DNN).<\/p>    <img width=500 src='https:\/\/ecocloud.ch\/extranet\/?p=200&amp;preview=true' alt='' \/>    <p><img src='https:\/\/ecocloud.ch\/extranet\/wp-content\/uploads\/2023\/12\/figure1.png' alt='' \/><br \/><strong>Figure 1: Typical computational pipeline of biomedical applications<\/strong><\/p>    <p>From an implementation point of view, the system undergoes an  always-on acquisition phase and an intermittent processing phase, as  presented in Figure 2. A complete processing period consists of an idle  period, during which the processing unit is in low-power mode, and a  computation upon acquisition of the full input signal. The duration of  the idle period varies significantly between applications and can  dominate the system&rsquo;s energy consumption.<\/p>    <p><img src='https:\/\/ecocloud.ch\/extranet\/wp-content\/uploads\/2023\/12\/figure2.png' alt='' width='496' height='137' \/><br \/><strong>Figure 2: System operating modes during signal acquisition<\/strong><\/p>    <p>BiomedBench includes eight applications representative of the  biomedical domain that offer a variety of workloads and profiles for the  processing, idle, and acquisition phases. All applications are coded in  C or C++. Four applications are implemented in fixed-point arithmetic,  targeting low-end MCUs. The rest are implemented in 32-bit floating  point arithmetic. Four applications also include a multi-core  implementation that enables significant acceleration in the presence of  multiple cores.<\/p>    <h2 class='wp-block-heading'>Considered ULP hardware platform &ndash; X-HEEP \/ HEEPocrates<\/h2>    <p>On the hardware side, ESL has devoted a lot of research efforts to  developing a new open-source hardware platform, called X-HEEP  (eXtendable Heterogeneous Energy-Efficient Platform), to support the  monitoring of participants in clinical studies with low energy  footprint. X-HEEP is an open-source, configurable, and extensible  single-core RISC-V 32b MCU, sponsored by the EcoCloud Sustainable  Computing Center of EPFL. It is based on many third-party open-source  IPs and in-house IPs developed at the Embedded Systems Laboratory (ESL)  jointly with other EPFL laboratories. X-HEEP provides a framework to run  applications compiled for RISC-V on<\/p>    <p>a simulator (Verilator, Questasim, or VCS), on a Xilinx FPGA, and can be implemented in silicon as well.<\/p>    <p>In 2023, ESL fabricated HEEPocrates, the first ASIC implementation  (in TSMC 65nm) deploying X-HEEP configured with the cv32e2 core and with  256kB of memory. HEEPocrates instantiates X-HEEP as the main  microcontroller driving a CGRA, an In-Memory Computing macro.  HEEPocrates belongs to the category of ULP platforms featuring a 6mm2  X-HEEP chip, a maximum frequency of 470 MHz consuming up to 48mW. Hence,  HEEPocrates is a suitable platform to deploy the BiomedBench  applications on and conduct a performance and energy analysis.<\/p>    <h2 class='wp-block-heading'>Thesis summary<\/h2>    <p>The goal of this thesis is to <strong>utilize BiomedBench to evaluate the X-HEEP and HEEPocrates platforms<\/strong>. To achieve this, the student will have to:<\/p>    <ol><li><strong>Learn to deploy the BiomedBench applications on X-HEEP \/ HEEPocrates <\/strong>by efficiently utilizing the capabilities of the platform for the sleep and acquisition phases<\/li><li><strong>Perform timing and energy measurements <\/strong>of each application running on X-HEEP \/HEEPocrates, analyze and compare with SoA results.<\/li><li><strong>Deploy the BiomedBench applications on X-HEEP FPGA <\/strong>changing  the memory size and CPU to find the optimal X-HEEP configuration for  BiomedBench, including data transfers from the FLASH to the on-chip SRAM  when data overfit the internal capacity.<\/li><li>(OPTIONAL) <strong>Apply algorithmic and software optimizations <\/strong>on  each application to speed up computations or\/and reduce memory  footprint in HEEPocrates without degradation of the final  application-level accuracy result.<\/li><\/ol>    <h2 class='wp-block-heading'>Thesis outcome<\/h2>    <p>The outcome of the M.Sc. thesis will be published open-source in the BiomedBench and X-HEEP (<a href='https:\/\/github.com\/esl-epfl\/x-heep'>link<\/a>) repositories: The expected outcomes of this thesis are:<\/p>    <ul><li>[Deployment of complete applications to X-HEEP and\/or HEEPocrates]  Development of software to run all the applications on X-HEEP \/  HEEPocrates. The basic C\/C++ implementation of each application is  given, but the porting to the platforms requires some extra code and  smart deployment decisions to respect the memory constraints of the  platform. Complementary to the processing part of each application, the  acquisition and sleep mode should be programmed efficiently.<\/li><li>[Performance and energy results] Measuring performance and energy  consumption running each complete application on X-HEEP \/ HEEPocrate and  comparing with other state-of-the-art platforms.<\/li><li>[Finding the  optimal] Find the optimal configuration of X-HEEP FPGA implementation  for the benchmark by varying the CPU and memory capacity.<\/li><li>[Application optimization] Optimize the C\/C++ implementation and\/or  the algorithms involved in each application. The target of the  optimizations is to improve the energy efficiency of the complete  application which can be achieved through decreasing the execution time  or the memory footprint, provided there is no accuracy degradation in  the final result.<\/li><\/ul>    <h2 class='wp-block-heading'>Learning outcome<\/h2>    <p>Throughout the thesis, the student will learn:<\/p>    <ul><li>How different real-time patient monitoring applications are structured and what is the state-of-the-art in the domain<\/li><li>How to deploy such applications in resource-constrained devices<\/li><\/ul><ul><li>How to orchestrate the processing, acquisition, and sleeping phases of such applications<\/li><\/ul><ul><li>How to identify application bottlenecks and apply algorithmic or software optimizations<\/li><\/ul><ul><li>How  to study the critical architectural features of a platform, such as  X-HEEP, to achieve maximum energy efficiency for each application  deployment<\/li><\/ul><ul><li>How to use Git to manage projects with multiple developers<\/li><li>How to collaborate with the team and to analyze and present the results<\/li><\/ul>    <p>The thesis will be carried out at the ESL at EPFL, one of the world&rsquo;s  top-class universities. ESL is an active group (24 Ph.D. students among  45 members) involved in many research aspects. The student will be  under the supervision of Prof. David Atienza, Dr. Davide Schiavone, and  two Ph.D. students (Dimitrios Samakovlis and Stefano Albini).<\/p>    <h2 class='wp-block-heading'>Required knowledge and skills:<\/h2>    <ul><li>Low-level software design (C and\/or C++ is going to be used throughout the thesis)<\/li><li>Good understanding of memory architectures and microcontrollers<\/li><li>Good analytical skills<\/li><li>Makefiles for complex project structures<\/li><li>Teamwork and git<\/li><li>Good background in algorithms and common ML models (required for the  optimization part at the end of the project, which is optional)<\/li><\/ul>    <h2 class='wp-block-heading'>Appreciated skills:<\/h2>    <ul><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English<\/li><li>Assembly knowledge (useful for in-depth performance analysis)<\/li><\/ul>    <p><strong>Type of work: <\/strong>10% theory analysis, 90% coding and experimenting<\/p>                                                                                              <h4 class='section-title'><br \/><\/h4><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Mr. Dimitrios Samakovlis, Mr. Stefano Albini, Prof. David Atienza<br> Contact email: <a href='mailto:davide.schiavone@epfl.ch;dimitrios.samakovlis@epfl.ch;stefano.albini@epfl.ch; david.atienza@epfl.ch;?subject=Porting and Optimization of biomedical benchmark applications to the X-HEEP Open-Source RISC-V-based microcontroller platform'>davide.schiavone@epfl.ch;dimitrios.samakovlis@epfl.ch;stefano.albini@epfl.ch; david.atienza@epfl.ch;<\/a><br>\";<\/script>\n<script>var project565minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Mr. Dimitrios Samakovlis, Mr. Stefano Albini, Prof. David Atienza<br>\";<\/script>\n<span id=project565><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Mr. Dimitrios Samakovlis, Mr. Stefano Albini, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project565',project565); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=https:\/\/ecocloud.ch\/extranet\/wp-content\/uploads\/2023\/12\/BiomedBench_X-HEEP-Master-Thesis-Proposal-Spring-2024.pdf title='document link'><img width=30 border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/doclink.gif hspace=2 alt='document link'><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor557><\/a><b><span style='font-size: 20px;'>Implementation of an Accelerator based on Near-Memory Computing IPs for a RISC-V-based Microcontroller<\/b><br><script>var project557=\"<p>Artificial Intelligence (AI) has been one of the most dominant factors driving technology innovation over the last decade and has been exploited in a huge variety of fields ranging from image recognition and natural language processing to autonomous driving and modeling of complex physical systems. As integrated semiconductor devices become smaller and faster, systems-on-chip (SoCs) become more and more complex, enabling pocket-size, wearable, battery-powered systems to efficiently support the computationally expensive algorithms at the core of complex AI models while running on a limited power budget. Due to the large number of parameters, software-programmable SoCs are preferred thanks to their versatility and short time-to-market. Small cameras that recognize faces, microphones that filter background noise or recognize voice commands, wearable devices that detect epilepsy attacks, and implantable devices that constantly monitor tens of body parameters and release drugs accordingly to prevent organ failures are just a few examples of what smart edge devices represent in the AI revolution we are experiencing.<\/p> <p>One of the main bottlenecks for performance and energy efficiency of next-generation SoCs reside in the limited memory bandwidth inherent in the traditional Von Neuman architecture. One idea to overcome this limitation is to bring computation within the memory subsystem, to better exploit the available memory bandwidth and leverage data reuse more efficiently. Such computational paradigm is referred to as Processing-in-Memory (PiM) or Compute-in-Memory (CiM). Further benefits can be achieved by leveraging the Single Instruction, Multiple Data (SIMD) approach, where the same operation (i.e., an instruction) operates on a multitude of data (e.g., a vector or matrix), therefore significantly reducing the number of instructions loaded from memory and contributing to reducing the system\u2019s energy consumption.<\/p> <p>The <a href='https:\/\/www.epfl.ch\/labs\/esl\/'>Embedded Systems Laboratory (ESL)<\/a> at the Swiss Federal Institute of Technology Lausanne (EPFL) has developed two SRAM-based low-power architectures (known as Caesar and Carus) that normally behave as traditional memories, but also offer scalar and vector computing capabilities (i.e., arithmetic and logic operations such as addition, and, or, xor, multiplication, multiply-add, etc.) between two or more memory words. Because the data is processed within the memory layout itself, these smart near-memory IPs eliminate the need for moving operands through the system bus and into the local memory elements of processing elements that are physically far from the memory (e.g., inside the system CPU).&nbsp;<\/p> <p>In addition to memory architectures, ESL has also developed X-HEEP (eXtendable Heterogeneous Energy-Efficient Platform). It is an open-source, configurable, and extensible single-core RISC-V 32-bit Microcontroller Unit (MCU), sponsored by the EcoCloud Sustainable Computing center of EPFL. It is based on many third-party open-source IPs as well as in-house IPs developed at the ESL jointly with other EPFL laboratories. X-HEEP provides a framework to configure and extend the MCU and experiment with it as an RTL simulation model (Verilator, Questasim, or VCS), a hardware prototype on a Xilinx FPGA, and even tape it out as a standalone ASIC circuit. The framework also provides the RISC-V software toolchain and the SDK that are necessary to deploy applications on the MCU.<\/p> <p>Recently, the X-HEEP system has been extended to integrate Caesar- and Carus-based memories besides traditional SRAM banks. When running in computing mode, they can be programmed or controlled using dedicated software routines that implement application-specific computing kernels (e.g., matrix multiplication). Otherwise, they operate as traditional memories. By definition, these near-memory computing units exclusively process data that is directly mapped inside their private memory space (i.e., the memory banks instantiated within the IP itself.<\/p> <p>From a low-level point of view,&nbsp; this approach reduces data movement and memory bandwidth, thus increasing the system\u2019s energy efficiency. However,&nbsp; from an application point of view, it limits the size of the data that can be processed by the in-memory computing kernel (as it must fit inside a single memory bank) and does not allow for multi-memory bank parallelism opportunities.<\/p> <p>Many edge AI applications rely on fixed-point operations, replacing the more expensive floating-point operations used when deploying the same machine learning models on more powerful hardware. As of today, Carus and Caesar support integer datapath on generic 32, 16, and 8-bit instructions. None of the operations is specifically designed to deal with the fixed-point data format (as additions or multiplications followed by rounding and shifting instructions). Therefore, fixed-width operation must be emulated in software in the current implementation.<\/p> <p>This thesis aims to extend the Instruction Set Architecture (ISA) of the Carus near-memory computing IP with fixed-point instructions to increase performance and energy efficiency.<\/p> <p>Throughout the project, the student will learn:<\/p> <ol class='wp-block-list'> <li>How the Carus NMC IP works and how to offload computationally expensive tasks to it within the X-HEEP framework.<\/li> <li>How to extend the Carus NMC IP decoder and execution pipeline to support fixed-point instructions as additions, subtractions, and multiplications with rounding and shift in 32, 16, and 8-bit modes.<\/li> <li>Verify the functionality of the new instructions with randomized inputs.<\/li> <li>Verify that the introduced modifications do not alter the timing characteristics of the system, and iterate on the architecture in case they do (for example with techniques such as multicycle logic paths).<\/li> <li><em>[Optional]<\/em> Update a few existing applications to use the new fixed-point instructions instructions and test them on the system deployed on an FPGA.<\/li> <li>How to work with version control (Git) and third-party, open-source repositories.<\/li> <li>How to work in a team of people all contributing to the same project.<\/li> <\/ol> <p>The project will be carried out at the ESL at EPFL, one of the world\u2019s top-class universities. ESL is an active group (24 PhD students among 45 members) involved in many research aspects, therefore providing a stimulating research environment. The student will be under the supervision of Prof. David Atienza, Dr. Davide Schiavone, and Dr. Michele Caon.<\/p> <p><strong>Project objectives:<\/strong><\/p> <p>Project objectives:<\/p> <ol class='wp-block-list'> <li>Design a new set of fixed-point instructions as addition, subtraction, multiplication, and multiply-add supporting 32-, 16-, and 8-bit data elements by extending the Carus NMC IP decoder and execution pipeline.<\/li> <li>Verify that such instructions work correctly with randomized tests by extending the Carus testbench.<\/li> <li>Verify that the timing characteristics (e.g., the maximum operating frequency) of Carus ASIC implementation do not get worse and that the area increases negligibly by checking its existing physical implementation flow. In case it does, iterate the hardware.<\/li> <li><em>[Optional]<\/em> Update existing applications to leverage the new instructions and run the application on the system\u2019s hardware model deployed on an FPGA.<\/li> <\/ol> <p><strong>Required knowledge and skills:<\/strong><\/p> <ul class='wp-block-list'> <li>RTL design and FPGA implementation in SystemVerilog<\/li> <li>Good understanding of memory architectures and microcontrollers<\/li> <li>Good analytical skills<\/li> <li>Good background in computer architecture<\/li> <li>Teamwork and Git<\/li> <\/ul> <p><strong>Appreciated skills:<\/strong><\/p> <ul class='wp-block-list'> <li>Scientific curiosity<\/li> <li>Good communication skills<\/li> <li>Advanced English&nbsp;<\/li> <\/ul> <p><strong>Type of work:<\/strong> 40% theory analysis, 60% SW\/HW co-design and simulation<\/p><p>Artificial Intelligence (AI) has been one of the most dominant factors driving technology innovation over the last decade and has been exploited in a huge variety of fields ranging from image recognition and natural language processing to autonomous driving and modeling of complex physical systems. As integrated semiconductor devices become smaller and faster, systems-on-chip (SoCs) become more and more complex, enabling pocket-size, wearable, battery-powered systems to efficiently support the computationally expensive algorithms at the core of complex AI models while running on a limited power budget. Due to the large number of parameters, software-programmable SoCs are preferred thanks to their versatility and short time-to-market. Small cameras that recognize faces, microphones that filter background noise or recognize voice commands, wearable devices that detect epilepsy attacks, and implantable devices that constantly monitor tens of body parameters and release drugs accordingly to prevent organ failures are just a few examples of what smart edge devices represent in the AI revolution we are experiencing.<\/p> <p>One of the main bottlenecks for performance and energy efficiency of next-generation SoCs reside in the limited memory bandwidth inherent in the traditional Von Neuman architecture. One idea to overcome this limitation is to bring computation within the memory subsystem, to better exploit the available memory bandwidth and leverage data reuse more efficiently. Such computational paradigm is referred to as Processing-in-Memory (PiM) or Compute-in-Memory (CiM). Further benefits can be achieved by leveraging the Single Instruction, Multiple Data (SIMD) approach, where the same operation (i.e., an instruction) operates on a multitude of data (e.g., a vector or matrix), therefore significantly reducing the number of instructions loaded from memory and contributing to reducing the system\u2019s energy consumption.<\/p> <p>The <a href='https:\/\/www.epfl.ch\/labs\/esl\/'>Embedded Systems Laboratory (ESL)<\/a> at the Swiss Federal Institute of Technology Lausanne (EPFL) has developed two SRAM-based low-power architectures (known as Caesar and Carus) that normally behave as traditional memories, but also offer scalar and vector computing capabilities (i.e., arithmetic and logic operations such as addition, and, or, xor, multiplication, multiply-add, etc.) between two or more memory words. Because the data is processed within the memory layout itself, these smart near-memory IPs eliminate the need for moving operands through the system bus and into the local memory elements of processing elements that are physically far from the memory (e.g., inside the system CPU).&nbsp;<\/p> <p>In addition to memory architectures, ESL has also developed X-HEEP (eXtendable Heterogeneous Energy-Efficient Platform). It is an open-source, configurable, and extensible single-core RISC-V 32-bit Microcontroller Unit (MCU), sponsored by the EcoCloud Sustainable Computing center of EPFL. It is based on many third-party open-source IPs as well as in-house IPs developed at the ESL jointly with other EPFL laboratories. X-HEEP provides a framework to configure and extend the MCU and experiment with it as an RTL simulation model (Verilator, Questasim, or VCS), a hardware prototype on a Xilinx FPGA, and even tape it out as a standalone ASIC circuit. The framework also provides the RISC-V software toolchain and the SDK that are necessary to deploy applications on the MCU.<\/p> <p>Recently, the X-HEEP system has been extended to integrate Caesar- and Carus-based memories besides traditional SRAM banks. When running in computing mode, they can be programmed or controlled using dedicated software routines that implement application-specific computing kernels (e.g., matrix multiplication). Otherwise, they operate as traditional memories. By definition, these near-memory computing units exclusively process data that is directly mapped inside their private memory space (i.e., the memory banks instantiated within the IP itself.<\/p> <p>From a low-level point of view,&nbsp; this approach reduces data movement and memory bandwidth, thus increasing the system\u2019s energy efficiency. However,&nbsp; from an application point of view, it limits the size of the data that can be processed by the in-memory computing kernel (as it must fit inside a single memory bank) and does not allow for multi-memory bank parallelism opportunities.<\/p> <p>Many edge AI applications rely on fixed-point operations, replacing the more expensive floating-point operations used when deploying the same machine learning models on more powerful hardware. As of today, Carus and Caesar support integer datapath on generic 32, 16, and 8-bit instructions. None of the operations is specifically designed to deal with the fixed-point data format (as additions or multiplications followed by rounding and shifting instructions). Therefore, fixed-width operation must be emulated in software in the current implementation.<\/p> <p>This thesis aims to extend the Instruction Set Architecture (ISA) of the Carus near-memory computing IP with fixed-point instructions to increase performance and energy efficiency.<\/p> <p>Throughout the project, the student will learn:<\/p> <ol class='wp-block-list'> <li>How the Carus NMC IP works and how to offload computationally expensive tasks to it within the X-HEEP framework.<\/li> <li>How to extend the Carus NMC IP decoder and execution pipeline to support fixed-point instructions as additions, subtractions, and multiplications with rounding and shift in 32, 16, and 8-bit modes.<\/li> <li>Verify the functionality of the new instructions with randomized inputs.<\/li> <li>Verify that the introduced modifications do not alter the timing characteristics of the system, and iterate on the architecture in case they do (for example with techniques such as multicycle logic paths).<\/li> <li><em>[Optional]<\/em> Update a few existing applications to use the new fixed-point instructions instructions and test them on the system deployed on an FPGA.<\/li> <li>How to work with version control (Git) and third-party, open-source repositories.<\/li> <li>How to work in a team of people all contributing to the same project.<\/li> <\/ol> <p>The project will be carried out at the ESL at EPFL, one of the world\u2019s top-class universities. ESL is an active group (24 PhD students among 45 members) involved in many research aspects, therefore providing a stimulating research environment. The student will be under the supervision of Prof. David Atienza, Dr. Davide Schiavone, and Dr. Michele Caon.<\/p> <p><strong>Project objectives:<\/strong><\/p> <p>Project objectives:<\/p> <ol class='wp-block-list'> <li>Design a new set of fixed-point instructions as addition, subtraction, multiplication, and multiply-add supporting 32-, 16-, and 8-bit data elements by extending the Carus NMC IP decoder and execution pipeline.<\/li> <li>Verify that such instructions work correctly with randomized tests by extending the Carus testbench.<\/li> <li>Verify that the timing characteristics (e.g., the maximum operating frequency) of Carus ASIC implementation do not get worse and that the area increases negligibly by checking its existing physical implementation flow. In case it does, iterate the hardware.<\/li> <li><em>[Optional]<\/em> Update existing applications to leverage the new instructions and run the application on the system\u2019s hardware model deployed on an FPGA.<\/li> <\/ol> <p><strong>Required knowledge and skills:<\/strong><\/p> <ul class='wp-block-list'> <li>RTL design and FPGA implementation in SystemVerilog<\/li> <li>Good understanding of memory architectures and microcontrollers<\/li> <li>Good analytical skills<\/li> <li>Good background in computer architecture<\/li> <li>Teamwork and Git<\/li> <\/ul> <p><strong>Appreciated skills:<\/strong><\/p> <ul class='wp-block-list'> <li>Scientific curiosity<\/li> <li>Good communication skills<\/li> <li>Advanced English&nbsp;<\/li> <\/ul> <p><strong>Type of work:<\/strong> 40% theory analysis, 60% SW\/HW co-design and simulation<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Michele Caon, Prof. David Atienza<br> Contact email: <a href='mailto:davide.schiavone@epfl.ch; michele.caon@epfl.ch; david.atienza@epfl.ch?subject=Implementation of an Accelerator based on Near-Memory Computing IPs for a RISC-V-based Microcontroller'>davide.schiavone@epfl.ch; michele.caon@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project557minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Michele Caon, Prof. David Atienza<br>\";<\/script>\n<span id=project557><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Michele Caon, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project557',project557); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor548><\/a><b><span style='font-size: 20px;'>X-HEEP Accelerators: design, verification, and integration of a general purpose co-processor\/accelerator based on RISC-V for edge-computing SoCs<\/b><br><script>var project548=\"Microcontrollers (MCUs) are used in a wide range of applications ranging from sensors monitoring all the way to robotics and automotive. Despite typically lower in performance, they are usually preferred over custom circuits and FPGAs thanks to their versatility and easy programmability via software routines typically written in the C language. <br \/><br \/>Thanks to their versatility, MCUs are typically chosen as edge computing platform. However, deploying edge computing kernels (e.g. signal processing, neural networks, etc.) on resource-constrained and power-limited devices poses serious challenges in delivering real-time performance. In addition, edge computing platforms are battery powered, thus achieving high energy efficiency is needed to increase the battery life-time. &nbsp;<br \/><br \/>For these reasons, edge computing platforms are typically extended with accelerators. However, accelerators are typically built around a specific kernel to maximize performance and energy efficiency, and thus not versatile, which makes its design and verification more expensive and not versatile.<br \/><br \/>The goal of this thesis is to design and implement a general purpose accelerator for edge-computing applications based on RISC-V. The accelerator can be either based on coarse-grain reconfigurable arrays (CGRAs), or on Graphic Processing Units (GPUs) and it will work together a RISC-V core. <br \/><br \/>The accelerator will exploit the data, instruction or thread parallelism to significantly increase the performance and the energy efficiency of typical edge-computing benchmarks. <br \/><br \/>The accelerator will be integrated in X-HEEP, (eXtendable Heterogeneous Energy-Efficient Platform), an open-source, configurable, and extensible single-core RISC-V MCU, sponsored by the EcoCloud Sustainable Computing center of EPFL, and&nbsp; developed at the Embedded Systems Laboratory (ESL) jointly with other EPFL laboratories.<br \/><br \/> The expected outcomes of this thesis are:  &nbsp; <ul>  \t<li>Design\/Extend the accelerator to improve the performance\/area\/power figures or the programmability of the accelerator to make it more general-purpose<\/li>  \t<li>Extend the open-source RISC-V X-HEEP with an accelerator<\/li>  \t<li>Benchmark the performance of the accelerator of kernels on the FPGA<\/li>  \t<li>Compare the performance of the accelerator against the X-HEEP CPU on a given set of kernels on the FPGA<\/li>  \t<li>Provide performance\/power\/area (PPA) figures of the accelerator in tsmc65 LP technology<\/li>  \t<li>Compare the accelerator with other general-puropose or fixed-function state-of-the-art accelerators<\/li> <\/ul> Throughout the project, the student will learn: <ul>  \t<li>How to design or extend an accelerator that can perform general-purpose kernels oriented to signal-processing<\/li>  \t<li>How to extend the RISC-V X-HEEP platform with the designed accelerator, this imposes constraints on the accelerator HW and SW interface<\/li>  \t<li>How to analyze the PPA figures of the accelerator in a given technology throught the ASIC flow<\/li>  \t<li>How to compare the proposed solution against state-of-the-art accelerators<\/li>  \t<li>How to work with git repositories and in a team of people all contributing to the same project.<\/li> <\/ul>The project will be carried out at the ESL at EPFL, one of the world's top-class universities including EcoCloud&rsquo;s technical support. ESL is an active group (24 Ph.D. students among 45 members) involved in many research aspects. The student will be under the supervision of Prof. David Atienza and Dr. Davide Schiavone.<br \/><br \/><strong>Project objectives:<\/strong> <ol>  \t<li>Understanding the X-HEEP microcontroller, how it works, and learning how IPs are integrated. Understand how the configuration script of the bus and CPU is implemented.<\/li>  \t<li>Understanding how to design accelerators in SystemVerilog and how to extend the MCU with that accelerator.<\/li>  \t<li>Understanding the ASIC flow to analyze the PPA figures.<\/li>  \t<li>Validation of the proposed accelerator with C tests.<\/li>  \t<li>Comparison against state-of-the-art solutions over a set of applications.<\/li> <\/ol> <strong> Required knowledge and skills:<\/strong> <ul>  \t<li>RTL design and in any HDL (SystemVerilog is preferred and is going to be used throughout the project)<\/li>  \t<li>Python basic skills and algorithm implementations<\/li>  \t<li>Good understanding of memory architectures and microcontrollers<\/li>  \t<li>Good analytical skills<\/li>  \t<li>Good background in computer architecture and algorithms<\/li>  \t<li>Teamwork and git<\/li> <\/ul> <strong>&nbsp;<\/strong>  <strong>Appreciated skills:<\/strong> <ul>  \t<li>Scientific curiosity<\/li>  \t<li>Good communication skills<\/li>  \t<li>Advanced English<\/li> <\/ul> <br \/> <strong>Type of work:<\/strong> 20% theory analysis, 80% design and simulation<br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Prof. David Atienza<br> Contact email: <a href='mailto:davide.schiavone@epfl.ch;david.atienza@epfl.ch?subject=X-HEEP Accelerators: design, verification, and integration of a general purpose co-processor\/accelerator based on RISC-V for edge-computing SoCs'>davide.schiavone@epfl.ch;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project548minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Prof. David Atienza<br>\";<\/script>\n<span id=project548><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project548',project548); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor540><\/a><b><span style='font-size: 20px;'>Towards intelligent laser welding: automatic control of processing quality using reinforcement learning and deep neural architectures<\/b><br><script>var project540=\"Laser welding is a critical technology for many key economic sectors, including aerospace, automotive, medical industries. The on-growing importance of this technology requires insurance of processing quality, which is lacking in modern industrial welding systems. The reason is a non-linear nature of light - mater interactions, which complicates the development of quality control with robust operation under varying manufacturing conditions and material properties. Recent advances in artificial intelligence (AI) and machine learning (ML) are a potential solution that allows recovering complex data regularities in a self-learning manner. The existing challenges in AI\/ML control developments are the self-learning\/adaption mechanisms and search for optimal solutions under uncertainty constraints. The development of such AI\/ML control is the prime goal of this project.<br \/>Keywords: Laser welding, reinforcement learning, control, self-learning, quality in laser processing<br \/><br \/><strong>Workplan<\/strong><br \/><br \/>As a part of ML\/AI team, you will work in the interdisciplinary field of laser physics, laser technology, sensors and AI\/ML. Your main activity will be devoted to development of AI\/ML control algorithms (80% of time), while the efficiency of those will be supported by experiments, involving unique laser welding equipment (20% of time). During your work, you will touch such topics, as machine learning, probability theory, topology and non-linear dynamics. You will push your algorithm towards a complete autonomous learning of the laser welding. In particular you will develop unsupervised self-learning procedures, that will bring the algorithm through the first baby-steps to a complete mastering of laser welding within a limited time without any himan interventions.<br \/><br \/><strong>Required skills<\/strong><br \/><br \/>We are looking for the internships and master student with the background in electrical and electronic engineering, applied mathematics, computational science or engineering. The position assumes the basic knowledge of control and probability theories (in latter, in particular, the concepts of reinforcement leanring are benefitial). The hands on and awareness of the main concepts of machine learning is preferable as well. The position assumes programming skills in python and\/or C++. Experience with real-time systems is benefitial. <br \/><br \/><strong>Languages:<\/strong> English (Advanced)<br \/><br \/><strong>Location: <\/strong>Empa Thun<br \/><br \/><strong>Project\/contact:<\/strong> This work is a part of intensive research in intelligent industrial automation that aims to develop digital twins for laser material processing. More details about current activities can be found on the group webpage:<br \/> <br \/><a href='https:\/\/www.empa.ch\/web\/s204'>https:\/\/www.empa.ch\/web\/s204<\/a><br \/><p>Further technical details about the project and applications can be sent\/discussed with <a href='mailto:sergey.shevchik@empa.ch'>Dr. Sergey Shevchik <\/a><\/p><p>&nbsp;<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Sergey Shevchik, Prof. David Atienza<br> Contact email: <a href='mailto:sergey.shevchik@empa.ch; david.atienza@epfl.ch?subject=Towards intelligent laser welding: automatic control of processing quality using reinforcement learning and deep neural architectures'>sergey.shevchik@empa.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project540minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Sergey Shevchik, Prof. David Atienza<br>\";<\/script>\n<span id=project540><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Sergey Shevchik, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project540',project540); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=http:\/\/www.empa.ch\/ target=_blank title='EMPA (Eidgen\u00f6ssische Materialpr\u00fcfungs- und Forschungsanstalt)'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/29.png width=70 alt='EMPA (Eidgen\u00f6ssische Materialpr\u00fcfungs- und Forschungsanstalt)'><\/a><\/td><\/tr><tr><td colspan=2><h3>Master or Semester Projects<br><br><\/h3><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor728><\/a><b><span style='font-size: 20px;'>Reinventing Numbers for Edge AI: Enabling Posit Arithmetic in Wearable Cough Monitoring<\/b><br><script>var project728=\"<strong>The student will port a state-of-the-art edge-AI cough detection algorithm to posit arithmetic, developing a software abstraction layer and targeting an ultra-low-power RISC-V chip.<\/strong><p><a href='https:\/\/dl.acm.org\/doi\/10.1145\/3772284'>Posit arithmetic<\/a> is an emerging numerical format proposed as an alternative to floating-point, offering improved dynamic range and accuracy-energy efficiency trade-offs for certain workloads. While posits are attractive for edge-AI and signal processing, their adoption is currently limited by the lack of mature software ecosystems and high-level programming support. The available <a href='https:\/\/eslweb.epfl.ch\/Xposit'>Xposit <\/a>project adds a custom RISC-V extension to the backend of the LLVM compiler, requiring programmers to explicitly write assembly code to exploit posit instructions. This enables the native execution of posit arithmetic in hardware projects like <a href='http:\/\/github.com\/esl-epfl\/PHEE'>PHEE<\/a> or <a href='https:\/\/x-heep.readthedocs.io\/en\/latest\/ASIC\/asic.html#heepatia'>HEEPatia<\/a>, an ultra-low power chip for edge-AI submitted for tape-out in 16nm technology.<\/p><p>This project aims to bridge this gap by developing a software abstraction layer that enables algorithm developers to write posit-based code at a higher level, while still generating correct and efficient Xposit assembly that can be executed on PHEE or HEEPatia. This can be in the form of C templates, macros, or a lightweight library. Using this abstraction, the student will port the Cough-E cough detection algorithm to posit arithmetic and run it on HEEPatia on an FPGA. The <a href='https:\/\/github.com\/esl-epfl\/Cough-E'>Cough-E<\/a> algorithm, developed at the Embedded Systems Laboratory (ESL) of EPFL, is a multimodal, privacy-preserving cough detector that combines audio and kinematic signals, which are processed in an Edge-AI wearable device.<\/p><p>The student will combine compiler-aware programming, embedded AI, and embedded hardware deployment, and will have the opportunity to influence how emerging arithmetic formats are programmed on real hardware. The final outcome will be a software abstraction layer for posit arithmetic and a posit implementation of Cough-E, evaluating the accuracy versus energy consumption trade-offs at the edge. The results of this project will directly contribute to ongoing research at ESL on alternative number formats for ultra-low-power AI, and high-quality results may lead to co-authorship of a research publication.<\/p><p><strong>Project objectives<\/strong><\/p><p>This project consists of several intermediate objectives. Each objective requires a small deliverable that helps students and supervisors stay on track, quantify progress, and build the final report from the start.<br \/><em>Objectives 1-4 are required to get a passing grade. Objectives 5-6 are required to get a maximum grade.<\/em><\/p><ol> <li>Background study and setup: Understand posit arithmetic, its advantages and limitations, and how it differs from floating- and fixed-point formats. Study the Cough-E algorithm and its signal processing pipeline. Analyze the Xposit RISC-V extension and its LLVM backend integration. Set up the environment. <strong>Deliverable: 3-page technical introduction to the problem, including the current setup.<\/strong><\/li> <li>Programming abstraction for posit arithmetic: Design and implement a software abstraction layer that allows posit operations to be expressed in higher-level functions. Possible approaches include C macros, inline assembly wrappers, or a minimal runtime\/library enabling structured posit kernels. Test the outcome on minimal code snippets executed on HEEPatia running on an FPGA. <strong>Deliverable: A Pull Request with the developed software abstraction layer, and a 2-3 page description of its use in a larger Edge-AI application like Cough-E.<\/strong><\/li> <li>Kernel-level porting of Cough-E to posits: Port the signal processing and feature extraction kernels of Cough-E to posit arithmetic using the developed abstraction layer. <strong>Deliverable: A Pull Request with the developed kernels and their validation, and a 1-2 page description of the results.<\/strong><\/li> <li>Performance and energy metrics: Obtain performance and energy metrics of the execution of the developed posit kernels on HEEPatia. <strong>Deliverable: A 2-3 page description and analysis of the results.<\/strong><\/li> <li>Kernel validation: Add simulation support using the <a href='https:\/\/github.com\/stillwater-sc\/universal'>Universal numbers library<\/a> to Cough-E. Validate the correctness of the results of Objective 3 against a simulation execution using Universal. <strong>Deliverable: A Pull Request with the simulation support, and a 1-2 page description of the validation.<\/strong><\/li> <li>Full application integration: Integrate the ported kernels from Objective 3 into the complete Cough-E application pipeline. Deploy and execute the full posit-based Cough-E application on HEEPatia running on an FPGA. Verify end-to-end functional correctness. <strong>Deliverable: A Pull Request with the integrated full application, and a 2-3 page description of the integration process and results.<\/strong><\/li><\/ol><p>The pull requests will be submitted to the repository of HEEPatia\/Cough-E, including the required scripts and documentation to replicate the results.<\/p><p><strong>Required knowledge and skills:<\/strong><\/p><ul> <li>Strong embedded C programming, demonstrated through past projects.<\/li> <li>Previous experience or knowledge of computer arithmetic is a plus.<\/li> <li>Basic Python programming.<\/li> <li>Ability to work consistently, independently, and communicate effectively in English.<\/li> <li>Familiarity with Git version control.<\/li><\/ul><p>Type of work: 20% theory analysis, 60% design and implementation, 20% verification and documentation <\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. David Mallas\u00e9n Quintana and Prof. David Atienza<br> Contact email: <a href='mailto:david.mallasen@epfl.ch;david.atienza@epfl.ch?subject=Reinventing Numbers for Edge AI: Enabling Posit Arithmetic in Wearable Cough Monitoring'>david.mallasen@epfl.ch;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project728minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. David Mallas\u00e9n Quintana and Prof. David Atienza<br>\";<\/script>\n<span id=project728><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. David Mallas\u00e9n Quintana and Prof. David Atienza<br> <a href=#_ onclick=opendesc('project728',project728); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor727><\/a><b><span style='font-size: 20px;'>Energy-Efficient Edge AI for Wearable Cough Monitoring via Fixed-Point Optimization<\/b><br><script>var project727=\"<p><span style='color: #000000'><strong>The student will port an existing state-of-the-art cough detection algorithm from floating-point to fixed-point arithmetic and study the resulting accuracy-energy trade-offs when deployed on a wearable edge-AI platform.<\/strong><\/span> <\/p> <p><span style='color: #000000'>Chronic cough is a key symptom in several respiratory diseases, and its continuous monitoring can significantly support diagnosis, treatment evaluation, and long-term patient care. However, continuous audio monitoring raises major privacy concerns and energy constraints. Edge-AI wearable devices offer a promising solution by processing sensitive data locally, without transmitting raw signals off-device. The <\/span><a rel='noopener noreferrer nofollow' href='https:\/\/github.com\/esl-epfl\/Cough-E' target='_blank'><span style='color: #1155cc'>Cough-E<\/span><\/a><span style='color: #000000'> algorithm, developed at the Embedded Systems Laboratory (ESL) of EPFL, is a multimodal, privacy-preserving cough detector that combines audio and kinematic signals and has been shown to run in real time on an ARM Cortex-M33 microcontroller.<\/span> <\/p> <p><span style='color: #000000'>Currently, Cough-E relies on floating-point arithmetic, which simplifies algorithm development but limits energy efficiency and scalability to ultra-low-power wearable platforms. This project aims to port the algorithm to fixed-point arithmetic, carefully analyzing the required dynamic range of each processing block (feature extraction, normalization, and classification) to preserve detection accuracy while minimizing energy consumption.<\/span> <\/p> <p><span style='color: #000000'>The student will study quantization effects, dynamic range constraints, and numerical stability across the full inference pipeline, using clinically relevant, event-based metrics. The final outcome will be a fixed-point implementation of Cough-E and a systematic evaluation of accuracy versus energy consumption, enabling informed design decisions for future low-power cough monitoring devices. The results of this project will contribute to ongoing research at ESL, and high-quality outcomes may lead to co-authorship of a research publication.<\/span> <\/p> <h3><span style='color: #000000'><strong>Project objectives<\/strong><\/span><\/h3> <p><span style='color: #000000'>This project consists of several intermediate objectives. Each objective requires a small deliverable that helps students and supervisors stay on track, quantify progress, and build the final report from the start.<\/span> <\/p> <p><span style='color: #000000'><em>Objectives 1-4 are required to get a passing grade. Objectives 5-6 are required to get a maximum grade.<\/em><\/span> <\/p> <ol>     <li>         <p><span style='color: #000000'>Understanding the problem and algorithm: Study chronic cough monitoring, clinically relevant evaluation metrics, and the structure of the Cough-E algorithm. Understand the signal processing pipeline, feature extraction, and classification stages, as well as the constraints of wearable edge-AI platforms. <strong>Deliverable: 2-3 page technical introduction to the problem.<\/strong><\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #000000'>Accuracy evaluation on the full dataset: Run inference using the complete evaluation dataset to obtain clinically relevant accuracy metrics of the cough detection algorithm. <strong>Deliverable: A Pull Request<\/strong><sup>1<\/sup><strong> with the accuracy evaluation flow, and a 1-2 page description of the metrics and initial results.<\/strong><\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #000000'>Dynamic range and numerical analysis: Analyze the floating-point implementation to identify the required dynamic range and precision for each kernel and operation. Determine suitable fixed-point formats (Q-formats) and scaling strategies. <strong>Deliverable: 2-3 page analysis of dynamic range and precision requirements.<\/strong><\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #000000'>Fixed-point porting of the algorithm: Port the full inference pipeline of Cough-E from floating-point to fixed-point arithmetic in embedded C, taking into account the results of 3. <strong>Deliverable: A Pull Request<\/strong><sup>1<\/sup><strong> with the fixed-point implementation of Cough-E, and a 1-2 page description of the metrics and results.<\/strong><\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #000000'>Per-kernel research on arithmetic impact: Investigate the impact of each arithmetic at the kernel level, extrapolating the Cough-E results to other biomedical algorithms. <strong>Deliverable: 3-page analysis of the impact of FP and FxP at the kernel level.<\/strong><\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #000000'>Edge-AI hardware evaluation: Adapt the algorithm to run in one of the latest ultra-low power chips for edge-AI developed at ESL, further investigating the energy-vs-accuracy trade-offs. <strong>Deliverable: A Pull Request<\/strong><sup>1<\/sup><strong> with the adapted implementation, and a 2-3 page description and trade-off evaluation.<\/strong><\/span>         <\/p>     <\/li> <\/ol> <p><sup>1<\/sup>On the Cough-E repository, including required scripts and documentation to replicate the results. <\/p> <p><span style='color: #0e101a'><strong>Required knowledge and skills:<\/strong><\/span> <\/p> <ul>     <li>         <p><span style='color: #0e101a'>Strong embedded C programming, demonstrated through past projects.<\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #0e101a'>Previous experience or knowledge of computer arithmetic is a plus.<\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #0e101a'>Basic Python programming.<\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #0e101a'>Ability to work consistently, independently, and communicate effectively in English.<\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #0e101a'>Familiarity with Git version control.<\/span>         <\/p>     <\/li> <\/ul> <p><span style='color: #0e101a'><strong>Type of work:<\/strong> 20% theory analysis, 60% design and implementation, 20% verification and documentation<\/span> <\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. David Mallas\u00e9n Quintana and Prof. David Atienza<br> Contact email: <a href='mailto:david.mallasen@epfl.ch;david.atienza@epfl.ch?subject=Energy-Efficient Edge AI for Wearable Cough Monitoring via Fixed-Point Optimization'>david.mallasen@epfl.ch;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project727minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. David Mallas\u00e9n Quintana and Prof. David Atienza<br>\";<\/script>\n<span id=project727><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. David Mallas\u00e9n Quintana and Prof. David Atienza<br> <a href=#_ onclick=opendesc('project727',project727); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><a href=#_ onclick=opendesc('project713',project713); style='position:relative;z-index:99;'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/project713.png><\/a><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor713><\/a><b><span style='font-size: 20px;'>Development of low-level algorithms for the conversion of biological signals<\/b><br><script>var project713=\"<div class='entry-content mb-5'> \t\t <p><strong>The student will develop low-level C code to translate the output  of a VCO-based ADC into the correct format for a feature extraction  algorithm, aiding in the detection of fear from measurements of Galvanic  Skin Response (GSR).<\/strong><\/p> <p>The developed code will be deployed on <a href='https:\/\/arxiv.org\/pdf\/2509.04528'>HEEPidermis<\/a>, a <a href='https:\/\/github.com\/esl-epfl\/HEEPidermis\/'>System-on-Chip<\/a>  developed and taped out in 65 nm at the Embedded Systems Laboratory  (ESL) to record and process GSR. GSR refers to changes in the skin&rsquo;s  electrical conductance that occur when sweat gland activity shifts.  Because the sympathetic nervous system controls these responses, GSR is  often used as an indicator of stress, excitement, or cognitive workload.<\/p> <p>The algorithm to be developed in this project will start from raw  data coming from a counter that keeps track of the oscillation frequency  of a Voltage Controlled Oscillator (VCO). Using information coming from  different circuits, the algorithm should normalize the data so that it  can be used in posterior feature extraction steps. To properly leverage  the sparse nature of GSR events, the student should later scale their  code to work on event-based samples coming from an on-chip block.&nbsp;<\/p> <p>The results of this project will be included into a broader research  effort, meaning the student has the opportunity of co-authoring a  research publication (subject to the quality and rigor of the obtained  results). This project is described with the objectives of a Master  Semester Project. However, it can be extended to a Master Thesis by  combining it with the project titled &ldquo;Development of low-level  algorithms for the control of a closed-loop Analog Front-End for  biosignal measurement.&rdquo;<\/p> <p>This <strong>Master Semester\/Thesis project<\/strong> will be carried out at the  ESL of EPFL, which has been at the forefront of ultra-low power  processing, focusing on embedded algorithms for healthcare wearables.  ESL is an active group (22 Ph.D. students among 45 members) involved in  various levels of the healthcare electronics stack, from full-custom  microelectronic design to device manufacturing and edge AI. The student  will be under the supervision of Mr. Juan Sapriza, Dr. David Mallas&eacute;n,  and Prof. David Atienza.<\/p> <h3><strong>Project objectives:<\/strong><\/h3> <p>This project consists of several intermediate objectives. Each  objective requires a small deliverable that helps students and  supervisors stay on track, quantify progress, and build the final report  from the start.&nbsp;<\/p> <p><em>Objectives 1-4 are required to get a passing grade.<\/em><\/p> <p><em>Objectives 5-6 are required to get a maximum grade.<\/em><\/p> <ol><li>Understand the problem: the GSR signal, its properties and features  of interest, the hardware blocks of HEEPidermis conforming the recording  process, and the requirements of the feature extraction algorithm. <br \/>&nbsp;<strong>Deliverable: 2-page introduction to the problem.<\/strong>&nbsp;&nbsp;&nbsp;<\/li><li>Reconstruction of data: Develop an algorithm to obtain data from the  VCO-based ADC and the front-end, and output data in a standard format  that fits the feature-extraction algorithm. Evaluate the performance of  the feature extraction algorithm at different operating points of the  analog front-end.<br \/>&nbsp;<strong>Deliverable: A Pull-Request<\/strong><sup>1<\/sup><strong>, a 3-page description of the theory and implementation of the algorithm, and analysis of results.<\/strong><\/li><li>Reconstruction from event-based data: Develop an algorithm to obtain  data from the front-end and an event-based decimation block to output  data in the same standard format.&nbsp; Compare performance and resource  utilization with respect to the previous solution. <br \/>&nbsp;<strong>Deliverable: A Pull-Request<\/strong><sup>1<\/sup><strong>, a 3-page description of the theory and implementation of the algorithm, and analysis of results.<\/strong><\/li><li>Adaptation to differential data: Develop an alternative conversion  algorithm that outputs data differentials. Adapt the feature extraction  algorithm to operate on differences instead of absolute values (without  reconstructing the signal!). Compare performance and resource  utilization with respect to the previous solution.<br \/>&nbsp;<strong>Deliverable: A Pull-Request<\/strong><sup>1<\/sup><strong>, a 3-page description of the theory and implementation of the algorithm, and analysis of results.<\/strong><\/li><li>Adaptation to event-based data: Adapt the feature extraction  algorithm to operate on event-based differences instead of fixed-rate  absolute values (without reconstructing the signal!). Compare  performance and resource utilization with respect to the previous  solution.<br \/>&nbsp;<strong>Deliverable: A Pull-Request<\/strong><sup>1<\/sup><strong>, a 3-page description of the theory and implementation of the algorithm, and analysis of results.<\/strong><\/li><li>Silicon verification: Carry out the experiments on the silicon chip. <br \/>&nbsp;<strong>Deliverable: A thorough comparison of the different results in silicon.&nbsp;<\/strong><\/li><\/ol> <p>&nbsp;<\/p> <p><sup>1<\/sup>on the HEEPidermis repository including the algorithm as a callable function, an example and documentation explaining its use.<\/p> <h3><strong>Required knowledge and skills:<\/strong><\/h3> <ul><li>Strong embedded C programming<\/li><li>Good understanding of logic circuits<\/li><li>Good understanding of electrical circuits<\/li><li>Creativity, autonomy, and scientific rigor&nbsp;<\/li><li>Git<\/li><\/ul> <h3><strong>Type of work: <\/strong><\/h3> <p>10% Research, 20% Design, 40% Implementation, 30% Testing and characterization.<\/p> \t<\/div>                         <div class='post-nav py-md-1'>                                 <div class='nav-prev'>         <\/div><\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Juan Sapriza, Dr. David Mallas\u00e9n, Prof. David Atienza<br> \";<\/script>\n<script>var project713minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Juan Sapriza, Dr. David Mallas\u00e9n, Prof. David Atienza<br>\";<\/script>\n<span id=project713><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Juan Sapriza, Dr. David Mallas\u00e9n, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project713',project713); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><a href=#_ onclick=opendesc('project712',project712); style='position:relative;z-index:99;'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/project712.png><\/a><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor712><\/a><b><span style='font-size: 20px;'>Development of low-level algorithms for the control of a closed-loop Analog Front-End for biosignal measurement<\/b><br><script>var project712=\"<p><strong>The student will develop low-level C code to control the Analog Front-End (AFE) of a SoC with the objective of increasing its range of acquisition and power efficiency while performing measurements of Galvanic Skin Response (GSR).<\/strong><\/p> <p>The developed code will be deployed on <a href='https:\/\/arxiv.org\/pdf\/2509.04528'>HEEPidermis<\/a>, a <a href='https:\/\/github.com\/esl-epfl\/HEEPidermis\/'>System-on-Chip<\/a> developed at the Embedded Systems Laboratory (ESL) to record and process GSR. GSR refers to changes in the skin&rsquo;s electrical conductance that occur when sweat gland activity shifts. Because the sympathetic nervous system controls these responses, GSR is often used as an indicator of stress, excitement, or cognitive workload.<\/p> <p>The algorithm to be developed in this project will control the parameters of the AFE, choosing them wisely to improve performance and optimize energy consumption. For this, the student should improve and integrate a model of the power consumption of the SoC, as well as the performance of the ADC under different operating points.&nbsp;&nbsp;<\/p> <p>The results of this project will be included into a broader research effort, meaning the student has the opportunity of co-authoring a research publication (subject to the quality and rigor of the obtained results). This project is described with the objectives of a Master Semester Project. However, it can be extended to a Master Thesis by combining it with the project titled &ldquo;Development of low-level algorithms for the conversion of biological signals.&rdquo;<\/p> <p>This <strong>Master Semester\/Thesis project<\/strong> will be carried out at the ESL of EPFL, which has been at the forefront of ultra-low power processing, focusing on embedded algorithms for healthcare wearables. ESL is an active group (22 Ph.D. students among 45 members) involved in various levels of the healthcare electronics stack, from full-custom microelectronic design to device manufacturing and edge AI. The student will be under the supervision of Mr. Juan Sapriza, Dr. David Mallas&eacute;n, and Prof. David Atienza.<\/p> <h3><strong>Project objectives:<\/strong><\/h3> <p>This project consists of several intermediate objectives. Each objective requires a small deliverable that helps students and supervisors stay on track, quantify progress, and build the final report from the start.&nbsp;<\/p> <p><em>Objectives 1-4 are required to get a passing grade.<\/em><\/p> <p><em>Objectives 5-6 are required to get a maximum grade.<\/em><\/p> <ol> <li>Understand the problem: the GSR signal, its properties and features of interest, the hardware blocks of HEEPidermis conforming the recording process, and the effects of the different operating points on the whole system. <br \/> <strong>Deliverable: 2-page introduction to the problem.<\/strong>&nbsp;&nbsp;&nbsp;<\/li> <li>Integration of power and performance models: Improve the current power model of the SoC to include considerations on digital back-end usage and signal quality. Develop an algorithm to evaluate the quality of the acquired signal and the power being consumed. <br \/> <strong>Deliverable: A Pull-Request<\/strong><sup>1<\/sup><strong>, a 3-page description of the model and algorithm.<\/strong><\/li> <li>Abstraction of the adjustment process: Develop an abstraction layer (C code) that allows the main application to request changes to the operating point (e.g. to change range, sensitivity, or power) without knowing the model. <br \/> <strong>Deliverable: A Pull-Request<\/strong><sup>1<\/sup><strong>, a 1-page description of the layer.<\/strong><\/li> <li>Automatic range adjustment: Configure the needed blocks to trigger interrupts when the signal exits a stipulated range. Develop the interrupt service routine to request the proper adjustment to bring it back. <br \/> <strong>Deliverable: A Pull-Request<\/strong><sup>1<\/sup><strong>, a 2-page description of the implementation of the algorithm.<\/strong><\/li> <li>Automatic operating point adjustment: Integrate the range adjustment with an additional consideration of signal quality and system-level energy. Show how the system can maintain performance while reaching a lower energy requirement. <p> <strong>Deliverable: A Pull-Request<\/strong><sup>1<\/sup><strong>, a 4-page description of the theory and implementation of the algorithm, and a thorough analysis of results.<\/strong><\/p> <\/li> <li>Silicon verification: Carry out the experiments on the silicon chip. <br \/> <strong>Deliverable: A thorough analysis of the results. Adaptation of the power\/performance model using silicon measurements.<\/strong><\/li> <\/ol> <p><sup>1<\/sup>on the HEEPidermis repository including the algorithm as a callable function, an example and documentation explaining its use.<\/p> <h3><strong>Required knowledge and skills:<\/strong><\/h3> <ul> <li>Strong embedded C programming<\/li> <li>Good understanding of logic circuits<\/li> <li>Good understanding of electrical circuits<\/li> <li>Creativity, autonomy, and scientific rigor&nbsp;<\/li> <li>Git<\/li> <\/ul> <h3><strong>Type of work: <\/strong><\/h3> <p>10% Research, 30% Design, 20% Implementation, 40% Testing and characterization.<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Juan Sapriza, Dr. David Mallas\u00e9n, Prof. David Atienza <br> \";<\/script>\n<script>var project712minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Juan Sapriza, Dr. David Mallas\u00e9n, Prof. David Atienza <br>\";<\/script>\n<span id=project712><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Juan Sapriza, Dr. David Mallas\u00e9n, Prof. David Atienza <br> <a href=#_ onclick=opendesc('project712',project712); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor707><\/a><b><span style='font-size: 20px;'>Custom Hardware Design of Posit Arithmetic Operators for Low-Precision Computing<\/b><br><script>var project707=\"<p>Computer arithmetic is the foundation of all numerical computation in digital systems. It encompasses the algorithms and hardware structures used to perform basic operations such as addition, multiplication, and division on binary-encoded numbers. These operations are implemented using digital logic circuits that manipulate bits through combinational and sequential logic. The choice of number representation, such as fixed-point or floating-point, directly impacts the precision, performance, and energy efficiency of arithmetic units. As applications increasingly demand low-power and high-throughput computation, especially in embedded and signal processing systems, the design of efficient arithmetic hardware has become a critical area of research and innovation.<\/p> <p>Posit arithmetic is an emerging number representation system designed to enhance numerical accuracy and efficiency, particularly in low-precision computing applications. Unlike traditional floating-point formats, posits offer a tapered precision scheme that allocates more bits to the exponent or fraction depending on the magnitude of the number. This dynamic allocation enables better precision near unity and a wider dynamic range, making posits especially attractive for signal processing and embedded AI applications.<\/p> <p>This project aims to develop a library of parameterized arithmetic modules for core posit arithmetic operations in SystemVerilog. These modules will enable flexible and efficient hardware design for signal processing or machine learning tasks. The designs will emphasize modularity, configurability (e.g., posit size and exponent size), and synthesis efficiency.<\/p> <p>The project will be carried out at the ESL at EPFL, one of the world's top-class universities. ESL is an active group (24 Ph.D. students among 45 members) involved in many research aspects. The student will be under the supervision of Mr. Tommaso Terzano, Dr. David Mallas&eacute;n Quintana and Prof. David Atienza. <\/p> <p><strong>Project objectives:<\/strong> <\/p> <p>The objectives can be adapted to both a semester project or master thesis. Depending on the workload, some will be mandatory and others optional. The full list is the following: <\/p> <ol>     <li>         <p>Understanding posit arithmetic and studying <a rel='noopener noreferrer nofollow' href='https:\/\/github.com\/RaulMurillo\/Flo-Posit' target='_blank'><span style='color: #1155cc'>previous<\/span><\/a><span style='color: #0e101a'> hardware implementations of posit units.<\/span>         <\/p><\/li><li>         <p>Designing modular and configurable implementations of posit units in SystemVerilog:         <\/p>         <ol>             <li>                 <p>Addition\/Subtraction                 <\/p>             <\/li>             <li>                 <p>Multiplication                 <\/p>             <\/li>             <li>                 <p>Multiply-Accumulate (MAC)                 <\/p>             <\/li>             <li>                 <p>Division                 <\/p>             <\/li>             <li>                 <p>Square root                 <\/p>             <\/li>             <li>                 <p>Conversion to and from integers                 <\/p>             <\/li>             <li>                 <p>Quire MAC                 <\/p>             <\/li>         <\/ol>     <\/li>     <li>         <p>Providing diagrams and documentation for each hardware block.         <\/p>     <\/li>     <li>         <p>Verifying the functionality of the posit units, for <a rel='noopener noreferrer nofollow' href='https:\/\/github.com\/davidmallasen\/arithmetic_units' target='_blank'><span style='color: #1155cc'>example<\/span><\/a><span style='color: #0e101a'> using cocotb.<\/span>         <\/p>     <\/li>     <li>         <p>Integrating the developed units into a full arithmetic unit, for example <a rel='noopener noreferrer nofollow' href='https:\/\/github.com\/openhwgroup\/cvfpu' target='_blank'><span style='color: #1155cc'>CVFPU<\/span><\/a><span style='color: #0e101a'>.<\/span>         <\/p>     <\/li>     <li>         <p>Providing ASIC synthesis results of the developed blocks and comparing them with existing implementations.         <\/p>     <\/li>     <li>         <p>Implement pipelined versions of selected operators and analyze trade-offs.         <\/p>     <\/li> <\/ol> <p>Note: Evaluation will be based on the overall performance and dedication of the student. If the student successfully completes the objectives, the next step would be participating in the research activity, which will be based on the student&rsquo;s interests. <\/p> <p><strong>Required knowledge and skills:<\/strong> <\/p> <ul>     <li>         <p>Excellent RTL design skills, ideally in SystemVerilog, demonstrated through past projects.         <\/p>     <\/li>     <li>         <p>Previous experience or knowledge of computer arithmetic is a plus.         <\/p>     <\/li>     <li>         <p>Basic Python programming.         <\/p>     <\/li>     <li>         <p>Ability to work consistently, independently, and communicate effectively in English.         <\/p>     <\/li>     <li>         <p>Familiarity with Git version control.         <\/p>     <\/li> <\/ul> <p>     <br \/><strong>Type of work:<\/strong> 20% theory analysis, 60% design and simulation, 20% verification and documentation <\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>STI<br><b>Supervisor<\/b>: Mr. Tommaso Terzano, Dr. David Mallas\u00e9n Quintana and Prof. David Atienza<br> Contact email: <a href='mailto:tommaso.terzano@epfl.ch;david.mallasen@epfl.ch;david.atienza@epfl.ch?subject=Custom Hardware Design of Posit Arithmetic Operators for Low-Precision Computing'>tommaso.terzano@epfl.ch;david.mallasen@epfl.ch;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project707minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>STI<br><b>Supervisor<\/b>: Mr. Tommaso Terzano, Dr. David Mallas\u00e9n Quintana and Prof. David Atienza<br>\";<\/script>\n<span id=project707><b>Lab: <\/b>ESL<br><b>Sections: <\/b>STI<br><b>Supervisor<\/b>: Mr. Tommaso Terzano, Dr. David Mallas\u00e9n Quintana and Prof. David Atienza<br> <a href=#_ onclick=opendesc('project707',project707); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor698><\/a><b><span style='font-size: 20px;'>Acceleration of TinyML applications using a cutting-edge heterogeneous accelerator platform<\/b><br><script>var project698=\"<p>Edge Artificial Intelligence is a novel computing paradigm that has the potential to revolutionize Internet-of-Things devices. Instead of uploading sensitive data (i.e. audio, video, biosignals) to remote servers, edge-AI devices perform all of their data processing on-board, thus preserving users&rsquo; privacy. However, to execute complex edge-AI operations while maximizing battery lifetime, these devices must be equipped with high-performance, ultra-low power processors.<\/p> <p>To this end, the Embedded Systems Laboratory (ESL) of EPFL has designed HEEPatia, an ultra-low power chip for edge-AI. <span>HEEPatia has been submitted for tape-out in 16nm technology and will be ready for testing in Q4 2025.<\/span> It combines a dual-core implementation of the <a rel='noopener noreferrer nofollow' href='https:\/\/github.com\/esl-epfl\/x-heep' target='_blank'><span style='color: #1155cc'>X-HEEP platform<\/span><\/a> with several state-of-the-art hardware IPs. Among these is the Very Wide Register Reconfigurable Array (VWR2A), an architecture that integrates high computational density and wide memory structures to efficiently execute data pre-processing kernels. Furthermore, HEEPatia contains two instances of NM-Carus, a near-memory computing platform that accelerates Deep Learning kernels. Finally, HEEPatia includes a 256 kB Gain-Cell Random Access Memory (GCRAM), which incurs area and power reductions compared to traditional SRAM cells.<\/p> <p><span>Up until this point, each of the hardware IPs on HEEPatia has been validated independently on its own test kernels, but there are several tinyML applications that could be accelerated further using a combination of the aforementioned IPs. The goal of this project is to run and optimize a real-world, end-to-end tinyML application using the IPs available on HEEPatia. Furthermore, it can be used to test the trade-offs &ndash; in terms of power, performance, and area &ndash; of using  each IP to execute a given kernel. The student will have regular guidance and feedback throughout the whole project. If the student successfully completes the objectives, the next step would be participation in the research activity, which will be based on the student&rsquo;s interests.<\/span> <\/p> <p><span>The expected outcomes of this project are:<\/span> <\/p> <ul>     <li>         <p><span>Developing an FPGA implementation of HEEPatia to facilitate application testing<\/span>         <\/p>     <\/li>     <li>         <p><span>Adapting existing kernels of each of the IPs (i.e. FFT, FIR, MAC, matrix addition) to evaluate trade-offs<\/span>         <\/p>     <\/li>     <li>         <p><span>Adapting a tinyML transformer workload to run on the X-HEEP platform of HEEPatia and evaluating the baseline performance<\/span>         <\/p>     <\/li>     <li>         <p><span>Accelerating the end-to-end application using VWR2A, two NM-Carus, and the dual-core CPU with DSP instruction extensions. Evaluate the effects on performance, energy, and accuracy<\/span>         <\/p>     <\/li>     <li>         <p><span>Investigate the energy-vs-accuracy trade-off between weight storing in the GCRAM versus traditional SRAM<\/span>         <\/p>     <\/li> <\/ul> <p><span>Throughout the project, the student will learn:<\/span> <\/p> <ul>     <li>         <p><span>Software development for ultra-low power MCUs <\/span>         <\/p>     <\/li>     <li>         <p><span>Programming heterogeneous accelerator platforms<\/span>         <\/p>     <\/li>     <li>         <p><span>FPGA implementation and testing workflow<\/span>         <\/p>     <\/li>     <li>         <p><span>How to verify the software and hardware<\/span>         <\/p>     <\/li>     <li>         <p><span>How to work with git repositories in a team of people <\/span>         <\/p>     <\/li> <\/ul> <p><span>The project will be carried out at the ESL at EPFL, one of the world's top-class universities. ESL is an active group(24 Ph.D. students among 45 members) involved in many research aspects. The student will be under the supervision of Ms. Lara Orlandic, Dr. David Mallas&eacute;n Quintana, and Prof. David Atienza.<\/span> <\/p> <p><span><strong>Project objectives:<\/strong><\/span> <\/p> <ol>     <li>         <p><span>Understanding the HEEPatia platform and functionalities of the IPs<\/span>         <\/p>     <\/li>     <li>         <p><span>Developing an FPGA implementation and validating it using existing kernels<\/span>         <\/p>     <\/li>     <li>         <p><span>Modifying the FFT test kernel of each IP to harmonize the inputs and outputs, then evaluating the performance of each IP. Repeat for other kernels (i.e. FIR, matrix addition)<\/span>         <\/p>     <\/li>     <li>         <p><span>Run a baseline tinyML application on the X-HEEP MCU<\/span>         <\/p>     <\/li>     <li>         <p><span>Accelerating the workload using the dual-core X-HEEP CPU<\/span>         <\/p>     <\/li>     <li>         <p><span>Accelerating the inference of the application using NM-Carus<\/span>         <\/p>     <\/li>     <li>         <p><span>Accelerating the pre-processing of the application using the most performant IP from step #3<\/span>         <\/p>     <\/li>     <li>         <p><span>Evaluating the trade-offs of using the GCRAM for storing parameters (i.e. neural network weights) against the standard SRAM cells<\/span>         <\/p>     <\/li> <\/ol> <p><span><strong>Required knowledge and skills:<\/strong><\/span> <\/p> <ul>     <li>         <p><span>Excellent embedded C programming and debugging<\/span>         <\/p>     <\/li>     <li>         <p><span>Understanding RTL written in SystemVerilog<\/span>         <\/p>     <\/li>     <li>         <p><span>Strong implementation and simulation with FPGAs<\/span>         <\/p>     <\/li>     <li>         <p><span>Ability to work consistently, independently, and ask for help when needed<\/span>         <\/p>     <\/li>     <li>         <p><span>Good communication skills in advanced English<\/span>         <\/p>     <\/li>     <li>         <p><span>Git version control<\/span>         <\/p>     <\/li> <\/ul> <p>     <br \/><span><strong>Type of work:<\/strong> 10% theory analysis, 70% design and simulation, 20% verification and documentation<\/span> <\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ms. Lara Orlandic, Dr. David Mallas\u00e9n Quintana, and Prof. David Atienza<br> \";<\/script>\n<script>var project698minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ms. Lara Orlandic, Dr. David Mallas\u00e9n Quintana, and Prof. David Atienza<br>\";<\/script>\n<span id=project698><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ms. Lara Orlandic, Dr. David Mallas\u00e9n Quintana, and Prof. David Atienza<br> <a href=#_ onclick=opendesc('project698',project698); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor630><\/a><b><span style='font-size: 20px;'>Enhancing the Efficiency and Accuracy of Radio Interferometry Kernels using CGRA-ME Framework<\/b><br><script>var project630=\"<div><strong><br \/><\/strong><\/div><strong>Background<\/strong><div><strong><br \/><\/strong>The Square Kilometer Array (SKA) is one of the most ambitious international scientific projects ever undertaken. It aims to build the world's largest radio telescope, with sites in Australia and South Africa. The SKA will enable unprecedented observations of the universe across a wide range of radio frequencies, facilitating groundbreaking research in fields such as astrophysics, cosmology, and fundamental physics. The sheer scale of the SKA, which will generate an enormous amount of data (up to 10 Tb\/s at the input), presents significant computational challenges, particularly in processing and analyzing the data streams in real time. To address these challenges, the SEAMS (sustainable &amp; energy aware methods for SKA observatory) project has been initiated. SEAMS is focused on developing energy-efficient and scalable architectures for processing the massive data generated by SKA. The project seeks to optimize key computational kernels used in radio interferometry, such as Fast Fourier Transform (FFT), convolution, and deconvolution, which are essential for converting raw telescope data into meaningful data products. High-Performance Computing (HPC) is critical to the SKA's success, and optimizing the performance and energy efficiency of these kernels is a primary goal.<p><strong>Project Motivation<\/strong><\/p><p>One promising approach to achieving these objectives is the use of Coarse-Grained Reconfigurable Architectures (CGRA), which offer a balance between the flexibility of general-purpose processors and the efficiency of custom hardware accelerators. The CGRA-ME framework is an open-source tool that allows for the design, configuration, and evaluation of CGRA architectures. By leveraging CGRA-ME, this project aims to enhance the performance and energy efficiency of HPC kernels critical to the SKA and SEAMS projects.<\/p><p><strong>Project Objectives<\/strong><\/p><p>1. Configure CGRA-ME and Evaluate the Performance and Efficiency of a Commonly Used HPC Kernel in Radio Interferometry<br \/>Objective Details:<br \/>o Implement and validate the execution of a basic kernel such as FFT, convolution, and\/or deconvolution using CGRA-ME framework.<br \/>o Evaluate the performance and energy efficiency of these kernels will be evaluated on a basic CGRA architecture to establish a baseline for further optimization.<br \/>o Compare performance results against other platforms.<br \/>o Identify bottlenecks in the computation and propose improvements.<\/p><p>2. Design a New Processing Element for Variable Precision Arithmetic Operators and Integrate it into the Basic CGRA Architecture<br \/>Objective Details:<br \/>o Radio interferometry computations often require varying levels of precision depending on the specific operation and data characteristics. Design a new processing element (PE) within the CGRA architecture that supports variable precision arithmetic. Explore different precision configurations.<br \/>o Test the correct functioning of the design and quantify the precision drop\/increment and performance.<br \/>o Explore other CGRA architecture aspects, such as communication, private memories, etc.<\/p><p>3. Evaluate the Precision, Performance, and Energy Trade-offs of the Design<br \/>Objective Details:<br \/>o The final objective of the project is to thoroughly evaluate the trade-offs between precision, performance, and energy consumption for the newly designed processing element.<br \/>o This evaluation will involve comparing the variable precision PE's performance with that of fixed-precision PEs in executing the chosen HPC kernels.<\/p><p><strong>Required Knowledge and Skills<\/strong><\/p><p>&bull; Hardware Design: Experience with hardware description languages (HDLs) such as Verilog or VHDL.<br \/>&bull; Reconfigurable Architectures: Understanding of reconfigurable architectures such as FPGAs and CGRAs.<br \/>&bull; Analysis: Knowledge of techniques to analyze and optimize energy consumption in hardware designs.<\/p><p><strong>Type of Work<\/strong><\/p><p>&bull; Theoretical Analysis (30%): This will involve the design of the new processing element and the theoretical exploration of precision scaling in arithmetic operations.<br \/>&bull; Design and Implementation (50%): Hands-on configuration and modification of the CGRA-ME framework, integration of the new processing element, and execution of kernel evaluations.<br \/>&bull; Testing and Evaluation (20%): Extensive testing, validation, and analysis of performance, precision, and energy trade-offs.<br \/>Expected Outcomes<br \/>&bull; A new processing element for variable precision arithmetic, integrated into the CGRA, validated, and tested.<br \/>&bull; Comprehensive evaluation results showing the trade-offs between precision, performance, and energy efficiency, offering valuable insights for the SKA and SEAMS projects.<\/p><p><strong>Learning objectives<\/strong><\/p><p>&bull; Research Skill: The student will develop the ability to synthesize information from various sources, apply theoretical knowledge to practical problems, and innovate in their design approach.<br \/>&bull; Technical Skills: The student will gain hands-on experience with Coarse-Grained Reconfigurable Architectures (CGRA), particularly using the CGRA-ME framework, including configuring, modifying, and optimizing these systems for specific applications.<br \/>&bull; Analytical Skills: The student will develop the ability to critically evaluate and analyze the trade-offs involved in different design choices, particularly regarding precision, performance, and energy efficiency.<br \/>&bull; Communication Skills: The student will learn how to present these trade-offs clearly, which is essential for making informed design decisions in engineering projects.<br \/>&bull; Problem-Solving Skills: Throughout the project, the student will engage in problem-solving and research, learning how to approach complex engineering challenges methodically.<\/p><p>This project will be conducted under the supervision of experts in hardware design, energy efficient HPC, and optimization from the Embedded Systems Laboratory (ESL), providing the student with an opportunity to contribute to cutting-edge research in energy-efficient computing: Dr. Denisa-Andreea Constantinescu denisa.constantinescu@epfl.ch, Rub&eacute;n Rodr&iacute;guez &Aacute;lvarez ruben.rodriguezalvarez@epfl.ch, Dr. Giovanni Ansaloni giovanni.ansaloni@epfl.ch, and Prof. David Atienza Alonso david.atienza@epfl.ch<br \/> <br \/><strong>References<\/strong><\/p><p>1. SEAMS Project: https:\/\/seams-project.com\/<br \/>2. CGRA-ME Framework: https:\/\/cgra-me.ece.utoronto.ca\/<\/p><\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>ESL<br><b>Supervisor<\/b>: Dr. Denisa-Andreea Constantinescu; Rub\u00e9n Rodr\u00edguez \u00c1lvarez; Dr. Giovanni Ansaloni; Prof. David Atienza Alonso <br> Contact email: <a href='mailto:denisa.constantinescu@epfl.ch;  ruben.rodriguezalvarez@epfl.ch; giovanni.ansaloni@epfl.ch; david.atienza@epfl.ch?subject=Enhancing the Efficiency and Accuracy of Radio Interferometry Kernels using CGRA-ME Framework'>denisa.constantinescu@epfl.ch;  ruben.rodriguezalvarez@epfl.ch; giovanni.ansaloni@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project630minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>ESL<br><b>Supervisor<\/b>: Dr. Denisa-Andreea Constantinescu; Rub\u00e9n Rodr\u00edguez \u00c1lvarez; Dr. Giovanni Ansaloni; Prof. David Atienza Alonso <br>\";<\/script>\n<span id=project630><b>Lab: <\/b>ESL<br><b>Sections: <\/b>ESL<br><b>Supervisor<\/b>: Dr. Denisa-Andreea Constantinescu; Rub\u00e9n Rodr\u00edguez \u00c1lvarez; Dr. Giovanni Ansaloni; Prof. David Atienza Alonso <br> <a href=#_ onclick=opendesc('project630',project630); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor620><\/a><b><span style='font-size: 20px;'>Integration and optimization of Ultra Low Power CPU on the X-HEEP platform targeting implantable devices\r\n<\/b><br><script>var project620=\"<h3>Project Description<\/h3> <p style='text-align: justify'>The Embedded Systems Laboratory (ESL) of EPFL has been at the forefront of developing open-source, energy-efficient computing platforms, most notably the eXtendable Heterogeneous Energy-Efficient Platform (<a href='https:\/\/github.com\/esl-epfl\/x-heep' target='_blank'>     <strong>X-HEEP<\/strong>   <\/a>). X-HEEP is a versatile, RISC-V-based microcontroller designed to target both small-scale and high-performance applications. This platform provides a customizable and extendable MCU, allowing users to integrate their own accelerators without modifying the core microcontroller architecture. X-HEEP has been proven effective in various implementations, from FPGA to ASIC, showcasing its adaptability and performance efficiency in diverse scenarios.&nbsp;<\/p> <p style='text-align: justify'>Although X-HEEP&rsquo;s energy efficiency has been proven to match the requirements of wearable devices, in the domain of implantable technology, there is an increasing demand for devices that are not only energy efficient, but also extremely low power. Wireless power delivery often found in implantable devices needs to operate at a constrained power budget to avoid damage to tissue due to overheating. This forces devices to go down from hundreds of mW (typical on wearable devices) to tens of &micro;W:   <strong>a four-orders-of-magnitude drop!<\/strong> <\/p> <p style='text-align: justify'>In this light, the X-HEEP project is opening a branch to attain unprecedented power efficiency while retaining the configurability, extendibility, and re-programmability that make it a desirable processing platform. The proposed project will kick off this path by instantiating an extremely low-power CPU on X-HEEP: the   <a href='https:\/\/github.com\/olofk\/serv' target='_blank'>SERV processor<\/a>. Then, the integration process will be optimized to the requirements of biosignal processing, usually characterized by time sparsity and medium amplitude resolution.&nbsp;<\/p> <p style='text-align: justify'>The project can be adapted to either a Master-level Semester Project or a Master&rsquo;s Thesis.&nbsp;<\/p> <p style='text-align: justify'>It will be carried out at the ESL at EPFL. ESL is an active group (22 Ph.D. students among 45 members) involved in many research aspects. The student will be under the supervision of Mr. Juan Sapriza, Dr. Davide Schiavone, and Prof. David Atienza.<\/p> <h3 style='text-align: justify'>Project Objectives<\/h3> <ol>   <li>Integrate the SERV processor into the X-HEEP platform.&nbsp;<\/li>   <li>Perform RTL-level adaptations to improve its performance in the biosignal processing context (for example, implementing optimized low-bit-width data computing as an addition of 4-bit data).&nbsp;<\/li>   <li>Characterize the improvements and validate the full functionality of the CPU.&nbsp;<\/li>   <li>(Optional) Help integrate the CPU as the main processor of the latest X-HEEP chip tape-out.&nbsp;<\/li> <\/ol> <h3 style='text-align: justify'>Required knowledge and skills:<\/h3> <ul>   <li>Advanced knowledge on computer architecture and RTL<\/li>   <li>Creativity, autonomy and scientific rigor<\/li>   <li>Embedded C and\/or Assembly<\/li><li>Confidence working with Linux systems<\/li>   <li>Git<\/li> <\/ul> <h3 style='text-align: justify'>Type of work: <\/h3> <p style='text-align: justify'>60% Implementation, 40% Characterization and testing<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Mr. Juan Sapriza, Dr. Davide Schiavone, Prof. David Atienza<br> \";<\/script>\n<script>var project620minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Mr. Juan Sapriza, Dr. Davide Schiavone, Prof. David Atienza<br>\";<\/script>\n<span id=project620><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Mr. Juan Sapriza, Dr. Davide Schiavone, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project620',project620); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor596><\/a><b><span style='font-size: 20px;'>Energy-efficient HW accelerators design on FPGA for SKA Observatory<\/b><br><script>var project596=\" <p>The Square Kilometer Array Observatory (SKAO) is an ambitious  international project to develop the world's largest radio observatory.  Its two telescope sites, in Australia and South Africa, will observe  celestial objects in a wide range of radio frequencies with  unprecedented sensitivity, allowing scientists to explore open questions  in fundamental physics, astrophysics, and cosmology. The first  construction phase is expected to be completed by 2029.<\/p> <p>When fully operational, two dedicated supercomputing facilities will  process continuous signals streams from the telescopes at an  unprecedented scale (about 10 Tb\/s at the input) to be reduced in  real-time a thousandfold to produce over 700 PBytes\/year of  science-ready data products to be distributed to various general-purpose  facilities across the globe. The decades-long lifespan envisioned for  this major scientific infrastructure makes sustainability a key concern,  including energy efficiency and cost containment across integration,  operations, and maintenance.<\/p> <p>The motivation of this project is to design and develop novel  energy-efficient accelerators for the SKAO data processing pipelines.  The student will use hardware-software co-design approach to design  domain-specific accelerators for FPGAs for radio-interferometry.  Concretely, the student will design HLS accelerators for at least two of  the kernels in the imaging pipeline of SKAO and evaluate them on an  FPGA. In the design space exploration of accelerators prototypes, the  tradeoffs between energy, latency, resource usage, and throughput will  be analyzed. The selected hardware accelerator prototypes will be  experimentally evaluated on surrogate and real data from SKAO precursor  instruments.<\/p> <p>The project will be carried out at the ESL at EPFL, one of the  world&rsquo;s top-class universities. ESL is an active group (24 Ph.D.  students among 45 members) involved in many research aspects. The  student will be under the supervision of Mr. Rub&eacute;n Rodr&iacute;guez Alvarez,  Dr. Denisa Constantinescu, Dr. Miguel Peon Quiros, and Prof. David  Atienza.<\/p> <p>Throughout the project, the student will learn and practice:<br \/> &bull; Hardware design process: specification, design, test, integration, and validation<br \/> &bull; How to accelerate a real-world application with specific hardware accelerators for the radio-interferometry domain<br \/> &bull; Present and communicate the design decision making and results<br \/> &bull; Scientific inquiry-based skills by analyzing the application,  formulating a hypothesis of design, testing your design, questioning and  interpreting the results<\/p> <p>Project objectives:<br \/> [Obligatory]<br \/> - Understand and describe radio interferometry kernels<br \/> - Implement and optimize at least two kernels from the imaging gridding domain in HLS<br \/> - Evaluate and explore latency, resources, and energy, presenting at least 3 different Pareto optimal solutions for each kernel<br \/> - Analyze the tradeoffs of the presented solutions<\/p> <p>[Additional]<br \/> - Integrate the accelerators with the complete imaging pipeline in PREESM<br \/> - Evaluate qualitative metrics of the output of your accelerators (signal to noise ratio of the produced images, accuracy, etc.)<\/p> <p>Required knowledge and skills:<br \/> &bull; Hardware description languages such as VHDL or Verilog.<br \/> &bull; Previous experience working with Xilinx FPGAs.<br \/> &bull; C, C++ programing languages.<\/p> <p>Type of work: 30% theoretical analysis, 70% design and experiment execution.<\/p> <p>References:<br \/> - Corda, S., Veenboer, B. and Tolley, E., 2022, November. PMT: Power  Measurement Toolkit. In 2022 IEEE\/ACM International Workshop on HPC User  Support Tools (HUST) (pp. 44-47). IEEE. <a href='https:\/\/arxiv.org\/abs\/2210.03724'>https:\/\/arxiv.org\/abs\/2210.03724<\/a><br \/> - Corda, S., Veenboer, B., Awan, A.J., Romein, J.W., Jordans, R., Kumar,  A., Boonstra, A.J. and Corporaal, H., 2022. Reduced-precision  acceleration of radio-astronomical imaging on reconfigurable hardware.  IEEE Access, 10, pp.22819-22843.<br \/> - <a href='https:\/\/ratt-ru.github.io\/fundamentals_of_interferometry\/'>Fundamentals of Radio Interferometry - an introductory course<\/a><br \/> - <a href='https:\/\/github.com\/Xilinx\/Vitis-HLS-Introductory-Examples'>Xilinx Vitis HLS Introductory Examples<\/a><br \/> -<a href='https:\/\/www.skao.int\/'> https:\/\/www.skao.int\/<\/a><\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ruben Rodriguez Alvarez (ESL), Dr. Denisa Constantinescu (ESL), Dr. Miguel Pe\u00f3n Quir\u00f3s (EcoCloud), Prof. David Atienza (ESL)<br> \";<\/script>\n<script>var project596minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ruben Rodriguez Alvarez (ESL), Dr. Denisa Constantinescu (ESL), Dr. Miguel Pe\u00f3n Quir\u00f3s (EcoCloud), Prof. David Atienza (ESL)<br>\";<\/script>\n<span id=project596><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ruben Rodriguez Alvarez (ESL), Dr. Denisa Constantinescu (ESL), Dr. Miguel Pe\u00f3n Quir\u00f3s (EcoCloud), Prof. David Atienza (ESL)<br> <a href=#_ onclick=opendesc('project596',project596); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><a href=https:\/\/skao.int target=_blank title='Square Kilometre Array Observatory'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/202.png width=70 alt='Square Kilometre Array Observatory'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor586><\/a><b><span style='font-size: 20px;'>Hardware accelerator design with Chisel and integration in a RISC-V full system<\/b><br><script>var project586=\"<p>The aim of this project is to design a <strong>specialized hardware accelerator <\/strong>for a real-world Machine Learning (ML) <strong>keyword spotting<\/strong> algorithm for wakeword command voice detection (e.g. &ldquo;Hey Siri, turn on the lights&rdquo;). In particular, you will use <strong>Chisel<\/strong>, a hardware construction language embedded in <strong>SCALA <\/strong>for  agile hardware design. You will have the opportunity to test your  design in an FPGA and explore different optimization opportunities.  Finally, you will be able to test the keyword spotting application  execution by integrating your accelerator in a state-of-the-art RISC-V  open-source platform: <strong>X-HEEP <\/strong>(eXtendable Heterogeneous  Energy-Efficient Platform). Moreover, through this process, you will  collaborate with experienced engineers and researchers from the Embedded  Systems Lab (ESL) and the Scala Center.<\/p> <p>The RISC-V open-hardware initiative removed the high-cost barriers  imposed by the industry when designing hardware and enabled the  development of X-HEEP by ESL and EcoCloud Sustainable Computing Center.  X-HEEP is a low-cost RISC-V microcontroller for running end-to-end  applications and can be extended with hardware accelerators. A  specialized hardware accelerator is a digital circuit that optimizes the  latency, energy consumption, and hardware resources of a given  application. <strong>Throughout this project, you will gain hands-on  experience with open-source tools developed by the RISC-V community and  used by the industry and academia to design novel digital circuits for  hardware accelerators.<\/strong><\/p> <p>This project targets accelerating an ML application by designing hardware accelerators with <strong>Chisel<\/strong>.  Chisel is a Hardware Construction Language (HCL) proposed as an  alternative to the traditional Hardware Description Languages, such as  Verilog and VHDL to describe hardware designs. Chisel is a hardware  library embedded in <strong>Scala: <\/strong>a high-level,  general-purpose language that features a blend of functional and  object-oriented programming paradigms. This combination enables  developers to express complex ideas more concisely than many other  languages, significantly reducing the amount of code required. Scala  facilitates abstraction over hardware. It allows for the efficient  capturing of hardware patterns using high-level language constructs,  enabling the description of more hardware scenarios with less code  compared to traditional languages like SystemVerilog. You can think of  the combination of Scala and Chisel as a flexible templating system.  Furthermore, Scala enhances circuit testing through the Chisel  workbench, a high-level framework that simulates circuits. This approach  simplifies testing to the ease of evaluating a software function,  marking Scala as an instrumental tool in both software and hardware  development realms.<\/p> <p>This project presents an exclusive opportunity for you to work with  ESL and the Scala Center to experience globally-used tools developed at  EPFL. You will be working with X-HEEP, an open-hardware platform from  ESL that has gained attention from both industry and academics to deploy  real-world applications. You will also be working with SCALA, developed  by the Scala Center, which has various applications beyond the domain  of digital circuits, such as data engineering and processing at scale.  It is utilized in the infrastructures of popular web services like  Disney Streaming and Netflix, making it a highly valuable skill in the  tech industry. In addition, you will be able to exploit the extended  potential of digital design with Chisel, supported by the combined  expertise of engineers and researchers of the two groups.<\/p> <p>This project has the following <strong>learning objectives<\/strong>:<\/p> <ul><li>You will practice the hardware design process: specification, design, test, integration, and validation<\/li><li>you will know how to accelerate a real-world application with specific hardware accelerators<\/li><li>You will be able to use the extended potential of the Chisel library  embedded in SCALA to efficiently describe hardware accelerators<\/li><li>You will integrate a hardware accelerator with the X-HEEP framework to simulate the full application<\/li><li>You will be able to present and communicate the design decision making and results<\/li><li>You will learn scientific inquiry-based skills by analyzing the  application, formulating a hypothesis of design, testing your design,  questioning and interpreting the results<\/li><\/ul> <p><strong>Prior knowledge required<\/strong><\/p> <ul><li>Previous experience (project or course) in any hardware description languages such as VHDL or Verilog.<\/li><li>Previous experience (project or course) working with Xilinx FPGAs.<\/li><li>Previous experience (project or course) with C, C++ programing languages.<\/li><li>Previous contact (project or course) with ML concepts and algorithms.<\/li><\/ul> <p><strong>Type of work:<\/strong> 25% theoretical analysis, 75% design and experiment execution.<\/p> <p><strong>References<\/strong><\/p> <ul><li>X-HEEP GitHub: <a href='https:\/\/github.com\/esl-epfl\/x-heep'>https:\/\/github.com\/esl-epfl\/x-heep<\/a><\/li><li>Chisel introduction: <a href='https:\/\/www.chisel-lang.org\/docs'>https:\/\/www.chisel-lang.org\/docs<\/a><\/li><li>Scala webpage: <a href='https:\/\/www.scala-lang.org\/'>https:\/\/www.scala-lang.org<\/a><\/li><\/ul><br><b>Lab: <\/b>ESL Scala<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ruben Rodriguez Alvarez (ESL), Dr. Denisa Constantinescu (ESL), Jamie Richard Thompson (Scala Center), Anatolii Kmetiuk (Scala Center), Prof. David Atienza (ESL)<br> \";<\/script>\n<script>var project586minus=\"<b>Lab: <\/b>ESL Scala<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ruben Rodriguez Alvarez (ESL), Dr. Denisa Constantinescu (ESL), Jamie Richard Thompson (Scala Center), Anatolii Kmetiuk (Scala Center), Prof. David Atienza (ESL)<br>\";<\/script>\n<span id=project586><b>Lab: <\/b>ESL Scala<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ruben Rodriguez Alvarez (ESL), Dr. Denisa Constantinescu (ESL), Jamie Richard Thompson (Scala Center), Anatolii Kmetiuk (Scala Center), Prof. David Atienza (ESL)<br> <a href=#_ onclick=opendesc('project586',project586); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><a href='https:\/\/www.epfl.ch\/labs\/esl\/wp-content\/uploads\/2023\/12\/Chisel_student_project.pdf' title='weblink'><img hspace=2 width=30 border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/weblink.gif alt='web link'><\/a><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><a href=https:\/\/scala.epfl.ch target=_blank title='Scala Center, EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/205.png width=70 alt='Scala Center, EPFL'><\/a><\/td><\/tr><tr><td colspan=2><h3>Semester Projects<br><br><\/h3><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor748><\/a><b><span style='font-size: 20px;'>Comparative Evaluation of Time-Series Embedding Methods for Seizure Classification<\/b><br><script>var project748=\"<p>Wearable sensors and mobile devices continuously generate large volumes of time-series data. These signals enable applications such as Human Activity Recognition (HAR) and accelerometer-based seizure detection. A central challenge in both domains is learning representations that capture meaningful temporal patterns while remaining robust to noise, inter-subject variability, and domain shifts.<\/p> <p>Traditional approaches like convolutional neural networks learn task-specific features, while newer self-supervised models like <strong>TS2Vec<\/strong> learn universal representations that can generalize to multiple downstream tasks. Additionally, handcrafted features based on domain knowledge remain competitive in many settings, especially when data is limited.<\/p> <p>This project will explore and compare three distinct embedding strategies (<strong>CNN-based end-to-end models<\/strong>, <strong>TS2Vec self-supervised embeddings<\/strong>, and <strong>manual feature extraction<\/strong>) focusing on their classification accuracy and representational quality across multiple HAR datasets and a seizure dataset.<\/p> <h2><strong>Objectives<\/strong><\/h2> <ol> <li>Implement and compare three embedding approaches:<\/li> <\/ol> <p>\u25cb  A baseline CNN trained end-to-end.<\/p> <p>\u25cb  Self-supervised TS2Vec embeddings with downstream classifiers.<\/p> <p>\u25cb  Manual feature extraction with classical machine learning models.<\/p> <ol> <li>Evaluate classification performance across multiple HAR datasets (e.g., Capture-24, WISDM, Opportunity, recgym) and a private seizure dataset.<\/li> <li>Analyse the quality of learned representations using clustering and visualization methods.<\/li> <li>Assess cross-domain generalization, particularly transfer from HAR data to seizure classification.<\/li> <li>(Optional) Investigate feasibility for deployment on resource-constrained edge devices.<\/li> <\/ol> <h2><strong>Evaluation Metrics<\/strong><\/h2> <ul> <li><strong>Classification:<\/strong> Accuracy, precision\/recall, F1-score, ROC-AUC, false positive rate<\/li> <li><strong>Representation quality:<\/strong><\/li> <\/ul> <p>\u25cb  Dimensionality reduction visualisations (e.g., t-SNE, UMAP).<\/p> <p>\u25cb  Clustering metrics (e.g., silhouette score).<\/p> <h3><strong>Grading Criteria and Task Distribution<\/strong><\/h3> <h4><u>Minimum Requirements (Grade: 4.0)<\/u><\/h4> <p>To obtain a passing grade, the student must successfully complete the following core tasks:<\/p> <ul> <li><strong>Implementation<\/strong>: Develop and validate three baseline models: Supervised CNN (end-to-end), TS2Vec (self-supervised), and manual feature extraction.<\/li> <li><strong>Data Processing<\/strong>: Standardize and evaluate the models on at least two HAR datasets and the provided seizure dataset.<\/li> <li><strong>Evaluation<\/strong>: Report standard classification metrics, including Accuracy, F1-score, and ROC-AUC.<\/li> <li><strong>Reporting<\/strong>: Submit a report\/presentation explaining the methodology, experimental setup, and a basic comparative analysis of results.<\/li> <\/ul> <h4><u>Optional Tasks for Higher Grades (Up to 6.0)<\/u><\/h4> <p>The final grade increases by approximately 0.5 points for each additional task completed, provided the execution meets professional standards:<\/p> <ul> <li><strong>Representational Analysis: <\/strong>Quantify embedding quality using dimensionality reduction (t-SNE\/UMAP) and clustering metrics (e.g., silhouette scores).<\/li> <li><strong>Cross-Domain Generalization: <\/strong>Evaluate transfer learning performance by applying representations learned from HAR datasets to seizure detection tasks.<\/li> <li><strong>Extended Benchmarking: <\/strong>Scale the evaluation to include the full suite of six HAR datasets (Capture-24, WEAR, WISDM, Opportunity, and Recgym).<\/li> <li><strong>Edge Deployment Study: <\/strong>Profile computational latency and memory usage, and implement a C-based inference prototype for resource-constrained hardware.<\/li> <\/ul> <p>&nbsp;<\/p> <h2><strong>Requirements<\/strong><\/h2> <ul> <li>Basic understanding of signal processing and time-series analysis.<\/li> <li>Good programming skills in Python.<\/li> <li>Experience with machine learning tools (PyTorch, scikit-learn).<\/li> <li>(Optional) C programming and firmware development<\/li> <\/ul> <p><strong>Type of Work<\/strong><\/p> <ul> <li><strong>33%<\/strong>: Literature review, methodological design, representation analysis, and interpretation of results.<\/li> <li><strong>67%<\/strong>: Implementation of models and experimental evaluation in Python<\/li> <\/ul> <p><strong>Sources<\/strong><\/p> <ul> <li><a href='https:\/\/github.com\/zhihanyue\/ts2vec'>Ts2vec repo<\/a><\/li> <li><a href='https:\/\/github.com\/ElsevierSoftwareX\/SOFTX%5F2020%5F1?tab=readme-ov-file#time-series-feature-extraction-library'>Time Series Feature Extraction Library<\/a><\/li> <li>Human activity recognition datasets:<br \/>&ndash; capture24: <a href='https:\/\/ora.ox.ac.uk\/objects\/uuid:99d7c092-d865-4a19-b096-cc16440cd001'>https:\/\/ora.ox.ac.uk\/objects\/uuid:99d7c092-d865-4a19-b096-cc16440cd001<br \/><\/a> &ndash; WEAR: <a href='https:\/\/github.com\/drhashimali\/wear'>https:\/\/github.com\/drhashimali\/wear<br \/><\/a> &ndash; HARTH: <a href='https:\/\/archive.ics.uci.edu\/dataset\/779\/harth'>https:\/\/archive.ics.uci.edu\/dataset\/779\/harth<br \/><\/a> &ndash; WISDM: <a href='https:\/\/www.cis.fordham.edu\/wisdm\/dataset.php'>https:\/\/www.cis.fordham.edu\/wisdm\/dataset.php<br \/><\/a> &ndash; Opportunity: <a href='https:\/\/archive.ics.uci.edu\/dataset\/226\/opportunity+activity+recognition'>https:\/\/archive.ics.uci.edu\/dataset\/226\/opportunity+activity+recognition<br \/><\/a> &ndash; Recgym: <a href='https:\/\/zhaxidele.github.io\/RecGym\/'>https:\/\/zhaxidele.github.io\/RecGym\/<\/a><\/li> <\/ul> \t<br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dimitra Tatlin, Dr. Lara Orlandic, Dr. Jonathan Dan, Prof. David Atienza<br> Contact email: <a href='mailto:lara.orlandic@epfl.ch;jonathan.dan@epfl.ch;david.atienza@epfl.ch;dimitra.tatli@epfl.ch?subject=Comparative Evaluation of Time-Series Embedding Methods for Seizure Classification'>lara.orlandic@epfl.ch;jonathan.dan@epfl.ch;david.atienza@epfl.ch;dimitra.tatli@epfl.ch<\/a><br>\";<\/script>\n<script>var project748minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dimitra Tatlin, Dr. Lara Orlandic, Dr. Jonathan Dan, Prof. David Atienza<br>\";<\/script>\n<span id=project748><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dimitra Tatlin, Dr. Lara Orlandic, Dr. Jonathan Dan, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project748',project748); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor746><\/a><b><span style='font-size: 20px;'>Precision-Scalable Systolic Arrays for High-Efficiency LLM Inference Acceleration<\/b><br><script>var project746=\"<p>This project focuses on the design and implementation of a scalable-bitwidth systolic array, a specialized hardware accelerator aimed at improving the computational efficiency of modern artificial intelligence workloads. Its primary objective is to develop an architecture that can dynamically support varying numerical precisions through a template-based design, ranging from low-bit quantization (e.g., INT2\/INT3\/INT4) to higher-precision formats such as FP16.<\/p>    <p>A systolic array consists of a network of tightly coupled Processing Elements (PEs), through which data flows in a structured manner. Each PE performs a small portion of the overall computation&mdash;typically multiply-accumulate (MAC) operations&mdash;and forwards intermediate results to neighboring elements. Systolic arrays enables massive parallelism and high data reuse. By significantly reducing accesses to off-chip memory, they achieve very high energy efficiency for matrix multiplication workloads, which form the computational backbone of deep learning and large language models.<\/p>    <p>Despite their efficiency, state-of-the-art systolic arrays are typically limited to a single, fixed bitwidth. Most existing designs target a specific numerical format, such as INT8 or FP16, which makes them less adaptable to the diverse precision requirements of modern workloads. As a result, these architectures struggle to efficiently support varying numerical characteristics, limiting their flexibility and broader applicability.<\/p>    <p>To address this limitation, this project proposes a reconfigurable, scalable-bitwidth systolic array. To this end, the student will be tasked with the development of a template-based PE architecture that enables the generation of practical systolic arrays supporting varying bitwidths. Multiple PEs can be dynamically grouped to form scalable matrix-multiplication accelerators with configurable precision. This approach aims to bridge the gap between flexibility and efficiency, enabling hardware accelerators that better match the diverse precision demands of modern AI models.<\/p>    <p><strong>Tasks description<\/strong><\/p>    <ol class='wp-block-list'> <li>Understand the architecture of the TiC-SAT systolic array under development at ESL_EPFL, including its PE design and interconnects.<\/li>    <li>Implement a PE design template that supports various bitwidth configuration.<\/li>    <li>Generate and test heterogeneous systolic array hardware designs using the scalable-bitwidth PEs to enable flexible precision support.<\/li> <\/ol>    <p><strong>Project objectives<br \/><\/strong>The fulfillment of the following objective is required for a passing grade (4.0)<\/p>    <ul class='wp-block-list'> <li>Extend the TiC-SAT PE design to support multiple bitwidth, including a testbench and testsuite to validate the design.<\/li>    <li>Create a template-based generator of the systolic array. The template must enable the generation of instances of the systolic array from configuration parameters, specifying the array size, supported &nbsp;bitwidths and the arrangement of PEs.<\/li>    <li>Characterize the runtime latency and energy efficiency of the modified systolic array for at least four different bitwidth configurations.<\/li> <\/ul>    <p>The completion of each of the following tasks will add 1.0 extra points to the project grade<\/p>    <ul class='wp-block-list'> <li>Build a functional simulator for the SA in python.<\/li>    <li>Emulate the system in an FPGA.<\/li> <\/ul>    <p><strong>Required knowledge and skills<\/strong><\/p>    <ul class='wp-block-list'> <li>Proficiency in RTL design and programming (VHDL or Verilog).<\/li>    <li>Basic understanding of computer architecture.<\/li>    <li>Strong analytical thinking and scientific curiosity.<\/li> <\/ul>    <p><strong>References<\/strong><\/p>    <p>[1] A. Amirshahi, J. Klein, G. Ansaloni, D. Atienza, &ldquo;TiC-SAT: Tightly-coupled Systolic Accelerator for Transformers&rdquo;,&nbsp; 28th Asia and South Pacific Design Automation Conference (ASP-DAC &rsquo;23), Tokyo, Japan, doi: 10.1145\/3566097.3567867<\/p>    <p>[2] S. Machetti, P. D. Schiavone, G. Ansaloni, M. Pe&oacute;n-Quir&oacute;s and D. Atienza, &ldquo;X-HEEP: An Open-Source, Configurable and Extendible RISC-V Platform for TinyAI Applications,&rdquo;&nbsp;<em>2025 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)<\/em>, Kalamata, Greece, 2025, pp. 1-6, doi: 10.1109\/ISVLSI65124.2025.11130281.<\/p>    <p><strong>Type of work<\/strong><\/p>    <ul class='wp-block-list'> <li>80% <strong>HW design<\/strong>: development and implementation of the scalable-bitwidth systolic array hardware.<\/li>    <li>20% <strong>Performance Evaluation<\/strong>: Benchmarking and analysis of machine learning workloads on the designed hardware.<\/li><\/ul><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Rub\u00e9n Rodr\u00edguez \u00c1lvarez, Yuxuan Wang, Dr. Giovanni Ansaloni, Prof. David Atienza<br> Contact email: <a href='mailto:ruben.rodriguezalvarez@epfl.ch; yuxuan.wang@epfl.ch; giovanni.ansaloni@epfl.ch; david.atienza@epfl.ch?subject=Precision-Scalable Systolic Arrays for High-Efficiency LLM Inference Acceleration'>ruben.rodriguezalvarez@epfl.ch; yuxuan.wang@epfl.ch; giovanni.ansaloni@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project746minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Rub\u00e9n Rodr\u00edguez \u00c1lvarez, Yuxuan Wang, Dr. Giovanni Ansaloni, Prof. David Atienza<br>\";<\/script>\n<span id=project746><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Rub\u00e9n Rodr\u00edguez \u00c1lvarez, Yuxuan Wang, Dr. Giovanni Ansaloni, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project746',project746); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor743><\/a><b><span style='font-size: 20px;'>Validation of a kinematic approach to extract a mobility score for physical monitoring<\/b><br><script>var project743=\"<p>The study aims to assess kinematic information obtained during dedicated exercises in order to derive a mobility score, using physiotherapist-provided labels as a reference. It is part of a broader project focused on developing tools to quantify and monitor physical improvement during adherence to targeted exercise programs.<\/p><p>This study builds on a previous semester project, which evaluated a set of selected movements and established a data acquisition framework to extract relevant kinematic features for mobility assessment. The current project will focus specifically on the squat movement, with the objective of providing a more fine-grained assessment of exercise performance as an indicator of mobility.<\/p><p>The study will be structured into three main phases:<\/p><ol><li>Refinement of the scoring system and its validation using previously acquired data,<\/li><li>Update of the data acquisition protocol and execution of an experimental study with 30 volunteers, using the OptiTrack system as a reference standard, and<\/li><li>Development of machine learning\u2013based models to derive a mobility score from wearable sensor data.<\/li><\/ol><p>The kinematic measurement system will consist of nine Inertial Measurement Units (IMUs) from the VersaSens platform, positioned on the limb segments and torso. VersaSens [1], developed at the Embedded Systems Laboratory (ESL), is a modular, multimodal, extendable, and reconfigurable Edge AI platform. It enables the integration of add-on modules alongside its array of sensing and processing modules, providing a flexible foundation for diverse applications.<\/p><p>In this study, synchronously collected IMU data will be used to reconstruct 3D body motion and extract relevant kinematic features. Pressure sensors will also be incorporated to assess foot contact dynamics. Finally, motion capture data from the OptiTrack system will serve as the ground truth reference for 3D motion analysis.<\/p><p><strong>Mandatory tasks<\/strong><\/p><p>Completion of <strong>all<\/strong> these tasks is required to pass the exam and obtain a grade of <strong>4<\/strong>. Failure to complete any of these tasks will result in <strong>no pass<\/strong>:<\/p><ol><li>Perform a series of tests with VersaSens to define and validate the calibration protocol for sensor orientation.<\/li><li>Evaluate correlations between previously acquired IMU data orientations and the new scoring system applied to the squat exercise.<\/li><li>Based on the outcomes of the previous tasks, finalize and test the protocol of data acquisition.<\/li><li>Conduct a synchronous data acquisition experiment using all the modalities, involving volunteers (target: 30 participants), according to the developed protocol, following HREC authorization.<\/li><li>Sanity check of the data acquired by all the modalities and correction of any errors \/ missing signals. Scoring of the videos by the physiotherapist.<\/li><li>Design and consequently train machine-learning models with the acquired data to estimate mobility scores. The models should use as input:<ol><li>Filtered IMU signals<\/li><li>Features extracted from IMU data<\/li><li>Optitrack data<\/li><\/ol><\/li><li>Develop a criteria-based algorithm to estimate the mobility score from the signals.<\/li><li>Compare the results and report the metrics for all the different models.<\/li><li>Deliver a complete documentation package to be uploaded to the VersaSens GitHub repository, including all datasets, software, and firmware.<\/li><\/ol><p><strong>Optional tasks<\/strong><\/p><p>Once all mandatory tasks have been completed and a grade of <strong>4<\/strong> has been<br>obtained, each optional task completed will contribute an additional <strong>1 <\/strong>point to the final grade, up to a maximum grade of <strong>6<\/strong>:<\/p><ol><li>Perform a comparative analysis between OptiTrack and IMU data, including the calculation of deviations between the two modalities. Additionally, compare the mobility scores predicted by each model.<\/li><li>Perform a study on the reduction of the number of IMU sensors in the models and estimate the impact on the accuracy. Establish the optimal locations and number of sensors necessary for acceptable accuracy.<\/li><\/ol><p><strong>Type of work<\/strong><\/p><ul><li>20% data analysis<\/li><li>30% Machine-learning models development<\/li><li>40% Experimental acquisition to mimic real-life application.<\/li><li>5% Communication with physiotherapists and clinical experts.<\/li><li>5% Preparation and delivery of the complete documentation package.<\/li><\/ul><p><strong>Desired skills:<\/strong><\/p><ul><li>Strong analytical skills<\/li><li>Experience in data acquisition<\/li><li>Strong background in signal processing and machine learning<\/li><li>Teamwork and git<\/li><\/ul><p><strong>Appreciated skills:<\/strong><\/p><ul><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English<\/li><\/ul><p><strong>References:<\/strong><\/p><p>[1] Najafi, Taraneh Aminosharieh, et al. 'VersaSens: An Extendable Multimodal Platform for Next-Generation Edge-AI Wearables.' IEEE Transactions on Circuits and Systems for Artificial Intelligence (2024).<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. J\u00e9r\u00f4me Thevenot, Dr. Taraneh Aminosharieh Najafi, Prof. David Atienza<br> Contact email: <a href='mailto:jerome.thevenot@epfl.ch; taraneh.aminoshariehnajafi@epfl.ch; david.atienza@epfl.ch?subject=Validation of a kinematic approach to extract a mobility score for physical monitoring'>jerome.thevenot@epfl.ch; taraneh.aminoshariehnajafi@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project743minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. J\u00e9r\u00f4me Thevenot, Dr. Taraneh Aminosharieh Najafi, Prof. David Atienza<br>\";<\/script>\n<span id=project743><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. J\u00e9r\u00f4me Thevenot, Dr. Taraneh Aminosharieh Najafi, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project743',project743); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor741><\/a><b><span style='font-size: 20px;'>Development of a TFT display interface for VersaSens<\/b><br><script>var project741=\"<p>The project aims to develop a user interface to be implemented within the VersaSens sensing platform. VersaSens, developed at the Embedded Systems Laboratory (ESL), is a modular, multimodal, extendable, and reconfigurable Edge AI platform. It enables the integration of add-on modules alongside its array of sensing and processing modules, providing a flexible foundation for diverse applications.<\/p> <p>The main aim of this project is to develop a PCB module with an embedded touchscreen TFT display compatible with the VersaSens main board. A secondary aim is to develop firmware to visualize VersaSens-collected signals in real time. The touchscreen capabilities of the interface should be used to enhance the platform's user-friendliness.<\/p> <p>This project is linked to an ongoing research focus in ESL, targeting the development of a multimodal sensing platform with applications in the medical field. Thus, some aspects of this project will benefit from previous developments within the laboratory (Firmware, algorithms for physiological data processing with embedded deployment), and it is expected that this semester's project will further improve and adapt the current solution.<\/p> <p><strong>Mandatory tasks<\/strong><\/p> <p>Completion of <strong>all<\/strong> these tasks is required to pass the exam and obtain a grade of <strong>4<\/strong>. Failure to complete any of these tasks will result in <strong>no pass<\/strong>:<\/p> <ol><li>Get familiar with VersaSens sensing platform. Conduct a comprehensive review of commercial TFT displays, including their specifications, and assess their compatibility with the VersaSens platform. Select a display according to VersaSens requirements and provide a justification.<\/li><li>Make the PCB design for the TFT display to be compatible with the VersaSens platform, and to fulfill the mechanical requirements of our casings. Prepare appropriately the manufacturing files and send them for production.<\/li><li>Develop the firmware for the communication between the microcontroller and the TFT display and integrate it to the VersaSens firmware. Validate the firmware by plotting real-time signals acquired from one of the sensors on screen (ECG, EEG, PPG, EMG, SKT, Sound, Bio-Z).<\/li><li>Further develop the firmware to activate the touchscreen capabilities of the TFT display.<\/li><li>Develop a user interface with a menu to access the plotting of all different sensors in real time (ECG, EEG, PPG, EMG, SKT, Sound, Bio-Z). The battery level should also be shown.<\/li><li>Measure the power consumption of the touchscreen and adapt the visualization for optimized consumption while keeping a user-friendly interface. Here are some potential parameters to assess: backlight brightness, refresh rate, and data processing for plotting...<\/li><li>Deliver a&nbsp;complete documentation package to be uploaded to the VersaSens GitHub repository, including all datasets, software, and firmware.<\/li><\/ol> <p><strong>Optional tasks<\/strong><\/p> <p>Once all mandatory tasks have been completed and a grade of <strong>4<\/strong> has been obtained, each optional task completed will contribute an additional 1 point to the final grade, up to a maximum grade of <strong>6<\/strong>:<\/p> <ol><li>Create a touchscreen keyboard to write a text to be saved on the SD card.<\/li><li>Create a menu to access and modify settings and parameters of the sensors in the firmware.<\/li><\/ol> <p><strong>Type of work<\/strong><\/p> <ul><li>40% firmware development<\/li><li>30% PCB development<\/li><li>25% Performance evaluation through systematic testing of the platform.<\/li><li>5% Preparation and delivery of the complete documentation package.<\/li><\/ul> <p><strong>Desired skills:<\/strong><\/p> <ul><li>Strong analytical skills<\/li><li>Experience in embedded systems development<\/li><li>Teamwork and git<\/li><\/ul> <p><strong>Appreciated skills:<\/strong><\/p> <ul><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English<\/li><\/ul><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. J\u00e9r\u00f4me Thevenot, Dr. Taraneh Aminosharieh Najafi, Doctoral candidate Wensi Zhang, Prof. David Atienza<br> Contact email: <a href='mailto:jerome.thevenot@epfl.ch;taraneh.aminoshariehnajafi@epfl.ch;wensi.zhang@epfl.ch;david.atienza@epfl.ch?subject=Development of a TFT display interface for VersaSens'>jerome.thevenot@epfl.ch;taraneh.aminoshariehnajafi@epfl.ch;wensi.zhang@epfl.ch;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project741minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. J\u00e9r\u00f4me Thevenot, Dr. Taraneh Aminosharieh Najafi, Doctoral candidate Wensi Zhang, Prof. David Atienza<br>\";<\/script>\n<span id=project741><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. J\u00e9r\u00f4me Thevenot, Dr. Taraneh Aminosharieh Najafi, Doctoral candidate Wensi Zhang, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project741',project741); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor740><\/a><b><span style='font-size: 20px;'>Automatic indoor geolocation of acoustic events using the VersaSens platform<\/b><br><script>var project740=\"<p>The project aims to develop a system capable of autonomously detecting and locating acoustic events indoors. The vision is to develop an acoustic system based on the VersaSens platform, capable of robustly locating acoustic events within a given spatial environment. VersaSens, developed at the Embedded Systems Laboratory (ESL), is a modular, multimodal, extendable, and reconfigurable Edge AI platform. It enables the integration of add-on modules alongside its array of sensing and processing modules, providing a flexible foundation for diverse applications.<\/p><p>The objective of this project is to ensure the resilience and robustness of acoustic event localization with an optimized number of sensors in a given room dimension.<\/p><p>This project is part of an ongoing research focus in ESL, targeting the detection of cough events using the VersaSens platform. Thus, some aspects of this project will benefit from previous developments within the laboratory (algorithms for cough detection with embedded deployment), and it is expected that this semester's project will further improve and adapt the current solution. The ultimate goal of this research is to detect cough associated with tuberculosis in waiting rooms, thereby supporting patient triage.<br><br><img class='alignnone size-full wp-image-4170' src='https:\/\/www.epfl.ch\/labs\/esl\/wp-content\/uploads\/2026\/01\/jerome.png' alt='' width='597' height='442' data-mce-src='https:\/\/www.epfl.ch\/labs\/esl\/wp-content\/uploads\/2026\/01\/jerome.png'><br><\/p><p><strong>Mandatory tasks<\/strong><\/p><p>Completion of <strong>all<\/strong> these tasks is required to pass the exam and obtain a grade of <strong>4<\/strong>. Failure to complete any of these tasks will result in <strong>no pass<\/strong>:<\/p><ol><li>Conduct a comprehensive literature review for sound geolocation techniques and get familiar with the VersaSens platform and the microphone characteristics.<\/li><li>Develop the firmware of VersaSens to control the acoustic sensors in stereo mode to process the data (storing and Bluetooth transfer).<\/li><li>Develop a computer interface with a visual representation of rooms from given dimensions and VersaSens devices' location.<\/li><li>Develop a communication between the multiple VersaSens devices and the computer interface through Bluetooth to collect acoustic signals.<\/li><li>Signal processing of acoustic data and coregistration to perform triangulation of the sound origin and Real-time visualization of the sound origin in the computer interface.<\/li><li>Develop some preliminary filtering on the VersaSens device to transmit only the minimal information necessary for the geolocation of acoustic events.<\/li><li>Conduct a validation experiment simulating real-life cough application. Establishment of the geolocation accuracy.<\/li><li>Deliver a&nbsp;complete documentation package to be uploaded to the VersaSens GitHub repository, including all dataset, software and firmware.<\/li><\/ol><p><strong>Optional tasks<\/strong><\/p><p>Once all mandatory tasks have been completed and a grade of <strong>4<\/strong> has been obtained, each optional task completed will contribute an additional <strong>0.<\/strong><strong>5<\/strong> points to the final grade, up to a maximum grade of <strong>6<\/strong>:<\/p><ol><li>The interface should allow the user to create \u201ccomplex\u201d room dimensions, and the data analysis should consider \u201cwalls\u201d and \u201ccorners\u201d to geolocalize the sound.<\/li><li>Develop an algorithm to automatically suggest the optimal positioning of the VersaSens devices in a room based on their numbers.<\/li><li>Develop a calibration method to synchronize the clock of the microcontroller among multiple VersaSens devices.<\/li><li>Use the information from the clock to help with the geolocation by considering the speed of sound.<\/li><\/ol><p><strong>Type of work<\/strong><\/p><ul><li>30% firmware development<\/li><li>30% software development<\/li><li>20% Experimental acquisition in different conditions to mimic real-life application.<\/li><li>10% Performance evaluation through systematic testing of the platform.<\/li><li>10% Preparation and delivery of the complete documentation package.<\/li><\/ul><p><strong>Desired skills:<\/strong><\/p><ul><li>Strong analytical skills<\/li><li>Experience in signal processing<\/li><li>Experience in embedded systems development<\/li><li>Teamwork and git<\/li><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. J\u00e9r\u00f4me Thevenot, Doctoral candidate Wensi Zhang, Prof. David Atienza<br> Contact email: <a href='mailto:jerome.thevenot@epfl.ch; wensi.zhang@epfl.ch; david.atienza@epfl.ch?subject=Automatic indoor geolocation of acoustic events using the VersaSens platform'>jerome.thevenot@epfl.ch; wensi.zhang@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project740minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. J\u00e9r\u00f4me Thevenot, Doctoral candidate Wensi Zhang, Prof. David Atienza<br>\";<\/script>\n<span id=project740><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. J\u00e9r\u00f4me Thevenot, Doctoral candidate Wensi Zhang, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project740',project740); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor739><\/a><b><span style='font-size: 20px;'>Efficient SPI: design, verification, and integration of a FLASH-Controller that uses the DMA and QuadSPI routines for efficient off-chip communications<\/b><br><script>var project739=\"Microcontrollers (MCUs) are used in a wide range of applications, ranging from sensor monitoring all the way to robotics and automotive. Despite typically being lower in performance, they are usually preferred over custom circuits and FPGAs due to their versatility and easy programmability via software routines typically written in the C language. <br><br>Thanks to these characteristics, MCUs are typically chosen as edge computing platforms. Due to cost (area) and power constraints, the on-chip memory, typically implemented with SRAM technology, is limited (usually below 1MB), thus limiting the software and data that can be processed. That\u2019s why MCUs are often extended with off-chip memories, typically with FLASH memories.<br>FLASH are non-volatile and usually connected to the MCU via serial-peripheral interfaces (SPIs). The SPI protocol is typically very slow to deal with recurrent read\/write operations to FLASH due to its serial nature. For example, in typical 32b MCUs, you need at least 32 SPI-cycles to read an instruction from FLASH.<\/p><p>X-HEEP (eXtendable Heterogeneous Energy-Efficient Platform) is an open-source, configurable, and extensible single-core RISC-V 32b MCU, sponsored by the EcoCloud Sustainable Computing center of EPFL. It is based on many third-party open-source IPs and in-house IPs developed at the Embedded Systems Laboratory (ESL) jointly with other EPFL laboratories.<\/p><p>X-HEEP provides a framework to run applications compiled for RISC-V on a simulator (Verilator, Questasim, or VCS), on a Xilinx FPGA, and can be implemented in silicon as well. <br><br>X-HEEP uses an off-chip FLASH memory connected via SPI. Accessing data to flash can be done by means of SW functions to send commands to the FLASH via SPI (read or write, quad or single mode, etc.), and program the X-HEEP DMA to move data from the SPI received data to the memory (or vice versa).<br>These functions result in quite long and inefficient code, making the reading and writing to FLASH slow. To speed up these operations, a hardware FSM (Flash Controller) has been designed to handle the SPI and DMA peripherals without the need for the CPU and its software functions. However, such a controller supports only single-mode SPI.<br><br>For this reason, the students applying to this project will design extended functionalities to the Flash Controller so that it can support quad SPI. <br><br>In addition, to further simplify the Flash software runtime, a bridge that translates memory-mapped read and write operations (e.g., CPU\u2019s load and stores) to commands to the Flash Controller can be implemented.<\/p><p>The outcome of the thesis will be published open-source in the X-HEEP repository: <a data-mce-href='https:\/\/github.com\/x-heep\/x-heep' href='https:\/\/github.com\/x-heep\/x-heep'>https:\/\/github.com\/x-heep\/x-heep<\/a><br data-mce-bogus='1'><\/p><ol><li aria-level='1'><b>[Read Quad SPI support for the Flash Controller] <\/b>Design\/Extension of the Flash Controller to read the Flash using the DMA and SPI peripherals in Quad-mode. The starting points are the existing Flash Controller and the Quad-mode SPI software functions.<ol><li aria-level='2'>Writing the Verilog states to extend the Flash Controller so that it can read in Quad SPI mode<\/li><li aria-level='2'>Software runtime to use the Flash Controller in Single or Quad mode to read<\/li><li aria-level='2'>C applications to test in Simulation (using Questasim) the Quad-mode read operations<\/li><\/ol><\/li><li aria-level='1'><b>[Write Quad SPI support for the Flash Controller] <\/b>Design\/Extension of the Flash Controller to write the Flash using the DMA and SPI peripherals in Quad-mode. The starting points are the existing Flash Controller and the Quad-mode SPI software functions.<ol><li aria-level='2'>Writing the Verilog states to extend the Flash Controller&nbsp; so that it can write in Quad SPI mode<\/li><li aria-level='2'>Software runtime to use the Flash Controller in Single or Quad mode to write<\/li><li aria-level='2'>C applications to test in Simulation (using Questasim) the Quad-mode write operations<\/li><\/ol><\/li><li aria-level='1'><b>[Memory Map support for the Flash Controller] <\/b>Design a bridge that maps memory operations (such as load and stores from the CPU or read and write from the DMA) to the Flash Controller. This will allow threading the external Flash as an external memory instead of explicitly using it as a peripheral, simplifying the code and making the applications more flexible.<ol><li aria-level='2'>Writing the Verilog modules required to bridge the bus operations (read and writes) to commands to the previously designed Flash Controller<\/li><li aria-level='2'>Writing C applications that read\/write the Flash by means of memory-mapped operations rather than peripheral configurations.<\/li><\/ol><\/li><\/ol><p>Throughout the project, the student will learn:<\/p><ul><li aria-level='1'>How the FLASH memories are connected to MCUs and how to communicate with them via&nbsp; SPI and QSPI protocols.<\/li><li aria-level='1'>How to write an FSM to accelerate existing SW functions.<\/li><li aria-level='1'>How to extend existing HW in Verilog\/SystemVerilog and Python to support more functionalities.<\/li><li aria-level='1'>How to verify the developed software and hardware.<\/li><li aria-level='1'>How to work with git repositories and in a team of people, all contributing to the same open-source project.<\/li><\/ul><p>The project will be carried out at the ESL at EPFL, one of the world's top-class universities, including EcoCloud\u2019s technical support. ESL is an active group (24 Ph.D. students among 45 members) involved in many research aspects. The student will be under the supervision of Mr. Tommaso Terzano, Ms. Anna Burdina, Dr. Davide Schiavone, and Prof. David Atienza.<\/p><p><b>Required knowledge and skills:<\/b><\/p><ul><li aria-level='1'>RTL design in any HDL (SystemVerilog is going to be used throughout the project)<\/li><li aria-level='1'>Low-level software design (C and\/or C++ is going to be used throughout the project)<\/li><li aria-level='1'>FPGA design, synthesis, and verification (the Pynq FPGA will be used throughout the project)<\/li><li aria-level='1'>RTL simulation with QuestaSim and Verilator<\/li><li aria-level='1'>Good understanding of microcontrollers<\/li><li aria-level='1'>Good analytical skills<\/li><li aria-level='1'>Good background in computer architecture and algorithms<\/li><li aria-level='1'>Teamwork and git<\/li><\/ul><p><b>Appreciated skills:<\/b><\/p><ul><li aria-level='1'>Scientific curiosity<\/li><li aria-level='1'>Good communication skills<\/li><li aria-level='1'>Advanced English&nbsp;<\/li><\/ul><p><b>Type of work:<\/b> 10% theory analysis, 90% design and simulation&nbsp;<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Tommaso Terzano, Anna Burdina, Prof. David Atienza<br> Contact email: <a href='mailto:davide.schiavone@epfl.ch; david.atienza@epfl.ch?subject=Efficient SPI: design, verification, and integration of a FLASH-Controller that uses the DMA and QuadSPI routines for efficient off-chip communications'>davide.schiavone@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project739minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Tommaso Terzano, Anna Burdina, Prof. David Atienza<br>\";<\/script>\n<span id=project739><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Tommaso Terzano, Anna Burdina, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project739',project739); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor738><\/a><b><span style='font-size: 20px;'>Efficient SPI: verification of a FLASH-Controller on FPGA and optimization of read\/write FLASH operations using caches<\/b><br><script>var project738=\"<p>Microcontrollers (MCUs) are used in a wide range of applications, ranging from sensor monitoring all the way to robotics and automotive. Despite typically being lower in performance, they are usually preferred over custom circuits and FPGAs due to their versatility and easy programmability via software routines typically written in the C language. <\/p><p>Thanks to these characteristics, MCUs are typically chosen as edge computing platforms. Due to cost (area) and power constraints, the on-chip memory, typically implemented with SRAM technology, is limited (usually below 1MB), thus limiting the software and data that can be processed. That\u2019s why MCUs are often extended with off-chip memories, typically with FLASH memories.<br>FLASH are non-volatile and usually connected to the MCU via serial-peripheral interfaces (SPIs). The SPI protocol is typically very slow to deal with recurrent read\/write operations to FLASH due to its serial nature. For example, in typical 32b MCUs, you need at least 32 SPI-cycles to read an instruction from FLASH.<\/p><p>X-HEEP (eXtendable Heterogeneous Energy-Efficient Platform) is an open-source, configurable, and extensible single-core RISC-V 32b MCU, sponsored by the EcoCloud Sustainable Computing center of EPFL. It is based on many third-party open-source IPs and in-house IPs developed at the Embedded Systems Laboratory (ESL) jointly with other EPFL laboratories.<\/p><p>X-HEEP provides a framework to run applications compiled for RISC-V on a simulator (Verilator, Questasim, or VCS), on a Xilinx FPGA, and can be implemented in silicon as well. <br><br>X-HEEP uses an off-chip FLASH memory connected via SPI. Accessing data to flash can be done by means of SW functions to send commands to the FLASH via SPI (read or write, quad or single mode, etc.), and program the X-HEEP DMA to move data from the SPI received data to the memory (or vice versa).<br><br>These functions result in quite long and inefficient code, making the reading and writing to FLASH slow. To speed up these operations, a hardware FSM (Flash Controller) has been designed to handle the SPI and DMA peripherals without the need for the CPU and its software functions. However, such a controller has been tested only in simulation, and it has never been tested on an FPGA.<br><br>For this reason, the students applying to this project will verify the functionalities of the Flash Controller with extended verification functions on an FPGA. <br>In addition, to optimize the Flash read and write operations, a 1-line Cache that holds the last valid 1 to 4 sectors will be implemented to avoid expensive off-chip communication when not needed.<\/p><p>The outcome of the thesis will be published open-source in the X-HEEP repository: <a href=https:\/\/github.com\/x-heep\/x-heep>https:\/\/github.com\/x-heep\/x-heep<\/a><\/p><ol><li aria-level='1'><b>[VERIFICATION and FPGA Validation of Read operations]<\/b> Testing the Flash Controller on FPGA. This means verifying that the Flash Controller integrated in X-HEEP synthesizes correctly on the FPGA and behaves as expected. New applications that challenge the Flash Controller functionality need to be written in C to test its robustness and unveil bugs. If bugs are found (in Software or Hardware), the student will need to fix them.<ol><li aria-level='2'>Synthesize and run on FPGA X-HEEP with the Flash Controller<\/li><li aria-level='2'>Run the existing reading tests and check if they run correctly. If not, check with the Integrated Logic Analyzer (ILA) of the FPGA and the external logic analyzer why it does not work, and provide an extensive description.<\/li><li aria-level='2'>Write new C tests to check the robustness of the Flash Controller. If bugs are found, the student has to fix them. If limitations of the Controller are found, the student has to report and document them.<\/li><\/ol><\/li><li aria-level='1'><b>[VERIFICATION and FPGA Validation of Write operations]<\/b> Testing the Flash Controller on FPGA. New applications that challenge the Flash Controller functionality need to be written in C to test its robustness and unveil bugs. If bugs are found (in Software or Hardware), the student will need to fix them.<ol><li aria-level='2'>Run the existing write tests and check if they run correctly. If not, check with the Integrated Logic Analyzer (ILA) of the FPGA and the external logic analyzer why it does not work, and provide an extensive description. Particular attention must be taken on the ERASE functionality of the Flash.<\/li><li aria-level='2'>Write new C tests to check the robustness of the Flash Controller. If bugs are found, the student has to fix them. If limitations of the Controller are found, the student has to report and document them.<\/li><\/ol><\/li><li aria-level='1'><b>[Cache support for the Flash Controller] <\/b>Design a minimum of a single-line cache up to a 4-line cache that holds in a line from 1 to 4 sectors to speed up the interaction between X-HEEP and the external Flash. This requires the Flash Controller to first check if the Flash data to be read or written is present in the cache by checking the address. If that is the case, skip the Flash read\/write operation. Otherwise, on a cache miss, the Flash Controller must read the missing sectors. The cache must also support the Write-Back policy that triggers a write to external Flash memory only upon a Cache Miss that requires a Line Replacement (Eviction).<\/li><\/ol><p>Throughout the project, the student will learn:<\/p><ul><li aria-level='1'>How the FLASH memories are connected to MCUs and how to communicate with them via&nbsp; SPI and QSPI protocols.<\/li><li aria-level='1'>How to write a Cache in Verilog\/SystemVerilog to accelerate existing SW functions.<\/li><li aria-level='1'>How to extend existing HW in Verilog\/SystemVerilog and Python to support more functionalities.<\/li><li aria-level='1'>How to verify the developed software and hardware.<\/li><li aria-level='1'>How to synthesize, run, and debug an FPGA.<\/li><li aria-level='1'>How to work with git repositories and in a team of people, all contributing to the same open-source project.<\/li><\/ul><p>The project will be carried out at the ESL at EPFL, one of the world's top-class universities, including EcoCloud\u2019s technical support. ESL is an active group (24 Ph.D. students among 45 members) involved in many research aspects. The student will be under the supervision of Mr. Tommaso Terzano, Ms. Anna Burdina, Dr. Davide Schiavone, and Prof. David Atienza.<\/p><p><b>Required knowledge and skills:<\/b><\/p><ul><li aria-level='1'>RTL design in any HDL (SystemVerilog is going to be used throughout the project)<\/li><li aria-level='1'>Low-level software design (C and\/or C++ is going to be used throughout the project)<\/li><li aria-level='1'>FPGA design, synthesis, and verification (the Pynq FPGA will be used throughout the project)<\/li><li aria-level='1'>RTL simulation with Questasim and Verilator<\/li><li aria-level='1'>Good understanding of microcontrollers<\/li><li aria-level='1'>Good analytical skills<\/li><li aria-level='1'>Good background in computer architecture and algorithms<\/li><li aria-level='1'>Teamwork and git<\/li><\/ul><p><b>Appreciated skills:<\/b><\/p><ul><li aria-level='1'>Scientific curiosity<\/li><li aria-level='1'>Good communication skills<\/li><li aria-level='1'>Advanced English&nbsp;<\/li><\/ul><p><b>Type of work:<\/b> 10% theory analysis, 90% design and simulation <\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Tommaso Terzano, Anna Burdina, Prof. David Atienza<br> Contact email: <a href='mailto:davide.schiavone@epfl.ch; david.atienza@epfl.ch?subject=Efficient SPI: verification of a FLASH-Controller on FPGA and optimization of read\/write FLASH operations using caches'>davide.schiavone@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project738minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Tommaso Terzano, Anna Burdina, Prof. David Atienza<br>\";<\/script>\n<span id=project738><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Tommaso Terzano, Anna Burdina, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project738',project738); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor733><\/a><b><span style='font-size: 20px;'>Architecture Definition and FPGA Deployment of a Near-Memory Computing Platform for Edge AI and Multimedia<\/b><br><script>var project733=\"<p>The shift toward data-centric algorithms, particularly in Artificial Intelligence (AI), Machine Learning (ML), and multimedia processing, has exposed the &quot;memory wall&quot; bottleneck in traditional von Neumann architectures. In these systems, the energy cost of moving data between memory and the CPU can be up to 100x higher than the arithmetic operations themselves. Near-Memory Computing (NMC) addresses this by performing computation directly where data is stored, significantly reducing data movement and exploiting internal memory bandwidth.<\/p> <p>However, existing CIM solutions often require high implementation effort and lack the flexibility required for general-purpose software integration. To address this, the Embedded Systems Laboratory (ESL) has developed <a href='https:\/\/ieeexplore.ieee.org\/document\/10964076'><strong>NM-Carus<\/strong><\/a>, a fully-autonomous, vector-capable RISC-V programmable NMC unit. NM-Carus is designed as a drop-in replacement for traditional SRAM banks, offering a functionally transparent memory mode alongside a high-performance computing mode. By integrating multiple NM-Carus instances into the <a href='https:\/\/github.com\/x-heep\/x-heep'><strong>X-HEEP<\/strong><\/a> microcontroller platform, this project aims to create a scalable SoC capable of accelerating complex workloads like Visual Transformers and multimedia codecs with peak energy efficiencies.<\/p> <h3><strong>Project Description and Main Goal<\/strong><\/h3> <p>The primary objective of this project is to define, implement, and deploy an NMC-enhanced System-on-Chip (SoC) based on X-HEEP, a configurable, ultra-low-power RISC-V microcontroller developed at EPFL. The student will re-architect the X-HEEP memory subsystem by replacing traditional SRAM banks with instances of the NM-Carus IP. The resulting platform must be tailored to accelerate specific target applications, including multimedia libraries (e.g., image and video codecs) and AI models (e.g., visual transformer), requiring a deep exploration of hardware-software trade-offs.<\/p> <h3><strong>Project Objectives<\/strong><\/h3> <p>The project will require a systematic approach involving RTL design, verification, and FPGA emulation. The student will work in close collaboration with another candidate taking care of deploying a set of relevant applications on the developed SoC. The specific tasks are defined in the following sections.<\/p> <ul><li><strong>X-HEEP Platform Familiarization<\/strong><\/li><\/ul> <p>The student must first gain a comprehensive understanding of the X-HEEP architecture. This involves setting up the development environment, installing required EDA tools (e.g., Verilator) and software toolchains. The student is expected to follow X-HEEP documentation to learn how to build the simulation model, compile example software applications, and run RTL simulations. It is crucial for the student to understand the SoC's internal mechanisms, specifically how parameters modify the hardware generation, how peripherals are exposed to the host CPU and what features they offer, how X-HEEP can be extended with new accelerators or coprocessors, and how interrupts are handled. Following simulation, the student will synthesize the design for FPGA and cooperate with the software team to deploy baseline versions of the benchmark applications, establishing a performance reference.<\/p> <ul><li><strong>NM-Carus Architecture Familiarization<\/strong><\/li><\/ul> <p>The student needs to acquire in-depth knowledge of the NM-Carus device, including its hardware microarchitecture and its custom RISC-V ISA extension. Understanding the vector processing capabilities, memory management paradigm, and the interface mechanism with the host system is essential for later optimization of processing kernels.<\/p> <ul><li><strong>Architecture Definition and Integration<\/strong><\/li><\/ul> <p>Starting from a provided template containing initial NM-Carus instances, the student will define the NMC-enhanced architecture. This is a critical design phase where the student must investigate the optimal configuration, bus architecture, and the number and size of the NMC devices to best suit the expected workloads. Key design decisions will include the bus architecture (e.g., OBI\/AHB crossbars), memory layout (interleaving schemes, banking factors), and the mapping of system addresses to the NM-Carus Vector Register Files (VRF). The student must also design efficient communication mechanisms between the host CPU and the NMC instances, as well as inter-NMC communication if deemed necessary.<\/p> <ul><li><strong>Functional Verification<\/strong><\/li><\/ul> <p>Once the base platform is defined, the student will verify its functionality through RTL simulation and FPGA emulation. This involves deploying simple &quot;sanity check&quot; applications (e.g. matrix multiplication, data movement tests), developed in collaboration with the software team, to ensure the modified memory subsystem functions correctly as both standard memory and a computing unit. The student will modify these initial tests to exploit the parallelism of multiple NMC instances.<\/p> <ul><li><strong>Benchmark Deployment and Profiling<\/strong><\/li><\/ul> <p>The student will assist the software team in deploying the full target benchmarks. This task involves hardware-level profiling to identify bottlenecks in the computing kernels. The student will explore opportunities to extend the hardware, such as adding custom instructions to the NM-Carus ISA, modifying the banking structure to increase bandwidth, or adding system-level modules to minimize data transfer latency and streamline synchronization with the host core and between NM-Carus instances.<\/p> <ul><li><strong>Architectural Design Space Exploration<\/strong><\/li><\/ul> <p>Based on profiling results, the student will iterate on the architecture. This may involve changing the number and size of NM-Carus instances, adjusting internal banking parallelism, or refining the interconnect policy to maximize system throughput and energy efficiency for the selected workloads.<\/p> <ul><li><strong><em>[Optional] <\/em><\/strong><strong>Physical Implementation<\/strong><\/li><\/ul> <p>Implement the SoC on a 65nm technology node (logic synthesis) to extract timing and energy consumption figures via post-synthesis simulations.<\/p> <h3><strong>Working Environment<\/strong><\/h3> <p>The research will take place at the Embedded Systems Laboratory (ESL) at EPFL, a globally recognized institution for research in embedded systems and computer architecture. ESL offers a stimulating and collaborative research environment, complete with access to cutting-edge tools and resources. The candidate will have the chance to work closely with Prof. David Atienza and other members of the ESL team. The candidate is expected to work in close collaboration with another student taking care of the software aspects of the project.<\/p> <h3><strong>Expected Outcomes and Impact<\/strong><\/h3> <p>The successful completion of this project will result in:<\/p> <ul><li>A novel, scalable SoC architecture exploiting programmable near-memory computing IPs to optimize the execution of multimedia and AI tasks on edge devices.<\/li><li>A functional FPGA prototype of such architecture.<\/li><li>A detailed performance assessment of the proposed architecture while running a predefined set of benchmarks applications.<\/li><\/ul> <h3><strong>Prerequisites<\/strong><\/h3> <ul><li>Strong background in computer architecture and digital systems design.<\/li><li>In-depth knowledge of the RISC-V architecture (ISA and microarchitecture).<\/li><li>Proficiency in SystemVerilog\/Verilog for RTL design and verification.<\/li><li>Experience with FPGA synthesis and emulation flows (e.g., Xilinx Vivado).<\/li><li>Familiarity with low-level programming (C and RISC-V assembly) to understand hardware-software interactions.<\/li><li>Experience with version control systems (Git) and Linux-based development environments.<\/li><li>Advanced experience with collaborative software and hardware development using Git.<\/li><\/ul> <p><strong>Appreciated skills:<\/strong><\/p> <ul><li>Knowledge of bus protocols (AHB, AXI, OBI).&nbsp;<\/li><li>Familiarity with logic synthesis tools (Synopsys Design Compiler or similar).<\/li><li>Advanced proficiency in English.&nbsp;&nbsp;<\/li><li>Effective communication skills.<\/li><\/ul> <p><strong>Type of work<\/strong><\/p> <ul><li>70% hardware design, RTL implementation, and FPGA emulation.<\/li><li>15% collaboration on software deployment and hardware-software co-design.<\/li><li>15% architecture analysis, documentation, and reporting.<\/li><\/ul><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Michele Caon, Dr. Davide Schiavone, Prof. David Atienza<br> Contact email: <a href='mailto:michele.caon@epfl.ch;davide.schiavone@epfl.ch;david.atienza@epfl.ch?subject=Architecture Definition and FPGA Deployment of a Near-Memory Computing Platform for Edge AI and Multimedia'>michele.caon@epfl.ch;davide.schiavone@epfl.ch;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project733minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Michele Caon, Dr. Davide Schiavone, Prof. David Atienza<br>\";<\/script>\n<span id=project733><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Michele Caon, Dr. Davide Schiavone, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project733',project733); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor732><\/a><b><span style='font-size: 20px;'>Application Deployment and Software Optimization on a Near-Memory Computing Platform for Edge AI and Multimedia<\/b><br><script>var project732=\"<p>While Near-Memory Computing (NMC) hardware offers significant theoretical gains in energy efficiency and throughput, unlocking this potential requires a specialized software stack and tailored application mapping. Traditional embedded software often struggles with the &quot;memory wall,&quot; where data movement dominates the power budget. This project focuses on porting and optimizing real-world, data-intensive applications (specifically transformer neural networks and multimedia codecs) onto a novel NMC-enhanced SoC.<\/p> <p>The target platform integrates the <a href='https:\/\/github.com\/x-heep\/x-heep'><strong>X-HEEP<\/strong><\/a> microcontroller with <a href='https:\/\/ieeexplore.ieee.org\/document\/10964076'><strong>NM-Carus<\/strong><\/a> units. NM-Carus provides a software-programmable solution based on the RISC-V ISA, allowing it to function as either a standard SRAM bank or a vector processing unit. The student will develop the firmware, drivers, and optimized kernels necessary to transition from standard CPU execution to accelerated NMC-based execution on a multi-instance, multiple-instruction, multiple-data architecture, aiming to significantly reduce the system-level energy footprint.<\/p> <h3><strong>Project description and main goal<\/strong><\/h3> <p>The primary objective of this project is to deploy, profile, and optimize a set of industry-relevant benchmark applications&mdash;multimedia libraries (libpng, ffmpeg) and AI models (dinov2)&mdash;on an X-HEEP microcontroller enhanced with NM-Carus devices in place of conventional SRAM banks in its memory subsystem. The student will be responsible for creating a performance baseline on a standard RISC-V microcontroller, porting the applications to the NMC-enhanced variant, and rewriting critical computing kernels to exploit the near-memory vector capabilities.<\/p> <h3><strong>Project objectives<\/strong><\/h3> <p>The project will involve a mix of embedded software development, algorithm analysis, and hardware-software co-design. The student will work in close collaboration with another candidate taking care of developing the SoC and deploying it on FPGA for faster prototyping. The specific tasks are defined in the following sections.<\/p> <ul><li><strong>X-HEEP Platform Familiarization<\/strong><\/li><\/ul> <p>The student will start by mastering the X-HEEP software development environment. This involves understanding the CMake-based build system, the linker scripts, and the Hardware Abstraction Layer (HAL). Special focus must be placed on how peripherals (e.g., UART, DMA, Timers) are mapped in memory and exposed to the software, and how to compile, flash, and debug bare-metal RISC-V applications.<\/p> <ul><li><strong>NM-Carus ISA Familiarization<\/strong><\/li><\/ul> <p>The student must gain a deep understanding of the NM-Carus programming model. This includes studying its custom RISC-V ISA extensions, the distinction between its &quot;standard memory&quot; and &quot;computing&quot; modes, and the cycle-accurate performance of its vector instructions. Understanding the theoretical peak throughput is crucial for setting optimization targets and optimizing the processing kernels in later stages of the project.<\/p> <ul><li><strong>Benchmark Analysis<\/strong><\/li><\/ul> <p>Before moving to the novel SoC, the student will analyze the selected benchmarks (libpng, ffmpeg, dinov2) on a standard host PC. The goal is to understand the algorithmic flow, data structures, and dependencies of these complex libraries. The student will identify the &quot;hotspots&quot; that are prime candidates for acceleration (e.g., vectorizable functions like convolution or matrix multiplication).<\/p> <ul><li><strong>Baseline Porting and Profiling<\/strong><\/li><\/ul> <p>The student will port the selected applications to the standard X-HEEP platform. This is a non-trivial task that involves removing OS dependencies (e.g., filesystem calls), managing limited memory resources, and compiling for a bare-metal RISC-V target. Once ported, the student will profile the applications using hardware performance counters to establish a firm baseline for execution time, identifying the specific bottlenecks to be addressed.<\/p> <ul><li><strong>NMC Acceleration and Optimization<\/strong><\/li><\/ul> <p>Leveraging the analysis from the previous steps, the student will offload the identified critical kernels to the NM-Carus instances. This involves rewriting specific functions and offloading custom assembly kernels to the available NMC vector units in SIMD or MIMD fashion. The student will work in strict cooperation with the hardware team to optimize data layout (e.g., tiling data to fit into NM-Carus banks) and minimize the overhead of data transfers between the host CPU and the NMC units.<\/p> <ul><li><strong>Performance Evaluation and Design Iteration<\/strong><\/li><\/ul> <p>The NMC-accelerated applications will be profiled and compared against the baseline in terms of execution time. Based on the optimization difficulties or bottlenecks encountered during software development (e.g., need for a specific shuffle instruction, or better DMA synchronization), the student will provide feedback to the hardware team to trigger architectural improvements.<\/p> <h3><strong>Working environment<\/strong><\/h3> <p>The research will take place at the Embedded Systems Laboratory (ESL) at EPFL, a globally recognized institution for research in embedded systems and computer architecture. ESL offers a stimulating and collaborative research environment, complete with access to cutting-edge tools and resources. The candidate is expected to work in close collaboration with another student taking care of the hardware definition and implementation.<\/p> <h3><strong>Expected Outcomes and Impact<\/strong><\/h3> <p>The successful completion of this project will result in:<\/p> <ul><li>A set of optimized, real-world applications running on a novel near-memory computing SoC.<\/li><li>An efficient and flexible device driver and SDK for the NM-Carus NMC device.<\/li><li>A comprehensive performance assessment of the proposed SoC when running the selected benchmark applications.<\/li><\/ul> <h3><strong>Prerequisites<\/strong><\/h3> <ul><li>Strong background in computer architecture and embedded software.<\/li><li>Advanced proficiency in C\/C++ programming.<\/li><li>Proficient in low-level programming, ideally with RISC-V assembly.<\/li><li>Experience with cross-compilation toolchains (GCC\/LLVM) and Make\/CMake build systems.<\/li><li>Experience with bare-metal programming (no OS) and debugging (GDB).<\/li><li>Advanced experience with collaborative software and hardware development using Git.<\/li><li>Strong scripting and data collections and analysis skills.<\/li><li>Good analytical skills.<\/li><\/ul> <p><strong>Appreciated skills:<\/strong><\/p> <ul><li>Knowledge of image processing algorithms, video compression standards, or neural network internals (specifically Transformers).<\/li><li>Advanced proficiency in English.&nbsp;&nbsp;<\/li><li>Effective communication skills.<\/li><\/ul> <p><strong>Type of work<\/strong><\/p> <ul><li>50% application software development and deployment.<\/li><li>35% hardware\/software co-design, implementation, verification, and validation.<\/li><li>15% profiling, documentation, and reporting.<\/li><\/ul><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Michele Caon, Dr. Davide Schiavone, Prof. David Atienza<br> Contact email: <a href='mailto:michele.caon@epfl.ch; davide.schiavone@epfl.ch; david.atienza@epfl.ch?subject=Application Deployment and Software Optimization on a Near-Memory Computing Platform for Edge AI and Multimedia'>michele.caon@epfl.ch; davide.schiavone@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project732minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Michele Caon, Dr. Davide Schiavone, Prof. David Atienza<br>\";<\/script>\n<span id=project732><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Michele Caon, Dr. Davide Schiavone, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project732',project732); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor723><\/a><b><span style='font-size: 20px;'>Brian Computer Interface (BCI) Application: Real-time Ball Mind Control using VersaSens EEG Headset (VersaBrain)<\/b><br><script>var project723=\"<p>The project aims to implement a real-time brain&ndash;computer interface  (BCI) application that controls the vertical position of a ball on a  computer screen through mental concentration. Cortical activity  generates electrical signals known as electroencephalogram (EEG)  signals, which can be recorded non-invasively from the scalp. As these  signals propagate through the skull and scalp layers, they become  significantly attenuated, resulting in low amplitudes typically ranging  from 10 to 100 &micro;V. EEG signals span frequencies from approximately 0.5  Hz to 45 Hz and are conventionally grouped into distinct bands, each  associated with specific physiological or cognitive states. In this  project, the beta band (12&ndash;30 Hz) is of particular interest, as it is  commonly linked to concentration, attention, and stress. By computing  the power spectral density within this frequency range, the user&rsquo;s  concentration level can be estimated.<\/p>    <p>In this project, an EEG headset based on VersaSens [1], referred to  as VersaBrain, will be used for real-time signal acquisition and  processing. VersaSens is a modular, multimodal, extendable, and  reconfigurable edge-AI platform developed at the Embedded Systems  Laboratory (ESL). It includes three sensing modules and two processing  modules, with the possibility of integrating additional custom-designed  modules, providing a flexible foundation for a wide range of  applications. For this project, VersaBrain is equipped with one sensing  module (ExG) for real-time EEG acquisition and one processing module  (Main) for data storage, on-device processing, and Bluetooth Low Energy  (BLE) transmission. Optionally, the co-processor module (HEEPO),  featuring the HEEPOcrates SoC [2] and offering hardware-acceleration  capabilities, may be integrated as a co-processor unit.<\/p>    <p>For this BCI application, VersaBrain acquires real-time EEG signals  from its frontal electrodes (Fp1 and Fp2) at a sampling rate of 200 Hz.  In addition to data collection and storage, the processor must compute  the power spectral density (PSD) of the beta band over short signal  windows, whose optimal length should be experimentally determined. The  computed PSD values must then be transmitted in real time to a computer,  tablet, or mobile device, where they are converted into the vertical  position of a ball. Higher levels of concentration correspond to  increased beta-band power, which should lower the position of the ball  on the screen, whereas lower concentration results in reduced beta power  and consequently a higher ball position.<\/p>    <p>Because this is a real-time application, the PSD computation,  wireless data transmission, and visualization script must be designed  and optimized to avoid any perceptible latency. While PSD calculation is  recommended for estimating concentration levels and driving the ball&rsquo;s  movement, alternative signal-processing methods or machine\/deep learning  approaches may also be employed, provided that (i) the end-to-end  real-time application runs smoothly on the sensor within its energy and  memory constraints, and (ii) the full system, including the receiving  device, operates without visible latency.<\/p>    <p><strong>Mandatory tasks:<\/strong><\/p>    <p>Completion of the following tasks is required to pass the exam and  obtain a grade of 4. Failure to complete any of these tasks will result  in <strong>no pass<\/strong>:<\/p>    <ol class='wp-block-list'><li>Become familiar with EEG signals, their frequency bands, and the physiological and cognitive states associated with each band.<\/li><li>Become familiar with VersaSens, including its hardware architecture,  firmware framework, and practical usage, in particular the ExG module,  and the Main module.<\/li><li>Characterise and configure the ExG module for the BCI application,  including sampling frequency, gain, and all other required parameters.  Test and validate real-time synchronous signal acquisition and data  storage and data streaming over BLE.<\/li><li>Design and implement the on device beta band PSD algorithm for real-time signals acquired by the ExG module.<\/li><li>Design and implement PSD data transmission from the sensor to a computer.<\/li><li>Develop a python script to receive the transmitted data and map them  to the vertical position of a ball on the screen. (A partially  implemented version is available in the repository, but it must be  adapted for use with real-time signals from VersaBrain.)<\/li><li>Integrate the developed components on both the sensor and the computer, to form a complete end-to-end BCI pipeline.<\/li><li>Test the whole pipeline using VersaBrain headset with real-time EEG  signals and produce a short a demonstrative video showcasing  functionality of your application.<\/li><li>Prepare and deliver a complete documentation package for upload to  the VersaSens GitLab repository, including all Python code, c\/c++ code,  and the firmware.<\/li><\/ol>    <p><strong>Optional tasks:<\/strong><\/p>    <p>Completion of each task will contribute an additional 1 point to the final grade:<\/p>    <ol class='wp-block-list'><li>Use HEEPocrates SoC for beta band PSD computation. In this case,  HEEPocrates must receive the real-time signals form the main processor  nRF5340 through SPI interface, process them, and return the results to  the nrf5340.<\/li><li>Test the full pipeline using the VersaBrain headset and the  HEEPocrates SoC with real-time EEG signals. Produce a short video  showcasing the application&rsquo;s functionality and submit all documentation  to the same repository.<\/li><\/ol>    <p><strong>Type of work<\/strong><\/p>    <ul class='wp-block-list'><li>25% software design.<\/li><li>50% Firmware design.<\/li><li>20% Testing and performance evaluation.<\/li><li>5% Preparation and delivery of the complete documentation package.<\/li><\/ul>    <p><strong>Desired skills:<\/strong><\/p>    <ul class='wp-block-list'><li>Strong background in C and Python programming<\/li><li>Knowledge of EEG physiological signals<\/li><li>Experience with embedded software development, RTOS environments and Zephyr<\/li><li>Familiarity with version control systems (Git)<\/li><\/ul>    <p><strong>Soft skills<\/strong><strong>:<\/strong><\/p>    <ul class='wp-block-list'><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English<\/li><\/ul>    <p><strong>Refrences:<\/strong><\/p>    <p>[1] Najafi, Taraneh Aminosharieh, et al. &ldquo;VersaSens: An Extendable Multimodal Platform for Next-Generation Edge-AI Wearables.&rdquo;&nbsp;<em>IEEE Transactions on Circuits and Systems for Artificial Intelligence<\/em>&nbsp;(2024).<\/p>    <p>[2] Machetti, Simone, et al. &ldquo;HEEPocrates: An ultra-low-power RISC-V  microcontroller for edge-computing healthcare applications.&rdquo;&nbsp;<em>Europractice, March<\/em>&nbsp;(2024).<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Taraneh Aminosharieh Najafi, Dr. Jose Miranda, Prof. David Atienza <br> Contact email: <a href='mailto:taraneh.aminoshariehnajafi@epfl.ch; jose.mirandacalero@epfl.ch\u00e9 david.atienza@epfl.ch?subject=Brian Computer Interface (BCI) Application: Real-time Ball Mind Control using VersaSens EEG Headset (VersaBrain)'>taraneh.aminoshariehnajafi@epfl.ch; jose.mirandacalero@epfl.ch\u00e9 david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project723minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Taraneh Aminosharieh Najafi, Dr. Jose Miranda, Prof. David Atienza <br>\";<\/script>\n<span id=project723><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Taraneh Aminosharieh Najafi, Dr. Jose Miranda, Prof. David Atienza <br> <a href=#_ onclick=opendesc('project723',project723); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><a href=https:\/\/ecocloud.epfl.ch\/wp-content\/uploads\/2025\/12\/ESL_Semester_project_BCI_Ball_Mind_Controll_on_VersaSens.pdf title='document link'><img width=30 border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/doclink.gif hspace=2 alt='document link'><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor715><\/a><b><span style='font-size: 20px;'>Embedded Multimodal Physiological Signal Preprocessing  on VersaSens Platform for Mental Engagement Assessment <\/b><br><script>var project715=\"<p>The project aims to implement a real-time preprocessing pipeline for electroencephalogram (EEG), electrocardiogram (ECG), and electrodermal activity (EDA) signals on a resource-constrained embedded platform called <a href=\/labs\/esl\/research\/smart-wearables\/versasens\/>VersaSens<\/a> [1]. Developed at the Embedded Systems Laboratory (ESL), VersaSens is a modular, multimodal, extendable, and reconfigurable Edge-AI platform. It includes three sensing modules and two processing modules, with the option to integrate additional custom-designed modules, providing a flexible foundation for diverse applications.<br \/>In this project, one sensing module (ExG) will be used for real-time signal acquisition, and one processing module (Main) will be used for data storage, on-device processing, and Bluetooth Low Energy (BLE) transmission.<\/p> <p>This project extends the work presented in [2], which investigated multimodal sensor fusion using deep convolutional neural networks (CNNs) for mental engagement assessment. The preprocessing steps described in that study represent common standard methods, some of which can be partially reused in the present work. In particular, the signal-filtering parts can be adapted, whereas the artifact-removal steps for EEG signals are more challenging to implement on an embedded platform. Therefore, we propose using the inertial measurement unit (IMU) as an indicator of movement and quantifying the correlation between accelerations and EEG artifacts to determine the quality of EEG segments as clean or contaminated. Given that EEG is highly susceptible to various types of artifacts, any contaminated EEG segment can be used to label the entire synchronized multimodal window as unsuitable for input to the CNN model.<\/p> <p>It is important to note that the full real-time preprocessing pipeline on VersaSens, together with the already designed CNN model as an end-to-end application, must operate within the 3-second classification window. Since the CNN inference requires approximately 1.30 seconds at the 128 MHz operating frequency of the nRF5340 system-on-chip (SoC), the preprocessing pipeline must (i) fit within the SoC\u2019s 1 MB flash memory alongside the VersaSens firmware and the CNN model, and (ii) execute in no more than 1.50 seconds. Nevertheless, the exploration and development of alternative methods or algorithms that achieve comparable performance while satisfying the timing, memory, and power constraints of the platform are equally valued.<\/p> <p><strong>Mandatory tasks:<\/strong><\/p> <p>Completion of these tasks is required to pass the exam and obtain a grade of 4.<\/p> <ol> <li>Become familiar with state-of-the-art IMU, EEG, ECG, and EDA filtering methods, including high-pass, low-pass, and stop-band filtering for all signals. All filters are recommended to be 2<sup>nd<\/sup> order Butterworth.<\/li> <table width='613' border=2> <tbody> <tr> <td width='46'> <p>\u00a0<\/p> <\/td> <td width='152'> <p>High-pass filter (Hz)<\/p> <\/td> <td width='151'> <p>Low-pass filter (Hz)<\/p> <\/td> <td width='265'> <p>Stop-band (notch) filter (Hz)<\/p> <\/td> <\/tr> <tr> <td width='46'> <p>EEG<\/p> <\/td> <td width='152'> <p>4<\/p> <\/td> <td width='151'> <p>45<\/p> <\/td> <td width='265'> <p>46-54 (optional, asses if necessary)<\/p> <\/td> <\/tr> <tr> <td width='46'> <p>ECG<\/p> <\/td> <td width='152'> <p>0.5<\/p> <\/td> <td width='151'> <p>-<\/p> <\/td> <td width='265'> <p>46-54 (optional, asses if necessary)<\/p> <\/td> <\/tr> <tr> <td width='46'> <p>EDA<\/p> <\/td> <td width='152'> <p>-<\/p> <\/td> <td width='151'> <p>2<\/p> <\/td> <td width='265'> <p>46-54 (optional, asses if necessary)<\/p> <\/td> <\/tr> <tr> <td width='46'> <p>IMU<\/p> <\/td> <td width='152'> <p>-<\/p> <\/td> <td width='151'> <p>10<\/p> <\/td> <td width='265'> <p>-<\/p> <\/td> <\/tr> <\/tbody> <\/table> <li>Become familiar with the state-of-the-art IMU-based EEG artifact detection. The recommended approach is: <ul> <li>Segment the filtered synchronized signals and calculate the Pearson cross-correlation between IMU and EEG.<\/li> <li>If the correlation value is higher than a certain threshold (to be tested), the related segment of signals has to be labelled as contaminated.<\/li> <li>Note that both EEG and IMU should have the same segment length. Therefore, either the EEG signals with a sampling rate of 200Hz should be downsampled (decimated), or the IMU with a sampling rate of 100Hz should be upsampled to 200Hz by interpolation.<\/li> <\/ul> <\/li> <li>Design the preprocessing steps defined in Steps 1 and 2 as a complete pipeline. It is recommended to use the order:<\/li>  <ul> <li>Notch filter -> High-pass -> Low-pass -> downsample (EEG, ECG, EDA) or upsample (IMU) -> segmentation -> Pearson cross-correlation (EEG and IMU) -> labelling high\/low contamination.<\/li> <\/ul>  <li>Plot and compare the signals before and after preprocessing with segments and labels to show the efficacy of your designed pipeline.<\/li> <li>Optimize, quantize, and convert the designed pipeline to C for embedded deployment.<\/li> <li>Become familiar with VersaSens, including its hardware architecture, firmware framework, and practical usage.<\/li> <li>Integrate the converted C pipeline into the VersaSens firmware and evaluate it on the device. Test the implemented pipeline using real-time signals acquired from the ExG module and IMU from the BNO086 SoC.<\/li> <li>Profile the execution performance of the pipeline, including energy consumption, execution time, and computational workload, and ensure that the execution time for a 3-second signal window is within 1.5 seconds.<\/li> <li>Prepare and deliver a complete documentation package for upload to the VersaSens GitLab <a href='https:\/\/eslgit.epfl.ch\/versasens\/v1\/versasens_v1_applications\/mental_engagement_sensor_fusion'>repository<\/a>, including all Python code, c\/c++ code, and the firmware.<\/li> <\/ol> <p><strong>Optional tasks: <\/strong><\/p> <p>Completion of each task will contribute an additional 1 point to the final grade.<\/p> <ol> <li>Integrate an existing CNN model with the designed preprocessing pipeline to create a complete end-to-end application.<\/li> <li>Test the functionality of the end-to-end system using a 3-second classification window on VersaSens, and profile its performance by measuring energy consumption, execution time, and computational operations.<\/li> <\/ol> <p><strong>Type of work<\/strong><\/p> <ul> <li>50% software design.<\/li> <li>30% Firmware design.<\/li> <li>15% Testing and performance evaluation.<\/li> <li>5% Preparation and delivery of the complete documentation package.<\/li> <\/ul> <p><strong>Desired skills:<\/strong><\/p> <ul> <li>Strong background in C\/C++ and Python programming<\/li> <li>Knowledge of physiological signals such as EEG, ECG, and EDA<\/li> <li>Background in deep learning models, including CNNs<\/li> <li>Experience with embedded software development, RTOS environments, and Zephyr<\/li> <li>Familiarity with version control systems (Git)<\/li> <\/ul> <p><strong>Soft skills<\/strong><strong>:<\/strong><\/p> <ul> <li>Scientific curiosity<\/li> <li>Good communication skills<\/li> <li>Advanced English<\/li> <\/ul> <p><strong>Refrences:<\/strong><\/p> <p>[1] Najafi, Taraneh Aminosharieh, et al. 'VersaSens: An Extendable Multimodal Platform for Next-Generation Edge-AI Wearables.'\u00a0<em>IEEE Transactions on Circuits and Systems for Artificial Intelligence<\/em>\u00a0(2024).<\/p> <p>[2] Aminosharieh Najafi, Taraneh, et al. 'Drivers\u2019 mental engagement analysis using multi-sensor fusion approaches based on deep convolutional neural networks.'\u00a0<em>Sensors<\/em>\u00a023.17 (2023): 7346.<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Taraneh Aminosharieh Najafi, Ms. Lara Orlandic, Ms. Dimitra Tatli, Dr. J\u00e9r\u00f4me Thevenot, Prof. David Atienza<br> Contact email: <a href='mailto:taraneh.aminoshariehnajafi@epfl.ch; lara.orlandic@epfl.ch; dimitra.tatli@epfl.ch; jerome.thevenot@epfl.ch; david.atienza@epfl.ch?subject=Embedded Multimodal Physiological Signal Preprocessing  on VersaSens Platform for Mental Engagement Assessment '>taraneh.aminoshariehnajafi@epfl.ch; lara.orlandic@epfl.ch; dimitra.tatli@epfl.ch; jerome.thevenot@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project715minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Taraneh Aminosharieh Najafi, Ms. Lara Orlandic, Ms. Dimitra Tatli, Dr. J\u00e9r\u00f4me Thevenot, Prof. David Atienza<br>\";<\/script>\n<span id=project715><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Taraneh Aminosharieh Najafi, Ms. Lara Orlandic, Ms. Dimitra Tatli, Dr. J\u00e9r\u00f4me Thevenot, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project715',project715); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/wp-content\/uploads\/2025\/12\/ESL_Semester_Project_Embedded_Multimodal_Signal_PreProcessing_on_VersaSens.pdf title='document link'><img width=30 border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/doclink.gif hspace=2 alt='document link'><\/a><\/td><td><a href='https:\/\/www.epfl.ch\/labs\/esl\/research\/smart-wearables\/versasens\/' title='weblink'><img hspace=2 width=30 border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/weblink.gif alt='web link'><\/a><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor714><\/a><b><span style='font-size: 20px;'>SIMD acceleration of Transformer-based ensemble<\/b><br><script>var project714=\"<p><strong>The goal of the project is to implement and explore the  effectiveness of a novel hardware\/software co-optimization strategy for  Transformer inference, based on model ensembles and shared-indexes  codebooks.<\/strong><\/p> <p>Transformers are state-of-the-art algorithms in AI, with applications  ranging from object detection\/classification to machine translation.  Nonetheless, their high storage and computational requirements pose a  challenge in resource-constrained edge computing scenarios. In this  context, codebook-based optimizations<sup>1<\/sup> greatly reduce  workloads and memory footprints of Transformer models by imposing few  admissible weight values, stored in a small table (codebook) for each  layer, and representing large weight matrices with codebook indexes.<\/p> <p>Codebooking also degrades, to a degree, the accuracy of models. To  recover from accuracy losses, multiple model instances, with the same  structure but different weight values, can be employed (ensembling<sup>2<\/sup>). Crucially, instances can be optimized to employ different codebooks, but <em>with the same indexes<\/em>,  hence minimally impacting memory requirements. Moreover, SIMD  extensions such as the ARM Scalable Vector Extension (SVE) can be  employed to leverage the data parallelism deriving from ensembling to  greatly reduce run time.<\/p> <p>To investigate such approach, in this project the student will be  asked to implement ensembles of Transformer models in C. The algorithms  will then be extended to the case of codebooks with shared indexes  between instances, exploiting SIMD instructions for acceleration.  System-wide explorations of the resulting implementations will be  performed targeting virtual systems of varying characteristics, defined  in the gem5 framework.<\/p> <p>The project will be carried out at the <a href='https:\/\/www.epfl.ch\/labs\/esl\/'>Embedded Systems Laboratory (ESL)<\/a>  of EPFL. ESL comprises more than 40 researchers, active in many  research topics in the hardware\/software co-design spectrum. The student  will be under the supervision of Mr. Stefano Albini, Dr. Giovanni  Ansaloni, and Prof. David Atienza.<\/p> <p>In order to achieve a passing grade of 4.0, the student is expected to:<\/p> <ul><li>Become familiar with full system simulation, codebook-based optimization, and SVE intrinsics.<\/li><li>Implement a software environment for generating C-language ensembles of Transformer models with codebooks and shared indexes.<\/li><li>Adapt the existing C functions leveraging SVE to accelerate the ensembles execution.<\/li><li>Use gem5 in Full System mode to assess the achieved speedup.<\/li><\/ul> <p>Additionally, the following tasks add 1 additional point to the project grade:<\/p> <ul><li>Explore SW optimizations (ex. tiling, data-rearrangement &hellip;) to further improve run-time.<\/li><li>Extend the algorithm to work with data representations smaller than float32.<\/li><\/ul> <h3><strong>Requirements:<\/strong><\/h3> <ul><li>Proficiency in C language and modular programming.<\/li><li>Basic command of Python language for data generation.<\/li><li>Familiarity with git version control system.<\/li><li>Interest in AI algorithms and computer architecture is a plus.<\/li><li>Scientific curiosity.<\/li><\/ul> <p><sup>1<\/sup> Flavio Ponzina et al. &ldquo;Using ensemble learning to  improve radiation tolerance of CNNs in space applications&rdquo;. In:  Proceedings of SPAICE2024: The First Joint European Space Agency\/IAA  Conference on AI in and for Space. 2024, pp. 16&ndash;20.<\/p> <p><sup>2<\/sup> Flavio Ponzina et al. &ldquo;An Accuracy-Driven Compression  Methodology to Derive Efficient Codebook-Based CNNs&rdquo;. In: 2022 IEEE  International Conference on Omni-layer Intelligent Systems (COINS).  2022, pp. 1&ndash;6. DOI: 10.1109\/COINS54846.2022.9854986<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Stefano Albini, Dr. Giovanni Ansaloni, Prof. David Atienza<br> \";<\/script>\n<script>var project714minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Stefano Albini, Dr. Giovanni Ansaloni, Prof. David Atienza<br>\";<\/script>\n<span id=project714><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Stefano Albini, Dr. Giovanni Ansaloni, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project714',project714); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor710><\/a><b><span style='font-size: 20px;'>Mind the Gaps: Missing-Aware Embedding Layers for Transformer Forecasting<\/b><br><script>var project710=\"<div class='entry-content mb-5'> \t\t <p>Urban wastewater time-series collected from underground sensing  infrastructure are inherently noisy, irregular, and incomplete.  Communication interruptions, fluctuating battery levels, harsh  environmental conditions, and irregular sampling frequently lead to <em>missing timestamps<\/em>, <em>bursty gaps<\/em>, or <em>temporarily corrupted sequences<\/em>. Forecasting methods deployed in digital twins [2] must therefore handle incomplete temporal signals gracefully.<\/p> <p>Transformer architectures rely heavily on embedding layers that  convert raw time-series into a sequence of tokens. However, most  existing embedding mechanisms&mdash;such as convolutional patch embeddings or  positional encodings&mdash;assume <strong>dense, regularly sampled<\/strong> time-series.  When timestamps are missing, the embedding becomes misaligned,  positional encodings become unreliable, and attention mechanisms degrade  due to incomplete temporal structure. This loss of representational  quality reduces forecasting robustness, particularly for long horizons  or rainfall-driven surges.<\/p> <p>This semester project focuses on designing <strong>missing-aware embedding layers<\/strong>  specifically for hydrological time-series forecasting with  Transformers. The goal is to ensure that the model can ingest irregular  or incomplete sequences without degrading performance, thereby improving  reliability for real-world deployments of AquaCast [1] or related  forecasting systems.<\/p> <h3><strong>TASKS&nbsp;<\/strong><\/h3> <ol><li><strong> Literature Review<\/strong><ul><li>Conduct a literature review on embedding layers of Transformers for standard time-series and time-series with missing values.<\/li><\/ul><\/li> <li><strong> Dataset Preparation<\/strong><ul><li>Preprocess and construct the dataset from raw wastewater and precipitation records for this specific task.<\/li><li>Introduce controlled missingness (random, burst, structural) to evaluate robustness.<\/li><\/ul><\/li>  <li><strong> Embedding Layer Design, Implementation &amp; Testing<\/strong><ul><li>Design, implement, and test <strong>different embedding layers<\/strong> for a vanilla Transformer to improve forecasting while enabling the network to handle missing time-steps.<\/li><li>Implement the developed embedding layer for AquaCast, while considering both exogenous and endogenous time-series.<\/li><li>Provide quantitative measurements and interpretability for current methods and your proposed designs.<\/li><\/ul><\/li> <\/ol> <p><strong>Optional:<\/strong><\/p> <ul><li><strong>Missing samples representation:<\/strong> Investigate self-supervised reconstruction (e.g., masked-signal modeling) and evaluate transfer to forecasting tasks.<\/li><\/ul> <p><strong>REQUIREMENTS<\/strong><\/p> <ul><li>Solid Python programming, basic understanding of deep learning.<\/li><li>Interest in robust AI and time-series forecasting.<\/li><li>Scientific curiosity<\/li><\/ul> <h3><strong>TYPE OF WORK<\/strong><\/h3> <ul><li><strong>30% theory<\/strong> (study of embedding mechanisms, missing-data theory).<br \/><strong>70% implementation<\/strong> (embedding design, integration, benchmarking, analysis).<\/li><\/ul> <p><strong>REFERENCES<\/strong><\/p> <p>[1] Abdollahinejad, Golnoosh, Saleh Baghersalimi, Denisa-Andreea Constantinescu, Sergey Shevchik, and David Atienza. &ldquo;<strong>AquaCast<\/strong>: Urban Water Dynamics Forecasting with Precipitation-Informed Multi-Input Transformer.&rdquo; <a href='https:\/\/arxiv.org\/abs\/2509.09458'><em>https:\/\/arxiv.org\/abs\/2509.09458<\/em><\/a><\/p> <p>[2] UrbanTwin project: <a href='https:\/\/urbantwin.ch\/'>https:\/\/urbantwin.ch\/<\/a><\/p> \t<\/div>                         <div class='post-nav py-md-1'>                                 <div class='nav-prev'>         <\/div><\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Golnoosh Abdollahinejad, Dr. Denisa Constantinescu, Prof. Dr. David Atienza<br> Contact email: <a href='mailto: golnoosh.abdollahinejad@epfl.ch; denisa.constantinescu@epfl.ch; david.atienza@epfl.ch?subject=Mind the Gaps: Missing-Aware Embedding Layers for Transformer Forecasting'> golnoosh.abdollahinejad@epfl.ch; denisa.constantinescu@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project710minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Golnoosh Abdollahinejad, Dr. Denisa Constantinescu, Prof. Dr. David Atienza<br>\";<\/script>\n<span id=project710><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Golnoosh Abdollahinejad, Dr. Denisa Constantinescu, Prof. Dr. David Atienza<br> <a href=#_ onclick=opendesc('project710',project710); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/urbantwin.ch target=_blank title='An urban digital twin for climate action: Assessing policies and solutions for energy, water and infrastructure'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/197.png width=70 alt='An urban digital twin for climate action: Assessing policies and solutions for energy, water and infrastructure'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor706><\/a><b><span style='font-size: 20px;'>FPGA implementation of a configurable Compute memory architecture for DNN inference applications<\/b><br><script>var project706=\"<div class='entry-content mb-5'> \t\t <p>The objective of this project is to extend an existing SRAM-based  compute-near-memory architectural template to accommodate large models  by employing off-chip DRAM to store weights and intermediate results. To  this end, targeting the Xilinx UltraScale+ FPGA platform, the student  will interface the adopted system bus protocol (OBI, Open Bus Interface)  with a DDR4 memory controller (which exposes its AXI ports to  programmable logic of the board), ensuring correct protocol translation  and efficient access to the external memory. A central task will be the  adaptation of the existing DMA engine to support synchronized data  transfers from DDR4, incorporating control logic to ensure that the  computing memory waits for data availability before computation begins.<\/p> <p>This integration, by tapping on the vast memory resources available  off-chip on the UltraScale+ board, will enable the deployment of large  DNNs, thereby validating the scalability of compute-near-memory  architecture for resource-intensive machine learning tasks being  developed at the Embedded Systems Laboratory of EPFL. It will hence  constitute a crucial step toward the practical deployment of  compute-in-memory systems for real-world AI workloads.<\/p> <p>Computing memories are up-and-coming solutions to the Von Neumann  bottleneck, i.e. the increasing disparity between the cost of computing  and that of memory access. Indeed, thanks to their hierarchical  structure, they allow for effective parallelization of computation,  while minimizing data movements, ultimately providing disruptive gains  in performance and efficiency.<\/p><p><strong>Tasks description<br \/><br \/><\/strong><em>Mandatory tasks:<\/em><\/p> <ol><li>Develop a solid understanding of the OBI (Open Bus Interface) and  AXI (Advanced eXtensible Interface) protocols used for memory-mapped  data transfers.<\/li><li>Integrate an OBI-to-AXI protocol bridge to enable communication  between the compute memory system and the external DDR4 memory via the  32-bit RISC-V core platform.<\/li><li>Analyze and adapt the existing DMA engine to synchronize data  transfers from DDR4, ensuring proper coordination with the compute  memory execution.<\/li><\/ol> <p><em>The completion of tasks 1-3 is required for a passing grade.<\/em><\/p> <p><em>&nbsp;<\/em><\/p> <p><em>Optional tasks:<\/em><\/p> <ol><li>Test and validate the full integration on the Xilinx UltraScale+ ZCU104 development board. <em>(required for 5\/6)<\/em><\/li><li>Use the software framework developed at ESL-EPFL to generate the C  application responsible for executing the MobileNet model inference on  the compute memory system. <em>(required for 6\/6)<\/em><\/li><\/ol><p><strong>Required knowledge and skills<\/strong><\/p> <ul><li>Good programming skills in VHDL, Verilog.<\/li><li>Knowledge of hardware design and computer architecture.<\/li><li>Scientific curiosity.<\/li><\/ul><p><strong>Type of work<\/strong><\/p> <ul><li>70% hardware design and integration, including protocol interfacing  (OBI to AXI), adaptation of the DMA engine, and integration of the DDR4  memory.<\/li><li>30% system validation and performance testing on the Xilinx UltraScale+ ZCU104 platform.<\/li><\/ul> \t<\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Gr\u00e9goire Eggermann, Dr. Giovanni Ansaloni, Prof. David Atienza<br> \";<\/script>\n<script>var project706minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Gr\u00e9goire Eggermann, Dr. Giovanni Ansaloni, Prof. David Atienza<br>\";<\/script>\n<span id=project706><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Gr\u00e9goire Eggermann, Dr. Giovanni Ansaloni, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project706',project706); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><a href=https:\/\/www.xilinx.com\/ target=_blank title='Xilinx Dublin'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/163.png width=70 alt='Xilinx Dublin'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor703><\/a><b><span style='font-size: 20px;'>Comparative evaluation of kinematic features derived from different modalities during a mobility test<\/b><br><script>var project703=\"<p>The study aims to evaluate the performance of different modalities to  assess kinematic features relevant to scoring a mobility test. It is  part of a larger project to develop tools to score and monitor physical  improvement during adherence to targeted exercise programs.<\/p>    <p>In this study, the accuracy and precision of different modalities  (and their combination) to evaluate 3D body motion and specific  kinematic features will be compared, using a VICON motion system  (Optitrack) as reference. Two of the selected modalities are based on  the wearable VersaSens platform: Inertial measurement units (IMU with  9-DOF) and capacitive-sensing system; while the third modality is based  on regular videos. VersaSens, developed at the Embedded Systems  Laboratory (ESL), is a modular, multimodal, extendable, and  reconfigurable Edge AI platform. It enables the integration of add-on  modules alongside its array of sensing and processing modules, providing  a flexible foundation for diverse applications.<\/p>    <p>This project is linked to an ongoing research within ESL, targeting  the development of wearable textile capacitive sensing system  (VersaPants) for human body motion tracking. The VersaPants system  consists of pants with embedded textile sensors connected to the  VersaSens platform, which is an alternative solution to conventional  rigid wearables. Thus, some aspects of this project will benefit from  previous developments within the laboratory (e.g. algorithms to create  3D animated models from sensor data), and it is expected that this  semester&rsquo;s project will further improve and validate the current  solution.<\/p>    <p>In this project, key priorities include evaluating the acquisition  robustness of kinematic features derived from each modality and  establishing reliable data acquisition and exercise protocols to score  the subjects&rsquo; mobility according to physiotherapists guidance.<\/p>    <p><strong>Mandatory tasks<\/strong><\/p>    <p>Completion of these tasks is required to pass the exam and obtain a grade of 4.<\/p>    <ol class='wp-block-list'><li>Conduct a comprehensive literature review of the different  modalities of interest to assess the 3D body motion for different  exercises relevant to the physiotherapist&rsquo;s recommendations. Get familiar with the modalities and platforms to be used in this project: IMUs, VersaPants, video, and VICON system.<\/li><li>Perform some preliminary experimental testing with all the  modalities and define a set of relevant features for limb mobility  scoring according to the physiotherapist&rsquo;s recommendations. Develop a  protocol for synchronous data acquisition from all modalities, with  specific exercises necessary to extract the identified features.  Consider the calibration requirements for the sensors when needed.<\/li><li>Conduct a synchronous data acquisition experiment with all the  modalities, with the help of volunteers (target: 10 participants),  according to the previously developed protocol. Assess the repeatability  of data acquisition for each modality. Extract the features from the  VICON system data.<\/li><li>Train machine-learning models of choice to obtain as an output the  identified relevant features to be used in the mobility scoring. The  models should use as input:<ol><li>IMU data<\/li><li>Video data<\/li><li>Combination of IMU and video data.<\/li><\/ol> <\/li><\/ol>    <p>The features extracted from the VICON system will be used as a  ground-truth reference. Prepare a report with the metrics for all the  models.<\/p>    <p><strong>Optional tasks<\/strong><\/p>    <p>Completion of each task will contribute an additional 1 &nbsp;point to the final grade.<\/p>    <ol class='wp-block-list'><li>Train machine-learning models using as input:<ol><li>VersaPants data<\/li><li>Combination of VersaPants data and IMU data<\/li><li>Combination of VersaPants data and video data<\/li><\/ol> The features extracted from the VICON system will be used as a  ground-truth reference. Prepare a report with the metrics for all the  models.<\/li>    <li>Based on the project&rsquo;s outcomes, propose a robust protocol of data  acquisition with the minimal number of sensors necessary to extract  relevant information for the mobility score. The selected modalities  should be justified as a trade-off between accuracy and ease of  implementation in real conditions. This document will be used as a base  to apply for an ethical permit for future projects.<\/li><\/ol>    <p><strong>Type of work<\/strong><\/p>    <ul class='wp-block-list'><li>40% Machine-learning models development<\/li><li>30% Experimental acquisition to mimic real-life application.<\/li><li>20% Development of protocols according to application requirements.<\/li><li>10% Preparation and delivery of the complete documentation package.<\/li><\/ul>    <p><strong>Desired skills:<\/strong><\/p>    <ul class='wp-block-list'><li>Strong analytical skills<\/li><li>Strong background in signal processing and machine learning<\/li><li>Experience in embedded systems development<\/li><li>Experience in data acquisition is a plus<\/li><li>Teamwork and git<\/li><\/ul>    <p><strong>Appreciated skills:<\/strong><\/p>    <ul class='wp-block-list'><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English<\/li><\/ul><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. J\u00e9r\u00f4me Thevenot, Dr. Taraneh Aminosharieh Najafi, Dr. Jonathan Dan, M.Sc. Deniz Kasap, Prof. David Atienza <br> Contact email: <a href='mailto:jerome.thevenot@epfl.ch; taraneh.aminoshariehnajafi@epfl.ch; jonathan.dan@epfl.ch; deniz.kasap@epfl.ch; david.atienza@epfl.ch?subject=Comparative evaluation of kinematic features derived from different modalities during a mobility test'>jerome.thevenot@epfl.ch; taraneh.aminoshariehnajafi@epfl.ch; jonathan.dan@epfl.ch; deniz.kasap@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project703minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. J\u00e9r\u00f4me Thevenot, Dr. Taraneh Aminosharieh Najafi, Dr. Jonathan Dan, M.Sc. Deniz Kasap, Prof. David Atienza <br>\";<\/script>\n<span id=project703><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. J\u00e9r\u00f4me Thevenot, Dr. Taraneh Aminosharieh Najafi, Dr. Jonathan Dan, M.Sc. Deniz Kasap, Prof. David Atienza <br> <a href=#_ onclick=opendesc('project703',project703); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor702><\/a><b><span style='font-size: 20px;'>ML4Seizures: An automated ML framework for training on seizure databases and efficient porting to low-power microcontrollers<\/b><br><script>var project702=\"<div class='entry-content mb-5'> \t\t <p>Machine learning (ML) has emerged as a transformative force across  numerous scientific and industrial domains, enabling systems to learn  from data and make intelligent decisions with minimal human  intervention. In recent years, deep learning&mdash;a subset of ML&mdash;has  demonstrated remarkable success in tasks such as image recognition,  natural language processing, and time-series analysis. These  advancements are primarily driven by the increasing availability of  computational resources, large-scale datasets, and sophisticated neural  network architectures, which together have enabled the development of  models with unprecedented accuracy and generalization capabilities.<\/p> <p>Among the many impactful applications of machine learning, healthcare  stands out as a field where data-driven methods can greatly improve  diagnostic accuracy and patient outcomes. One especially challenging  area is the detection and prediction of epileptic seizures from  electroencephalogram (EEG) signals. Seizure detection involves analyzing  complex, high-dimensional time-series data, where subtle patterns need  to be identified in real time. Machine learning techniques, particularly  deep neural networks, have shown promise in automating this process,  enabling continuous monitoring and timely intervention in both clinical  and home settings.<\/p> <p>In the rapidly evolving field of machine learning, the efficiency and  scalability of training pipelines are critical to accelerating research  and deployment. This project aims to automate the end-to-end machine  learning training pipeline on dedicated servers, streamlining tasks such  as data preprocessing, model configuration, training orchestration, and  performance monitoring. By reducing manual intervention and optimizing  resource utilization, the proposed system will enhance reproducibility,  minimize human error, and significantly cut down development time for  iterative experimentation.<\/p> <p>Complementing the automation of the training pipeline, this project  also focuses on the development of a high-performance C library  implementing commonly used deep neural network (DNN) layers. While many  existing frameworks offer extensive functionality, a lightweight and  modular C-based implementation provides greater control, portability,  and integration potential with embedded systems or custom hardware. This  dual approach&mdash;automation of training workflows and low-level  optimization of DNN components&mdash;addresses both software efficiency and  hardware adaptability, laying the groundwork for scalable and robust  machine learning solutions.<\/p> <p>More specifically, the project proposes to:<\/p> <ul><li>Implement a <strong>parametrizable<\/strong> framework for training PyTorch models on popular seizure databases running <strong>both<\/strong> on ESL and RCP EPFL servers. <ul><li>Using as baseline the existing ESL repositories<\/li><li>Ensuring efficient usage of underlying HW (i.e., GPUs) in both servers<\/li><li>Covering 3 popular seizure databases (i.e., TUH, CHB-MIT, Siena)<\/li><li>Providing flexibility in the data pre-processing step (i.e., segment  duration, %overlap, undersampling options, etc.), allowing future users  to replicate the state-of-the-art seizure pre-processing techniques  seamlessly<\/li><\/ul> <\/li><li>Implement a <strong>parametrizable<\/strong> C or C++ library for common DNN layers (i.e., CNN and Transformer layers). <ul><li>Using as baseline the existing ESL libraries<\/li><li>Writing generalizable code that allows future users to effortlessly  modify the layer parameters (e.g., padding, stride, in convolution)<\/li><li>Porting and profiling an existing DNN architecture on an embedded microcontroller (e.g., STM32-L4R5ZI or GAP9)<\/li><li>Applying low-level optimizations to the most time-consuming kernels and minimizing memory usage<\/li><\/ul> <\/li><\/ul> <p>The minimum requirements for successfully completing the project and achieving the minimum grade include:<\/p> <ol><li>Delivering a <strong>well-documented<\/strong> Git repository that enables training of an existing DNN architecture on <strong>three<\/strong> different seizure databases, <strong>both<\/strong> on ESL and RCP servers. The training must be <strong>fully parametrized<\/strong>  with respect to the input pre-processing scheme and training  hyperparameters. The DNN model (e.g., Zhu&rsquo;s Transformer [1]) will be  expected to achieve comparable performance to the reported one in the  literature.<\/li><li>Delivering a <strong>well-documented<\/strong> Git repository that  correctly implements inference using the most common CNN and Transformer  layers in C or C++. The library will be used to implement inference on  existing DNN architectures (e.g., Transformer in [2]) and will be  validated against PyTorch.<\/li><\/ol> <p>&nbsp;<\/p> <p>To increase the final grade further, the student is expected to  optimize the code in both Python and C and efficiently use the given HW  resources on both the server and the microcontroller. <strong>Half a point<\/strong>  will be awarded if the student correctly utilizes the GPU and achieves  training times comparable to existing ESL efforts. An extra <strong>half point<\/strong>  will be awarded for efficiently managing RAM limitations by batching  the training\/validation dataset. Finally, on the embedded  implementation, a <strong>full point<\/strong> will be awarded if the  student achieves a significant speed-up when running a Transformer  network (e.g., Transformer in [2]) compared to a plain, unoptimized C  implementation or if they prove that no optimization that yields a  significant speed-up can be applied to the baseline code implementation.<\/p> <p>The project will be carried out at the <a href='https:\/\/www.epfl.ch\/labs\/esl\/'>Embedded Systems Laboratory (ESL)<\/a>,  inside the Swiss Federal Institute of Technology (EPFL), one of the  world&rsquo;s top-class universities. ESL is an active group (22 Ph.D.  students among 40 members) involved in many research aspects. The  student will be supervised by Mr. Dimitrios Samakovlis, Dr. Jonathan  Dan, Dr. Giovanni Ansaloni, and Prof. David Atienza.<\/p> <p><strong>Required knowledge and skills<\/strong><\/p> <ul><li>Knowledge of deep neural networks and the main training techniques<\/li><li>Good background on Python, PyTorch and C programming<\/li><li>Basic knowledge of computer architecture<\/li><li>Good analytical skills<\/li><\/ul> <p><strong>Appreciated skills<\/strong><\/p> <ul><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English<\/li><li>Good competences to work autonomously<\/li><li>Teamwork<\/li><\/ul> <p><strong>Type of work<\/strong><\/p> <ul><li><strong>Theoretical Analysis (10%)<\/strong>: Involves understanding  the data preparation and pre-processing techniques in seizure tasks, and  the DNN architectures and layers involved.<\/li><li><strong>Coding (70%)<\/strong>: Involves developing parametrizable  code that utilizes the underlying HW resources. Two Git repositories  that achieve the desired results, one in Python and one in C, are  expected as final deliverable for getting a passing grade.<\/li><li><strong>Optimizations (20%)<\/strong>: Involves understanding the  bottlenecks and the underlying architecture to speed up the training on  the server and the inference on the embedded device. Applying accurate  optimizations that showcase a good understanding of the HW architecture  will help the student raise their grade.<\/li><\/ul> <p><strong>&nbsp;<\/strong><\/p> <p><strong>&nbsp;<\/strong><\/p> <p><strong>&nbsp;<\/strong><\/p> <p><strong>References<\/strong><\/p> <p>[1] Y. Zhu and M. D. Wang, &ldquo;Automated Seizure Detection using  Transformer Models on Multi-Channel EEGs,&rdquo; 2023 IEEE EMBS International  Conference on Biomedical and Health Informatics (BHI), Pittsburgh, PA,  USA, 2023, pp. 1-6, doi: 10.1109\/BHI58575.2023.10313440.<\/p> <p>[2] Ma, Yongpei &amp; Liu, Chunyu &amp; Ma, Maria &amp; Yang, Yikai  &amp; Truong, Nhan &amp; Kothur, Kavitha &amp; Nikpour, Armin &amp;  Kavehei, Omid. (2023). TSD: Transformers for Seizure Detection.  10.1101\/2023.01.24.525308.<\/p> \t<\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dimitrios Samakovlis, Dr. Jonathan Dan, Dr. Giovanni Ansaloni, Prof. David Atienza <br> Contact email: <a href='mailto:dimitrios.samakovlis@epfl.ch;jonathan.dan@epfl.ch;giovanni.ansaloni@epfl.ch;david.atienza@epfl.ch?subject=ML4Seizures: An automated ML framework for training on seizure databases and efficient porting to low-power microcontrollers'>dimitrios.samakovlis@epfl.ch;jonathan.dan@epfl.ch;giovanni.ansaloni@epfl.ch;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project702minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dimitrios Samakovlis, Dr. Jonathan Dan, Dr. Giovanni Ansaloni, Prof. David Atienza <br>\";<\/script>\n<span id=project702><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dimitrios Samakovlis, Dr. Jonathan Dan, Dr. Giovanni Ansaloni, Prof. David Atienza <br> <a href=#_ onclick=opendesc('project702',project702); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor694><\/a><b><span style='font-size: 20px;'>MuxD2D: Adaptive protocol-aware D2D communication support for chiplets in the X-HEEP microcontroller<\/b><br><script>var project694=\"<p><span style='color: #212121'>The growing complexity of heterogeneous multi-processor systems&mdash;particularly in energy-constrained edge applications like IoT, wearables, and biomedical devices&mdash;has driven the adoption of chiplet-based architectures, which distribute SoC components across multiple dies to enhance performance, efficiency, and flexibility. <\/span> <\/p> <p><span style='color: #212121'>To allow chiplet development, Die to Die (D2D) functionality is implemented within <strong>X-HEEP<\/strong> - an open-source (<\/span><a rel='noopener noreferrer nofollow' href='https:\/\/github.com\/esl-epfl\/x-heep' target='_blank'><span style='color: #1155cc'>https:\/\/github.com\/esl-epfl\/x-heep<\/span><\/a><span style='color: #212121'>), modular RISC-V-based microcontroller that enables scalable integration of custom IPs and accelerators. Its flexible architecture makes it well-suited for chiplet extensions, allowing rapid prototyping, FPGA deployment, and silicon implementation across diverse hardware configurations.<\/span> <\/p> <p><span style='color: #212121'>As chiplet systems grow, flexible and transparent D2D (die-to-die) communication becomes critical, requiring support for diverse bus protocols and functional intents such as raw data transfer or low-power wake-up signaling. This demands a communication infrastructure that can be easily adapted to the architectural and functional heterogeneity of the target system.<\/span> <\/p> <p><span style='color: #212121'>The project focuses on enhancing flexibility and clarity in D2D (die-to-die) communication within the X-HEEP platform by implementing a configurable SystemVerilog MUX wrapper that supports various signal types and bus protocol translations (e.g., OBI to AXI). Building on the existing integration flow, the work includes updating the RTL-based design and testbench, separating software and hardware resets, extending documentation, and thoroughly validating all enhancements through simulation and hardware testing. The student will have regular guidance and feedback throughout the whole project. The outcomes of this project are expected to be merged into the main X-HEEP repository.<\/span> <\/p> <p><span style='color: #212121'>This Master Semester project will be carried out at the ESL at EPFL. ESL is an active group (22 Ph.D. students among 45 members) involved in many research aspects. The student will be under the supervision of Ms. Anna Burdina, Dr. David Mallas&eacute;n, and Prof. David Atienza.<\/span> <\/p> <h3><span style='color: #212121'><strong>Project objectives:<\/strong><\/span><\/h3> <ol>     <li>         <p><span style='color: #212121'>Understanding X-HEEP and its peripherals. How to integrate a peripheral? What is memory mapped Peripheral? difference in the implementation between Master and Slave peripheral? HW\/SW Testbench: How to test the peripheral functionality in simulation? And in FPGA?<\/span>         <\/p>     <\/li>     <li>         <p>Refactor the Serial Link wrapper to allow master request propagation: implies sw mapped mux to decide if data is sent and stored into fifo on the receiving side or if it is directly written to the destination address (e.g. memory). Update application drivers for both simulation and FPGA. Deliverable: Pull Request with the updated system verilog files, test c files + documentation. Tested working FPGA emulation.         <\/p>     <\/li>     <li>         <p><span style='color: #212121'>Performance Evaluation of the Serial Link for different modes implemented in point 2. Deliverable: 1-2 pages report explaining the test case and efficiency difference of the different modes. Comparison to the original IP performance.<\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #212121'>Implement interrupt policy such that it notifies the DMA when the data is available on the receiver side of the Serial Link in fifo mode. Deliverable: Pull Request with the functional test of the interrupt, updated c application + documentation.<\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #212121'>Explore the usage of Raw Mode Registers for the no-protocol data transfer. Deliverable: Pull Request  with fully functional updated c drivers for serial link raw mode, test application in c + detailed documentation.<\/span>         <\/p>     <\/li> <\/ol> <p><span style='color: #212121'><br \/><\/span><span style='color: #999999'><em>Note: Objectives 1-3 are required to get a passing grade, whereas objectives 4-5 are required to get a maximum grade. Mark evaluation will be based on the overall performance and enthusiasm of the student, given their background and acquired skills. If the student successfully completes the objectives, the next step would be participation in the research activity, which will be based on the student's interests.<\/em><\/span> <\/p> <h3><span style='color: #434343'><strong>Required knowledge and skills:<\/strong><\/span><\/h3> <ul>     <li>         <p><span style='color: #0e101a'>Excellent RTL design and FPGA implementation in any HDL (SystemVerilog is preferred and will be used throughout the project).<\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #0e101a'>Good level of C language.<\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #0e101a'>Strong background in computer architecture.<\/span>         <\/p>     <\/li>     <li>         <p><span style='color: #0e101a'>Good understanding of memory architectures and microcontrollers.<\/span>         <\/p>     <\/li> <\/ul> <ul>     <li>         <p><span style='color: #0e101a'>Good analytical skills and a proactive attitude.<\/span>         <\/p>     <\/li> <\/ul> <ul>     <li>         <p><span style='color: #0e101a'>Good Git level. General Linux use and scripting.<\/span>         <\/p>     <\/li> <\/ul> <h3><span style='color: #434343'><strong>Appreciated skills:<\/strong><\/span><\/h3> <ul>     <li>         <p>Scientific curiosity.<\/p>     <\/li>     <li>         <p>Good communication skills.<\/p>     <\/li>     <li>         <p>Advanced English.<\/p>     <\/li> <\/ul><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ms. Anna Burdina, Dr. David Mallas\u00e9n, and Prof. David Atienza.<br> \";<\/script>\n<script>var project694minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ms. Anna Burdina, Dr. David Mallas\u00e9n, and Prof. David Atienza.<br>\";<\/script>\n<span id=project694><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Ms. Anna Burdina, Dr. David Mallas\u00e9n, and Prof. David Atienza.<br> <a href=#_ onclick=opendesc('project694',project694); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td>\n    <div style='position:relative;'>\n     <div style='position: absolute;top:-80px;left:-300px;'>\n       <img border=0 src=https:\/\/eslweb.epfl.ch\/projects\/images\/notavailable.gif alt='project no longer available'>\n     <\/div>\n    <\/div><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor677><\/a><b><span style='font-size: 20px;'>Algorithmic optimizations in transformers for epilepsy detection<\/b><br><script>var project677=\"<p>AI models based on transformers have received a great deal of attention over the last years, particularly for their role in large language models. Furthermore, they are also being explored in the domain of biomedical applications. Unfortunately, their implementation in the embedded devices commonly used in the biomedical domain usually poses daunting challenges given the strict constraints in terms of resources and energy of these systems. The Embedded Systems Laboratory (ESL) has developed a transformed-based application to detect epilepsy episodes in patients wearing EEG devices [1, 2]. However, during the porting process to X-HEEP [3] \u2014 an architecture developed also at ESL in collaboration with the EcoCloud center of EPFL for the prototyping of accelerators for edge-AI in embedded devices \u2014 we found that some parts of the tranformer architecture (i.e., the computation of the logarithm of the amplitude after FFT, log_amp, and the softmax kernels) are very computationally intensive and, furthermore, are not suitable for the accelerators present in the different instantiations of the X-HEEP architecture.<br><br>To tackle this problem, we have looked into approximation techniques to trade-off accuracy for computational complexity. After the optimizations, most of the execution time is now consumed by operations that can be executed in the platform accelerators, such as matrix multiplication and FFT \u2014 enabling further optimization with already existing techniques.<\/p><p>However, to use these optimizations safely, the transformer model must be re-trained and carefully tested to assess their impact on the accuracy of the complete system. The following table shows an initial evaluation of the number of cycles required in an X-HEEP implementation for some of the application kernels, and the number of cycles after optimization:<br><a href=https:\/\/www.epfl.ch\/labs\/lmc\/wp-content\/uploads\/2025\/03\/tableMiguel.png><img decoding='async' class='alignnone size-full wp-image-1961' src='https:\/\/www.epfl.ch\/labs\/lmc\/wp-content\/uploads\/2025\/03\/tableMiguel.png' alt='' width='993' height='346' srcset='https:\/\/www.epfl.ch\/labs\/lmc\/wp-content\/uploads\/2025\/03\/tableMiguel.png 993w, https:\/\/www.epfl.ch\/labs\/lmc\/wp-content\/uploads\/2025\/03\/tableMiguel-300x105.png 300w, https:\/\/www.epfl.ch\/labs\/lmc\/wp-content\/uploads\/2025\/03\/tableMiguel-768x268.png 768w' sizes='(max-width: 993px) 100vw, 993px'><\/a><br><br>The goal of this project is to, first, modify the existing Python implementation of the transformer model to incorporate the approximated kernels. Then, after retraining the model with these new kernels, the student will study the trade-off between accuracy and computational complexity. Finally, the student will optionally integrate the changes into the C\/C++ application that is executed on an FPGA-based X-HEEP instantiation to analyze the improvements on performance.<br><br>The expected outcomes of this project are:<br>\u25cf Re-training of the transformer model using the new approximation algorithms. This step will require implementing the new approximate algorithms in Python for integration with standard AI training&nbsp; frameworks (PyTorch).<br>\u25cf Development of a framework in Python to execute the complete application on all the signals of a standard epilepsy database.<br>\u25cf Evaluation of the impact on accuracy of the approximate optimizations for a complete epilepsy dataset.<br><br>Optional\/additional outcomes:<br>\u25cf Performance characterization of the new algorithms using the CPU present in an FPGA-based instantiation of the X-HEEP architecture.<br>\u25cf Helping in the development of CGRA-based implementations for the approximate versions of the softmax algorithm, using the compiler developed in ESL.<br>\u25cf Evaluate the impact of the approximated algorithms on the full transformer-based application and its early-exit variants, analyzing the resulting accuracy-complexity trade-offs.<br><br>Throughout the project, the student will learn:<br>\u25cf The main features of applications in the biomedical wearable domain.<br>\u25cf Basic concepts on the use of transformers in the biosignal processing domain.<br>\u25cf Implementation and training of AI models using the PyTorch framework.<br>\u25cf How to deploy a biosignal processing application on an embedded platform.<br>\u25cf Use of accelerators in embedded platforms.<br>\u25cf How to work with git repositories in a team of contributors to the same project.<br><br>The project will be carried out at the ESL at EPFL, one of the world\u2019s top-class universities, including EcoCloud\u2019s technical support. ESL is an active group (24 Ph.D. students among 45 members) involved in many research lines. The student will be under the supervision of Prof. David Atienza (ESL) and Dr. Miguel Pe\u00f3n-Quir\u00f3s (EcoCloud), with technical support from Hossein Taji (ESL) and the collaboration of Dr. Jos\u00e9 Miranda Calero.<br><br><strong>Project objectives:<\/strong><br>1. Understanding the use of transformers in applications for the detection of epilepsy episodes. Analysis of the computational complexity of each step of the application.<br>2. Retraining of the transformer model using approximate versions of the softmax, GeLu and (possibly) log_amp algorithms implemented in Python (PyTorch).<br>3. Evaluation of the impact on accuracy of the approximations introduced, on a complete database for epilepsy detection.<br>4. (Optional) Porting of the algorithms to an FPGA-based instantiation of X-HEEP and evaluation of the performance of the different steps of the application.<br>5. (Optional) Implementation of the soft_max algorithm for the CGRA present in X-HEEP using an experimental compiler.<br><br>Required knowledge and skills:<br>\u25cf C\/C++ and Python. General Linux use and scripting.<br>\u25cf Recommended: Basic concepts of AI frameworks (PyTorch).<br>\u25cf Good analytical skills.<br>\u25cf Teamwork and git.<br><br>Appreciated skills:<br>\u25cf Scientific curiosity.<br>\u25cf Good communication skills.<br>\u25cf Advanced English (interaction during the project will be in English).<br><br>Type of work: 40% theory analysis, 60% design and simulation.<br><br>[1] Alireza Amirshahi et al. \u201cMetaWearS: A Shortcut in Wearable Systems Lifecycle with Only a Few Shots.\u201d<br>2024, arxiv. https:\/\/arxiv.org\/abs\/2408.01988<br>[2] Taraneh Aminosharieh Najafi et al. \u201cVersaSens: An Extendable Multimodal Platform for Next-Generation Edge-AI Wearables.\u201d 2024, IEEE Transactions on Circuits and Systems for Artificial Intelligence.<br>[3] Simone Machetti et al. \u201cX-HEEP: An Open-Source, Configurable and Extendible RISC-V Microcontroller for the Exploration of Ultra-Low-Power Edge Accelerators.\u201d 2024, arxiv. https:\/\/arxiv.org\/abs\/2401.05548<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Miguel Peon, Hossein Taji, Dr. Jos\u00e9 Miranda Calero, Prof. David Atienza<br> Contact email: <a href='mailto:hossein.taji@epfl.ch; jose.mirandacalero@epfl.ch; miguel.peon@epfl.ch; david.atienza@epfl.ch?subject=Algorithmic optimizations in transformers for epilepsy detection'>hossein.taji@epfl.ch; jose.mirandacalero@epfl.ch; miguel.peon@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project677minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Miguel Peon, Hossein Taji, Dr. Jos\u00e9 Miranda Calero, Prof. David Atienza<br>\";<\/script>\n<span id=project677><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Miguel Peon, Hossein Taji, Dr. Jos\u00e9 Miranda Calero, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project677',project677); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor675><\/a><b><span style='font-size: 20px;'>Automation of a semi-custom design and verification flow for ultra-low-leakage always-on circuits relying on differential logic<\/b><br><script>var project675=\"<div class='entry-content mb-5'> \t\t <p>Microcontrollers (MCUs) are used in a wide range of applications ranging from sensor monitoring to robotics and automotive.<\/p> <p>Thanks to their versatility, MCUs are typically chosen as edge computing platforms.&nbsp;<\/p> <p>IoT, wearable, and edge-computing applications are typically profiled in 4 different phases:<\/p> <ol><li>acquisition<\/li><li>pre-processing<\/li><li>processing<\/li><li>transmission or stimulation via actuators<\/li><\/ol> <p>Acquisitions and stimulation can be performed either with external  analog-digital (ADC) and digital-analog (DAC) converters, or with  integrated ones.<\/p> <p>For the latter, the processor (usually implemented in digital) and  the analog components are implemented in the same technology node and  integrated in the same system-on-chip (SoC), requiring  analog-digital-interfaces.<\/p> <p>With a wide collaboration between EPFL, Imperial College London,  Universidad Carlos III de Madrid, and Politecnico di Torino, we are  building HEEPidermis: an SoC that integrates both the processing  elements and the acquisition and stimulation required to obtain  precision measurements of impedance and conductance of the skin in a  low-power and autonomous manner.&nbsp;<\/p> <p>The processor is based on X-HEEP, an open-source RISC-V configurable  and extendable microcontroller, and includes smart pre-processing of  data coming from ADCs, while the acquisition and stimulus components are  based on a set of possible ADCs (VCO-based, &Delta;&Sigma; and Level-Crossing), and  a current DAC, respectively. HEEPidermis also includes an FLL and LDO  to reduce the required count of off-chip components.&nbsp;<\/p> <p>The full chip is going to be implemented into the TSMC 65 LP technology.&nbsp;<\/p> <p>This project proposes to:<\/p> <ul><li>Do the full-custom layout of Analog\/Digital\/Mixed-Signal blocks.  These include ADCs and DACs, as well as mixed-signal components. <ul><li>Specifications and schematics will be provided<\/li><li>The layout will be done by using Cadence Virtuoso <ul><li>Possibly requiring modifying the Schematic<\/li><\/ul> <\/li><li>The layout must be equivalent to the schematic (LVS) and performed with Calibre<\/li><li>The layout must be DRC-free and performed with Calibre<\/li><li>Simulations with the parasitic extracted netlist will need to be performed to evaluate the final design<\/li><li>LEF and LIB need to be generated so that such IPs can be integrated into the digital-on-top flow used to build HEEPidermis<\/li><\/ul> <\/li><li>Design level shifters that interface the asynchronous-analog side with the synchronous-digital side.&nbsp; <ul><li>Specifications will be provided &ndash; as well as a baseline schematic and layout<\/li><li>The schematic and the layout will be done by using Cadence Virtuoso<\/li><li>The layout must be equivalent to the schematic (LVS) and performed with Calibre<\/li><li>The layout must be DRC-free and performed with Calibre<\/li><li>Simulations with the parasitic extracted netlist will need to be performed to evaluate the final design<\/li><li>LEF and LIB need to be generated so that such IPs can be integrated into the digital-on-top flow used to build HEEPidermis<\/li><\/ul> <\/li><li>Perform the back-end of the digital and analog components of HEEPidermis using Cadence Innovus <ul><li>LEF and LIB of the analog\/mixed-signal components are generated by the tasks above<\/li><li>the processor netlist and constraints are provided<\/li><li>AMS verification of the whole HEEPidermis will be performed together with the rest of the team<\/li><\/ul> <\/li><\/ul> <p>The project will be carried out at the ESL at EPFL, one of the world&rsquo;s top-class universities.<\/p> <p><strong>Required knowledge and skills:<\/strong><\/p> <ul><li>Synopsys Design Compiler<\/li><li>Cadence Innovus and Virtuoso<\/li><li>Siemens Calibre<\/li><li>Abstraction files syntax of LEF, LIB, etc.<\/li><li>Spice\/HSpice and SystemVerilog\/Verilog<\/li><li>Good analytical skills<\/li><li>Teamwork and git<\/li><\/ul> <p><strong>Appreciated skills:<\/strong><\/p> <ul><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English&nbsp;<\/li><\/ul> <p><br \/><strong>Type of work:<\/strong> 10% theory analysis, 90% design and simulation<\/p> \t<\/div>                         <div class='post-nav py-md-1'>                                 <div class='nav-prev'>         <\/div><\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Dr. Alexandre Levisse, Prof. Matias Miguez, Robin Leplae, Juan Sapriza, Prof. David Atienza<br> Contact email: <a href='mailto:davide.schiavone@epfl.ch;alexandre.levisse@epfl.ch;matias.miguez@epfl.ch;robin.leplae@epfl.ch;juan.sapriza@epfl.ch;david.atienza@epfl.ch?subject=Automation of a semi-custom design and verification flow for ultra-low-leakage always-on circuits relying on differential logic'>davide.schiavone@epfl.ch;alexandre.levisse@epfl.ch;matias.miguez@epfl.ch;robin.leplae@epfl.ch;juan.sapriza@epfl.ch;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project675minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Dr. Alexandre Levisse, Prof. Matias Miguez, Robin Leplae, Juan Sapriza, Prof. David Atienza<br>\";<\/script>\n<span id=project675><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Dr. Alexandre Levisse, Prof. Matias Miguez, Robin Leplae, Juan Sapriza, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project675',project675); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor660><\/a><b><span style='font-size: 20px;'>Extraction of Heart-Related Information in Epileptic Patients<\/b><br><script>var project660=\"<div class='entry-content'> \t\t\t <p>Epilepsy is a neurological condition that can be accompanied by  changes in heart rate and heart rate variability around seizure events.  Portable ECG (electrocardiogram) devices and smartwatch based PPG  (photoplethysmogram) sensors provide a unique opportunity to extract  heart-related features in real-time, enabling continuous monitoring for  patients with epilepsy. This project aims to leverage ECG and PPG data  from wearable devices to assess the quality of heart-related signals  before, during, and after seizures. Furthermore, the project will  replicate existing research on seizure detection using heart-related  features to investigate the feasibility of using wearable devices for  seizure detection.<\/p>    <p>The technical challenges include handling noisy ECG and PPG data,  ensuring signal quality in diverse patient conditions, and designing  algorithms capable of detecting relevant heart-related features during  seizure events. The system will need to analyze signal quality and  extract meaningful heart-related features while maintaining real-time  performance on resource-constrained devices like smartwatches.<\/p>    <p>This project will contribute to the field of epilepsy monitoring by  providing a method for using heart-related data from wearable devices to  assess seizure states. It could serve as a valuable tool for  researchers and clinicians working on seizure prediction and monitoring.<\/p>    <h3 class='wp-block-heading'>Tasks:<\/h3>    <ul class='wp-block-list'><li>Preprocess ECG and PPG signals from smartwatch data to remove noise and artifacts, especially during seizure events.<\/li><li>Implement algorithms to extract heart-related features from ECG and  PPG signals (e.g., heart rate, heart rate variability, and respiratory  rate).<\/li><li>Assess the quality of heart-related signals before, during, and  after seizures, identifying any characteristic changes in the features.<\/li><li>Replicate a seizure detection study using extracted heart-related  features to demonstrate the potential of using these features for  seizure detection.<\/li><\/ul>    <h3 class='wp-block-heading'>Requirements:<\/h3>    <ul class='wp-block-list'><li>Strong Python programming skills<\/li><li>Basic signal processing knowledge<\/li><li>Machine learning fundamentals<\/li><li>Ability to handle large-scale data<\/li><li>Interest in biomedical applications<\/li><\/ul> \t\t<\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Jonathan Dan, Christodoulos Kechris, Prof. David Atienza<br> Contact email: <a href='mailto:jonathan.dan@epfl.ch; david.atienza@epfl.ch?subject=Extraction of Heart-Related Information in Epileptic Patients'>jonathan.dan@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project660minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Jonathan Dan, Christodoulos Kechris, Prof. David Atienza<br>\";<\/script>\n<span id=project660><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Jonathan Dan, Christodoulos Kechris, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project660',project660); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor659><\/a><b><span style='font-size: 20px;'>Physiological Data Collection Platform for WearOS Smartwatches<\/b><br><script>var project659=\"<div class='entry-content'> \t\t\t <p>The increasing availability of smartwatches with advanced sensors  presents unique opportunities for continuous health monitoring. Building  on our lab&rsquo;s experience with epilepsy monitoring applications, this  project aims to develop a generalized framework for collecting and  managing physiological data from WearOS smartwatch devices. While  current commercial solutions often provide only processed data (like  step counts), there is a need in the research community for access to  raw sensor data. This project will create a flexible platform that  allows researchers to configure and collect both raw and processed  physiological signals from commercially available smartwatches.<\/p>    <p>The technical challenges include managing battery life while  collecting high-frequency sensor data, ensuring cross-device  compatibility, and implementing secure and efficient data transmission  protocols. The system must be user-friendly enough for research  participants while providing the detailed configuration options  researchers need.<\/p>    <p>This project will contribute to the research community by providing a  flexible, open-source platform for physiological data collection using  commercially available smartwatches, enabling various future research  applications beyond epilepsy monitoring.<\/p>    <p>Tasks:<\/p>    <ul class='wp-block-list'><li>Extend the existing smartwatch lab application to support additional sensor data<\/li><li>Implement a configuration interface in the phone companion app for sensor selection and sampling rates<\/li><li>Develop a robust data storage and transmission architecture<\/li><li>Create interfaces for multiple cloud storage services (minimum: Google Drive, Amazon AWS S3)<\/li><li>Test and validate the system on multiple WearOS devices (minimum 3)<\/li><li>Document the system architecture and API<\/li><li>Bonus: Implement real-time data visualization<\/li><li>Bonus: Create or integrate with a researcher dashboard for monitoring data collection<\/li><\/ul>    <p>Requirements:<\/p>    <ul class='wp-block-list'><li>Strong Android development skills (Kotlin)<\/li><li>Understanding of mobile sensor APIs<\/li><li>Knowledge of cloud services and REST APIs<\/li><li>Knowledge of database systems<\/li><li>Experience with Bluetooth communication protocols<\/li><li>Basic understanding of physiological signals<\/li><li>UI\/UX design skills<\/li><\/ul> \t\t<\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Jonathan Dan, Dimitra Tatli, Prof. David Atienza<br> Contact email: <a href='mailto:jonathan.dan@epfl.ch; david.atienza@epfl.ch?subject=Physiological Data Collection Platform for WearOS Smartwatches'>jonathan.dan@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project659minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Jonathan Dan, Dimitra Tatli, Prof. David Atienza<br>\";<\/script>\n<span id=project659><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Jonathan Dan, Dimitra Tatli, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project659',project659); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor636><\/a><b><span style='font-size: 20px;'>Porting X-HEEP to a Lattice FPGA with open-source EDA tools<\/b><br><script>var project636=\"<p>Micro-controller units (MCUs) are used in a wide range of applications ranging from sensor monitoring all the way to robotics. Despite typically lower in performance, they are usually preferred over custom circuits thanks to their versatility and easy programmability via software routines typically written in the C language.<\/p> <p>During the design stage, MCUs are typically implemented in FPGAs to speed up the emulation runtime and allow SW developers to work on the software development kit in parallel with the hardware team.<\/p> <p>Usually, FPGAs are expensive and rely on commercial EDA tools for the synthesis and place-and-route. Lately, however, open-source EDA tools such as YosysHQ have been developed to allow for tool customizations and free costs.<\/p> <p>One of the most supported FPGA vendors supported by YosysHQ is Lattice, which provides very cheap FPGA that can be used for education, rapid prototypes, and all the way to products.<\/p>  <p>In this project, we want to implement in Lattice FPGA using YosysHQ X-HEEP (eXtendable Heterogeneous Energy-Efficient Platform), an open-source, configurable, and extensible single-core RISC-V 32-bit MCU developed at the Embedded Systems Laboratory (ESL), sponsored by the EcoCloud Sustainable Computing center of Swiss Federal Institute of Technology Lausanne (EPFL).<\/p>  <p>In this project, the student will extend the X-HEEP configurations knobs to allow the X-HEEP RTL to be mapped into the Lattice FPGA using the open-source<\/p> <p>synthesis flow based on the YosysHQ EDA tools. This will require several steps to make the RTL HDL &ldquo;digested&rdquo; by the YosysHQ, which supports less advanced HDL features than commercial EDA tools such as Vivado.&nbsp;<\/p>  <p>In particular, the student will need to:<\/p>  <ul> <li>Translate the SystemVerilog description of X-HEEP from SystemVerilog to Verilog with tools such as sv2v<\/li> <li>Add ifdef\/including options to translate only a subset of the supported SystemVerilog RTL to Verilog by sv2v YosysHQ<\/li> <li>Design a synthesis and place-and-route script for YosysHQ to have a working bitstream for the Lattice FPGA<\/li> <li>Load the bitstream into the ICESugar-Pro board that hosts the Lattice FPGA<\/li> <li>Write a simple &ldquo;hello world&rdquo; application for X-HEEP and run it into the ICESugar-Pro board<\/li> <li>[optional] Integrate\/design the SDRAM IP into the X-HEEP RTL so that programs can access the SDRAM present in the ICESugar-Pro board<\/li> <\/ul>  <p>Throughout the project, the student will learn:<\/p>  <ul> <li>How X-HEEP build flow is organized<\/li> <li>How to build a YosysHQ flow for Lattice FPGA for X-HEEP<\/li> <li>How to analyze the output of the open source EDA tools and fix potential violations.<\/li> <li>How to work with several git repositories and in a team of people all contributing to the same project&nbsp;<\/li> <li>How to debug FPGA designs<\/li> <\/ul>  <p>The project will be carried out between the ESL and the TCL groups at EPFL, one of the world&rsquo;s top-class universities. ESL and TCL are active groups (24 Ph.D. students among 45 members for ESL, and 4 Ph.D students among 12 members for TCL) involved in many research aspects. The student will be under the supervision of Prof. David Atienza and Prof. Andreas Burg, Dr. Davide Schiavone, and Dr. Christoph Mueller.<\/p>  <p><strong>Project objectives:<\/strong><\/p> <ol> <li>Understanding the X-HEEP microcontroller, how its build flow works, and learning how IPs are integrated.<\/li> <li>Understanding the sv2v flow to translate SystemVerilog to Verilog.<\/li> <li>How to synthesize, place, and route the translated Verilog with YosysHQ for Lattice FPGAs.<\/li> <li>How to debug and program the design into the ICESugar-Pro board.<\/li> <li>Contribute to the X-HEEP GitHub repository the whole flow.<\/li> <\/ol>  <p><strong>Required knowledge and skills:<\/strong><\/p>  <ul> <li>How the RTL to FPGA flow is usually implemented with the standard flows and tools (e.g. Vivado)<\/li> <li>Python, bash, Linux<\/li> <li>Good analytical skills<\/li> <li>Good background in computer architecture<\/li> <li>Advanced problem-solving skills<\/li> <li>Very curious<\/li> <li>Teamwork and git<\/li> <\/ul>  <p><strong>Appreciated skills:<\/strong><\/p> <ul> <li>Scientific curiosity<\/li> <li>Good communication skills<\/li> <li>Advanced English&nbsp;<\/li> <\/ul> <p><br \/><strong>Type of work:<\/strong> 10% theory analysis, 90% design and simulation<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Dr. Christoph Mueller, Prof. David Atienza, Prof. Andreas Burg<br> Contact email: <a href='mailto:davide.schiavone@epfl.ch;christoph.mueller@epfl.ch;david.atienza@epfl.ch;andreas.burg@epfl.ch?subject=Porting X-HEEP to a Lattice FPGA with open-source EDA tools'>davide.schiavone@epfl.ch;christoph.mueller@epfl.ch;david.atienza@epfl.ch;andreas.burg@epfl.ch<\/a><br>\";<\/script>\n<script>var project636minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Dr. Christoph Mueller, Prof. David Atienza, Prof. Andreas Burg<br>\";<\/script>\n<span id=project636><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Davide Schiavone, Dr. Christoph Mueller, Prof. David Atienza, Prof. Andreas Burg<br> <a href=#_ onclick=opendesc('project636',project636); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor635><\/a><b><span style='font-size: 20px;'>Testing and verification of a fabricated IMC memory<\/b><br><script>var project635=\"<div class='entry-content'> \t\t\t <p>Efficiently computing complex AI-based workloads on edge devices is a  challenge that industrial and academic groups try to overcome with  various techniques. Among them, computing part of the algorithms in  memory subsystems appears as a competitive solution. This is called In  and Near Memory Computing (IMC, NMC). Focusing on IMC, one of the main  limiting factors resides in the complexity of the design process, as the  subarray itself must be modified. A completely custom memory subarray  has been handcrafted in the Heepocrates chip, designed in 65nm CMOS  technology. This is a tedious and prone-to-error process.<\/p>    <p>This project proposes to characterize the memory and, if this can be  done quickly, contribute to the development of an IMC memory compiler.<\/p>    <p>The memory characterization will be done using a RISC-V core to  control the memory test structure. First, we will characterize the test  structure to ensure its full functionality. Then, test the memory: (i)  perform some functional test to verify its functionality. (ii) write  code to run performance and power tests on the memory. (iii) generate  shmoo plots. Do the same test with several chips to do some statistical  analysis.<\/p>    <p>Then, depending on the available time, we propose to explore the  automation of the physical design of a similar IMC array in 65nm. Python  scripts shall automatically generate a memory subarray in a gdsii file.<\/p>    <p>The project will be carried out at the <a href='https:\/\/www.epfl.ch\/labs\/esl\/'>Embedded Systems Laboratory (ESL)<\/a>,  inside the Swiss Federal State Institute of Technology (EPFL), one of  the world&rsquo;s top-class universities. ESL is an active group that is  involved in many research aspects. The student will be under the  supervision of Dr. Alexandre Levisse and Prof. David Atienza.<\/p>    <p><strong>Project objectives:<\/strong><\/p>    <ol class='wp-block-list'><li>Understanding of the BLADE subarray architecture, schematic, and  floorplan. Specifically, it is important to explicitly understand the  operations that can be done in the memory. Understanding of the  characterization structure.<\/li><li>Getting used to the utilization of the test structure developed to  test the memory. Write some basic code and verify the functionality of  the test structure.<\/li><li>Running tests on the memory : (i) test the functionality. (ii) test  the speed. (iii) characterize the power for each operation. (iv) extract  the max freq at various voltages. (v) check several chips. <ol class='wp-block-list'><li>This is mandatory to pass (grade 4). The quality of the generated  code and the efficacy of the student, put in context with the challenges  faced during the project, could bias positively the grade.<\/li><\/ol> <\/li><li>Then, if time allows, we will look into the automation of the array  generation with Python scripts. i.e., generating an arbitrary size SRAM  array with a Python script using the Nazca library.<\/li><\/ol>    <p><strong>Required knowledge and skills:<\/strong><\/p>    <ul class='wp-block-list'><li>C-code and Python.<\/li><li>Linux environment<\/li><li>Good understanding of memory architectures<\/li><li>Advanced knowledge of digital and analog circuit design<\/li><li>Good analytical skills<\/li><li>Good background in computer architecture<\/li><\/ul>    <p><strong>Appreciated skills:<\/strong><\/p>    <ul class='wp-block-list'><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English<\/li><li>Autonomous workability<\/li><li>Teamwork<\/li><\/ul>    <p><strong><u>Type of work:<\/u><\/strong> &nbsp;&nbsp;&nbsp;&nbsp; 20% theory analysis, 80% design and simulation<\/p> \t\t<\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Alexandre Levisse, Prof. David Atienza<br> Contact email: <a href='mailto:alexandre.levisse@epfl.ch;david.atienza@epfl.ch?subject=Testing and verification of a fabricated IMC memory'>alexandre.levisse@epfl.ch;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project635minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Alexandre Levisse, Prof. David Atienza<br>\";<\/script>\n<span id=project635><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Alexandre Levisse, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project635',project635); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor632><\/a><b><span style='font-size: 20px;'>Deployment of probabilistic deep learning methods for context-aware and robust Heart Rate extraction in constrained wearables<\/b><br><script>var project632=\"<div class='entry-content'> \t\t\t\t\t <p>Photoplethysmography (PPG) has emerged as a cost-effective  alternative to electrocardiography (ECG) and is now prevalent in a  variety of commercially available wearable devices. This presents a  practical solution for long-term patient monitoring. However, PPG  signals are more susceptible to interference from body motion than ECG,  potentially distorting crucial information about heart functionality,  such as the blood volume pulse (BVP) morphology. To address these motion  artifacts (MA), a plethora of methods have been proposed, with a  predominant focus on accurately extracting heart rate (HR) from PPG. A  proposed solution involves leveraging Deep Learning to integrate both  steps into a single model. This model is designed to simultaneously  perform direct heart rate inference and minimize the effects of motion  artifacts. While these methods aim to diminish the impact of Motion  Artefacts on the ultimate heart rate estimations, it&rsquo;s noteworthy that  no explicit source separation task is outlined during training. Instead,  the implication is that by minimizing the HR-inference loss, the  solution achieved will inherently extract heart rate solely from the  disentangled heart component. The latter fact is false.<\/p>    <p>On this basis, the goal of this project is to employ a novel  probabilistic-driven deep learning methodology, which has been designed  and developed at the Embedded Systems Laboratory (ESL) at EPFL.  Specifically, a design space exploration will be performed, given the  restrictions of the application and the HW\/SW capabilities of the  available platform. This will result in a clear understanding of the  bottlenecks and limitations of the proposed deep learning architecture  when trying to deploy them in extreme edge ultra-low-power devices.<\/p>    <p>Throughout the project, the student will learn:<\/p>    <ul class='wp-block-list'><li>How to interact with cutting-edge System-on-Chips.<\/li><li>How to deal with embedded systems (memories, peripherals, etc).<\/li><li>How to write optimal C-level.<\/li><li>How to design and develop efficient embedded APIs.<\/li><li>How to properly debug embedded applications.<\/li><li>How to work with git repositories.<\/li><li>How to interface with the other people of the team (machine learning, etc.) contributing to the project.<\/li><\/ul>    <p>The project will be carried out at the ESL at EPFL, one of the  world&rsquo;s top-class universities including EcoCloud&rsquo;s technical support.  ESL is an active group (24 Ph.D. students among 45 members) involved in  many research aspects. The student will be under the supervision of Christodoulos Kechris MSc., Dr. Jos&eacute; Miranda, and Dr. Jonathan Dan, as key  senior daily supervisors,&nbsp; and Prof. David Atienza.<\/p>    <p><strong>Project objectives:<\/strong><\/p>    <ol class='wp-block-list'><li>Understanding the current deep learning model(s) to be deployed, and  proposing a step-by-step\/white-box planning to reach its deployment  into an embedded constrained platform.<\/li><li>Developing and integrating the different stages of the probabilistic deep learning pipeline.<\/li><li>Validation of previous point given specific public available data.<\/li><li>Analysis of both memory and inference timings given the performed implementation.<\/li><li>[Optional]. Utilization and analysis of different HW\/SW optimisations to improve the metrics of the previous point.<\/li><li>[Optional]. Plugging-in the already implemented first stages of the architecture and validation of the whole system.<\/li><li>[Optional]. Testing the architecture using a real Body Area Network system available at the lab (e.g.: VersaSens).<\/li><\/ol>    <p><strong>Required knowledge and skills:<\/strong><\/p>    <ul class='wp-block-list'><li>Low-level software design (C and\/or C++ is going to be used throughout the project)<\/li><li>Good understanding of memory architectures and microcontrollers<\/li><li>Good analytical skills<\/li><li>Good background in computer architecture and algorithms<\/li><li>Teamwork and git<\/li><\/ul>    <p><strong>Appreciated skills:<\/strong><\/p>    <ul class='wp-block-list'><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English<\/li><\/ul>    <p><strong>Type of work:<\/strong> 10% theory analysis, 90% design and simulation<\/p> \t\t\t\t\t\t\t\t\t<\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Christodoulos Kechris, Dr. Jose Miranda, Dr. Jonathan Dan, Prof. David Atienza <br> Contact email: <a href='mailto:christodoulos.kechris@epfl.ch; jose.mirandacalero@epfl.ch; jonathan.dan@epfl.ch; david.atienza@epfl.ch?subject=Deployment of probabilistic deep learning methods for context-aware and robust Heart Rate extraction in constrained wearables'>christodoulos.kechris@epfl.ch; jose.mirandacalero@epfl.ch; jonathan.dan@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project632minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Christodoulos Kechris, Dr. Jose Miranda, Dr. Jonathan Dan, Prof. David Atienza <br>\";<\/script>\n<span id=project632><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Christodoulos Kechris, Dr. Jose Miranda, Dr. Jonathan Dan, Prof. David Atienza <br> <a href=#_ onclick=opendesc('project632',project632); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor594><\/a><b><span style='font-size: 20px;'>Towards Audio-Visual Speech Separation and Recognition<\/b><br><script>var project594=\"<div class='entry-content'>   <p>Human-computer interaction evolved rapidly in the last years thanks  to the hardware development enabling complex Deep Neural Networks (DNNs)  to perform machine learning tasks. The scientific advancements led to  the employment of Automated Speech Recognition (ASR) and Visual Speech  Recognition (VSR) in human-machine communication, with the former  representing the task of converting spoken speech into written words  [2], whilst the latter being the conversion of lip movement into written  words (i.e., lipreading) [6]. In the context of voice-controlled  devices, it is of uttermost importance to additionally mention speaker  separation [7]. This aim to mimic the cocktail party effect present in  humans, which denotes our capability of focusing on an individual  conversation, whilst filtering out other discussions and surrounding  noises. This ensures that only a designated individual can control one  target device at a time. On this basis, the <a href='https:\/\/www.epfl.ch\/labs\/esl\/'>Embedded System Laboratory<\/a>  at Ecole Polytechnique F&eacute;d&eacute;rale de Lausanne (EPFL) together with the  Integrated System Laboratory at Eidgen&ouml;ssische Technische Hochschule  Z&uuml;rich (ETH) are developing different platforms, tools, and frameworks  to tackle the next-generation IoT audio devices. In the current work, we  investigate the potential of performing Audio-Visual Speech Separation  and Recognition (AVSSR) using sensor fusion [3, 4, 8]. Through  multi-task learning, we merge Audio-Visual Speech Separation (AVSS) and  Audio-Visual Speech Recognition (AVSR). We propose to fuse audio and  visual information, thus increasing the data dimensionality and,  implicitly, the amount of useful information. The so-trained model would  then provide the target user(s) with the transcript of their respective  speech. Lastly, the proposed system must abide by the TinyML [5]  constraints considering edge devices, namely reduced memory and storage  requirements, as well as real-time operation on low-power,  battery-operated devices.<\/p>             <p>The project will be carried out at the ESL at EPFL, one of the  world&rsquo;s top-class universities. ESL is an active group (24 Ph.D.  students among 45 members) involved in many research aspects. The  student will be under the supervision of Dr. Jos&eacute; Miranda, Mr. Cristian  Cioflan, and Dr. Miguel de Prado, as key senior daily supervisors,&nbsp; and  Prof. David Atienza.<\/p>    <p><strong>Project objectives:<\/strong><\/p>    <ol><li>Familiarize yourself with the project specifics (1-2 Weeks) <ol style='list-style-type: lower-alpha'><li>Learn about DNN training and PyTorch, how to visualize results with TensorBoard.<\/li><li>Read up on data fusion and multimodal learning, common approaches and recent advances on the topic.<\/li><li>Read up on multi-task learning in the context of audio-visual time series.<\/li><li>Read up on DNN models aimed at time series (e.g., TCNs, TASMs,  Transformer and Conformer networks) and the recent advances in  AVSR\/AVSS.<\/li><\/ol> <\/li><li>Propose and evaluate AVSSR topologies (4-6 weeks) <ol style='list-style-type: lower-alpha'><li>Considering state-of-the-art works on AVSS[9] and AVSR[10][11][12]  and previous IIS projects, propose and implement AVSSR architectures.<\/li><li>Propose evaluation metrics and novel loss functions; analyse the models&rsquo; performance on the GRID dataset [1]<\/li><\/ol> <\/li><li>Optimize proposed models considering TinyML constraints (2-3 weeks) <ol style='list-style-type: lower-alpha'><li>Reduce the models&rsquo; hardware-associated costs (i.e., memory, storage,  computational complexity), evaluating the trade-offs on the proposed  metrics.<\/li><li>(Only if conducted as a Master&rsquo;s thesis) Deploy the proposed architecture on novel ultra-low-power platforms.<\/li><\/ol> <\/li><li>(Optional) Dataset generalization and ablation study (1-2 Weeks) <ol style='list-style-type: lower-alpha'><li>Investigate alternative datasets for AVSSR.<\/li><li>Propose and implement dataset modifications to enable multi-speaker AVSSR.<\/li><li>Evaluate and compare multimodal learning against audio- and video-only learning.<\/li><li>Evaluate and compare speaker-overlapping and speaker-disjoint training and testing.<\/li><\/ol> <\/li><li>Gather and Present Final Results (2-3 Weeks): Write a final report.  Include all major decisions taken during the design process and argue  your choice. Include everything that deviates from the very standard  case, show off everything that took time to figure out and all your  ideas that have influenced the project.<\/li><\/ol>    <p><strong>Required knowledge and skills:<\/strong><\/p>    <ul><li>Python programming<\/li><li>Deep learning knowledge<\/li><li>Automatic speech recognition understanding\/experience<\/li><li>Strong background in computer science<\/li><li>Teamwork and git<\/li><\/ul>    <p><strong>Appreciated skills:<\/strong><\/p>    <ul><li>Scientific curiosity, good communication skills, and advanced English<\/li><\/ul>    <p><strong>Type of work:<\/strong> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 40% research and theory analysis, 60% development and implementation<\/p>    <p><strong>References<\/strong><\/p>    <p>[1]&nbsp;Mishaim Malik, Muhammad Kamran Malik, Khawar Mehmood, and Imran Makhdoom&nbsp;Automatic speech recognition: a survey.&nbsp;2021.<\/p>    <p>[2]&nbsp;Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng  Yu, Dong Yu, and Jesper Jensen.&nbsp;An overview of deep-learning-based  audio-visual speech enhancement and separation.&nbsp;2020.<\/p>    <p>[3]&nbsp;Liliane Momeni, Triantafyllos Afouras, Themos Stafylakis, Samuel  Albanie, and Andrew Zisserman.&nbsp;Seeing wake words: Audio-visual keyword  spotting.&nbsp;2020.<\/p>    <p>[4]&nbsp;Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev  Khudanpu.&nbsp;Librispeech: An asr corpus based on public domain audio  books.&nbsp;2015.<\/p>    <p>[5]&nbsp;Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson,  Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien  Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam  Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner,  Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John,  Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco  Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady  Pekhimenko, Arun Tejusve Raghunathm Rajan, Dilip Sequeira, Ashish  Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu,  Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao  Zhang, and Yuchen Zhou.&nbsp;&ldquo;MLperf inference benchmark.&nbsp;2020<\/p>    <p>[6]&nbsp;Changchong Sheng, Gangyao Kuang, Liang Bai, Chenping Hou, Yulan  Guo, Xin Xu, Matti Pietik&auml;inen, and Li Liu.&nbsp;Deep learning for visual  speech analysis: A survey,&nbsp;2022<\/p>    <p>[7]&nbsp;DeLiang Wang and Jitong Chen.&nbsp;Supervised speech separation based on deep learning: An overview.&nbsp;2018<\/p>         <\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Mr. Cristian Cioflan, Dr. Miguel de Prado, Dr. Jose Miranda, Prof. David Atienza<br> Contact email: <a href='mailto:cioflanc@iis.ee.ethz.ch; miguel.deprado@verses.ai; jose.mirandacalero@epfl.ch; david.atienza@epfl.ch?subject=Towards Audio-Visual Speech Separation and Recognition'>cioflanc@iis.ee.ethz.ch; miguel.deprado@verses.ai; jose.mirandacalero@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project594minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Mr. Cristian Cioflan, Dr. Miguel de Prado, Dr. Jose Miranda, Prof. David Atienza<br>\";<\/script>\n<span id=project594><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Mr. Cristian Cioflan, Dr. Miguel de Prado, Dr. Jose Miranda, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project594',project594); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor591><\/a><b><span style='font-size: 20px;'>Implementation of automation techniques for IMC arrays<\/b><br><script>var project591=\"<div class='entry-content mb-5'> \t\t <p>Efficiently computing complex AI-based workloads on edge devices is a  challenge that industrial and academic groups try to overcome with  various techniques. Among them, computing part of the algorithms in  memory subsystems appears as a competitive solution. This is called In  and Near Memory Computing (IMC, NMC). Focusing on IMC, one of the main  limiting factors resides in the complexity of the design process, as the  subarray itself must be modified. In the Heepocrates chip, designed in  65nm CMOS technology, a completely custom memory subarray has been  handcrafted. This is a tedious and prone-to-error process.<\/p> <p>This project proposes to lay the foundations of an IMC memory  compiler by exploring the automation of some parts of the subarray,  specifically the decoder and the controller. Starting from an HDL  definition of these blocks, the objective of the project is to  parametrize them, automate their physical design, and integrate them  inside the subarray. Finally, if time allows, exploring the utilization  of open-source tool flows could be considered.<\/p> <p>The project will be carried out at the <a href='https:\/\/www.epfl.ch\/labs\/esl\/'>Embedded Systems Laboratory (ESL)<\/a>,  inside the Swiss Federal State Institute of Technology (EPFL), one of  the world&rsquo;s top-class universities. ESL is an active group involved in  many research aspects. The student will be under the supervision of  Prof. David Atienza and Dr. Alexandre Levisse.<\/p>  <p><strong>Project objectives:<\/strong><\/p> <ol><li>Understanding of the BLADE subarray architecture, schematic, and  floorplan. Specifically understanding the details of the decoder and  subarray controller. Definition of the top entities (input, outputs,  functional behavior, metrics).<\/li><li>Design of the RTL code of the decoder. Synthesis, PnR under physical  constraints to fit it on the available area enclosure and pitch  matching of the outputs with the array drivers. Verification with spice  simulations.<\/li><li>Utilization of the same flow on the subarray controller. Definition  of timing constraints. Floorplan updates. Update of the array schematic  and layout.<\/li><li>If time allows, the project targets the utilization of open source synthesis and PnR tools in the flow.<\/li><\/ol>  <p><strong>Required knowledge and skills:<\/strong><\/p> <ul><li>Good understanding of memory architectures<\/li><li>Advanced knowledge of digital and analog circuit design<\/li><li>Good analytical skills<\/li><li>Good background in computer architecture<\/li><\/ul>  <p><strong>Appreciated skills:<\/strong><\/p> <ul><li>Scientific curiosity<\/li><li>Good communication skills<\/li><li>Advanced English<\/li><li>Autonomous workability<\/li><li>Teamwork<\/li><\/ul>  <p><strong><u>Type of work:<\/u><\/strong> &nbsp;&nbsp;&nbsp;&nbsp; 20% theory analysis, 80% design and simulation<\/p> \t<\/div><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Alexandre Levisse, Prof. David Atienza <br> Contact email: <a href='mailto:alexandre.levisse@epfl.ch; david.atienza@epfl.ch?subject=Implementation of automation techniques for IMC arrays'>alexandre.levisse@epfl.ch; david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project591minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Alexandre Levisse, Prof. David Atienza <br>\";<\/script>\n<span id=project591><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Dr. Alexandre Levisse, Prof. David Atienza <br> <a href=#_ onclick=opendesc('project591',project591); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor589><\/a><b><span style='font-size: 20px;'>Deployment of an unsupervised deep adaptive filtering method for context-aware and robust Heart Rate extraction in constrained wearables<\/b><br><script>var project589=\"<p>Photoplethysmography (PPG) has emerged as a cost-effective alternative to electrocardiography (ECG) and is now prevalent in a variety of commercially available wearable devices. This presents a practical solution for long-term patient monitoring. However, PPG signals are more susceptible to interference from body motion than ECG, potentially distorting crucial information about heart functionality, such as the blood volume pulse (BVP) morphology. To address these motion artifacts (MA), a plethora of methods have been proposed, with a predominant focus on accurately extracting heart rate (HR) from PPG. A proposed solution involves leveraging Deep Learning to integrate both steps into a single model. This model is designed to simultaneously perform direct heart rate inference and minimize the effects of motion artifacts. While these methods aim to diminish the impact of Motion Artefacts on the ultimate heart rate estimations, it\u2019s noteworthy that no explicit source separation task is outlined during training. Instead, the implication is that by minimizing the HR-inference loss, the solution achieved will inherently extract heart rate solely from the disentangled heart component. The latter fact is false.&nbsp;<\/p> <p>On this basis, the goal of this project is to employ unsupervised deep adaptive filtering methods, which has been designed and developed at the Embedded Systems Laboratory (ESL) at EPFL, deploying it into heterogeneous SoCs based on RISC-V ISA. For instance, into X-HEEP, (eXtendable Heterogeneous Energy-Efficient Platform), which is an open-source, configurable, and extensible single-core RISC-V 32b MCU, sponsored by the EcoCloud Sustainable Computing center of EPFL. It is based on many third-party open-source IPs and in-house IPs developed at ESL jointly with other EPFL laboratories. X-HEEP provides a framework to run applications compiled for RISC-V on a simulator (Verilator, Questasim, or VCS), on a Xilinx FPGA, and can be implemented in silicon as well. Specifically, a design space exploration will be performed, given the restrictions of the application and the HW\/SW capabilities of the platform. This will result in a clear understanding of the bottlenecks and limitations of the proposed deep learning architecture when trying to deploy them in extreme edge ultra-low-power devices.<\/p> <p>Throughout the project, the student will learn:<\/p> <ul> <li>How to interact with cutting-edge System-on-Chips.<\/li> <li>How to deal with embedded systems (memories, peripherals, etc).<\/li> <li>How to write optimal C-level.<\/li> <li>How to design and develop efficient embedded APIs.<\/li> <li>How to properly debug embedded applications.<\/li> <li>How to work with git repositories.<\/li> <li>How to interface with the other people of the team (machine learning, etc.) contributing to the project.<\/li> <\/ul> <p>The project will be carried out at the ESL at EPFL, one of the world\u2019s top-class universities including EcoCloud\u2019s technical support. ESL is an active group (24 Ph.D. students among 45 members) involved in many research aspects. The student will be under the supervision of MSc. Christodoulos Kechris, Dr. Jos\u00e9 Miranda, and Dr. Jonathan Dan, as key senior daily supervisors,&nbsp; and Prof. David Atienza.&nbsp;<\/p> <p><strong>Project objectives:<\/strong><\/p> <ol> <li>Understanding the current unsupervised adaptive deep learning model to be deployed.<\/li> <li>Developing and integrating the initial stages of the deep learning architecture into X-HEEP.<\/li> <li>Validation of previous point given specific public available data.<\/li> <li>Developing and integrating the latter stages of the architecture.<\/li> <li>Validation of the whole system.<\/li> <li>Analysis of both memory and training\/inference timings given the performed implementation.&nbsp;<\/li> <li>[Optional]. Utilization and analysis of different HW\/SW optimisations to improve the metrics of the previous point.<\/li> <\/ol> <p><strong>Required knowledge and skills:<\/strong><\/p> <ul> <li>Low-level software design (C and\/or C++ is going to be used throughout the project)<\/li> <li>Good understanding of memory architectures and microcontrollers<\/li> <li>Good analytical skills<\/li> <li>Good background in computer architecture and algorithms<\/li> <li>Teamwork and git<\/li> <\/ul> <p><strong>Appreciated skills:<\/strong><\/p> <ul> <li>Scientific curiosity<\/li> <li>Good communication skills<\/li> <li>Advanced English&nbsp;<\/li> <\/ul> <p><strong>Type of work:<\/strong> 5% theory analysis, 95% design and simulation<\/p><br><b>Lab: <\/b>ESL <br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Christodoulos Kechris MSc, Dr. Jose Miranda, Dr. Jonathan Dan, Prof. David Atienza<br> Contact email: <a href='mailto:christodoulos.kechris@epfl.ch;jose.mirandacalero@epfl.ch;jonathan.dan@epfl.ch;david.atienza@epfl.ch?subject=Deployment of an unsupervised deep adaptive filtering method for context-aware and robust Heart Rate extraction in constrained wearables'>christodoulos.kechris@epfl.ch;jose.mirandacalero@epfl.ch;jonathan.dan@epfl.ch;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project589minus=\"<b>Lab: <\/b>ESL <br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Christodoulos Kechris MSc, Dr. Jose Miranda, Dr. Jonathan Dan, Prof. David Atienza<br>\";<\/script>\n<span id=project589><b>Lab: <\/b>ESL <br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Christodoulos Kechris MSc, Dr. Jose Miranda, Dr. Jonathan Dan, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project589',project589); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><a href=# title='theoretical project'><img width=30 alt='theoretical' src=https:\/\/eslweb.epfl.ch\/projects\/images\/t.gif hspace=2 border=0><\/a><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><td width=10 rowspan=2 valign=top><\/td><td rowspan=2 valign=top><a style='position: relative; top:-30px' name=anchor588><\/a><b><span style='font-size: 20px;'>Design of efficient Convolutional Layers for Near-Memory Computing IPs based on RISC-V<\/b><br><script>var project588=\"<p><em>Artificial Intelligence<\/em> (AI) has been one of the most dominant factors driving technology innovation over the last decade. Cars that drive autonomously, cameras that recognize faces, microphones that recognize voice commands, and wearable health monitors that detect epilepsy attacks are only a few examples born from the AI revolution. As silicon devices become smaller and faster, system-on-chips (SoCs) become more and more complex, enabling pocket-size, wearable, battery-powered systems to compute millions\/billions of operations per second in a limited power budget. To support a large variety of rapidly changing <em>AI<\/em>-backed applications, software programmable SoCs are usually preferred to more specialized architectures thanks to their versatility and short time-to-market.<\/p> <p>One of the main nowadays SoCs performance and energy limitations resides in the high memory traffic generated by traditional Von Neuman systems, which is one of the critical bottlenecks to solve for the next generation of smart systems.<\/p> <p>One promising idea to overcome this limitation is to bring computation into the memory subsystem to use the available bandwidth more efficiently and leverage data reutilization.<br>Such a computational paradigm is known as In-Memory (or Near-Memory) computing.<\/p> <p>The <a href='https:\/\/www.epfl.ch\/labs\/esl\/'>Embedded Systems Laboratory (ESL)<\/a> at Swiss Federal Institute of Technology Lausanne (EPFL) has developed a TSMC 65nm low-power SRAM-based programmable near-computing architecture (known as Carus) capable of computing operations (such as addition, and, or, xor, multiplication, etc.) between two words within the memory layout without moving the operands out into processing elements outside the memory. This reduces the number of bus transactions, resulting in a lower energy consumption when compared to Von Neumann architectures such as CPUs.<\/p> <p>Carus works in two different modes: memory mode, where it behaves exactly like a normal memory; and computing mode, where Carus executes previously loaded programs, typically written in RISC-V assembly.<\/p> <p>Carus has been integrated into a 32b microcontroller called X-HEEP (eXtendable Heterogeneous Energy-Efficient Platform). X-HEEP is an open-source, configurable, and extensible single-core RISC-V 32b MCU, sponsored by the EcoCloud Sustainable Computing center of EPFL. It is based on many third-party open-source IPs and in-house IPs developed at the Embedded Systems Laboratory (ESL) jointly with other EPFL laboratories.<\/p> <p>X-HEEP provides a framework to run applications compiled for RISC-V on a simulator (Verilator, Questasim, or VCS), on a Xilinx FPGA, and can be implemented in silicon as well.<\/p> <p>To execute kernels, the system RISC-V CPU sets Carus into computing mode, loads the appropriate Carus program, copies the input data to Carus\u2019s memory (leveraging the DMA), and finally triggers Carus execution.<\/p> <p>As AI is dominating edge-computing workloads, Carus has been designed to run efficiently convolutional layers. However, due to the number of parameters, such as the size of the inputs and weights filters, number of channels, filters, padding, strides, data types, etc, optimally partitioning and mapping such layers is not trivial, and requires an in-depth study and characterization of the impact of the available data-flow variants (input, output, or weight stationary), tiling strategies, etc.<\/p> <p>Therefore, we propose to develop a library of computing kernels that support different parameters and data flows to efficiently deploy convolutional layers on the system by leveraging Carus computing capabilities.  Such library takes as input the activation and filter dimensions and other parameters such as stride, padding, data-flow, etc. Then, the system CPU applies an appropriate tiling to the tensors to maximise the exploitation of Carus available memory and computing bandwidth. The goal is to profile Carus\u2019s performance under different parameters and build a performance and energy model of the IP when executing convolutional layers.<\/p> <p>Throughout the project, the student will learn:<\/p> <ul> <li>how to partition and optimize Convolutional Layer software implementations using the system CPU.<\/li> <li>how to design and optimize Convolutional Layer with the Carus IP.<\/li> <li>how to analyze, profile, and improve the performance of the designed library.<\/li> <li>how to model a system to estimate its performance given a set of input parameters and constraints.<\/li> <li>how to work with version control (Git) and third-party, open-source repositories and tools.<\/li> <li>How to work in a team of people all contributing to the same project.<\/li> <\/ul> <p>The project will be carried out at the ESL at EPFL, one of the world\u2019s top-class universities. ESL is an active group (24 Ph.D. students among 45 members) involved in many research aspects, therefore providing a stimulating research environment. The student will be under the supervision of Prof. David Atienza, Dr. Davide Schiavone, and Mr. Luigi Giuffrida.<\/p> <p><strong>Project objectives:<\/strong><\/p> <p>Project objectives:<\/p> <ol> <li>Understand the architecture and working principles of the X-HEEP microcontroller, and learn how IPs are integrated into it.<\/li> <li>Understand the Carus memory architecture and learn how it is connected and leveraged in the X-HEEP\/HEEPerator MCU.<\/li> <li>Develop a C implementation of a tiled Convolutional Layer using the system C, supporting different data flows and leveraging the existing kernel implementations to offload computation to Carus. Analyze the performance.<\/li> <li>Develop the Carus\u2019s Assembly implementation of a Convolutional Layer using different data flow, and analyze the performance.<\/li> <li>Validate, Profile, and Analyze the new software library on HEEPerator when using it to run real-world neural networks.<\/li> <\/ol> <p>Throughout the project, the student will learn:<\/p> <ul> <li>How to optimize C code targeting execution speed on near-memory devices.<\/li> <li>How to write assembly code for a custom computing engine based on RISC-V.<\/li> <li>The relevance of hardware-software co-design and its impact on performance.<\/li> <li>Different profiling and characterization techniques.<\/li> <li>How to efficiently analyze and model experimental results.<\/li> <\/ul> <p><strong>Required knowledge and skills:<\/strong><\/p> <ul> <li>Low-level, embedded system C programming.<\/li> <li>Strong analytical skills.<\/li> <li>Strong background in computer architecture and algorithms.<\/li> <li>Teamwork and version control using Git.<\/li> <\/ul> <p><strong>Appreciated skills:<\/strong><\/p> <ul> <li>Scientific curiosity, good communication skills, and advanced English.<\/li> <\/ul> <p><strong>Type of work:<\/strong><\/p> <p>10% research and theory analysis, 60% development and implementation, 30% analysis of results.<\/p><br><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Mr. Luigi Giuffrida, Dr. Davide Schiavone, Prof. David Atienza<br> Contact email: <a href='mailto:luigi.giuffrida@epfl.ch;davide.schiavone@epfl.ch;david.atienza@epfl.ch?subject=Design of efficient Convolutional Layers for Near-Memory Computing IPs based on RISC-V'>luigi.giuffrida@epfl.ch;davide.schiavone@epfl.ch;david.atienza@epfl.ch<\/a><br>\";<\/script>\n<script>var project588minus=\"<b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Mr. Luigi Giuffrida, Dr. Davide Schiavone, Prof. David Atienza<br>\";<\/script>\n<span id=project588><b>Lab: <\/b>ESL<br><b>Sections: <\/b>SEL<br><b>Supervisor<\/b>: Mr. Luigi Giuffrida, Dr. Davide Schiavone, Prof. David Atienza<br> <a href=#_ onclick=opendesc('project588',project588); style='position:relative;z-index:99;'>[read&nbsp;on]<\/a><\/span><br><hr style='border-top: 1px solid #999; color: #999; background-color: #fff; height: 1px; margin: 20px 0;'><br><\/td><td valign=top width=100><table cellpadding=0 cellspacing=0 border=0><td><a href=# title='experimental project'><img alt='experimental project' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/e.gif hspace=2 border=0><\/a><\/td><td><img width=27 src=https:\/\/eslweb.epfl.ch\/img\/1pixel.gif><\/td><td><a href=# title='computational project'><img alt='computational' width=30 src=https:\/\/eslweb.epfl.ch\/projects\/images\/c.gif hspace=2 border=0><\/a><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/docnolink.gif><\/td><td><img width=30 hspace=2 src=https:\/\/eslweb.epfl.ch\/projects\/images\/webnolink.gif><\/td><\/table><\/td><\/tr><tr><td><a href=https:\/\/www.epfl.ch\/labs\/esl\/research\/systems-on-chip\/x-heep\/ target=_blank title='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><img src=https:\/\/eslweb.epfl.ch\/img\/collaborations\/industry\/201.png width=70 alt='eXtendable Heterogeneous Energy-Efficient Platform - EPFL'><\/a><\/td><\/tr><\/table>\n","protected":false},"excerpt":{"rendered":"<p>Edit Master Projects<\/p>\n","protected":false},"author":7,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-119","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/eslweb.epfl.ch\/index.php?rest_route=\/wp\/v2\/pages\/119","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/eslweb.epfl.ch\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/eslweb.epfl.ch\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/eslweb.epfl.ch\/index.php?rest_route=\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/eslweb.epfl.ch\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=119"}],"version-history":[{"count":3,"href":"https:\/\/eslweb.epfl.ch\/index.php?rest_route=\/wp\/v2\/pages\/119\/revisions"}],"predecessor-version":[{"id":138,"href":"https:\/\/eslweb.epfl.ch\/index.php?rest_route=\/wp\/v2\/pages\/119\/revisions\/138"}],"wp:attachment":[{"href":"https:\/\/eslweb.epfl.ch\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=119"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}